mh_utils.cef_parser

Parser for MassHunter Compound Exchange Format .cef files.

A CEF file represents a file identified in LC-MS data by MassHunter Qualitative. It consists of a list of compounds encapsulated in a CompoundList.

A CompoundList consists of Compound objects representing the individual compounds identified in the data. Each Compound object contains information on the location of that compound within the LC data (location), the scores indicating the confidence of the match (compound_scores), a list of possible matching compounds (results), and the matching mass spectrum extracted from the LC-MS data (spectra).

The following diagram represents this structure:

Classes:

Compound([algo, location, compound_scores, …])

Represents a compound identified in mass spectral data by MassHunter Qualitative.

CompoundList([instrument, compounds])

A list of Compound objects parsed from a CEF file.

Device(device_type, number)

Represents the device that acquired a Spectrum.

Flag(string, severity)

Represents a flag in a score, to warn that the identification of a compound is poor.

LocationDict

TypedDict representing the location of a spectrum within mass spectrometry data.

Molecule(name[, formula, matches])

Represents a molecule in a CEF file.

Peak(x, rx, y[, charge, label])

A peak in a Mass Spectrum.

RTRange([start, end])

Represents an <RTRange> element from a CEF file.

Score(score[, flag_string, flag_severity])

A score indicating how well the compound matches the observed spectrum.

Spectrum([spectrum_type, algorithm, …])

Agilent CEF Spectrum.

Functions:

make_timedelta(minutes)

Construct a timedelta from a value in minutes.

parse_cef(filename)

Construct an CompoundList object from the given .cef file.

parse_compound_scores(element)

Parse a <CompoundScores> element into a mapping of algorithms to scores.

parse_match_scores(element)

Parse a <MatchScores> element into a mapping of algorithms to scores.

class Compound(algo='', location=None, compound_scores=None, results=None, spectra=None)[source]

Bases: Dictable

Represents a compound identified in mass spectral data by MassHunter Qualitative.

Parameters

Methods:

__repr__()

Returns a string representation of the Compound.

__str__()

Returns the Compound as a string.

from_xml(element)

Construct a Compound object from an XML element.

Attributes:

algo

The algorithm used to identify the compound.

compound_scores

A dictionary of compound scores.

location

A dictionary of information to locate the compound in the spectral data.

results

A list of molecules that match the spectrum.

spectra

A list of spectra for the compound.

__repr__()[source]

Returns a string representation of the Compound.

Return type

str

__str__()[source]

Returns the Compound as a string.

Return type

str

algo

Type:    str

The algorithm used to identify the compound.

compound_scores

Type:    Dict[str, Score]

A dictionary of compound scores.

classmethod from_xml(element)[source]

Construct a Compound object from an XML element.

Parameters

element (ObjectifiedElement) – a Compound XML element from a CEF file.

Return type

Compound

location

Type:    LocationDict

A dictionary of information to locate the compound in the spectral data.

results

Type:    List[Molecule]

A list of molecules that match the spectrum.

spectra

Type:    List[Spectrum]

A list of spectra for the compound.

class CompoundList(instrument='', compounds=None)[source]

Bases: NamedList

A list of Compound objects parsed from a CEF file.

The full list API is available for this class.

Parameters
  • instrument (str) – String identifying the instrument that acquired the data. Default ''.

  • compounds (Optional[Iterable[Compound]]) – List of compounds identified in the mass spectrometry data. Default None.

Methods:

__repr__()

Return a string representation of the NamedList.

__str__()

Returns the list as a string.

from_xml(element)

Construct a CompoundList object from an XML element.

Attributes:

instrument

The type of instrument that obtained the data, e.g.

__repr__()

Return a string representation of the NamedList.

Return type

str

__str__()[source]

Returns the list as a string.

Return type

str

classmethod from_xml(element)[source]

Construct a CompoundList object from an XML element.

Parameters

element (ObjectifiedElement) – The XML element to parse the data from.

Return type

CompoundList

instrument

Type:    str

The type of instrument that obtained the data, e.g. "LCQTOF".

class Device(device_type, number)[source]

Bases: object

Represents the device that acquired a Spectrum.

Parameters
  • device_type (str) – String identifying the type of device.

  • number (int)

Attributes:

device_type

String identifying the type of device.

number

Methods:

from_dict(d)

Construct an instance of Device from a dictionary.

from_xml(element)

Construct a Device object from an XML element.

to_dict([convert_values])

Returns a dictionary containing the contents of the Device object.

device_type

Type:    str

String identifying the type of device.

classmethod from_dict(d)

Construct an instance of Device from a dictionary.

Parameters

d (Mapping[str, Any]) – The dictionary.

classmethod from_xml(element)[source]

Construct a Device object from an XML element.

Parameters

element (ObjectifiedElement) – a <Device> XML element from a CEF file

Return type

Device

number

Type:    int

to_dict(convert_values=False)

Returns a dictionary containing the contents of the Device object.

Parameters

convert_values (bool) – Recursively convert values into dictionaries, lists etc. as appropriate. Default False.

Return type

MutableMapping[str, Any]

class Flag(string: str, severity: int)[source]

Bases: str

Represents a flag in a score, to warn that the identification of a compound is poor.

Parameters
  • string – The text of the flag

  • severity – The severity of the flag

Methods:

__bool__()

Returns a boolean representation of the Flag.

__eq__(other)

Return self == other.

__ne__(other)

Return self != other.

__bool__()[source]

Returns a boolean representation of the Flag.

Return type

bool

__eq__(other)[source]

Return self == other.

Return type

bool

__ne__(other)[source]

Return self != other.

Return type

bool

typeddict LocationDict[source]

Bases: TypedDict

TypedDict representing the location of a spectrum within mass spectrometry data.

Optional Keys
  • m (float) – the accurate mass of the compound, determined from the observed mass spectrum.

  • rt (float) – The retention time at which the compound was detected.

  • a (float) – The area of the peak in the EIC.

  • y (float) – The height of the peak in the EIC.

class Molecule(name, formula=None, matches=None)[source]

Bases: Dictable

Represents a molecule in a CEF file.

Parameters
  • name (str) – The name of the compound

  • formula (Union[str, Formula, None]) – The formula of the compound. If a string it must be parsable by chemistry_tools.formulae.Formula. Default None.

  • matches (Optional[Dict[str, Score]]) – Dictionary of algo: score match values. Default None.

Methods:

__repr__()

Returns a string representation of the Molecule.

__str__()

Returns the molecule as a string.

from_xml(element)

Construct a Molecule object from an XML element.

__repr__()[source]

Returns a string representation of the Molecule.

Return type

str

__str__()[source]

Returns the molecule as a string.

Return type

str

classmethod from_xml(element)[source]

Construct a Molecule object from an XML element.

Parameters

element (ObjectifiedElement) – a Molecule XML element

Return type

Molecule

parse_cef(filename)[source]

Construct an CompoundList object from the given .cef file.

Parameters

filename (Union[str, Path, PathLike]) – The filename of the CEF file to read.

Return type

CompoundList

parse_compound_scores(element)[source]

Parse a <CompoundScores> element into a mapping of algorithms to scores.

Parameters

element (ObjectifiedElement) – a CompoundScores XML element.

Return type

Dict[str, Score]

parse_match_scores(element)[source]

Parse a <MatchScores> element into a mapping of algorithms to scores.

Parameters

element (ObjectifiedElement) – a MatchScores XML element.

Return type

Dict[str, Score]

class Peak(x, rx, y, charge=0, label='')[source]

Bases: object

A peak in a Mass Spectrum.

Parameters
  • x (float)

  • rx (float)

  • y (float) – The height of the peak in the EIC.

  • charge (int) – The charge on the peak. Default 0.

  • label (str) – The label of the peak. e.g. “M+H” . Default ''.

Attributes:

charge

The charge on the peak.

label

The label of the peak.

rx

x

y

Methods:

from_dict(d)

Construct an instance of Peak from a dictionary.

from_xml(element)

Construct a Peak object from an XML element.

to_dict([convert_values])

Returns a dictionary containing the contents of the Peak object.

charge

Type:    int

The charge on the peak.

classmethod from_dict(d)

Construct an instance of Peak from a dictionary.

Parameters

d (Mapping[str, Any]) – The dictionary.

classmethod from_xml(element)[source]

Construct a Peak object from an XML element.

Parameters

element (ObjectifiedElement) – a <p> XML element from an <MSPeaks> element of a CEF file

Return type

Peak

label

Type:    str

The label of the peak. e.g. “M+H”

rx

Type:    float

to_dict(convert_values=False)

Returns a dictionary containing the contents of the Peak object.

Parameters

convert_values (bool) – Recursively convert values into dictionaries, lists etc. as appropriate. Default False.

Return type

MutableMapping[str, Any]

x

Type:    float

y

Type:    float

The height of the peak in the EIC.

class RTRange(start=0.0, end=0.0)[source]

Bases: object

Represents an <RTRange> element from a CEF file.

Parameters

Attributes:

end

The end time in minutes

start

The start time in minutes

Methods:

from_dict(d)

Construct an instance of RTRange from a dictionary.

from_xml(element)

Construct ab RTRange object from an XML element.

to_dict([convert_values])

Returns a dictionary containing the contents of the RTRange object.

end

Type:    timedelta

The end time in minutes

classmethod from_dict(d)

Construct an instance of RTRange from a dictionary.

Parameters

d (Mapping[str, Any]) – The dictionary.

classmethod from_xml(element)[source]

Construct ab RTRange object from an XML element.

Parameters

element (ObjectifiedElement) – The <RTRange> XML element to parse the data from.

Return type

RTRange

start

Type:    timedelta

The start time in minutes

to_dict(convert_values=False)

Returns a dictionary containing the contents of the RTRange object.

Parameters

convert_values (bool) – Recursively convert values into dictionaries, lists etc. as appropriate. Default False.

Return type

MutableMapping[str, Any]

class Score(score, flag_string='', flag_severity=0)[source]

Bases: float

A score indicating how well the compound matches the observed spectrum.

Parameters
  • score – The score

  • flag_string (str) – Optional flag. See Flag for details. Default ''.

  • flag_severity (int) – The severity of the flag. Default 0.

class Spectrum(spectrum_type='', algorithm='', saturation_limit=0, scans=0, scan_type='', ionisation='', polarity=0, voltage=0.0, device=None, peaks=None, rt_ranges=None)[source]

Bases: Dictable

Agilent CEF Spectrum.

Parameters
  • spectrum_type (str) – The type of spectrum e.g. 'FbF'. Default ''.

  • algorithm (str) – The algorithm used to identify the compound. Default ''.

  • saturation_limit (int) – Unknown. Might mean saturation limit?. Default 0.

  • scans (int) – Unknown. Presumably the number of scans that make up the spectrum?. Default 0.

  • scan_type (str) – Default ''.

  • ionisation (str) – The type of ionisation e.g. ESI. Default ''.

  • polarity (Union[str, int]) – The polarity of the ionisation. Default 0.

  • device (Optional[Device]) – The device that acquired the data. Default None.

  • peaks (Optional[Sequence[Peak]]) – A list of identified peaks in the mass spectrum. Default None.

  • rt_ranges (Optional[Sequence[RTRange]]) – A list of retention time ranges for the mass spectrum. Default None.

classmethod from_xml(element)[source]

Construct a Spectrum object from an XML element.

Parameters

element (ObjectifiedElement) – a Spectrum XML element from a CEF file

Return type

Spectrum