mh_utils.cef_parser
Parser for MassHunter Compound Exchange Format .cef
files.
A CEF file represents a file identified in LC-MS data by MassHunter Qualitative.
It consists of a list of compounds encapsulated in a CompoundList
.
A CompoundList
consists of Compound
objects representing the
individual compounds identified in the data. Each Compound
object contains
information on the location of that compound within the LC data (location
),
the scores indicating the confidence of the match (compound_scores
),
a list of possible matching compounds (results
),
and the matching mass spectrum extracted from the LC-MS data (spectra
).
The following diagram represents this structure:
Classes:
|
Represents a compound identified in mass spectral data by MassHunter Qualitative. |
|
A list of Compound objects parsed from a CEF file. |
|
Represents the device that acquired a |
|
Represents a flag in a score, to warn that the identification of a compound is poor. |
|
|
|
Represents a molecule in a CEF file. |
|
A peak in a Mass Spectrum. |
|
Represents an |
|
A score indicating how well the compound matches the observed spectrum. |
|
Agilent CEF Spectrum. |
Functions:
|
Construct a timedelta from a value in minutes. |
|
Construct an |
|
Parse a |
|
Parse a |
-
class
Compound
(algo='', location=None, compound_scores=None, results=None, spectra=None)[source] Bases:
Dictable
Represents a compound identified in mass spectral data by MassHunter Qualitative.
- Parameters
algo (
str
) – The algorithm used to identify the compound. Default''
.location (
Optional
[LocationDict
]) – A dictionary of information to locate the compound in the spectral data. DefaultNone
.compound_scores (
Optional
[Dict
[str
,Score
]]) – A dictionary of compound scores. DefaultNone
.results (
Optional
[Sequence
[Molecule
]]) – A list of molecules that match the spectrum. DefaultNone
.spectra (
Optional
[Sequence
[Spectrum
]]) – A list of spectra for the compound. DefaultNone
.
Methods:
__repr__
()Returns a string representation of the
Compound
.__str__
()Returns the
Compound
as a string.from_xml
(element)Construct a
Compound
object from an XML element.Attributes:
The algorithm used to identify the compound.
A dictionary of compound scores.
A dictionary of information to locate the compound in the spectral data.
A list of molecules that match the spectrum.
A list of spectra for the compound.
-
classmethod
from_xml
(element)[source] Construct a
Compound
object from an XML element.- Parameters
element (
ObjectifiedElement
) – a Compound XML element from a CEF file.- Return type
-
location
Type:
LocationDict
A dictionary of information to locate the compound in the spectral data.
-
class
CompoundList
(instrument='', compounds=None)[source] Bases:
NamedList
A list of Compound objects parsed from a CEF file.
The full
list
API is available for this class.- Parameters
Methods:
__repr__
()Return a string representation of the
NamedList
.__str__
()Returns the list as a string.
from_xml
(element)Construct a
CompoundList
object from an XML element.Attributes:
The type of instrument that obtained the data, e.g.
-
classmethod
from_xml
(element)[source] Construct a
CompoundList
object from an XML element.- Parameters
element (
ObjectifiedElement
) – The XML element to parse the data from.- Return type
-
class
Device
(device_type, number)[source] Bases:
object
Represents the device that acquired a
Spectrum
.Attributes:
String identifying the type of device.
Methods:
from_dict
(d)Construct an instance of
Device
from a dictionary.from_xml
(element)Construct a
Device
object from an XML element.to_dict
([convert_values])Returns a dictionary containing the contents of the
Device
object.
-
class
Flag
(string: str, severity: int)[source] Bases:
str
Represents a flag in a score, to warn that the identification of a compound is poor.
- Parameters
string – The text of the flag
severity – The severity of the flag
Methods:
__bool__
()Returns a boolean representation of the
Flag
.__eq__
(other)Return
self == other
.__ne__
(other)Return
self != other
.
-
typeddict
LocationDict
[source] Bases:
TypedDict
TypedDict
representing the location of a spectrum within mass spectrometry data.
-
class
Molecule
(name, formula=None, matches=None)[source] Bases:
Dictable
Represents a molecule in a CEF file.
- Parameters
Methods:
__repr__
()Returns a string representation of the
Molecule
.__str__
()Returns the molecule as a string.
from_xml
(element)Construct a
Molecule
object from an XML element.
-
parse_cef
(filename)[source] Construct an
CompoundList
object from the given.cef
file.
-
parse_compound_scores
(element)[source] Parse a
<CompoundScores>
element into a mapping of algorithms to scores.
-
parse_match_scores
(element)[source] Parse a
<MatchScores>
element into a mapping of algorithms to scores.
-
class
Peak
(x, rx, y, charge=0, label='')[source] Bases:
object
A peak in a Mass Spectrum.
- Parameters
Attributes:
The charge on the peak.
The label of the peak.
Methods:
from_dict
(d)Construct an instance of
Peak
from a dictionary.from_xml
(element)Construct a
Peak
object from an XML element.to_dict
([convert_values])Returns a dictionary containing the contents of the
Peak
object.-
classmethod
from_xml
(element)[source] Construct a
Peak
object from an XML element.- Parameters
element (
ObjectifiedElement
) – a<p>
XML element from an <MSPeaks> element of a CEF file- Return type
-
class
RTRange
(start=0.0, end=0.0)[source] Bases:
object
Represents an
<RTRange>
element from a CEF file.- Parameters
Attributes:
The end time in minutes
The start time in minutes
Methods:
from_dict
(d)Construct an instance of
RTRange
from a dictionary.from_xml
(element)Construct ab
RTRange
object from an XML element.to_dict
([convert_values])Returns a dictionary containing the contents of the
RTRange
object.
-
class
Score
(score, flag_string='', flag_severity=0)[source] Bases:
float
A score indicating how well the compound matches the observed spectrum.
-
class
Spectrum
(spectrum_type='', algorithm='', saturation_limit=0, scans=0, scan_type='', ionisation='', polarity=0, voltage=0.0, device=None, peaks=None, rt_ranges=None)[source] Bases:
Dictable
Agilent CEF Spectrum.
- Parameters
spectrum_type (
str
) – The type of spectrum e.g.'FbF'
. Default''
.algorithm (
str
) – The algorithm used to identify the compound. Default''
.saturation_limit (
int
) – Unknown. Might mean saturation limit?. Default0
.scans (
int
) – Unknown. Presumably the number of scans that make up the spectrum?. Default0
.scan_type (
str
) – Default''
.ionisation (
str
) – The type of ionisation e.g. ESI. Default''
.polarity (
Union
[str
,int
]) – The polarity of the ionisation. Default0
.device (
Optional
[Device
]) – The device that acquired the data. DefaultNone
.peaks (
Optional
[Sequence
[Peak
]]) – A list of identified peaks in the mass spectrum. DefaultNone
.rt_ranges (
Optional
[Sequence
[RTRange
]]) – A list of retention time ranges for the mass spectrum. DefaultNone
.