mh_utils¶
Utilities for handing ancillary files produced by MassHunter.
Docs |
|
---|---|
Tests |
|
PyPI |
|
Anaconda |
|
Activity |
|
QA |
|
Other |
The current utilities are as follows:
mh_utils.worklist_parser
: Parse Agilent MassHunter Worklists (*.wkl
files).mh_utils.cef_parser
: Parse Agilent MassHunter Compound Exchange Format files (*.cef
files).
Installation¶
python3 -m pip install mh_utils --user
First add the required channels
conda config --add channels https://conda.anaconda.org/conda-forge
conda config --add channels https://conda.anaconda.org/domdfcoding
Then install
conda install mh_utils
python3 -m pip install git+https://github.com/domdfcoding/mh_utils@master --user
mh_utils.utils
¶
General utility functions.
Functions:
|
Returns |
|
Convert |
|
Returns the boolean representation of |
|
Construct a timedelta from a value in minutes. |
|
Returns |
-
as_path
(val)[source]¶ Returns
val
as aPureWindowsPath
, orNone
if the value is empty/None
/False
.- Parameters
val (
Any
) – The value to convert to a path- Return type
-
element_to_bool
(val)[source]¶ Returns the boolean representation of
val
.Values of
-1
are counted asTrue
for the purposes of this function.True
values are'y'
,'yes'
,'t'
,'true'
,'on'
,'1'
,1
,-1
, and'-1'
.False
values are'n'
,'no'
,'f'
,'false'
,'off'
,'0'
, and0
.- Raises
ValueError
if ‘val’ is anything else.- Return type
mh_utils.xml
¶
Functions and classes for handling XML files.
Classes:
ABC mixin to provide a function for instantiating the class from an XML file. |
Functions:
|
Returns a validated lxml objectify from the given XML file, validated against the schema file. |
-
class
XMLFileMixin
[source]¶ Bases:
ABC
ABC mixin to provide a function for instantiating the class from an XML file.
Methods:
from_xml
(element)Construct an object from an XML element.
from_xml_file
(filename)Generate an instance of this class by parsing an from an XML file.
-
get_validated_tree
(xml_file, schema_file=None)[source]¶ Returns a validated lxml objectify from the given XML file, validated against the schema file.
- Parameters
- Return type
_ElementTree
- Returns
An lxml ElementTree object. When .getroot() us called on the tree the root will be an instance of
lxml.objectify.ObjectifiedElement
.
mh_utils.cef_parser
¶
Parser for MassHunter Compound Exchange Format .cef
files.
A CEF file represents a file identified in LC-MS data by MassHunter Qualitative.
It consists of a list of compounds encapsulated in a CompoundList
.
A CompoundList
consists of Compound
objects representing the
individual compounds identified in the data. Each Compound
object contains
information on the location of that compound within the LC data (location
),
the scores indicating the confidence of the match (compound_scores
),
a list of possible matching compounds (results
),
and the matching mass spectrum extracted from the LC-MS data (spectra
).
The following diagram represents this structure:
-
-
Compound.location
⇨Optional
[LocationDict
]Compound.compound_scores
⇨Optional
[Dict
[str
,Score
] ]-
-
Another
Compound
...
-
Classes:
|
Represents a compound identified in mass spectral data by MassHunter Qualitative. |
|
A list of Compound objects parsed from a CEF file. |
|
Represents the device that acquired a |
|
Represents a flag in a score, to warn that the identification of a compound is poor. |
|
|
|
Represents a molecule in a CEF file. |
|
A peak in a Mass Spectrum. |
|
Represents an |
|
A score indicating how well the compound matches the observed spectrum. |
|
Agilent CEF Spectrum. |
Functions:
|
Construct a timedelta from a value in minutes. |
|
Construct an |
|
Parse a |
|
Parse a |
-
class
Compound
(algo='', location=None, compound_scores=None, results=None, spectra=None)[source]¶ Bases:
Dictable
Represents a compound identified in mass spectral data by MassHunter Qualitative.
- Parameters
algo (
str
) – The algorithm used to identify the compound. Default''
.location (
Optional
[LocationDict
]) – A dictionary of information to locate the compound in the spectral data. DefaultNone
.compound_scores (
Optional
[Dict
[str
,Score
]]) – A dictionary of compound scores. DefaultNone
.results (
Optional
[Sequence
[Molecule
]]) – A list of molecules that match the spectrum. DefaultNone
.spectra (
Optional
[Sequence
[Spectrum
]]) – A list of spectra for the compound. DefaultNone
.
Methods:
__repr__
()Returns a string representation of the
Compound
.__str__
()Returns the
Compound
as a string.from_xml
(element)Construct a
Compound
object from an XML element.Attributes:
The algorithm used to identify the compound.
A dictionary of compound scores
A dictionary of information to locate the compound in the spectral data
A list of molecules that match the spectrum.
A list of spectra for the compound.
-
classmethod
from_xml
(element)[source]¶ Construct a
Compound
object from an XML element.- Parameters
element (
ObjectifiedElement
) – a Compound XML element from a CEF file.- Return type
-
location
¶ Type:
LocationDict
A dictionary of information to locate the compound in the spectral data
-
class
CompoundList
(instrument='', compounds=None)[source]¶ Bases:
NamedList
A list of Compound objects parsed from a CEF file.
The full
list
API is available for this class.- Parameters
Methods:
__repr__
()Return a string representation of the
NamedList
.__str__
()Returns the list as a string.
from_xml
(element)Construct a
CompoundList
object from an XML element.Attributes:
The type of instrument that obtained the data, e.g.
-
classmethod
from_xml
(element)[source]¶ Construct a
CompoundList
object from an XML element.- Parameters
element (
ObjectifiedElement
) – The XML element to parse the data from.- Return type
-
class
Device
(device_type, number)[source]¶ Bases:
object
Represents the device that acquired a
Spectrum
.- Parameters
device_type – String identifying the type of device.
number
Attributes:
String identifying the type of device.
Methods:
from_dict
(d)Construct an instance of
Device
from a dictionary.from_xml
(element)Construct a
Device
object from an XML element.to_dict
([convert_values])Returns a dictionary containing the contents of the
Device
object.
-
class
Flag
(string: str, severity: int)[source]¶ Bases:
str
Represents a flag in a score, to warn that the identification of a compound is poor.
- Parameters
string – The text of the flag
severity – The severity of the flag
Methods:
__bool__
()Returns a boolean representation of the
Flag
.__eq__
(other)Return
self == other
.__ne__
(other)Return
self != other
.__repr__
()Returns a string representation of the
Flag
.
-
typeddict
LocationDict
[source]¶ Bases:
dict
TypedDict
representing the location of a spectrum within mass spectrometry data.
-
class
Molecule
(name, formula=None, matches=None)[source]¶ Bases:
Dictable
Represents a molecule in a CEF file.
- Parameters
Methods:
__repr__
()Returns a string representation of the
Molecule
.__str__
()Returns the molecule as a string.
from_xml
(element)Construct a
Molecule
object from an XML element.
-
parse_cef
(filename)[source]¶ Construct an
CompoundList
object from the given.cef
file.
-
parse_compound_scores
(element)[source]¶ Parse a
<CompoundScores>
element into a dictionary mapping algorithms to scores.
-
parse_match_scores
(element)[source]¶ Parse a
<MatchScores>
element into a dictionary mapping algorithms to scores.
-
class
Peak
(x, rx, y, charge=0, label='')[source]¶ Bases:
object
A peak in a Mass Spectrum.
- Parameters
x
rx
y – The height of the peak in the EIC.
charge – The charge on the peak. Default
0
.label – The label of the peak. e.g. “M+H” . Default
''
.
Attributes:
The charge on the peak.
The label of the peak.
Methods:
from_dict
(d)Construct an instance of
Peak
from a dictionary.from_xml
(element)Construct a
Peak
object from an XML element.to_dict
([convert_values])Returns a dictionary containing the contents of the
Peak
object.-
classmethod
from_xml
(element)[source]¶ Construct a
Peak
object from an XML element.- Parameters
element (
ObjectifiedElement
) – a<p>
XML element from an <MSPeaks> element of a CEF file- Return type
-
class
RTRange
(start=0.0, end=0.0)[source]¶ Bases:
object
Represents an
<RTRange>
element from a CEF file.- Parameters
start – The start time in minutes . Default
0.0
.end – The end time in minutes . Default
0.0
.
Attributes:
The end time in minutes
The start time in minutes
Methods:
from_dict
(d)Construct an instance of
RTRange
from a dictionary.from_xml
(element)Construct ab
RTRange
object from an XML element.to_dict
([convert_values])Returns a dictionary containing the contents of the
RTRange
object.
-
class
Score
(score, flag_string='', flag_severity=0)[source]¶ Bases:
float
A score indicating how well the compound matches the observed spectrum.
-
class
Spectrum
(spectrum_type='', algorithm='', saturation_limit=0, scans=0, scan_type='', ionisation='', polarity=0, voltage=0.0, device=None, peaks=None, rt_ranges=None)[source]¶ Bases:
Dictable
Agilent CEF Spectrum.
- Parameters
spectrum_type (
str
) – The type of spectrum e.g.'FbF'
. Default''
.algorithm (
str
) – The algorithm used to identify the compound. Default''
.saturation_limit (
int
) – Unknown. Might mean saturation limit?. Default0
.scans (
int
) – Unknown. Presumably the number of scans that make up the spectrum?. Default0
.scan_type (
str
) – Default''
.ionisation (
str
) – The type of ionisation e.g. ESI. Default''
.polarity (
Union
[str
,int
]) – The polarity of the ionisation. Default0
.device (
Optional
[Device
]) – The device that acquired the data. DefaultNone
.peaks (
Optional
[Sequence
[Peak
]]) – A list of identified peaks in the mass spectrum. DefaultNone
.rt_ranges (
Optional
[Sequence
[RTRange
]]) – A list of retention time ranges for the mass spectrum. DefaultNone
.
Methods:
__repr__
()Returns a string representation of the
Spectrum
.from_xml
(element)Construct a
Spectrum
object from an XML element.
mh_utils.worklist_parser
¶
Parser for MassHunter worklists.
Only one function is defined here: read_worklist
,
which reads the reads the given worklist file and returns
a mh_utils.worklist_parser.classes.Worklist
file representing it.
The other functions and classes must be imported from submodules of this package.
Functions:
|
Read the worklist from the given file. |
mh_utils.worklist_parser.classes
¶
Main classes for the worklist paser.
Classes:
|
Represents an Attribute. |
|
Represents a checksum for a worklist. |
|
Class that represents an entry in the worklist. |
|
Represents a macro in a worklist. |
|
Class that represents an Agilent MassHunter worklist. |
-
class
Attribute
(attribute_id, attribute_type, field_type, system_name, header_name, data_type, default_data_value, reorder_id, show_hide_status, column_width)[source]¶ Bases:
object
Represents an Attribute.
- Parameters
attribute_id
attribute_type – Can be System Defined (
0
), System Used (1
), or User Added (2
).field_type – Each of the system defined columns have a field type starting from sampleid = 0 to reserved6 = 24. The system used column can be ‘compound param’ = 35, ‘optim param’ = 36, ‘mass param’ = 37 and ‘protein param’ = 38. The User added columns start from 45.
system_name
header_name
data_type
default_data_value
reorder_id
show_hide_status
column_width
Attributes:
Can be System Defined (
0
), System Used (1
), or User Added (2
).Each of the system defined columns have a field type starting from sampleid = 0 to reserved6 = 24.
Methods:
from_dict
(d)Construct an instance of
Attribute
from a dictionary.from_xml
(element)Construct an
Attribute
object from an XML element.to_dict
([convert_values])Returns a dictionary containing the contents of the
Attribute
object.-
attribute_type
¶ Type:
AttributeType
Can be System Defined (
0
), System Used (1
), or User Added (2
).
-
field_type
¶ Type:
int
Each of the system defined columns have a field type starting from sampleid = 0 to reserved6 = 24. The system used column can be ‘compound param’ = 35, ‘optim param’ = 36, ‘mass param’ = 37 and ‘protein param’ = 38. The User added columns start from 45.
-
class
Checksum
(SchemaVersion, ALGO_VERSION, HASHCODE)[source]¶ Bases:
object
Represents a checksum for a worklist.
The format of the checksum is unknown.
- Parameters
SchemaVersion
ALGO_VERSION
HASHCODE
Attributes:
Methods:
from_dict
(d)Construct an instance of
Checksum
from a dictionary.from_xml
(element)Construct a
Checksum
object from an XML element.to_dict
([convert_values])Returns a dictionary containing the contents of the
Checksum
object.
-
class
JobData
(id, job_type, run_status, sample_info=None)[source]¶ Bases:
Dictable
Class that represents an entry in the worklist.
- Parameters
Methods:
__repr__
()Return a string representation of the
Dictable
.from_xml
(element[, user_columns])Construct a
JobData
object from an XML element.
-
class
Macro
(project_name, procedure_name, input_parameter, output_data_type, output_parameter, display_string)[source]¶ Bases:
object
Represents a macro in a worklist.
- Parameters
project_name
procedure_name
input_parameter
output_data_type
output_parameter
display_string
Attributes:
Returns whether the macro is undefined.
Methods:
from_dict
(d)Construct an instance of
Macro
from a dictionary.from_xml
(element)Construct a
Macro
object from an XML element.to_dict
([convert_values])Returns a dictionary containing the contents of the
Macro
object.
-
class
Worklist
(version, locked_run_mode, instrument_name, params, user_columns, jobs, checksum)[source]¶ Bases:
XMLFileMixin
,Dictable
Class that represents an Agilent MassHunter worklist.
- Parameters
version (
float
) – WorklistInfo version numberlocked_run_mode (
bool
) – Flag to indicate whether the data was acquired in locked mode. Yes = -1. No = 0.instrument_name (
str
) – The name of the instrument.params (
dict
) – Mapping of parameter names to values. TODO: Checkuser_columns (
dict
) – Mapping of user columns to ??? TODOchecksum (
Checksum
) – The checksum of the worklist file. The format is unknown.
Methods:
__repr__
()Return repr(self).
Returns the
Worklist
as apandas.DataFrame
.from_xml
(element)Construct a
Worklist
object from an XML element.-
as_dataframe
()[source]¶ Returns the
Worklist
as apandas.DataFrame
.- Return type
mh_utils.worklist_parser.columns
¶
Properties for columns in a Worklist.
Classes:
|
Represents a column in a worklist. |
Functions:
|
Handle special case for injection volume of |
-
class
Column
(name, attribute_id, attribute_type, dtype, default_value, field_type=None, reorder_id=None)[source]¶ Bases:
object
Represents a column in a worklist.
- Parameters
name – The name of the column
attribute_id
attribute_type – can be System Defined = 0, System Used = 1, User Added = 2
dtype (
Callable
)default_value (
Any
)field_type (
Optional
[int
]) – Each of the system defined columns have a field type starting from sampleid = 0 to reserved6 = 24. The system used column can be ‘compound param’ = 35, ‘optim param’ = 36, ‘mass param’ = 37 and ‘protein param’ = 38. The User added columns start from 45. DefaultNone
.
Methods:
__eq__
(other)Return
self == other
.__ge__
(other)Return
self >= other
.Used for pickling.
__gt__
(other)Return
self > other
.__le__
(other)Return
self <= other
.__lt__
(other)Return
self < other
.__ne__
(other)Return
self != other
.__repr__
()Return a string representation of the
Column
.__setstate__
(state)Used for pickling.
cast_value
(value)Cast
value
to the dtype of this column.from_attribute
(attribute)Construct a column for an
Attribute
.from_dict
(d)Construct an instance of
Column
from a dictionary.to_dict
([convert_values])Returns a dictionary containing the contents of the
Column
object.Attributes:
can be System Defined = 0, System Used = 1, User Added = 2
Each of the system defined columns have a field type starting from sampleid = 0 to reserved6 = 24.
The name of the column
-
attribute_type
¶ Type:
AttributeType
can be System Defined = 0, System Used = 1, User Added = 2
-
field_type
¶ -
Each of the system defined columns have a field type starting from sampleid = 0 to reserved6 = 24. The system used column can be ‘compound param’ = 35, ‘optim param’ = 36, ‘mass param’ = 37 and ‘protein param’ = 38. The User added columns start from 45.
-
injection_volume
(val)[source]¶ Handle special case for injection volume of
-1
, which indicates “As Method”.
-
columns
¶ Mapping of column names to column objects.
mh_utils.worklist_parser.enums
¶
Enumerations of values.
Classes:
|
Enumeration of values for column/attribute types. |
-
enum
AttributeType
(value)[source]¶ Bases:
enum_tools.custom_enums.IntEnum
Enumeration of values for column/attribute types.
- Member Type
Valid values are as follows:
-
SystemDefined
= <AttributeType.SystemDefined: 0>¶ Attributes defined by the system.
-
SystemUsed
= <AttributeType.SystemUsed: 1>¶ Attributes used by the system.
-
UserAdded
= <AttributeType.UserAdded: 2>¶ Attributes added by the user.
mh_utils.worklist_parser.parser
¶
MassHunter worklist parser.
Functions:
|
Parse a datetime from a worklist or contents file. |
|
Parse the worklist execution parameters from XML. |
|
Parse information about a sample in a worklist from XML. |
Data:
Mapping of XML tag names to attribute names. |
-
parse_sample_info
(element, user_columns=None)[source]¶ Parse information about a sample in a worklist from XML.
Mapping of XML tag names to attribute names.
Example Usage¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | # 3rd party
from pandas import DataFrame
# this package
from mh_utils.worklist_parser import Worklist, read_worklist
# Replace 'worklist.wkl' with the filename of your worklist.
wkl: Worklist = read_worklist("worklist.wkl")
df: DataFrame = wkl.as_dataframe()
# Filter columns
df = df[["Sample Name", "Method", "Data File", "Drying Gas", "Gas Temp", "Nebulizer"]]
# Get just the filename from 'Method' and 'Data File'
df["Method"] = [x.name for x in df["Method"]]
df["Data File"] = [x.name for x in df["Data File"]]
# Show the DataFrame
|
Sample Name Method Data File Drying Gas Gas Temp Nebulizer
0 Methanol Blank +ve Maitre Gunshot Residue Positive.m Methanol_Blank_+ve_191121-0001-r001.d
1 Propellant 1mg +ve Maitre Gunshot Residue Positive.m Propellant_1mg_+ve_191121-0002-r001.d
2 Propellant 1ug +ve Maitre Gunshot Residue Positive.m Propellant_1ug_+ve_191121-0003-r001.d
3 Methanol Blank -ve Maitre Gunshot Residue Negative.m Methanol_Blank_-ve_191121-0004-r001.d
4 Propellant 1mg -ve Maitre Gunshot Residue Negative.m Propellant_1mg_-ve_191121-0005-r001.d
5 Propellant 1ug -ve Maitre Gunshot Residue Negative.m Propellant_1ug_-ve_191121-0006-r001.d
6 Methanol Blank +ve 5ul Maitre Gunshot Residue Positive 5ul.m Methanol_Blank_+ve_5ul_191121-0007-r001.d
7 Propellant 1mg +ve 5ul Maitre Gunshot Residue Positive 5ul.m Propellant_1mg_+ve_5ul_191121-0008-r001.d
8 Propellant 1ug +ve 5ul Maitre Gunshot Residue Positive 5ul.m Propellant_1ug_+ve_5ul_191121-0009-r001.d
9 Methanol Blank Maitre Gunshot Residue Positive 5ul.m Methanol_Blank_191121-0010-r001.d
10 Propellant 1mg gas 200 Maitre Gunshot Residue Positive 5ul.m Propellant_1mg_gas_200_191121-0011-r001.d 200
11 Propellant 1ug gas 200 Maitre Gunshot Residue Positive 5ul.m Propellant_1ug_gas_200_191121-0012-r001.d 200
12 Propellant 1mg gas 280 Maitre Gunshot Residue Positive 5ul.m Propellant_1mg_gas_280_191121-0013-r001.d 280
13 Propellant 1ug gas 280 Maitre Gunshot Residue Positive 5ul.m Propellant_1ug_gas_280_191121-0014-r001.d 280
14 Propellant 1mg drying 14 Maitre Gunshot Residue Positive 5ul.m Propellant_1mg_drying_14_191121-0015-r001.d 14
15 Propellant 1ug drying 14 Maitre Gunshot Residue Positive 5ul.m Propellant_1ug_drying_14_191121-0016-r001.d 14
16 Propellant 1mg drying 16 Maitre Gunshot Residue Positive 5ul.m Propellant_1mg_drying_16_191121-0017-r001.d 16
17 Propellant 1ug drying 16 Maitre Gunshot Residue Positive 5ul.m Propellant_1ug_drying_16_191121-0018-r001.d 16
18 Propellant 1mg drying 18 Maitre Gunshot Residue Positive 5ul.m Propellant_1mg_drying_18_191121-0019-r001.d 18
19 Propellant 1ug drying 18 Maitre Gunshot Residue Positive 5ul.m Propellant_1ug_drying_18_191121-0020-r001.d 18
20 Propellant 1mg nebul 40 Maitre Gunshot Residue Positive 5ul.m Propellant_1mg_nebul_40_191121-0021-r001.d 40
21 Propellant 1ug nebul 40 Maitre Gunshot Residue Positive 5ul.m Propellant_1ug_nebul_40_191121-0022-r001.d 40
22 Propellant 1mg nebul 50 Maitre Gunshot Residue Positive 5ul.m Propellant_1mg_nebul_50_191121-0023-r001.d 50
23 Propellant 1ug nebul 50 Maitre Gunshot Residue Positive 5ul.m Propellant_1ug_nebul_50_191121-0024-r001.d 50
24 Propellant 1mg nebul 60 Maitre Gunshot Residue Positive 5ul.m Propellant_1mg_nebul_60_191121-0025-r001.d 60
25 Propellant 1ug nebul 60 Maitre Gunshot Residue Positive 5ul.m Propellant_1ug_nebul_60_191121-0026-r001.d 60
...
20 21 22 23 24 25 26 | print(df.to_string())
# save as CSV
df.to_csv("worklist.csv")
# save as JSON
df.to_json("worklist.json", indent=2)
|
Output¶
Overview¶
mh_utils
uses tox to automate testing and packaging,
and pre-commit to maintain code quality.
Install pre-commit
with pip
and install the git hook:
python -m pip install pre-commit
pre-commit install
Coding style¶
formate is used for code formatting.
It can be run manually via pre-commit
:
pre-commit run formate -a
Or, to run the complete autoformatting suite:
pre-commit run -a
Automated tests¶
Tests are run with tox
and pytest
.
To run tests for a specific Python version, such as Python 3.6:
tox -e py36
To run tests for all Python versions, simply run:
tox
Build documentation locally¶
The documentation is powered by Sphinx. A local copy of the documentation can be built with tox
:
tox -e docs
Downloading source code¶
The mh_utils
source code is available on GitHub,
and can be accessed from the following URL: https://github.com/domdfcoding/mh_utils
If you have git
installed, you can clone the repository with the following command:
$ git clone https://github.com/domdfcoding/mh_utils"
> Cloning into 'mh_utils'...
> remote: Enumerating objects: 47, done.
> remote: Counting objects: 100% (47/47), done.
> remote: Compressing objects: 100% (41/41), done.
> remote: Total 173 (delta 16), reused 17 (delta 6), pack-reused 126
> Receiving objects: 100% (173/173), 126.56 KiB | 678.00 KiB/s, done.
> Resolving deltas: 100% (66/66), done.

Downloading a ‘zip’ file of the source code¶
Building from source¶
The recommended way to build mh_utils
is to use tox:
tox -e build
The source and wheel distributions will be in the directory dist
.
If you wish, you may also use pep517.build or another PEP 517-compatible build tool.
View the Function Index or browse the Source Code.