`mh_utils.csv_parser.classes`

Classes to model parts of MassHunter CSV files.

New in version 0.2.0.

Classes:

`BaseSamplePropertyDict`	OrderedDict to store a single property of a set of samples.
`Result`(cas, name, hits[, index, formula, …])	Represents a Result in a MassHunter CSV file.
`Sample`(sample_name, sample_type, …[, results])	Represents a sample in a MassHunter CSV file.
`SampleList`([iterable])	A list of `mh_utils.csv_parser.classes.Sample` objects.
`SamplesAreaDict`	`collections.OrderedDict` to store area information parsed from MassHunter results CSV files.
`SamplesScoresDict`	`collections.OrderedDict` to store score information parsed from MassHunter results CSV files.

Data:

`_R`	Invariant `TypeVar` bound to `mh_utils.csv_parser.classes.Result`.
`_S`	Invariant `TypeVar` bound to `mh_utils.csv_parser.classes.Sample`.
`_SL`	Invariant `TypeVar` bound to `mh_utils.csv_parser.classes.SampleList`.

class BaseSamplePropertyDict[source]

Bases: OrderedDict

OrderedDict to store a single property of a set of samples.

Keys are the sample names and the values are dictionaries mapping compound names to property values.

Attributes:

`n_compounds`	Returns the number of compounds in the `BaseSamplePropertyDict`.
`n_samples`	Returns the number of samples in the `BaseSamplePropertyDict`.
`sample_names`	Returns a list of sample names in the `BaseSamplePropertyDict`.

property n_compounds

Returns the number of compounds in the BaseSamplePropertyDict.

Return type: int

property n_samples

Returns the number of samples in the BaseSamplePropertyDict.

Return type: int

property sample_names

Returns a list of sample names in the BaseSamplePropertyDict.

Return type: List[str]

class Result(cas, name, hits, index=- 1, formula='', score=0.0, abundance=0, height=0, area=0, diff_mDa=0.0, diff_ppm=0.0, rt=0.0, start=0.0, end=0.0, width=0.0, tgt_rt=0.0, rt_diff=0.0, mz=0.0, product_mz=0.0, base_peak=0.0, mass=0.0, average_mass=0.0, tgt_mass=0.0, mining_algorithm='', z_count=0, max_z=0, min_z=0, n_ions=0, polarity='', label='', flags='', flag_severity='', flag_severity_code=0)[source]

Bases: Dictable

Represents a Result in a MassHunter CSV file.

Parameters

cas
name (str)
hits
index (int) – Default -1.
formula (str) – Default ''.
score (float) – Default 0.0.
abundance (float) – Default 0.
height (float) – Default 0.
area (float) – Default 0.
diff_mDa (float) – Default 0.0.
diff_ppm (float) – Default 0.0.
rt (float) – Default 0.0.
start (float) – Default 0.0.
end (float) – Default 0.0.
width (float) – Default 0.0.
tgt_rt (float) – Default 0.0.
rt_diff (float) – Default 0.0.
mz (float) – Default 0.0.
product_mz (float) – Default 0.0.
base_peak (float) – Default 0.0.
mass (float) – Default 0.0.
average_mass (float) – Default 0.0.
tgt_mass (float) – Default 0.0.
mining_algorithm (str) – Default ''.
z_count (int) – Default 0.
max_z (int) – Default 0.
min_z (int) – Default 0.
n_ions (int) – Default 0.
polarity (str) – Default ''.
label (str) – Default ''.
flags (str) – Default ''.
flag_severity (str) – Default ''.
flag_severity_code (int) – Default 0.

Methods:

from_series(series)

Consruct a Result from a pandas.Series.

classmethod from_series(series)[source]

Consruct a Result from a pandas.Series.

Parameters: series (Series)
Return type: ~_R

class Sample(sample_name, sample_type, instrument_name, position, user, acq_method, da_method, irm_cal_status, filename, results=None)[source]

Bases: Dictable

Represents a sample in a MassHunter CSV file.

Parameters

sample_name
sample_type
instrument_name
position
user
acq_method
da_method
irm_cal_status
filename
results – Default None.

Methods:

`add_result`(result)	Add a result to the sample.
`from_series`(series)	Constuct a `Sample` from a `pandas.Series`.

Attributes:

results_list

Returns a list of results in the order in which they were identified.

add_result(result)[source]

Add a result to the sample.

Parameters: result

classmethod from_series(series)[source]

Constuct a Sample from a pandas.Series.

Parameters: series
Return type: ~_S
Returns

property results_list

Returns a list of results in the order in which they were identified.

I.e. sorted by the Cpd value from the csv export.

Return type: List[Result]

class SampleList(iterable=(), /)[source]

Bases: List[Sample]

A list of mh_utils.csv_parser.classes.Sample objects.

Methods:

`add_new_sample`(args, *kwargs)	Add a new sample to the list and return the `Sample` object representing it.
`add_sample`(sample)	Add a `Sample` object to the list.
`add_sample_from_series`(series)	Create a new sample object from a `pandas.series` and add it to the list.
`filter`(sample_names[, key, exclude])	Filter the list to only contain sample_names whose name is in `sample_names`.
`from_json_file`(filename, **kwargs)	Construct a `SampleList` from JSON file.
`get_areas_and_scores`(compound_name[, …])	Returns two dictionaries: one containing sample names and peak areas for the compound with the given name, the other containing sample names and scores.
`get_areas_and_scores_for_compounds`(…[, …])	Returns two dictionaries: one containing sample names and peak areas for the compounds with the given names, the other containing sample names and scores.
`get_areas_for_compounds`(compound_names[, …])	Returns a dictionary containing sample names and peak areas for the compounds with the given names.
`get_compounds`()	Returns a list containing the names of the compounds present in the samples in alphabetical order.
`get_peak_areas`(compound_name[, include_none])	Returns a dictionary containing sample names and peak areas for the compound with the given name.
`get_retention_times`(compound_name[, …])	Returns a dictionary containing sample names and retention times for the compound with the given name.
`get_scores`(compound_name[, include_none])	Returns a dictionary containing sample names and scores for the compound with the given name.
`rename_samples`(rename_mapping[, key])	Rename the samples in the list.
`reorder_samples`(order_mapping[, key])	Reorder the list of `Samples` in place.
`sort_samples`(key[, reverse])	Sort the list of `Samples` in place.

Attributes:

sample_names

Returns a list of sample names in the SampleList.

add_new_sample(*args, **kwargs)[source]: Add a new sample to the list and return the Sample object representing it.

add_sample(sample)[source]

Add a Sample object to the list.

Parameters: sample (Sample)
Return type: Sample

add_sample_from_series(series)[source]

Create a new sample object from a pandas.series and add it to the list.

Return type: Sample
Returns: The newly created Sample object.
Parameters: series (Series)

filter(sample_names, key='sample_name', exclude=False)[source]

Filter the list to only contain sample_names whose name is in sample_names.

Parameters

sample_names (Iterable[str]) – A list of sample names to include
key (str) – The name of the property in the sample to sort by. Default 'sample_name'.
exclude (bool) – If True, any sample whose name is in sample_names will be excluded from the output, rather than included. Default False.

Return type

~_SL

classmethod from_json_file(filename, **kwargs)[source]

Construct a SampleList from JSON file.

Parameters

filename (Union[str, Path, PathLike]) – The filename of the JSON file.
**kwargs – Keyword arguments passed to domdf_python_tools.paths.PathPlus.load_json().

Return type

~_SL

get_areas_and_scores(compound_name, include_none=False)[source]

Returns two dictionaries: one containing sample names and peak areas for the compound with the given name, the other containing sample names and scores.

Parameters

compound_name (str)
include_none (bool) – Whether samples where the compound was not found should be included in the results. Default False.

Return type

Tuple[OrderedDict, OrderedDict]

get_areas_and_scores_for_compounds(compound_names, include_none=False)[source]

Returns two dictionaries: one containing sample names and peak areas for the compounds with the given names, the other containing sample names and scores.

Parameters

compound_names (Iterable[str])
include_none (bool) – Whether samples where none of the specified compounds were found should be included in the results. Default False.

Return type

Tuple[SamplesAreaDict, SamplesScoresDict]

get_areas_for_compounds(compound_names, include_none=False)[source]

Returns a dictionary containing sample names and peak areas for the compounds with the given names.

Parameters

compound_names (Iterable[str])
include_none (bool) – Whether samples where none of the specified compounds were found should be included in the results. Default False.

Return type

SamplesAreaDict

get_compounds()[source]

Returns a list containing the names of the compounds present in the samples in alphabetical order.

Return type: List[str]

get_peak_areas(compound_name, include_none=False)[source]

Returns a dictionary containing sample names and peak areas for the compound with the given name.

Parameters

compound_name (str)
include_none (bool) – Whether samples where the compound was not found should be included in the results. Default False.

Return type

OrderedDict

get_retention_times(compound_name, include_none=False)[source]

Returns a dictionary containing sample names and retention times for the compound with the given name.

Parameters

compound_name (str)
include_none (bool) – Whether samples where the compound was not found should be included in the results. Default False.

Return type

OrderedDict

get_scores(compound_name, include_none=False)[source]

Returns a dictionary containing sample names and scores for the compound with the given name.

Parameters

compound_name (str)
include_none (bool) – Whether samples where the compound was not found should be included in the results. Default False.

Return type

OrderedDict

rename_samples(rename_mapping, key='sample_name')[source]

Rename the samples in the list.

Parameters

rename_mapping (Dict) – A mapping between current sample names and their new names.
key (str) – The name of the property in the sample to sort by. Default 'sample_name'.

Use rename_mapping=:py:obj:None or omit the sample from the rename_mapping entirely to leave the name unchanged.

For example:

rename_mapping = {
    "Propellant 1ug +ve": "Alliant Unique 1µg/L +ESI",
    "Propellant 1mg +ve": "Alliant Unique 1mg/L +ESI",
    "Propellant 1mg -ve": None,
    }

reorder_samples(order_mapping, key='sample_name')[source]

Reorder the list of Samples in place.

Parameters

order_mapping (Dict) –

A mapping between sample names and their new position in the list. For example:

order_mapping = {
    "Propellant 1ug +ve": 0,
    "Propellant 1mg +ve": 1,
    "Propellant 1ug -ve": 2,
    "Propellant 1mg -ve": 3,
    }

key (str) – The name of the property in the sample to sort by. Default 'sample_name'.

property sample_names

Returns a list of sample names in the SampleList.

Return type: List[str]

sort_samples(key, reverse=False)[source]

Sort the list of Samples in place.

Parameters

key (str) – The name of the property in the sample to sort by.
reverse (bool) – Whether the list should be sorted in reverse order. Default False.

Return type