mh_utils.csv_parser.classes

Classes to model parts of MassHunter CSV files.

New in version 0.2.0.

Classes:

BaseSamplePropertyDict

OrderedDict to store a single property of a set of samples.

Result(cas, name, hits[, index, formula, …])

Represents a Result in a MassHunter CSV file.

Sample(sample_name, sample_type, …[, results])

Represents a sample in a MassHunter CSV file.

SampleList([iterable])

A list of mh_utils.csv_parser.classes.Sample objects.

SamplesAreaDict

collections.OrderedDict to store area information parsed from MassHunter results CSV files.

SamplesScoresDict

collections.OrderedDict to store score information parsed from MassHunter results CSV files.

Data:

_R

Invariant TypeVar bound to mh_utils.csv_parser.classes.Result.

_S

Invariant TypeVar bound to mh_utils.csv_parser.classes.Sample.

_SL

Invariant TypeVar bound to mh_utils.csv_parser.classes.SampleList.

class BaseSamplePropertyDict[source]

Bases: OrderedDict

OrderedDict to store a single property of a set of samples.

Keys are the sample names and the values are dictionaries mapping compound names to property values.

Attributes:

n_compounds

Returns the number of compounds in the BaseSamplePropertyDict.

n_samples

Returns the number of samples in the BaseSamplePropertyDict.

sample_names

Returns a list of sample names in the BaseSamplePropertyDict.

property n_compounds

Returns the number of compounds in the BaseSamplePropertyDict.

Return type

int

property n_samples

Returns the number of samples in the BaseSamplePropertyDict.

Return type

int

property sample_names

Returns a list of sample names in the BaseSamplePropertyDict.

Return type

List[str]

class Result(cas, name, hits, index=- 1, formula='', score=0.0, abundance=0, height=0, area=0, diff_mDa=0.0, diff_ppm=0.0, rt=0.0, start=0.0, end=0.0, width=0.0, tgt_rt=0.0, rt_diff=0.0, mz=0.0, product_mz=0.0, base_peak=0.0, mass=0.0, average_mass=0.0, tgt_mass=0.0, mining_algorithm='', z_count=0, max_z=0, min_z=0, n_ions=0, polarity='', label='', flags='', flag_severity='', flag_severity_code=0)[source]

Bases: Dictable

Represents a Result in a MassHunter CSV file.

Parameters
  • cas

  • name (str)

  • hits

  • index (int) – Default -1.

  • formula (str) – Default ''.

  • score (float) – Default 0.0.

  • abundance (float) – Default 0.

  • height (float) – Default 0.

  • area (float) – Default 0.

  • diff_mDa (float) – Default 0.0.

  • diff_ppm (float) – Default 0.0.

  • rt (float) – Default 0.0.

  • start (float) – Default 0.0.

  • end (float) – Default 0.0.

  • width (float) – Default 0.0.

  • tgt_rt (float) – Default 0.0.

  • rt_diff (float) – Default 0.0.

  • mz (float) – Default 0.0.

  • product_mz (float) – Default 0.0.

  • base_peak (float) – Default 0.0.

  • mass (float) – Default 0.0.

  • average_mass (float) – Default 0.0.

  • tgt_mass (float) – Default 0.0.

  • mining_algorithm (str) – Default ''.

  • z_count (int) – Default 0.

  • max_z (int) – Default 0.

  • min_z (int) – Default 0.

  • n_ions (int) – Default 0.

  • polarity (str) – Default ''.

  • label (str) – Default ''.

  • flags (str) – Default ''.

  • flag_severity (str) – Default ''.

  • flag_severity_code (int) – Default 0.

Methods:

from_series(series)

Consruct a Result from a pandas.Series.

classmethod from_series(series)[source]

Consruct a Result from a pandas.Series.

Parameters

series (Series)

Return type

~_R

class Sample(sample_name, sample_type, instrument_name, position, user, acq_method, da_method, irm_cal_status, filename, results=None)[source]

Bases: Dictable

Represents a sample in a MassHunter CSV file.

Parameters
  • sample_name

  • sample_type

  • instrument_name

  • position

  • user

  • acq_method

  • da_method

  • irm_cal_status

  • filename

  • results – Default None.

Methods:

add_result(result)

Add a result to the sample.

from_series(series)

Constuct a Sample from a pandas.Series.

Attributes:

results_list

Returns a list of results in the order in which they were identified.

add_result(result)[source]

Add a result to the sample.

Parameters

result

classmethod from_series(series)[source]

Constuct a Sample from a pandas.Series.

Parameters

series

Return type

~_S

Returns

property results_list

Returns a list of results in the order in which they were identified.

I.e. sorted by the Cpd value from the csv export.

Return type

List[Result]

class SampleList(iterable=(), /)[source]

Bases: List[Sample]

A list of mh_utils.csv_parser.classes.Sample objects.

Methods:

add_new_sample(*args, **kwargs)

Add a new sample to the list and return the Sample object representing it.

add_sample(sample)

Add a Sample object to the list.

add_sample_from_series(series)

Create a new sample object from a pandas.series and add it to the list.

filter(sample_names[, key, exclude])

Filter the list to only contain sample_names whose name is in sample_names.

from_json_file(filename, **kwargs)

Construct a SampleList from JSON file.

get_areas_and_scores(compound_name[, …])

Returns two dictionaries: one containing sample names and peak areas for the compound with the given name, the other containing sample names and scores.

get_areas_and_scores_for_compounds(…[, …])

Returns two dictionaries: one containing sample names and peak areas for the compounds with the given names, the other containing sample names and scores.

get_areas_for_compounds(compound_names[, …])

Returns a dictionary containing sample names and peak areas for the compounds with the given names.

get_compounds()

Returns a list containing the names of the compounds present in the samples in alphabetical order.

get_peak_areas(compound_name[, include_none])

Returns a dictionary containing sample names and peak areas for the compound with the given name.

get_retention_times(compound_name[, …])

Returns a dictionary containing sample names and retention times for the compound with the given name.

get_scores(compound_name[, include_none])

Returns a dictionary containing sample names and scores for the compound with the given name.

rename_samples(rename_mapping[, key])

Rename the samples in the list.

reorder_samples(order_mapping[, key])

Reorder the list of Samples in place.

sort_samples(key[, reverse])

Sort the list of Samples in place.

Attributes:

sample_names

Returns a list of sample names in the SampleList.

add_new_sample(*args, **kwargs)[source]

Add a new sample to the list and return the Sample object representing it.

add_sample(sample)[source]

Add a Sample object to the list.

Parameters

sample (Sample)

Return type

Sample

add_sample_from_series(series)[source]

Create a new sample object from a pandas.series and add it to the list.

Return type

Sample

Returns

The newly created Sample object.

Parameters

series (Series)

filter(sample_names, key='sample_name', exclude=False)[source]

Filter the list to only contain sample_names whose name is in sample_names.

Parameters
  • sample_names (Iterable[str]) – A list of sample names to include

  • key (str) – The name of the property in the sample to sort by. Default 'sample_name'.

  • exclude (bool) – If True, any sample whose name is in sample_names will be excluded from the output, rather than included. Default False.

Return type

~_SL

classmethod from_json_file(filename, **kwargs)[source]

Construct a SampleList from JSON file.

Parameters
Return type

~_SL

get_areas_and_scores(compound_name, include_none=False)[source]

Returns two dictionaries: one containing sample names and peak areas for the compound with the given name, the other containing sample names and scores.

Parameters
  • compound_name (str)

  • include_none (bool) – Whether samples where the compound was not found should be included in the results. Default False.

Return type

Tuple[OrderedDict, OrderedDict]

get_areas_and_scores_for_compounds(compound_names, include_none=False)[source]

Returns two dictionaries: one containing sample names and peak areas for the compounds with the given names, the other containing sample names and scores.

Parameters
  • compound_names (Iterable[str])

  • include_none (bool) – Whether samples where none of the specified compounds were found should be included in the results. Default False.

Return type

Tuple[SamplesAreaDict, SamplesScoresDict]

get_areas_for_compounds(compound_names, include_none=False)[source]

Returns a dictionary containing sample names and peak areas for the compounds with the given names.

Parameters
  • compound_names (Iterable[str])

  • include_none (bool) – Whether samples where none of the specified compounds were found should be included in the results. Default False.

Return type

SamplesAreaDict

get_compounds()[source]

Returns a list containing the names of the compounds present in the samples in alphabetical order.

Return type

List[str]

get_peak_areas(compound_name, include_none=False)[source]

Returns a dictionary containing sample names and peak areas for the compound with the given name.

Parameters
  • compound_name (str)

  • include_none (bool) – Whether samples where the compound was not found should be included in the results. Default False.

Return type

OrderedDict

get_retention_times(compound_name, include_none=False)[source]

Returns a dictionary containing sample names and retention times for the compound with the given name.

Parameters
  • compound_name (str)

  • include_none (bool) – Whether samples where the compound was not found should be included in the results. Default False.

Return type

OrderedDict

get_scores(compound_name, include_none=False)[source]

Returns a dictionary containing sample names and scores for the compound with the given name.

Parameters
  • compound_name (str)

  • include_none (bool) – Whether samples where the compound was not found should be included in the results. Default False.

Return type

OrderedDict

rename_samples(rename_mapping, key='sample_name')[source]

Rename the samples in the list.

Parameters
  • rename_mapping (Dict) – A mapping between current sample names and their new names.

  • key (str) – The name of the property in the sample to sort by. Default 'sample_name'.

Use rename_mapping=:py:obj:None or omit the sample from the rename_mapping entirely to leave the name unchanged.

For example:

rename_mapping = {
    "Propellant 1ug +ve": "Alliant Unique 1µg/L +ESI",
    "Propellant 1mg +ve": "Alliant Unique 1mg/L +ESI",
    "Propellant 1mg -ve": None,
    }
reorder_samples(order_mapping, key='sample_name')[source]

Reorder the list of Samples in place.

Parameters
  • order_mapping (Dict) –

    A mapping between sample names and their new position in the list. For example:

    order_mapping = {
        "Propellant 1ug +ve": 0,
        "Propellant 1mg +ve": 1,
        "Propellant 1ug -ve": 2,
        "Propellant 1mg -ve": 3,
        }
    

  • key (str) – The name of the property in the sample to sort by. Default 'sample_name'.

property sample_names

Returns a list of sample names in the SampleList.

Return type

List[str]

sort_samples(key, reverse=False)[source]

Sort the list of Samples in place.

Parameters
  • key (str) – The name of the property in the sample to sort by.

  • reverse (bool) – Whether the list should be sorted in reverse order. Default False.

Return type

class SamplesAreaDict[source]

Bases: BaseSamplePropertyDict

collections.OrderedDict to store area information parsed from MassHunter results CSV files.

Methods:

get_compound_areas(compound_name)

Get the peak areas for the given compound in every sample.

get_compound_areas(compound_name)[source]

Get the peak areas for the given compound in every sample.

Parameters

compound_name (str)

Return type

List[float]

class SamplesScoresDict[source]

Bases: BaseSamplePropertyDict

collections.OrderedDict to store score information parsed from MassHunter results CSV files.

Methods:

get_compound_scores(compound_name)

Get the peak scores for the given compound in every sample.

get_compound_scores(compound_name)[source]

Get the peak scores for the given compound in every sample.

Parameters

compound_name (str)

Return type

List[float]

_R = TypeVar(_R, bound=Result)

Type:    TypeVar

Invariant TypeVar bound to mh_utils.csv_parser.classes.Result.

_S = TypeVar(_S, bound=Sample)

Type:    TypeVar

Invariant TypeVar bound to mh_utils.csv_parser.classes.Sample.

_SL = TypeVar(_SL, bound=SampleList)

Type:    TypeVar

Invariant TypeVar bound to mh_utils.csv_parser.classes.SampleList.