diffpy.utils.parsers package

Various utilities related to data parsing and manipulation.

Submodules

diffpy.utils.parsers.loaddata module

class diffpy.utils.parsers.loaddata.TextDataLoader(minrows=10, usecols=None, skiprows=None)[source]

Bases: object

Smart loading of a text data with possibly multiple datasets.

Parameters:

minrows (int) – Minimum number of rows in the first data block. (Default 10.)
usecols (tuple) – Which columns in our dataset to use. Ignores all other columns. If None (default), use all columns.
skiprows – Rows in dataset to skip. (Currently not functional.)

read(filename)[source]

Open a file and run readfp.

Use if file is not already open for read byte.

readfp(fp, append=False)[source]

Get file details.

File details include:

File name.
All data blocks findable by loadData.
Headers (if present) for each data block. (Generally the headers contain column name information).

diffpy.utils.parsers.loaddata.loadData(filename, minrows=10, headers=False, hdel='=', hignore=None, **kwargs)[source]

Find and load data from a text file.

The data block is identified as the first matrix block of at least minrows rows and constant number of columns. This seems to work for most of the datafiles including those generated by diffpy programs.

Parameters:

filename – Name of the file we want to load data from.
minrows (int) – Minimum number of rows in the first data block. All rows must have the same number of floating point values.
headers (bool) – when False (default), the function returns a numpy array of the data in the data block. When True, the function instead returns a dictionary of parameters and their corresponding values parsed from header (information prior the data block). See hdel and hignore for options to help with parsing header information.
hdel (str) – (Only used when headers enabled.) Delimiter for parsing header information (default ‘=’). e.g. using default hdel, the line ‘ parameter = p_value’ is put into the dictionary as {parameter: p_value}.
hignore (list) – (Only used when headers enabled.) Ignore header rows beginning with any elements in hignore. e.g. hignore=[’# ‘, ‘[’] causes the following lines to be skipped: ‘# qmax=10’, ‘[defaults]’.
kwargs (Useful) – Keyword arguments that are passed to numpy.loadtxt including the following arguments below. (See numpy.loadtxt for more details.) Only pass kwargs used by numpy.loadtxt.
kwargs
=============
comments (str, sequence of str) – The characters or list of characters used to indicate the start of a comment (default ‘#’). Comment lines are ignored.
delimiter (str) – Delimiter for the data in the block (default use whitespace). For comma-separated data blocks, set delimiter to ‘,’.
unpack (bool) – Return data as a sequence of columns that allows tuple unpacking such as x, y = loadData(FILENAME, unpack=True). Note transposing the loaded array as loadData(FILENAME).T has the same effect.
usecols – Zero-based index of columns to be loaded, by default use all detected columns. The reading skips data blocks that do not have the usecols- specified columns.

Returns:

data_block (ndarray) – A numpy array containing the found data block. (This is not returned if headers is enabled.)
hdata (dict) – If headers are enabled, return a dictionary of parameters read from the header.

diffpy.utils.parsers.custom_exceptions module

exception diffpy.utils.parsers.custom_exceptions.ImproperSizeError(bad_object, message=None)[source]

Bases: Exception

When the size of an object does not match expectations.

Parameters:

bad_object – Object with improper size.
message (str) – Overwrites default message.

exception diffpy.utils.parsers.custom_exceptions.UnsupportedTypeError(file, supported_types=None, message=None)[source]

Bases: Exception

For file types not supported by our parsers.

Parameters:

file – Name of file triggering the error.
supported_types (list) – Supported file types.
message (str) – Overwrites default message.

diffpy.utils.parsers.serialization module

diffpy.utils.parsers.serialization.deserialize_data(filename, filetype=None)[source]

Load a dictionary from a serial file.

Parameters:

filename – Serial file to load from.
filetype – For specifying extension type (i.e. ‘.json’).

Returns:

A dictionary read from a serial file.

Return type:

dict

diffpy.utils.parsers.serialization.serialize_data(filename, hdata: dict, data_table, dt_colnames=None, show_path=True, serial_file=None)[source]

Serialize file data into a dictionary. Can also save dictionary into a serial language file. Dictionary is formatted as {filename: data}.

Requires hdata and data_table (can be generated by loadData).

Parameters:

filename – Name of the file whose data is being serialized.
hdata (dict) – File metadata (generally related to data table).
data_table (list or ndarray) – Data table.
dt_colnames (list) – Names of each column in data_table. Every name in data_table_cols will be put into the Dictionary as a key with a value of that column in data_table (stored as a List). Put None for columns without names. If dt_cols has less non-None entries than columns in data_table, the pair {‘data table’: data_table} will be put in the dictionary. (Default None: only entry {‘data table’: data_table} will be added to dictionary.)
show_path (bool) – include a path element in the database entry (default True). If ‘path’ is not included in hddata, extract path from filename.
serial_file – Serial language file to dump dictionary into. If None (default), no dumping will occur.

Returns:

Returns the dictionary loaded from/into the updated database file.

Return type:

dict