diffpy.utils.parsers package
Various utilities related to data parsing and manipulation.
For a sample data extraction workflow, see parsers example.
diffpy.utils.parsers.loaddata module
- class diffpy.utils.parsers.loaddata.TextDataLoader(minrows=10, usecols=None, skiprows=None)[source]
Bases:
object
Smart loading of a text data with possibly multiple datasets.
Parameters
- minrows: int
Minimum number of rows in the first data block. (Default 10.)
- usecols: tuple
Which columns in our dataset to use. Ignores all other columns. If None (default), use all columns.
- skiprows
Rows in dataset to skip. (Currently not functional.)
- diffpy.utils.parsers.loaddata.loadData(filename, minrows=10, headers=False, hdel='=', hignore=None, **kwargs)[source]
Find and load data from a text file.
The data block is identified as the first matrix block of at least minrows rows and constant number of columns. This seems to work for most of the datafiles including those generated by diffpy programs.
Parameters
- filename
Name of the file we want to load data from.
- minrows: int
Minimum number of rows in the first data block. All rows must have the same number of floating point values.
- headers: bool
- when False (defualt), the function returns a numpy array of the data in the data block.
When True, the function instead returns a dictionary of parameters and their corresponding values parsed from header (information prior the data block). See hdel and hignore for options to help with parsing header information.
- hdel: str
(Only used when headers enabled.) Delimiter for parsing header information (default ‘=’). e.g. using default hdel, the line ‘parameter = p_value’ is put into the dictionary as {parameter: p_value}.
- hignore: list
(Only used when headers enabled.) Ignore header rows beginning with any elements in hignore. e.g. hignore=[’# ‘, ‘[’] causes the following lines to be skipped: ‘# qmax=10’, ‘[defaults]’.
- kwargs:
Keyword arguments that are passed to numpy.loadtxt including the following arguments below. (See numpy.loadtxt for more details.) Only pass kwargs used by numpy.loadtxt.
Useful kwargs
- comments: str, sequence of str
The characters or list of characters used to indicate the start of a comment (default ‘#’). Comment lines are ignored.
- delimiter: str
Delimiter for the data in the block (default use whitespace). For comma-separated data blocks, set delimiter to ‘,’.
- unpack: bool
Return data as a sequence of columns that allows tuple unpacking such as x, y = loadData(FILENAME, unpack=True). Note transposing the loaded array as loadData(FILENAME).T has the same effect.
- usecols:
Zero-based index of columns to be loaded, by default use all detected columns. The reading skips data blocks that do not have the usecols-specified columns.
Returns
- data_block: ndarray
A numpy array containing the found data block. (This is not returned if headers is enabled.)
- hdata: dict
If headers are enabled, return a dictionary of parameters read from the header.
diffpy.utils.parsers.serialization module
- diffpy.utils.parsers.serialization.deserialize_data(filename, filetype=None)[source]
Load a dictionary from a serial file.
Parameters
- filename
Serial file to load from.
- filetype
For specifying extension type (i.e. ‘.json’).
Returns
- dict
A dictionary read from a serial file.
- diffpy.utils.parsers.serialization.serialize_data(filename, hdata: dict, data_table, dt_colnames=None, show_path=True, serial_file=None)[source]
Serialize file data into a dictionary. Can also save dictionary into a serial language file. Dictionary is formatted as {filename: data}.
Requires hdata and data_table (can be generated by loadData).
Parameters
- filename
Name of the file whose data is being serialized.
- hdata: dict
File metadata (generally related to data table).
- data_table: list or ndarray
Data table.
- dt_colnames: list
Names of each column in data_table. Every name in data_table_cols will be put into the Dictionary as a key with a value of that column in data_table (stored as a List). Put None for columns without names. If dt_cols has less non-None entries than columns in data_table, the pair {‘data table’: data_table} will be put in the dictionary. (Default None: only entry {‘data table’: data_table} will be added to dictionary.)
- show_path: bool
include a path element in the database entry (default True). If ‘path’ is not included in hddata, extract path from filename.
- serial_file
Serial language file to dump dictionary into. If None (defualt), no dumping will occur.
Returns
- dict:
Returns the dictionary loaded from/into the updated database file.
diffpy.utils.parsers.resample module
Various utilities related to data parsing and manipulation.
- diffpy.utils.parsers.resample.resample(r, s, dr)[source]
Resample a PDF on a new grid.
This uses the Whittaker-Shannon interpolation formula to put s1 on a new grid if dr is less than the sampling interval of r1, or linear interpolation if dr is greater than the sampling interval of r1.
Parameters
- r
The r-grid used for s1.
- s
The signal to be resampled.
- dr
The new sampling interval.
Returns
Returns resampled (r, s).
- diffpy.utils.parsers.resample.wsinterp(x, xp, fp, left=None, right=None)[source]
One-dimensional Whittaker-Shannon interpolation.
This uses the Whittaker-Shannon interpolation formula to interpolate the value of fp (array), which is defined over xp (array), at x (array or float).
Parameters
- x: ndarray
Desired range for interpolation.
- xp: ndarray
Defined range for fp.
- fp: ndarray
Function to be interpolated.
- left: float
If given, set fp for x < xp[0] to left. Otherwise, if left is None (default) or not given, set fp for x < xp[0] to fp evaluated at xp[-1].
- right: float
If given, set fp for x > xp[-1] to right. Otherwise, if right is None (default) or not given, set fp for x > xp[-1] to fp evaluated at xp[-1].
Returns
- float:
If input x is a scalar (not an array), return the interpolated value at x.
- ndarray:
If input x is an array, return the interpolated array with dimensions of x.