diffpy.utils.parsers package

Various utilities related to data parsing and manipulation.

For a sample data extraction workflow, see parsers example.

diffpy.utils.parsers.loaddata module

class diffpy.utils.parsers.loaddata.TextDataLoader(minrows=10, usecols=None, skiprows=None)[source]

Bases: object

Smart loading of a text data with possibly multiple datasets.

Parameters

minrows: int

Minimum number of rows in the first data block. (Default 10.)

usecols: tuple

Which columns in our dataset to use. Ignores all other columns. If None (default), use all columns.

skiprows

Rows in dataset to skip. (Currently not functional.)

read(filename)[source]

Open a file and run readfp.

Use if file is not already open for read byte.

readfp(fp, append=False)[source]

Get file details.

File details include:
  • File name.

  • All data blocks findable by loadData.

  • Headers (if present) for each data block. (Generally the headers contain column name information).

diffpy.utils.parsers.loaddata.isfloat(s)[source]

True if s is convertible to float.

diffpy.utils.parsers.loaddata.loadData(filename, minrows=10, headers=False, hdel='=', hignore=None, **kwargs)[source]

Find and load data from a text file.

The data block is identified as the first matrix block of at least minrows rows and constant number of columns. This seems to work for most of the datafiles including those generated by diffpy programs.

Parameters

filename

Name of the file we want to load data from.

minrows: int

Minimum number of rows in the first data block. All rows must have the same number of floating point values.

headers: bool
when False (defualt), the function returns a numpy array of the data in the data block.

When True, the function instead returns a dictionary of parameters and their corresponding values parsed from header (information prior the data block). See hdel and hignore for options to help with parsing header information.

hdel: str

(Only used when headers enabled.) Delimiter for parsing header information (default ‘=’). e.g. using default hdel, the line ‘parameter = p_value’ is put into the dictionary as {parameter: p_value}.

hignore: list

(Only used when headers enabled.) Ignore header rows beginning with any elements in hignore. e.g. hignore=[’# ‘, ‘[’] causes the following lines to be skipped: ‘# qmax=10’, ‘[defaults]’.

kwargs:

Keyword arguments that are passed to numpy.loadtxt including the following arguments below. (See numpy.loadtxt for more details.) Only pass kwargs used by numpy.loadtxt.

Useful kwargs

comments: str, sequence of str

The characters or list of characters used to indicate the start of a comment (default ‘#’). Comment lines are ignored.

delimiter: str

Delimiter for the data in the block (default use whitespace). For comma-separated data blocks, set delimiter to ‘,’.

unpack: bool

Return data as a sequence of columns that allows tuple unpacking such as x, y = loadData(FILENAME, unpack=True). Note transposing the loaded array as loadData(FILENAME).T has the same effect.

usecols:

Zero-based index of columns to be loaded, by default use all detected columns. The reading skips data blocks that do not have the usecols-specified columns.

Returns

data_block: ndarray

A numpy array containing the found data block. (This is not returned if headers is enabled.)

hdata: dict

If headers are enabled, return a dictionary of parameters read from the header.

diffpy.utils.parsers.serialization module

diffpy.utils.parsers.serialization.deserialize_data(filename, filetype=None)[source]

Load a dictionary from a serial file.

Parameters

filename

Serial file to load from.

filetype

For specifying extension type (i.e. ‘.json’).

Returns

dict

A dictionary read from a serial file.

diffpy.utils.parsers.serialization.serialize_data(filename, hdata: dict, data_table, dt_colnames=None, show_path=True, serial_file=None)[source]

Serialize file data into a dictionary. Can also save dictionary into a serial language file. Dictionary is formatted as {filename: data}.

Requires hdata and data_table (can be generated by loadData).

Parameters

filename

Name of the file whose data is being serialized.

hdata: dict

File metadata (generally related to data table).

data_table: list or ndarray

Data table.

dt_colnames: list

Names of each column in data_table. Every name in data_table_cols will be put into the Dictionary as a key with a value of that column in data_table (stored as a List). Put None for columns without names. If dt_cols has less non-None entries than columns in data_table, the pair {‘data table’: data_table} will be put in the dictionary. (Default None: only entry {‘data table’: data_table} will be added to dictionary.)

show_path: bool

include a path element in the database entry (default True). If ‘path’ is not included in hddata, extract path from filename.

serial_file

Serial language file to dump dictionary into. If None (defualt), no dumping will occur.

Returns

dict:

Returns the dictionary loaded from/into the updated database file.

diffpy.utils.parsers.resample module

Various utilities related to data parsing and manipulation.

diffpy.utils.parsers.resample.resample(r, s, dr)[source]

Resample a PDF on a new grid.

This uses the Whittaker-Shannon interpolation formula to put s1 on a new grid if dr is less than the sampling interval of r1, or linear interpolation if dr is greater than the sampling interval of r1.

Parameters

r

The r-grid used for s1.

s

The signal to be resampled.

dr

The new sampling interval.

Returns

Returns resampled (r, s).

diffpy.utils.parsers.resample.wsinterp(x, xp, fp, left=None, right=None)[source]

One-dimensional Whittaker-Shannon interpolation.

This uses the Whittaker-Shannon interpolation formula to interpolate the value of fp (array), which is defined over xp (array), at x (array or float).

Parameters

x: ndarray

Desired range for interpolation.

xp: ndarray

Defined range for fp.

fp: ndarray

Function to be interpolated.

left: float

If given, set fp for x < xp[0] to left. Otherwise, if left is None (default) or not given, set fp for x < xp[0] to fp evaluated at xp[-1].

right: float

If given, set fp for x > xp[-1] to right. Otherwise, if right is None (default) or not given, set fp for x > xp[-1] to fp evaluated at xp[-1].

Returns

float:

If input x is a scalar (not an array), return the interpolated value at x.

ndarray:

If input x is an array, return the interpolated array with dimensions of x.