diffpy.snmf package

A python package implementing the stretched NMF algorithm.

Submodules

diffpy.snmf.subroutines module

diffpy.snmf.subroutines.construct_component_matrix(components)[source]

Constructs the component matrix.

Parameters:

components (tuple of ComponentSignal objects) – The tuple containing the component signals in ComponentSignal objects.

Returns:

The matrix containing the component signal values. Has dimensions signal_length x number_of_components.

Return type:

2d array

diffpy.snmf.subroutines.construct_stretching_matrix(components, number_of_components, number_of_signals)[source]

Constructs the stretching factor matrix.

Parameters:
  • components (tuple of ComponentSignal objects) – The tuple containing the component signals in ComponentSignal objects.

  • number_of_signals (int) – The number of signals in the data provided by the user.

Returns:

The matrix containing the stretching factors for the component signals for each of the signals in the raw data. Has dimensions component_signal x number_of_signals

Return type:

2d array

diffpy.snmf.subroutines.construct_weight_matrix(components)[source]

Constructs the weights matrix.

Constructs a Ķ x M matrix where K is the number of components and M is the number of signals. Each element is the stretching factor for a specific weights for a specific signal from the data input.

Parameters:

components (tuple of ComponentSignal objects) – The tuple containing the component signals.

Returns:

The 2d array containing the weightings for each component for each signal.

Return type:

2d array like

diffpy.snmf.subroutines.get_residual_matrix(component_matrix, weights_matrix, stretching_matrix, data_input, moment_amount, component_amount, signal_length)[source]

Obtains the residual matrix between the experimental data and calculated data.

Calculates the difference between the experimental data and the reconstructed experimental data created from the calculated components, weights, and stretching factors. For each experimental pattern, the stretched and weighted components making up that pattern are subtracted.

Parameters:
  • component_matrix (2d array like) – The matrix containing the calculated component signals. Has dimensions N x K where N is the length of the signal and K is the number of calculated component signals.

  • weights_matrix (2d array like) – The matrix containing the calculated weights of the stretched component signals. Has dimensions K x M where K is the number of components and M is the number of moments or experimental PDF/XRD patterns.

  • stretching_matrix (2d array like) – The matrix containing the calculated stretching factors of the calculated component signals. Has dimensions K x M where K is the number of components and M is the number of moments or experimental PDF/XRD patterns.

  • data_input (2d array like) – The matrix containing the experimental PDF/XRD data. Has dimensions N x M where N is the length of the signals and M is the number of signal patterns.

  • moment_amount (int) – The number of patterns in the experimental data. Represents the number of moments in time in the data series

  • component_amount (int) – The number of component signals the user would like to obtain from the experimental data.

  • signal_length (int) – The length of the signals in the experimental data.

Returns:

The matrix containing the residual between the experimental data and reconstructed data from calculated values. Has dimensions N x M where N is the signal length and M is the number of moments. Each column contains the difference between an experimental signal and a reconstruction of that signal from the calculated weights, components, and stretching factors.

Return type:

2d array like

diffpy.snmf.subroutines.get_stretched_component(stretching_factor, component, signal_length)[source]

Applies a stretching factor to a component signal.

Computes a stretched signal and reinterpolates it onto the original grid of points. Uses a normalized grid of evenly spaced integers counting from 0 to signal_length (exclusive) to approximate values in between grid nodes. Once this grid is stretched, values at grid nodes past the unstretched signal’s domain are set to zero. Returns the approximate values of x(r/a) from x(r) where x is a component signal.

Parameters:
  • stretching_factor (float) – The stretching factor of a component signal at a particular moment.

  • component (1d array like) – The calculated component signal without stretching or weighting. Has length N, the length of the signal.

  • signal_length (int) – The length of the component signal.

Returns:

The calculated component signal with stretching factors applied. Has length N, the length of the unstretched component signal. Also returns the gradient and hessian of the stretching transformation.

Return type:

tuple of 1d array of floats

diffpy.snmf.subroutines.initialize_arrays(number_of_components, number_of_moments, signal_length)[source]

Generates the initial guesses for the weight, stretching, and component matrices.

Calculates the initial guesses for the component matrix, stretching factor matrix, and weight matrix. The initial guess for the component matrix is a random (signal_length) x (number_of_components) matrix where each element is between 0 and 1. The initial stretching factor matrix is a random (number_of_components) x (number_of_moments) matrix where each element is number slightly perturbed from 1. The initial weight matrix guess is a random (number_of_components) x (number_of_moments) matrix where each element is between 0 and 1.

Parameters:
  • number_of_components (int) – The number of component signals to obtain from the stretched nmf decomposition.

  • number_of_moments (int) – The number of signals in the user provided dataset where each signal is at a different moment.

  • signal_length (int) – The length of each signal in the user provided dataset.

Returns:

The tuple containing three elements: the initial component matrix guess, the initial stretching factor matrix guess, and the initial weight factor matrix guess in that order.

Return type:

tuple of 2d arrays of floats

diffpy.snmf.subroutines.initialize_components(number_of_components, number_of_signals, grid_vector)[source]

Initializes ComponentSignals for each of the components in the decomposition.

Parameters:
  • number_of_components (int) – The number of component signals in the NMF decomposition

  • number_of_signals (int)

  • grid_vector (1d array) – The grid of the user provided signals.

Returns:

The tuple containing number_of_components of initialized ComponentSignal objects.

Return type:

tuple of ComponentSignal objects

diffpy.snmf.subroutines.lift_data(data_input, lift=1)[source]

Lifts values of data_input.

Adds ‘lift’ * the minimum value in data_input to data_input element-wise.

Parameters:
  • data_input (2d array like) – The matrix containing a series of signals to be decomposed. Has dimensions N x M where N is the length of each signal and M is the number of signals.

  • lift (float) – The factor representing how much to lift ‘data_input’.

Returns:

The matrix that contains data_input - (min(data_input) * lift).

Return type:

2d array like

diffpy.snmf.subroutines.objective_function(residual_matrix, stretching_factor_matrix, smoothness, smoothness_term, component_matrix, sparsity)[source]

Defines the objective function of the algorithm and returns its value.

Calculates the value of ‘(||residual_matrix||_F) ** 2 + smoothness * (||smoothness_term * stretching_factor_matrix.T||)**2 + sparsity * sum(component_matrix ** .5)’ and returns its value.

Parameters:
  • residual_matrix (2d array like) – The matrix where each column is the difference between an experimental PDF/XRD pattern and a calculated PDF/XRD pattern at each grid point. Has dimensions R x M where R is the length of each pattern and M is the amount of patterns.

  • stretching_factor_matrix (2d array like) – The matrix containing the stretching factors of the calculated component signal. Has dimensions K x M where K is the amount of components and M is the number of experimental PDF/XRD patterns.

  • smoothness (float) – The coefficient of the smoothness term which determines the intensity of the smoothness term and its behavior. It is not very sensitive and is usually adjusted by multiplying it by ten.

  • smoothness_term (2d array like) – The regularization term that ensures that smooth changes in the component stretching signals are favored. Has dimensions (M-2) x M where M is the amount of experimentally obtained PDF/XRD patterns, the moment amount.

  • component_matrix (2d array like) – The matrix containing the calculated component signals of the experimental PDF/XRD patterns. Has dimensions R x K where R is the signal length and K is the number of component signals.

  • sparsity (float) – The parameter determining the intensity of the sparsity regularization term which enables the algorithm to exploit the sparse nature of XRD data. It is usually adjusted by doubling.

Returns:

The value of the objective function.

Return type:

float

diffpy.snmf.subroutines.reconstruct_data(components)[source]

Reconstructs the input_data matrix.

Reconstructs the input_data matrix from calculated component signals, weights, and stretching factors.

Parameters:

components (tuple of ComponentSignal objects) – The tuple containing the component signals.

Returns:

The 2d array containing the reconstruction of input_data.

Return type:

2d array

diffpy.snmf.subroutines.reconstruct_signal(components, signal_idx)[source]

Reconstructs a specific signal from its weighted and stretched components.

Calculates the linear combination of stretched components where each term is the stretched component multiplied by its weight factor.

Parameters:
  • components (tuple of ComponentSignal objects) – The tuple containing the ComponentSignal objects

  • signal_idx (int) – The index of the specific signal in the input data to be reconstructed

Returns:

The reconstruction of a signal from calculated weights, stretching factors, and iq values.

Return type:

1d array like

diffpy.snmf.subroutines.update_weights(components, data_input, method=None)[source]

Updates the weights matrix.

Updates the weights matrix and the weights vector for each ComponentSignal object.

Parameters:
  • components (tuple of ComponentSignal objects) – The tuple containing the component signals.

  • method (str) – The string specifying which method should be used to find a new weight matrix: non-negative least squares or a quadratic program.

  • data_input (2d array) – The 2d array containing the user-provided signals.

Returns:

The 2d array containing the weight factors for each component for each signal from data_input. Has dimensions K x M where K is the number of components and M is the number of signals in data_input.

Return type:

2d array

diffpy.snmf.subroutines.update_weights_matrix(component_amount, signal_length, stretching_factor_matrix, component_matrix, data_input, moment_amount, weights_matrix, method)[source]

Update the weight factors matrix.

Parameters:
  • component_amount (int) – The number of component signals the user would like to determine from the experimental data.

  • signal_length (int) – The length of the experimental signal patterns

  • stretching_factor_matrix (2d array like) – The matrx containing the stretching factors of the calculated component signals. Has dimensions K x M where K is the number of component signals and M is the number of XRD/PDF patterns.

  • component_matrix (2d array lik) – The matrix containing the unstretched calculated component signals. Has dimensions N x K where N is the length of the signals and K is the number of component signals.

  • data_input (2d array like) – The experimental series of PDF/XRD patterns. Has dimensions N x M where N is the length of the PDF/XRD signals and M is the number of PDF/XRD patterns.

  • moment_amount (int) – The number of PDF/XRD patterns from the experimental data.

  • weights_matrix (2d array like) – The matrix containing the weights of the stretched component signals. Has dimensions K x M where K is the number of component signals and M is the number of XRD/PDF patterns.

  • method (str) – The string specifying the method for obtaining individual weights.

Returns:

The matrix containing the new weight factors of the stretched component signals.

Return type:

2d array like

diffpy.snmf.containers module

class diffpy.snmf.containers.ComponentSignal(grid, number_of_signals, id_number, perturbation=0.001)[source]

Bases: object

grid

The vector containing the grid points of the component.

Type:

1d array of floats

iq

The intensity/g(r) values of the component.

Type:

1d array of floats

weights

The vector containing the weight of the component signal for each signal.

Type:

1d array of floats

stretching_factors

The vector containing the stretching factor for the component signal for each signal.

Type:

1d array of floats

id

The number identifying the component.

Type:

int

apply_stretch(m)[source]

Applies a stretching factor to a component

Parameters:

m (int) – The index specifying which stretching factor to apply

Returns:

The tuple of vectors where one vector is the stretched component, one vector is the 1st derivative of the stretching operation, and one vector is the second derivative of the stretching operation.

Return type:

tuple of 1d arrays

apply_weight(m, stretched_component=None)[source]

Applies as weight factor to a component signal.

Parameters:
  • m (int) – The index specifying with weight to apply

  • stretched_component (1d array) – The 1d array containing a stretched component.

Returns:

The vector containing a component signal or stretched component signal with a weight factor applied.

Return type:

1d array

diffpy.snmf.io module

diffpy.snmf.io.initialize_variables(data_input, number_of_components, data_type, sparsity=1, smoothness=1e+18)[source]

Determines the variables and initial values used in the SNMF algorithm.

Parameters:
  • data_input (2d array like) – The observed or simulated PDF or XRD data provided by the user. Has dimensions R x N where R is the signa length and N is the number of PDF/XRD signals.

  • number_of_components (int) – The number of component signals the user would like to decompose ‘data_input’ into.

  • data_type (str) – The type of data the user has passed into the program. Can assume the value of ‘PDF’ or ‘XRD.’

  • sparsity (float, optional) – The regularization parameter that behaves as the coefficient of a “sparseness” regularization term that enhances the ability to decompose signals in the case of sparse data e.g. X-ray Diffraction data. A non-zero value indicates sparsity in the data; greater magnitudes indicate greater amounts of sparsity.

  • smoothness (float, optional) – The regularization parameter that behaves as the coefficient of a “smoothness” term that ensures that component signal weightings change smoothly with time. Assumes a default value of 1e18.

Returns:

The collection of the names and values of the constants used in the algorithm. Contains the number of observed PDF/XRD patterns, the length of each pattern, the type of the data, the number of components the user would like to decompose the data into, an initial guess for the component matrix, and initial guess for the weight factor matrix, an initial guess for the stretching factor matrix, a parameter controlling smoothness of the solution, a parameter controlling sparseness of the solution, the matrix representing the smoothness term, and a matrix used to construct a hessian matrix.

Return type:

dictionary

diffpy.snmf.io.load_input_signals(file_path=None)[source]

Processes a directory of a series of PDF/XRD patterns into a usable format.

Constructs a 2d array out of a directory of PDF/XRD patterns containing each files dependent variable column in a new column. Constructs a 1d array containing the grid values.

Parameters:

file_path (str or Path object, optional) – The path to the directory containing the input XRD/PDF data. If no path is specified, defaults to the current working directory. Accepts a string or a pathlib.Path object. Input data not on the same grid as the first file read will be ignored.

Returns:

The tuple whose first element is an R x M 2d array made of PDF/XRD patterns as each column; R is the length of the signal and M is the number of patterns. The tuple contains a 1d array containing the values of the grid points as its second element; Has length R.

Return type:

tuple

diffpy.snmf.polynomials module

diffpy.snmf.polynomials.rooth(linear_coefficient, constant_term)[source]

Returns the largest real root of x^3+(linear_coefficient) * x + constant_term. If there are no real roots return 0.

Parameters:
  • linear_coefficient (nd array like of floats) – The matrix coefficient of the linear term

  • constant_term (0d array like, 1d array like of floats or scalar) – The constant scalar term of the problem

Returns:

The largest real root of x^3+(linear_coefficient) * x + constant_term if roots are real, else return 0 array

Return type:

ndarray of floats

diffpy.snmf.optimizers module

diffpy.snmf.optimizers.get_weights(stretched_component_gram_matrix, linear_coefficient, lower_bound, upper_bound)[source]

Finds the weights of stretched component signals under a two-sided constraint

Solves min J(y) = (linear_coefficient)’ * y + (1/2) * y’ * (quadratic coefficient) * y where lower_bound <= y <= upper_bound and stretched_component_gram_matrix is symmetric positive definite. Finds the weightings of stretched component signals under a two-sided constraint.

Parameters:
  • stretched_component_gram_matrix (2d array like) – The Gram matrix constructed from the stretched component matrix. It is a square positive definite matrix. It has dimensions C x C where C is the number of component signals. Must be symmetric positive definite.

  • linear_coefficient (1d array like) – The vector containing the product of the stretched component matrix and the transpose of the observed data matrix. Has length C.

  • lower_bound (1d array like) – The lower bound on the values of the output weights. Has the same dimensions of the function output. Each element in ‘lower_bound’ determines the minimum value the corresponding element in the function output may take.

  • upper_bound (1d array like) – The upper bound on the values of the output weights. Has the same dimensions of the function output. Each element in ‘upper_bound’ determines the maximum value the corresponding element in the function output may take.

Returns:

The vector containing the weightings of the components needed to reconstruct a given input signal from the input set. Has length C

Return type:

1d array like

diffpy.snmf.factorizers module

diffpy.snmf.factorizers.lsqnonneg(stretched_component_matrix, target_signal)[source]

Finds the weights of stretched component signals under one-sided constraint.

Solves argmin_x || Ax - b ||_2 for x>=0 where A is the stretched_component_matrix and b is the target_signal vector. Finds the weights of component signals given undecomposed signal data and stretched components under a one-sided constraint on the weights.

Parameters:
  • stretched_component_matrix (2d array like) – The component matrix where each column contains a stretched component signal. Has dimensions R x C where R is the length of the signal and C is the number of components. Does not need to be nonnegative. Corresponds with ‘A’ from the objective function.

  • target_signal (1d array like) – The signal that is used as reference against which weight factors will be determined. Any column from the matrix of the entire, unfactorized input data could be used. Has length R. Does not need to be nonnegative. Corresponds with ‘b’ from the objective function.

Returns:

The vector containing component signal weights at a moment. Has length C.

Return type:

1d array like

diffpy.snmf.stretchednmfapp module

diffpy.snmf.stretchednmfapp.create_parser()[source]
diffpy.snmf.stretchednmfapp.main()[source]