Writing a script with diffpy.cmi

This guide provides a step-by-step walkthrough for creating a basic script using the diffpy.cmi framework. We will follow the linefit example so you can familiarize yourself with the structure of a CMI script.

If you would like, copy the linefit example to your current working directory by running,

cmi copy linefit

To demonstrate how to fit data in diffpy.cmi, we will start by trying to fit a simple line to some synthetic data.

First, lets install the plotting pack if you haven’t already done so.

cmi install plotting

Now, open up a new Python script and start by importing the necessary modules,

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import leastsq

from diffpy.srfit.fitbase import (
    FitContribution,
    FitRecipe,
    FitResults,
    Profile,
)

Generate Synthetic Data

Let’s generate some synthetic data to fit, This data will be a simple line with some added noise,

# Generate synthetic data
xobs = np.arange(-10, 10.1)
sigma_yobs = 0.3 * np.ones_like(xobs) * np.random.randn(xobs.size) # random error
yobs = 0.5 * xobs + 3 + sigma_yobs

Lets now visualize the data by plotting it,

plt.figure(figsize=(10, 6))
plt.plot(xobs, yobs, marker='o', linestyle="none", label="Observed Data")
plt.legend()
plt.show()

You should see a scatter plot that looks something like this,

`Profile`

Great, now lets begin setting up our fitting framework. We will start by creating a Profile to hold our data. This is done to encapsulate the observed data so that it can be used in the fitting process.

# Create a Profile to hold the data
linedata = Profile()
linedata.setObservedProfile(xobs, yobs, sigma_yobs)

`FitContribution`

Now it’s time to define our model in a FitContribution. A FitContribution is a container that holds the model function and its associated parameters.

Setting Fit Equation

In this case, our model is a simple linear function defined by the equation y = Ax + B,

# Define a FitContribution: linear model A*x + B
linefit = FitContribution("linefit")
linefit.setProfile(linedata)
linefit.setEquation("A * x + B")

To see the parameters we just created, we can print them out,

linefit.show()

This should output something like,

Parameters
------------------------------------------------------------------------------
x   [-10.  -9.  -8.  -7.  -6.  -5.  -4.  -3.  -2.  -1.   0.   1.   2.   3.
y   [-1.8053485  -1.56263035 -1.11627164 -0.4137478   0.68210214  0.58758801
dy  [0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
B   None
A   None

Initialize Fitting Parameters

Note that A and B are set to None since we haven’t assigned them any initial values yet. To do that, we can set them like this,

linefit.A = 3
linefit.B = 5
print("Initial A value:", linefit.A.value)
print("Initial B value:", linefit.B.value)

This should output,

Initial A value: 3
Initial B value: 5

`FitRecipe`

Now that we have our model defined and initlialized, we can create a FitRecipe. The FitRecipe is the main container that holds all the FitContributions and manages the fitting process.

# Create a FitRecipe to manage fitting
recipe = FitRecipe()

Adding `FitContribution` to `FitRecipe`

We can now add our linefit contribution to the recipe and print out the parameters,

recipe.clearFitHooks()
recipe.addContribution(linefit)
recipe.show()

This should output something like,

Parameters
------------------------------------------------------------------------------
linefit.x   [-10.  -9.  -8.  -7.  -6.  -5.  -4.  -3.  -2.  -1.   0.   1.   2.
linefit.y   [-1.67605745 -1.19031421 -1.40162608 -0.90829441  0.38607679  0.44
linefit.dy  [0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0
linefit.B   5
linefit.A   3

Now you can see that the parameters are prefixed with linefit., indicating they belong to the linefit contribution. Also, note that A and B now have initial values assigned.

Add Variables to Fit

With everything set up, we can now mark the parameters we want to refine during the fitting process. This is done using the .addVar() method of the FitRecipe,

recipe.addVar(recipe.linefit.A)
recipe.addVar(recipe.linefit.B)

To see the variables that will be refined, we can print them out,

print("recipe.A =", recipe.A)
print("recipe.A.value =", recipe.A.value)
print("recipe.values =", recipe.values)
print("recipe.names =", recipe.names)
print("recipe.residual() =", recipe.residual())

This should output something like,

recipe.A = ParameterProxy(A)
recipe.A.value = 3
recipe.values = [3 5]
recipe.names = ['A', 'B']
recipe.residual() = [-78.18212598 -67.62039727 -60.63792074 -52.12278541 -41.97888213
-34.76069913 -27.49568671 -17.88327019 -10.00997001  -1.21702776
6.79454602  15.44637505  21.39664018  32.98689369  40.26685555
47.96728622  57.81169457  65.48416201  73.74878531  81.40669851
90.08765793]

The recipe.residual() method computes the difference between the observed data and the model prediction. As you can see, the residuals are quite large at this point since we haven’t performed any fitting yet.

To better visualize the residuals, we can plot them,

ycalc = linefit.A.value * xobs + linefit.B.value

plt.figure(figsize=(10, 6))
plt.plot(xobs, yobs, marker='o', linestyle='None', label="Observed Data")
plt.plot(xobs, ycalc, label="Calculated Fit: Ax + B")
plt.legend()
plt.show()

You should see a plot that looks something like this,

As you can see, the initial fit is quite poor, which is expected since we haven’t refined the parameters yet. So… lets refine them!

Perform the Least-Squares Fit

Now we are ready to perform the least squares fitting using the leastsq function from scipy.optimize. The leastsq function minimizes the sum of squares between the observed data and the model prediction by adjusting the parameters we marked for refinement. The least-squares algorithm is given by,

\[\text{minimize} \sum_{i} \left( y_{\text{obs}, i} - y_{\text{calc}, i} \right)^2\]

This is what is known as the objective function. The objective function is a function that you want to minimize (or maximize) during the fitting process. In this case, the squared difference between the calculated and observed values is what the algorithm will minimize. In theory, our objective function could be anything. You could even customize your own! But for now, we will use least-squares to optimize our fit.

To perform a least-squares regression, we pass the residuals (recipe.residual), which represent the differences to minimize, as a function of the current parameter values (recipe.values) to leastsq.

leastsq(recipe.residual, recipe.values)
print("After leastsq:", recipe.names, "-->", recipe.values)
linefit.show()

This should output something like,

After leastsq: ['A', 'B'] --> [0.51240164 3.00692156]
Parameters
------------------------------------------------------------------------------
x   [-10.  -9.  -8.  -7.  -6.  -5.  -4.  -3.  -2.  -1.   0.   1.   2.   3.
y   [-2.33763938 -1.9018065  -0.9653354  -0.63162421  0.22997984 -0.17955301
dy  [0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
A   0.512401638015315
B   3.006921559716823

You can see that the parameters A and B have been updated to values that better fit the data. Lets print the residual sum of squares to see how well the fit performed,

print("recipe.residual() =", recipe.residual())

This should output something like,

recipe.residual() = [ 0.83916097  0.6429651  -0.76778838 -0.89453514  0.42300657  0.97663332
-1.50946072 -0.21983212 -0.3464077  -1.84471305 -0.11995115  1.14584239
1.04321154  0.92606764  0.53831321 -0.06905766 -0.02106872 -0.78551651
1.3405239  -0.01692016 -1.28047336]

As you can see, the residuals have significantly decreased, indicating a much better fit to the data.

Lets visualize the final fit by plotting the observed data and the fitted line,

ycalc = linefit.A.value * xobs + linefit.B.value

plt.figure(figsize=(10, 6))
plt.plot(xobs, yobs, marker='o', linestyle='None', label="Observed Data")
plt.plot(xobs, ycalc, label="Fitted Line: Ax + B")
plt.legend()
plt.show()

You should see a plot that looks something like this,

Plot of observed data and final fitted line

Wow! Look at that fit!

`FitResults`

To summarize the fitting results, we can create a FitResults object and pass our recipe to it,

# Create FitResults to summarize the fit
results = FitResults(recipe)
print(results)

This should output something like,

Overall
------------------------------------------------------------------------------
Residual       12.59360793
Contributions  12.59360793
Restraints     0.00000000
Chi2           12.59360793
Reduced Chi2   0.66282147
Rw             0.05372518

Variables
------------------------------------------------------------------------------
A  5.18663792e-01 +/- 1.08112496e-02
B  2.96331208e+00 +/- 6.54653670e-02

Variable Correlations greater than 25%
------------------------------------------------------------------------------
No correlations greater than 25%

The value Rw is a measure of the quality of the fit, with lower values indicating a better fit.

How to Fix Parameters

Note

Because in this example we are fitting a simple line with only two parameters, we can select any values for A and B and still get a good fit. However, we demonstrate how to fix parameters for educational purposes.

There are two ways to fix parameters in diffpy.cmi, the first one is by using the .fix() method on a parameter,

recipe.fix(B=0)
print("Free:", recipe.names, "-->", recipe.values)
print("Fixed:", recipe.fixednames, "-->", recipe.fixedvalues)

This should output something like,

Free: ['A'] --> [0.51315825]
Fixed: ['B'] --> [0]

The other way to fix parameters is by setting fixed=True when adding the variable with .addVar() to the recipe,

recipe.addVar(recipe.linefit.B, fixed=True)
print("Free:", recipe.names, "-->", recipe.values)
print("Fixed:", recipe.fixednames, "-->", recipe.fixedvalues)

This should output something like,

Free: ['A'] --> [0.49144738]
Fixed: ['B'] --> [5]

Tips for Fitting Data

Before we wrap up this tutorial, here are some tips on selecting and refining parameters for fitting.

Selecting Good Parameters

Because the leastsq regressor is a local optimization, you typically need to provide good starting values for the parameters. The process of finding good starting values is an art which will be carefully crafted over time as you gain experience with fitting data. Use your scientific intuition to select good starting values!

Refining Strategy

Oftentimes, you need to fix certain parameters during the fitting process to find a solution that better fits your data. This is a common practice in data fitting to improve the stability of the fit.

Typically, the process goes like this:

Refining Strategy

Good initial guess: Start with good initial guesses for all parameters.
Refine on Parameters: Refine all (or a subset of) parameters. If you see large correlations or instability, try again with a different initial guess or a different subset of parameters.
Fixing Parameters: Fix some parameters based on the results from step 2.
Refine again: Refine the remaining free parameters.

This process is where you spend most of your time when fitting data.

Words of Encouragement

Remember, all the computer knows how to do is minimize a value. It has no understanding of the physical meaning of the parameters. That is why we need smart people like yourself to guide the fitting process!

What’s Next?

Now that you have a basic understanding of how to write a script using diffpy.cmi, you can explore more advanced examples such as fitting a gaussian (gaussianfit example) or fitting intensity (intensityfit example) data with diffpy.cmi. To see available examples, run

cmi info examples

To copy an example to your current working directory, use the command

cmi copy <example-name>

Happy fitting!

Writing a script with diffpy.cmi

Generate Synthetic Data

Profile

FitContribution

Setting Fit Equation

Initialize Fitting Parameters

FitRecipe

Adding FitContribution to FitRecipe

Add Variables to Fit

Perform the Least-Squares Fit

FitResults

How to Fix Parameters

Tips for Fitting Data

Selecting Good Parameters

Refining Strategy

Words of Encouragement

What’s Next?

`Profile`

`FitContribution`

`FitRecipe`

Adding `FitContribution` to `FitRecipe`

`FitResults`