Advanced structure solution: parallel process & testing multiple configurations#
You should first try the cimetidine structure solution notebook before this one.
In this notebook, you can try solving a structure while:
testing different spacegroups and adapting the crystal contents (number of independent molecules)
using multiple parallel process to go faster
This is an example of meta-structure solution, ‘meta’ meaning that instead if having a single description for the contents of you crystal (spacegroup, number of molecules, atoms or polyhedra), you can try any different combinations using python. It requires a little programming but can be very powerful when several choices for the configuration of your structure are possible.
For this to work you need to install the following packages:
multiprocess
ipywidgets
andpy3dmol
(optional)
[1]:
%matplotlib widget
import os
import sys
import io
import timeit
# Note: we need 'multiprocess' instead of standard multiprocessing because of the
# following issue (specific to ipython or notebooks):
# https://bugs.python.org/issue25053
# https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror
#
# In a python script (not in ipython or a notebook) multiprocessing could be used instead
try:
from multiprocess import Pool, current_process
except ImportError:
print("Please install `multiprocess` using 'pip', 'conda' or 'mamba' to run this notebook")
print()
sys.exit()
import pyobjcryst
import numpy as np
import matplotlib.pyplot as plt
from pyobjcryst.crystal import *
from pyobjcryst.powderpattern import *
from pyobjcryst.indexing import *
from pyobjcryst.molecule import *
from pyobjcryst.globaloptim import MonteCarlo
from pyobjcryst.io import xml_cryst_file_save_global
try:
import ipywidgets as widgets
except ImportError:
widgets = None
try:
# Get the real number of processor cores available - requires psutil
# os.sched_getaffinity is only available on some *nix platforms
import psutil
nproc = len(os.sched_getaffinity(0)) * psutil.cpu_count(logical=False) // psutil.cpu_count(logical=True)
except:
nproc = os.cpu_count()
print("Number of processors or cores available: ", nproc)
Number of processors or cores available: 8
Create powder pattern object, index & fit profile#
Same as the cimetidine structure solution notebook, so read that for details
[2]:
import os
p = PowderPattern()
data_dir = "./data"
dat_path = os.path.join(data_dir, "cime.dat")
if not os.path.exists(dat_path):
os.makedirs(data_dir, exist_ok=True)
os.system("curl -o {} https://raw.githubusercontent.com/vincefn/objcryst/master/Fox/example/tutorial-cimetidine/cime.dat".format(dat_path))
p.ImportPowderPatternFullprof(dat_path)
p.SetWavelength(1.52904)
# Index
pl = p.FindPeaks(1.5, -1, 1000, verbose=False)
if len(pl) > 20:
pl.resize(20) # Only keep 20 peaks
ex = quick_index(pl, verbose=False)
print("Indexed unit cell:")
for s in ex.GetSolutions():
print(s)
# Use solution to create a crystal
uc = ex.GetSolutions()[0][0].DirectUnitCell()
c = Crystal(uc[0], uc[1], uc[2], uc[3], uc[4], uc[5], "P1")
pdiff = p.AddPowderPatternDiffraction(c)
# Fit profile
p.SetMaxSinThetaOvLambda(0.3)
p.quick_fit_profile(auto_background=True,plot=False, init_profile=True,verbose=True)
p.quick_fit_profile(plot=False, init_profile=False, asym=True, displ_transl=True, verbose=False)
# Plot
p.plot(diff=True, fig=None, hkl=True)
print("Fit result: Rw=%6.2f%% Chi2=%10.2f GoF=%8.2f LLK=%10.3f" %
(p.rw * 100, p.chi2, p.chi2/p.GetNbPointUsed(), p.llk))
Imported powder pattern: 7699 points, 2theta= 8.010 -> 84.990, step= 0.010
Indexed unit cell:
( 6.83 18.82 10.39 90.0 106.4 90.0 V=1281 MONOCLINIC P, 130.0296630859375)
No background, adding one automatically
Selected PowderPatternDiffraction: with Crystal:
Profile fitting finished.
Remember to use SetExtractionMode(False) on the PowderPatternDiffraction object
to disable profile fitting and optimise the structure.
Fit result: Rw= 5.51% Chi2= 34095.69 GoF= 4.43 LLK= 6397.991
Find the spacegroup#
We’ll use part of the list of possible spacegroups as options to test
[3]:
p.SetMaxSinThetaOvLambda(0.2) # Important for stability of profile fit. And faster !
spgex = SpaceGroupExplorer(pdiff)
# NB:verbose C++ output does not appear in a notebook
spgex.RunAll(keep_best=True, update_display=False, fitprofile_p1=False)
for sol in spgex.GetScores():
#if sol.nGoF > 4 * spgex.GetScores()[0].nGoF:
if sol.GoF <= 2 * spgex.GetScores()[0].GoF:
print(sol)
print("Chosen spacegroup (smallest nGoF): ", c.GetSpaceGroup())
Beginning spacegroup exploration... 37 to go...
P 1 21/c 1 nGoF= 1.4866 GoF= 13.575 Rw= 6.50 [ 92 reflections, extinct446= 17]
P 1 21 1 nGoF= 1.6488 GoF= 14.108 Rw= 6.61 [101 reflections, extinct446= 2]
P 1 21/m 1 nGoF= 1.6488 GoF= 14.108 Rw= 6.61 [101 reflections, extinct446= 2]
P 1 c 1 nGoF= 1.6649 GoF= 13.922 Rw= 6.58 [ 96 reflections, extinct446= 15]
P 1 2/c 1 nGoF= 1.6649 GoF= 13.922 Rw= 6.58 [ 96 reflections, extinct446= 15]
P 1 m 1 nGoF= 1.8392 GoF= 14.458 Rw= 6.69 [105 reflections, extinct446= 0]
P 1 2/m 1 nGoF= 1.8392 GoF= 14.458 Rw= 6.69 [105 reflections, extinct446= 0]
P 1 2 1 nGoF= 1.8392 GoF= 14.458 Rw= 6.69 [105 reflections, extinct446= 0]
P 1 nGoF= 3.2428 GoF= 14.982 Rw= 6.80 [186 reflections, extinct446= 0]
P -1 nGoF= 3.2428 GoF= 14.982 Rw= 6.80 [186 reflections, extinct446= 0]
P 1 21/n 1 nGoF= 5.4155 GoF= 26.697 Rw= 9.11 [ 92 reflections, extinct446= 19]
P 1 n 1 nGoF= 5.7672 GoF= 27.063 Rw= 9.17 [ 96 reflections, extinct446= 17]
P 1 2/n 1 nGoF= 5.7672 GoF= 27.063 Rw= 9.17 [ 96 reflections, extinct446= 17]
Chosen spacegroup (smallest nGoF): P 1 21/c 1
(# 1) P 1 : Rwp= 6.80% GoF= 14.98 nGoF= 3.24 (186 reflections, 0 extinct)
(# 2) P -1 : Rwp= 6.80% GoF= 14.98 nGoF= 3.24 (186 reflections, 0 extinct) [same extinctions as:P 1]
(# 3) P 1 2 1 : Rwp= 6.69% GoF= 14.46 nGoF= 1.84 (105 reflections, 0 extinct)
(# 4) P 1 21 1 : Rwp= 6.61% GoF= 14.11 nGoF= 1.65 (101 reflections, 2 extinct)
(# 5) C 1 2 1 : Rwp= 62.70% GoF= 1246.13 nGoF= 311.68 ( 52 reflections, 84 extinct)
(# 5) A 1 2 1 : Rwp= 62.83% GoF= 1253.74 nGoF= 313.87 ( 53 reflections, 85 extinct)
(# 5) I 1 2 1 : Rwp= 60.91% GoF= 1196.43 nGoF= 246.94 ( 52 reflections, 87 extinct)
(# 6) P 1 m 1 : Rwp= 6.69% GoF= 14.46 nGoF= 1.84 (105 reflections, 0 extinct) [same extinctions as:P 1 2 1]
(# 7) P 1 c 1 : Rwp= 6.58% GoF= 13.92 nGoF= 1.66 ( 96 reflections, 15 extinct)
(# 7) P 1 n 1 : Rwp= 9.17% GoF= 27.06 nGoF= 5.77 ( 96 reflections, 17 extinct)
(# 7) P 1 a 1 : Rwp= 9.26% GoF= 27.58 nGoF= 5.97 ( 97 reflections, 14 extinct)
(# 8) C 1 m 1 : Rwp= 62.70% GoF= 1246.13 nGoF= 311.68 ( 52 reflections, 84 extinct) [same extinctions as:C 1 2 1]
(# 8) A 1 m 1 : Rwp= 62.83% GoF= 1253.74 nGoF= 313.87 ( 53 reflections, 85 extinct) [same extinctions as:A 1 2 1]
(# 8) I 1 m 1 : Rwp= 60.91% GoF= 1196.43 nGoF= 246.94 ( 52 reflections, 87 extinct) [same extinctions as:I 1 2 1]
(# 9) C 1 c 1 : Rwp= 62.51% GoF= 1236.15 nGoF= 280.76 ( 47 reflections, 93 extinct)
(# 9) A 1 n 1 : Rwp= 62.99% GoF= 1258.62 nGoF= 291.60 ( 49 reflections, 93 extinct)
(# 9) I 1 a 1 : Rwp= 59.00% GoF= 1120.91 nGoF= 221.05 ( 48 reflections, 93 extinct)
(# 9) A 1 a 1 : Rwp= 62.99% GoF= 1258.62 nGoF= 291.60 ( 49 reflections, 93 extinct) [same extinctions as:A 1 n 1]
(# 9) C 1 n 1 : Rwp= 62.51% GoF= 1236.15 nGoF= 280.76 ( 47 reflections, 93 extinct) [same extinctions as:C 1 c 1]
(# 9) I 1 c 1 : Rwp= 59.00% GoF= 1120.91 nGoF= 221.05 ( 48 reflections, 93 extinct) [same extinctions as:I 1 a 1]
(# 10) P 1 2/m 1 : Rwp= 6.69% GoF= 14.46 nGoF= 1.84 (105 reflections, 0 extinct) [same extinctions as:P 1 2 1]
(# 11) P 1 21/m 1 : Rwp= 6.61% GoF= 14.11 nGoF= 1.65 (101 reflections, 2 extinct) [same extinctions as:P 1 21 1]
(# 12) C 1 2/m 1 : Rwp= 62.70% GoF= 1246.13 nGoF= 311.68 ( 52 reflections, 84 extinct) [same extinctions as:C 1 2 1]
(# 12) A 1 2/m 1 : Rwp= 62.83% GoF= 1253.74 nGoF= 313.87 ( 53 reflections, 85 extinct) [same extinctions as:A 1 2 1]
(# 12) I 1 2/m 1 : Rwp= 60.91% GoF= 1196.43 nGoF= 246.94 ( 52 reflections, 87 extinct) [same extinctions as:I 1 2 1]
(# 13) P 1 2/c 1 : Rwp= 6.58% GoF= 13.92 nGoF= 1.66 ( 96 reflections, 15 extinct) [same extinctions as:P 1 c 1]
(# 13) P 1 2/n 1 : Rwp= 9.17% GoF= 27.06 nGoF= 5.77 ( 96 reflections, 17 extinct) [same extinctions as:P 1 n 1]
(# 13) P 1 2/a 1 : Rwp= 9.26% GoF= 27.58 nGoF= 5.97 ( 97 reflections, 14 extinct) [same extinctions as:P 1 a 1]
(# 14) P 1 21/c 1 : Rwp= 6.50% GoF= 13.58 nGoF= 1.49 ( 92 reflections, 17 extinct)
(# 14) P 1 21/n 1 : Rwp= 9.11% GoF= 26.70 nGoF= 5.42 ( 92 reflections, 19 extinct)
(# 14) P 1 21/a 1 : Rwp= 9.20% GoF= 27.22 nGoF= 5.61 ( 93 reflections, 16 extinct)
(# 15) C 1 2/c 1 : Rwp= 62.51% GoF= 1236.15 nGoF= 280.76 ( 47 reflections, 93 extinct) [same extinctions as:C 1 c 1]
(# 15) A 1 2/n 1 : Rwp= 62.99% GoF= 1258.62 nGoF= 291.60 ( 49 reflections, 93 extinct) [same extinctions as:A 1 n 1]
(# 15) I 1 2/a 1 : Rwp= 59.00% GoF= 1120.91 nGoF= 221.05 ( 48 reflections, 93 extinct) [same extinctions as:I 1 a 1]
(# 15) A 1 2/a 1 : Rwp= 62.99% GoF= 1258.62 nGoF= 291.60 ( 49 reflections, 93 extinct) [same extinctions as:A 1 n 1]
(# 15) C 1 2/n 1 : Rwp= 62.51% GoF= 1236.15 nGoF= 280.76 ( 47 reflections, 93 extinct) [same extinctions as:C 1 c 1]
(# 15) I 1 2/c 1 : Rwp= 59.00% GoF= 1120.91 nGoF= 221.05 ( 48 reflections, 93 extinct) [same extinctions as:I 1 a 1]
Restoring best spacegroup: P 1 21/c 1
Setup the ‘meta’-structure solution#
In the spacegroup search above the first solution P 1 21/c 1
actually is the correct one, and with its multiplicity (4) only requires a single independent cimetidine molecule in the asymmetric unit cell. But let’s proceed as if we did not actually know that.
What we do is:
download the cimetidine z-matrix
create a list of the possible spacegroups that we want to test
create a function which, given a spacegroup, will :
apply the spacegroup to the crystal
determine the appropriate number of independent molecules (as a function of the multiplicity)
optimise for 5 runs and 2 million trials
return the solutions using the XML string of the Crystal
use
multiprocessing
ormultiprocess
to test the different spacegroups in parallel with multiple processor cores
Note: this is a generic approach - it would be possible also to completely change the contents of the crystal, e.g. if the number of independent units (atoms, molecule, polyhedra) was unknown, as is often the case for inorganic or metallic structures due to special positions
[ ]:
import os
# Get the cimetidine z-matrix
fhz_path = os.path.join(data_dir, "cime.fhz")
if not os.path.exists(fhz_path):
os.makedirs(data_dir, exist_ok=True)
os.system("curl -o {} https://raw.githubusercontent.com/vincefn/objcryst/master/Fox/example/tutorial-cimetidine/cime.fhz".format(fhz_path))
# Disable dynamical occupancy correction (no special position expected, usual in organic structures)
c.GetOption(1).SetChoice(0)
# Create the Monte-Carlo object with the crystal and the powder pattern
mc = MonteCarlo()
mc.AddRefinableObj(c)
mc.AddRefinableObj(p)
mc.GetOption("Automatic Least Squares Refinement").SetChoice(2)
nb_step = 1e6 # Increase this for more complex problems - it's ok if the spacegroup is correct (1 independent molecule)
# Disable profile fitting for the powder pattern (or nothing gets optimised !)
pdiff.SetExtractionMode(False)
def solve_for_spacegroup(spgname):
# Set spacegroup
spg = c.GetSpaceGroup()
spg.ChangeSpaceGroup(spgname)
# Multiplicity
mult = spg.GetNbSymmetrics()
# Empty Crystal of existing scatterers (should not be necessary here)
for i in range(c.GetNbScatterer()):
c.RemoveScatterer(c.GetScatterer(0))
# Add 4/mult independent cimetidine molecules
nb_mol = 4//mult
for i in range(nb_mol):
m = ImportFenskeHallZMatrix(c, fhz_path)
# Disable all display update if not in the Main process-or strange things happen !
if current_process().name != 'MainProcess':
# we must do that for the crystal, monte-carlo and powder pattern object
for o in (c, mc, p):
o.disable_display_update()
# Run the parallel tempering optimisation
# We could run multiple times the optimisation (using nb_run), but it is more
# efficient to distribute that using the multiprocessing pool
nb_run = 1
t0 = timeit.default_timer()
for i in range(nb_run):
mc.MultiRunOptimize(nb_run=1, nb_step=nb_step)
dt = timeit.default_timer() - t0
print("Spacegroup: %12s LLK: %12.2f dt=%3.0fs" % # /run (%3.0fs remaining)
(spgname,mc.GetLogLikelihood(), dt,)) # dt/(i+1) * (nb_run-i-1)
# Extract all solutions as the XML output of the crystal
vsol = []
for i in range(mc.GetNbParamSet()-nb_run, mc.GetNbParamSet()):
mc.RestoreParamSet(i, update_display=False)
s = c.xml()
llk = mc.GetLogLikelihood()
vsol.append({'xml': s, 'llk': llk, 'spg': spgname, 'nb_mol': nb_mol})
return vsol
Run the tests: this will take a little while (5 to 20 minutes for each spacegroup, in parallel processes), and is longer for spacegroups with lower multiplicity and not centrosymmetric (more independent atoms).
[5]:
# List of spacegroups to test (this can be larger than the number
# of available processor cores, process will loop over possible spacegroups)
v_spacegroup = ["P 1 21/c 1", "P 1 2/c 1", "P 1 c 1", "P 1 21 1",
"P 1 2 1", "P 1 m 1", "P 1 21/m 1", "P -1"] * 5
# Use a multiprocess pool to solve in parallel - this should use all available processor cores
print("Solving structures in // - this will take a little while, be patient !")
with Pool(nproc) as pool:
res = pool.map(solve_for_spacegroup, v_spacegroup)
# Merge all the results
vsol = []
for v in res:
vsol += v
Solving structures in // - this will take a little while, be patient !
Spacegroup: P 1 21/c 1 LLK: 18495.13 dt= 30s
Spacegroup: P 1 21/c 1 LLK: 18497.70 dt= 30s
Spacegroup: P 1 21/m 1 LLK: 256613.45 dt= 31s
Spacegroup: P 1 21/m 1 LLK: 297780.55 dt= 31s
Spacegroup: P 1 2/c 1 LLK: 335891.83 dt= 30s
Spacegroup: P 1 2/c 1 LLK: 288920.12 dt= 30s
Spacegroup: P 1 c 1 LLK: 91573.42 dt= 79s
Spacegroup: P 1 c 1 LLK: 72883.09 dt= 81s
Spacegroup: P 1 2 1 LLK: 236363.10 dt= 89s
Spacegroup: P 1 21/c 1 LLK: 18498.68 dt= 30s
Spacegroup: P 1 2 1 LLK: 180490.60 dt= 92s
Spacegroup: P -1 LLK: 142046.67 dt= 67s
Spacegroup: P -1 LLK: 137882.23 dt= 68s
Spacegroup: P 1 2/c 1 LLK: 268907.40 dt= 30s
Spacegroup: P 1 21/m 1 LLK: 254364.25 dt= 31s
Spacegroup: P 1 c 1 LLK: 130411.66 dt= 80s
Spacegroup: P 1 21/c 1 LLK: 18495.45 dt= 30s
Spacegroup: P 1 21 1 LLK: 95965.21 dt= 78s
Spacegroup: P 1 21 1 LLK: 104310.84 dt= 79s
Spacegroup: P 1 m 1 LLK: 313023.54 dt= 87s
Spacegroup: P 1 m 1 LLK: 291594.11 dt= 87s
Spacegroup: P 1 2/c 1 LLK: 318014.98 dt= 30s
Spacegroup: P 1 2 1 LLK: 255142.75 dt= 86s
Spacegroup: P -1 LLK: 111644.89 dt= 67s
Spacegroup: P 1 21/m 1 LLK: 273165.66 dt= 31s
Spacegroup: P 1 21/c 1 LLK: 18494.21 dt= 29s
Spacegroup: P 1 21 1 LLK: 93373.67 dt= 84s
Spacegroup: P 1 2/c 1 LLK: 338329.99 dt= 30s
Spacegroup: P 1 c 1 LLK: 93566.20 dt= 82s
Spacegroup: P 1 2 1 LLK: 203632.21 dt= 87s
Spacegroup: P 1 21/m 1 LLK: 262459.32 dt= 30s
Spacegroup: P 1 c 1 LLK: 100817.93 dt= 78s
Spacegroup: P 1 m 1 LLK: 345530.12 dt= 85s
Spacegroup: P -1 LLK: 121915.00 dt= 66s
Spacegroup: P 1 2 1 LLK: 224594.45 dt= 83s
Spacegroup: P 1 21 1 LLK: 77726.30 dt= 67s
Spacegroup: P -1 LLK: 126293.48 dt= 57s
Spacegroup: P 1 m 1 LLK: 210979.84 dt= 72s
Spacegroup: P 1 21 1 LLK: 95344.71 dt= 67s
Spacegroup: P 1 m 1 LLK: 243392.35 dt= 63s
Look at solutions#
If you have ipywidgets, just use the dropdown menu to select available solutions
[6]:
# restore the best solution (first in list)
c.XMLInput(vsol[0]['xml'])
# sort solutions as a function of the log-likelihood
vsol = sorted(vsol, key=lambda sol: sol['llk'])
if widgets is not None:
v = []
for i in range(len(vsol)):
sol = vsol[i]
v.append("#%2d %12s : %d mol LLK=%12.2f"%(i, sol['spg'], sol['nb_mol'], sol['llk']))
w = widgets.Dropdown(options=v, description='Solutions:', disabled=False,)
def show_solution(solution):
i = v.index(solution)
# Crystal display is automatically updated when loaded
c.XMLInput(vsol[i]['xml'])
# Update powder pattern display manually
p.FitScaleFactorForIntegratedRw()
p.UpdateDisplay()
widgets.interact(show_solution, solution=v)
else:
# print solutions
for i in range(len(vsol)):
sol = vsol[i]
print("#%2d %12s : %d mol LLK=%12.2f"%(i, sol['spg'], sol['nb_mol'], sol['llk']))
# displays - requires ipywidgets for crystal 3D display
display(c.widget_3d())
p.plot(fig=None,diff=True,hkl=True)
XML: Loading Crystal:
XML: Loading Crystal:(spg:P 1 21/c 1)
Input ScatteringPowerAtom:C(C)
Input ScatteringPowerAtom:N(N)
Input ScatteringPowerAtom:S(S)
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Try some other solutions#
If ipywidgets is not shown in the previous cell, load solutions manually
[7]:
# Crystal display is automatically updated when loaded
c.XMLInput(vsol[2]['xml'])
# Update powder pattern display manually
p.UpdateDisplay()
XML: Loading Crystal:
XML: Loading Crystal:(spg:P 1 21/c 1)
Input ScatteringPowerAtom:C(C)
Input ScatteringPowerAtom:N(N)
Input ScatteringPowerAtom:S(S)
Save the selected result to CIF and Fox (.xmlgz) formats#
All the solutions (or just the best ones) could be saved automatically that way by looping over the results
[8]:
# Save result so it can be opened by Fox
xml_cryst_file_save_global('result.xmlgz')
# Also export to the CIF format
c.CIFOutput("result.cif")
[ ]: