diffpy.structure.parsers package

Conversion plugins for various structure formats.

The recognized structure formats are defined by subclassing StructureParser, by convention these classes are named P_<format>.py. The parser classes should to override the parseLines() and toLines() methods of StructureParser. Any structure parser needs to be registered in parser_index module.

For normal usage it should be sufficient to use the routines provided in this module.

Content:
  • StructureParser: base class for a concrete Parser

  • parser_index: dictionary of known structure formats

  • getParser: factory for Parser at given format

  • inputFormats: list of available input formats

  • outputFormats: list of available output formats

diffpy.structure.parsers.getParser(format, **kw)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.get_parser instead.

diffpy.structure.parsers.get_parser(format, **kw)[source]

Return Parser instance for a given structure format.

Parameters:
  • format (str) – String with the format name, see parser_index_mod.

  • **kw (dict) – Keyword arguments passed to the Parser init function.

Returns:

Parser instance for the given format.

Return type:

Parser

Raises:

StructureFormatError – When the format is not defined.

diffpy.structure.parsers.inputFormats()

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.input_formats instead.

diffpy.structure.parsers.input_formats()[source]

Return list of implemented input structure formats.

diffpy.structure.parsers.outputFormats()

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.output_formats instead.

diffpy.structure.parsers.output_formats()[source]

Return list of implemented output structure formats.

Submodules

diffpy.structure.parsers.p_rawxyz module

Parser for raw XYZ file format.

Raw XYZ is a 3 or 4 column text file with cartesian coordinates of atoms and an optional first column for atom types.

class diffpy.structure.parsers.p_rawxyz.P_rawxyz[source]

Bases: StructureParser

Parser –> StructureParser subclass for RAWXYZ format.

format

Format name, default “rawxyz”.

Type:

str

parseLines(lines)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_rawxyz.parse_lines instead.

parse_lines(lines)[source]

Parse list of lines in RAWXYZ format.

Parameters:

lines (list of str) – List of lines in RAWXYZ format.

Returns:

Parsed structure instance.

Return type:

Structure

Raises:

StructureFormatError – Invalid RAWXYZ format.

toLines(stru)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_rawxyz.to_lines instead.

to_lines(stru)[source]

Convert Structure stru to a list of lines in RAWXYZ format.

Parameters:

stru (Structure) – Structure to be converted.

Returns:

List of lines in RAWXYZ format.

Return type:

list of str

diffpy.structure.parsers.p_rawxyz.getParser()

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_rawxyz.get_parser instead.

diffpy.structure.parsers.p_rawxyz.get_parser()[source]

Return new parser object for RAWXYZ format.

Returns:

Instance of P_rawxyz.

Return type:

P_rawxyz

diffpy.structure.parsers.structureparser module

Definition of StructureParser, a base class for specific parsers.

class diffpy.structure.parsers.structureparser.StructureParser[source]

Bases: object

Base class for all structure parsers.

format

Format name of particular parser.

Type:

str

filename

Path to structure file that is read or written.

Type:

str

parse(s)[source]

Create Structure instance from a string.

parseFile(filename)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.StructureParser.parse_file instead.

parseLines(lines)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.StructureParser.parse_lines instead.

parse_file(filename)[source]

Create Structure instance from an existing file.

parse_lines(lines)[source]

Create Structure instance from a list of lines.

Return Structure object or raise StructureFormatError exception.

Note

This method has to be overloaded in derived class.

toLines(stru)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.StructureParser.to_lines instead.

to_lines(stru)[source]

Convert Structure stru to a list of lines.

Return list of strings.

Note

This method has to be overloaded in derived class.

tostring(stru)[source]

Convert Structure instance to a string.

diffpy.structure.parsers.p_cif module

Parser for basic CIF file format.

diffpy.structure.parsers.p_cif.rx_float

Constant regular expression for leading_float().

Type:

re.Pattern

diffpy.structure.parsers.p_cif.symvec

Helper dictionary for getSymOp().

Type:

dict

class diffpy.structure.parsers.p_cif.P_cif(eps=None)[source]

Bases: StructureParser

Simple parser for CIF structure format.

Reads Structure from the first block containing _atom_site_label key. Following blocks, if any, are ignored.

Parameters:

eps (float, Optional) – Fractional coordinates cutoff for duplicate positions. When None use the default for ExpandAsymmetricUnit: 1.0e-5.

format

Structure format name.

Type:

str

ciffile

Instance of CifFile from PyCifRW.

Type:

CifFile

stru

Structure instance used for CIF input or output.

Type:

Structure

spacegroup

Instance of SpaceGroup used for symmetry expansion.

Type:

SpaceGroup

eps

Resolution in fractional coordinates for non-equal positions. Used for expansion of asymmetric unit.

Type:

float

eau

Instance of ExpandAsymmetricUnit from SymmetryUtilities.

Type:

ExpandAsymmetricUnit

asymmetric_unit

List of Atom instances for the original asymmetric unit in the CIF file.

Type:

list

labelindex

Dictionary mapping unique atom label to index of Atom in self.asymmetric_unit.

Type:

dict

anisotropy

Dictionary mapping unique atom label to displacement anisotropy resolved at that site.

Type:

dict

cif_sgname

Space group name obtained by looking up the value of _space_group_name_Hall, _symmetry_space_group_name_Hall, _space_group_name_H-M_alt, _symmetry_space_group_name_H-M items. None when neither is defined.

Type:

str or None

BtoU = 0.012665147955292222

Conversion factor from B values to U values.

Type:

float

parse(s)[source]

Create Structure instance from a string in CIF format.

Parameters:

s (str) – A string in CIF format.

Returns:

Structure instance.

Return type:

Structure

Raises:

StructureFormatError – When the data do not constitute a valid CIF format.

parseFile(filename)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_cif.parse_file instead.

parseLines(lines)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_cif.parse_lines instead.

parse_file(filename)[source]

Create Structure from an existing CIF file.

Parameters:

filename (str) – Path to structure file.

Returns:

Structure instance.

Return type:

Structure

Raises:
  • StructureFormatError – When the data do not constitute a valid CIF format.

  • IOError – When the file cannot be opened.

parse_lines(lines)

Parse list of lines in CIF format.

Parameters:

lines (list) – List of strings stripped of line terminator.

Returns:

Structure instance.

Return type:

Structure

Raises:

StructureFormatError – When the data do not constitute a valid CIF format.

toLines(stru)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_cif.to_lines instead.

to_lines(stru)[source]

Convert Structure to a list of lines in basic CIF format.

Parameters:

stru (Structure) – The structure to be converted.

Returns:

List of lines in basic CIF format.

Return type:

list

diffpy.structure.parsers.p_cif.getParser(eps=None)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.get_parser instead.

diffpy.structure.parsers.p_cif.getSymOp(s)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.get_symop instead.

diffpy.structure.parsers.p_cif.get_parser(eps=None)[source]

Return new parser object for CIF format.

Parameters:

eps (float, Optional) – fractional coordinates cutoff for duplicate positions. When None use the default for ExpandAsymmetricUnit: 1.0e-5.

Returns:

Instance of P_cif.

Return type:

P_cif

diffpy.structure.parsers.p_cif.get_symop(s)[source]

Create SpaceGroups.SymOp instance from a string.

Parameters:

s (str) – Formula for equivalent coordinates, for example 'x,1/2-y,1/2+z'.

Returns:

Instance of SymOp.

Return type:

SymOp

diffpy.structure.parsers.p_cif.leading_float(s, d=0.0)[source]

Extract the first float from a string and ignore trailing characters.

Useful for extracting values from “value(std)” syntax.

Parameters:
  • s (str) – The string to be scanned for floating point value.

  • d (float, Optional) – The default value when s is “.” or “?”, which in CIF format stands for inapplicable and unknown, respectively.

Returns:

The extracted floating point value.

Return type:

float

Raises:

ValueError – When string does not start with a float.

diffpy.structure.parsers.p_auto module

Parser for automatic file format detection.

This Parser does not provide the the toLines() method.

class diffpy.structure.parsers.p_auto.P_auto(**kw)[source]

Bases: StructureParser

Parser with automatic detection of structure format.

This parser attempts to automatically detect the format of a given structure file and parse it accordingly. When successful, it sets its format attribute to the detected structure format.

Parameters:

**kw (dict) – Keyword arguments for the structure parser.

format

Detected structure format. Initially set to “auto” and updated after successful detection of the structure format.

Type:

str

pkw

Keyword arguments passed to the parser.

Type:

dict

parse(s)[source]

Detect format and create Structure instance from a string.

Set format attribute to the detected file format.

Parameters:

s (str) – String with structure data.

Returns:

Structure object.

Return type:

Structure

Raises:

StructureFormatError

parseFile(filename)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_auto.parse_file instead.

parseLines(lines)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_auto.parse_lines instead.

parse_file(filename)[source]

Detect format and create Structure instance from an existing file.

Set format attribute to the detected file format.

Parameters:

filename (str) – Path to structure file.

Returns:

Structure object.

Return type:

Structure

Raises:
  • StructureFormatError – If the structure format is unknown or invalid.

  • IOError – If the file cannot be read.

parse_lines(lines)[source]

Detect format and create Structure instance from a list of lines.

Set format attribute to the detected file format.

Parameters:

lines (list) – List of lines with structure data.

Returns:

Structure object.

Return type:

Structure

Raises:

StructureFormatError

diffpy.structure.parsers.p_auto.getParser(**kw)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_auto.get_parser instead.

diffpy.structure.parsers.p_auto.get_parser(**kw)[source]

Return a new instance of the automatic parser.

Parameters:

**kw (dict) – Keyword arguments for the structure parser

Returns:

Instance of P_auto.

Return type:

P_auto

diffpy.structure.parsers.p_pdffit module

Parser for PDFfit structure format.

class diffpy.structure.parsers.p_pdffit.P_pdffit[source]

Bases: StructureParser

Parser for PDFfit structure format.

format

Format name, default “pdffit”.

Type:

str

ignored_lines

List of lines ignored during parsing.

Type:

list

stru

Structure instance used for cif input or output.

Type:

PDFFitStructure

parseLines(lines)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_pdffit.parse_lines instead.

parse_lines(lines)[source]

Parse list of lines in PDFfit format.

Parameters:

lines (list of str) – List of lines in PDB format.

Returns:

Parsed structure instance.

Return type:

Structure

Raises:

StructureFormatError – File not in PDFfit format.

toLines(stru)[source]

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_pdffit.toLines instead.

to_lines(stru)[source]

Convert Structure stru to a list of lines in PDFfit format.

Parameters:

stru (Structure) – Structure to be converted.

Returns:

List of lines in PDFfit format.

Return type:

list of str

diffpy.structure.parsers.p_pdffit.getParser()[source]

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_pdffit.get_parser instead.

diffpy.structure.parsers.p_pdffit.get_parser()[source]

Return new parser object for PDFfit format.

Returns:

Instance of P_pdffit.

Return type:

P_pdffit

diffpy.structure.parsers.p_xcfg module

Parser for extended CFG format used by atomeye.

diffpy.structure.parsers.p_xcfg.AtomicMass

Dictionary of atomic masses for elements.

Type:

dict

class diffpy.structure.parsers.p_xcfg.P_xcfg[source]

Bases: StructureParser

Parser for AtomEye extended CFG format.

format

Format name, default “xcfg”.

Type:

str

cluster_boundary = 2

Width of boundary around corners of non-periodic cluster to avoid PBC effects in atomeye.

Type:

int

parseLines(lines)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_xcfg.parse_lines instead.

parse_lines(lines)[source]

Parse list of lines in XCFG format.

Parameters:

lines (list of str) – List of lines in XCFG format.

Returns:

Parsed structure instance.

Return type:

Structure

Raises:

StructureFormatError – Invalid XCFG format.

toLines(stru)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_xcfg.to_lines instead.

to_lines(stru)[source]

Convert Structure stru to a list of lines in XCFG atomeye format.

Parameters:

stru (Structure) – Structure to be converted.

Returns:

List of lines in XCFG format.

Return type:

list of str

Raises:

StructureFormatError – Cannot convert empty structure to XCFG format.

diffpy.structure.parsers.p_xcfg.getParser()

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_xcfg.get_parser instead.

diffpy.structure.parsers.p_xcfg.get_parser()[source]

Return new parser object for XCFG format.

Returns:

Instance of P_xcfg.

Return type:

P_xcfg

diffpy.structure.parsers.parser_index_mod module

Index of recognized structure formats, their IO capabilities and associated modules where they are defined.

diffpy.structure.parsers.parser_index_mod.parser_index

Dictionary of recognized structure formats. The keys are format names and the values are dictionaries with the following keys:

modulestr

Name of the module that defines the parser class.

file_extensionstr

File extension for the format, including the leading dot.

file_patternstr

File pattern for the format, using ‘|’ as separator for multiple patterns.

has_inputbool

True if the parser can read the format.

has_outputbool

True if the parser can write the format.

Type:

dict

Note

Plugins for new structure formats need to be added to the parser_index dictionary in this module.

diffpy.structure.parsers.p_pdb module

Basic parser for PDB structure format.

class diffpy.structure.parsers.p_pdb.P_pdb[source]

Bases: StructureParser

Simple parser for PDB format.

The parser understands following PDB records: TITLE, CRYST1, SCALE1, SCALE2, SCALE3, ATOM, SIGATM, ANISOU, SIGUIJ, TER, HETATM, END.

format

Format name, default “pdb”.

Type:

str

atomLines(stru, idx)

Build ATOM records and possibly SIGATM, ANISOU or SIGUIJ records for structure stru atom number aidx.

atom_lines(stru, idx)[source]

Build ATOM records and possibly SIGATM, ANISOU or SIGUIJ records for structure stru atom number aidx.

cryst1Lines(stru)

Build lines corresponding to CRYST1 record.

cryst1_lines(stru)[source]

Build lines corresponding to CRYST1 record.

orderOfRecords = ['HEADER', 'OBSLTE', 'TITLE', 'CAVEAT', 'COMPND', 'SOURCE', 'KEYWDS', 'EXPDTA', 'AUTHOR', 'REVDAT', 'SPRSDE', 'JRNL', 'REMARK', 'REMARK', 'REMARK', 'REMARK', 'DBREF', 'SEQADV', 'SEQRES', 'MODRES', 'HET', 'HETNAM', 'HETSYN', 'FORMUL', 'HELIX', 'SHEET', 'TURN', 'SSBOND', 'LINK', 'HYDBND', 'SLTBRG', 'CISPEP', 'SITE', 'CRYST1', 'ORIGX1', 'ORIGX2', 'ORIGX3', 'SCALE1', 'SCALE2', 'SCALE3', 'MTRIX1', 'MTRIX2', 'MTRIX3', 'TVECT', 'MODEL', 'ATOM', 'SIGATM', 'ANISOU', 'SIGUIJ', 'TER', 'HETATM', 'ENDMDL', 'CONECT', 'MASTER', 'END']

Ordered list of PDB record labels.

Type:

list

parseLines(lines)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_pdb.parse_lines instead.

parse_lines(lines)[source]

Parse list of lines in PDB format.

Parameters:

lines (list of str) – List of lines in PDB format.

Returns:

Parsed structure instance.

Return type:

Structure

Raises:

StructureFormatError – Invalid PDB record.

titleLines(stru)

Build lines corresponding to TITLE record.

title_lines(stru)[source]

Build lines corresponding to TITLE record.

toLines(stru)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_pdb.to_lines instead.

to_lines(stru)[source]

Convert Structure stru to a list of lines in PDB format.

Parameters:

stru (Structure) – Structure to be converted.

Returns:

List of lines in PDB format.

Return type:

list of str

validRecords = {'ANISOU': None, 'ATOM': None, 'AUTHOR': None, 'CAVEAT': None, 'CISPEP': None, 'COMPND': None, 'CONECT': None, 'CRYST1': None, 'DBREF': None, 'END': None, 'ENDMDL': None, 'EXPDTA': None, 'FORMUL': None, 'HEADER': None, 'HELIX': None, 'HET': None, 'HETATM': None, 'HETNAM': None, 'HETSYN': None, 'HYDBND': None, 'JRNL': None, 'KEYWDS': None, 'LINK': None, 'MASTER': None, 'MODEL': None, 'MODRES': None, 'MTRIX1': None, 'MTRIX2': None, 'MTRIX3': None, 'OBSLTE': None, 'ORIGX1': None, 'ORIGX2': None, 'ORIGX3': None, 'REMARK': None, 'REVDAT': None, 'SCALE1': None, 'SCALE2': None, 'SCALE3': None, 'SEQADV': None, 'SEQRES': None, 'SHEET': None, 'SIGATM': None, 'SIGUIJ': None, 'SITE': None, 'SLTBRG': None, 'SOURCE': None, 'SPRSDE': None, 'SSBOND': None, 'TER': None, 'TITLE': None, 'TURN': None, 'TVECT': None}

Dictionary of PDB record labels.

Type:

dict

diffpy.structure.parsers.p_pdb.getParser()

Return new parser object for PDB format.

Returns:

Instance of P_pdb.

Return type:

P_pdb

diffpy.structure.parsers.p_pdb.get_parser()[source]

Return new parser object for PDB format.

Returns:

Instance of P_pdb.

Return type:

P_pdb

diffpy.structure.parsers.p_discus module

Parser for DISCUS structure format.

class diffpy.structure.parsers.p_discus.P_discus[source]

Bases: StructureParser

Parser for DISCUS structure format. The parser chokes on molecule and generator records.

format

File format name, default “discus”.

Type:

str

nl

Line number of the current line being parsed.

Type:

int

lines

List of lines from the input file.

Type:

list of str

line

Current line being parsed.

Type:

str

stru

Structure being parsed.

Type:

PDFFitStructure

ignored_lines

List of lines that were ignored during parsing.

Type:

list of str

cell_read

True if cell record processed.

Type:

bool

ncell_read

True if ncell record processed.

Type:

bool

parseLines(lines)

Parse list of lines in DISCUS format.

Parameters:

lines (list of str) – List of lines from the input file.

Returns:

Parsed PDFFitStructure instance.

Return type:

PDFFitStructure

Raises:

StructureFormatError – If the file is not in DISCUS format.

parse_lines(lines)[source]

Parse list of lines in DISCUS format.

Parameters:

lines (list of str) – List of lines from the input file.

Returns:

Parsed PDFFitStructure instance.

Return type:

PDFFitStructure

Raises:

StructureFormatError – If the file is not in DISCUS format.

toLines(stru)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_discus.to_lines instead.

to_lines(stru)[source]

Convert Structure stru to a list of lines in DISCUS format.

Parameters:

stru (Structure) – Structure to be converted.

Returns:

List of lines in DISCUS format.

Return type:

list of str

diffpy.structure.parsers.p_discus.getParser()

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_discus.get_parser instead.

diffpy.structure.parsers.p_discus.get_parser()[source]

Return new parser object for DISCUS format.

Returns:

Instance of P_discus.

Return type:

P_discus

diffpy.structure.parsers.p_vesta module

Parser for VESTA format used by VESTA (Visualization for Electronic and Structural Analysis).

This module replaces the AtomEye XCFG parser (P_xcfg). The XCFG parser and all its original attributes are preserved for backward compatibility. VESTA is the actively maintained successor viewer.

diffpy.structure.parsers.p_vesta.AtomicMass

Dictionary of atomic masses for elements.

Type:

dict

class diffpy.structure.parsers.p_vesta.P_vesta[source]

Bases: StructureParser

Parser for VESTA native structure format (.vesta).

VESTA (Visualization for Electronic and Structural Analysis) is the actively maintained successor to AtomEye. This parser writes the native VESTA format understood by VESTA 3.x and later.

format

Format name, default “vesta”.

Type:

str

Notes

The cluster_boundary attribute is retained from the original AtomEye/XCFG parser for API compatibility; it is not used by VESTA because VESTA handles periodicity natively.

cluster_boundary = 2

Width of boundary around corners of non-periodic cluster. Retained from the original AtomEye/XCFG parser for API compatibility. VESTA handles periodicity natively so this value has no effect on output.

Type:

int

parse_lines(lines)[source]

Parse list of lines in VESTA format.

Reads the STRUC, ATOMT, and COORD sections of a .vesta file to reconstruct a Structure.

Parameters:

lines (list of str) – Lines of a VESTA format file.

Returns:

Parsed structure instance.

Return type:

Structure

Raises:

StructureFormatError – When the file does not conform to the VESTA format.

to_lines(stru)[source]

Convert Structure stru to a list of lines in VESTA format.

Produces a .vesta file readable by VESTA 3.x and later, containing STRUC, ATOMT, and COORD sections derived from the structure’s lattice and atomic data.

Parameters:

stru (Structure) – Structure to be converted.

Returns:

Lines of a VESTA format file.

Return type:

list of str

Raises:

StructureFormatError – Cannot convert empty structure to VESTA format.

diffpy.structure.parsers.p_vesta.get_parser()[source]

Return new parser object for VESTA format.

Returns:

Instance of P_vesta.

Return type:

P_vesta

diffpy.structure.parsers.p_xyz module

Parser for XYZ file format, where.

  • First line gives number of atoms.

  • Second line has optional title.

  • Remaining lines contain element, x, y, z.

class diffpy.structure.parsers.p_xyz.P_xyz[source]

Bases: StructureParser

Parser for standard XYZ structure format.

format

Format name, default “xyz”.

Type:

str

parseLines(lines)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_xyz.parse_lines instead.

parse_lines(lines)[source]

Parse list of lines in XYZ format.

Parameters:

lines (list of str) – List of lines in XYZ format.

Returns:

Parsed structure instance.

Return type:

Structure

Raises:

StructureFormatError – Invalid XYZ format.

toLines(stru)

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_xyz.to_lines instead.

to_lines(stru)[source]

Convert Structure stru to a list of lines in XYZ format.

Parameters:

stru (Structure) – Structure to be converted.

Returns:

List of lines in XYZ format.

Return type:

list of str

diffpy.structure.parsers.p_xyz.getParser()

This function has been deprecated and will be removed in version 4.0.0.

Please use diffpy.structure.P_xyz.get_parser instead.

diffpy.structure.parsers.p_xyz.get_parser()[source]

Return new parser object for XYZ format.

Returns:

Instance of P_xyz.

Return type:

P_xcfg