diffpy.structure.parsers package
Conversion plugins for various structure formats.
The recognized structure formats are defined by subclassing StructureParser, by convention these classes are named P_<format>.py. The parser classes should to override the parseLines() and toLines() methods of StructureParser. Any structure parser needs to be registered in parser_index module.
For normal usage it should be sufficient to use the routines provided in this module.
- Content:
- StructureParser: base class for a concrete Parser 
- parser_index: dictionary of known structure formats 
- getParser: factory for Parser at given format 
- inputFormats: list of available input formats 
- outputFormats: list of available output formats 
 
- diffpy.structure.parsers.getParser(format, **kw)[source]
- Return Parser instance for a given structure format. - Parameters:
- format (str) – String with the format name, see parser_index_mod. 
- **kw (dict) – Keyword arguments passed to the Parser init function. 
 
- Returns:
- Parser instance for the given format. 
- Return type:
- Parser 
- Raises:
- StructureFormatError – When the format is not defined. 
 
- diffpy.structure.parsers.inputFormats()[source]
- Return list of implemented input structure formats. 
- diffpy.structure.parsers.outputFormats()[source]
- Return list of implemented output structure formats. 
Submodules
diffpy.structure.parsers.p_rawxyz module
Parser for raw XYZ file format.
Raw XYZ is a 3 or 4 column text file with cartesian coordinates of atoms and an optional first column for atom types.
- class diffpy.structure.parsers.p_rawxyz.P_rawxyz[source]
- Bases: - StructureParser- Parser –> StructureParser subclass for RAWXYZ format. - format
- Format name, default “rawxyz”. - Type:
- str 
 
 - parseLines(lines)[source]
- Parse list of lines in RAWXYZ format. - Parameters:
- lines (list of str) – List of lines in RAWXYZ format. 
- Returns:
- Parsed structure instance. 
- Return type:
- Raises:
- StructureFormatError – Invalid RAWXYZ format. 
 
 
diffpy.structure.parsers.structureparser module
Definition of StructureParser, a base class for specific parsers.
- class diffpy.structure.parsers.structureparser.StructureParser[source]
- Bases: - object- Base class for all structure parsers. - format
- Format name of particular parser. - Type:
- str 
 
 - filename
- Path to structure file that is read or written. - Type:
- str 
 
 - parseLines(lines)[source]
- Create Structure instance from a list of lines. - Return Structure object or raise StructureFormatError exception. - Note - This method has to be overloaded in derived class. 
 
diffpy.structure.parsers.p_cif module
Parser for basic CIF file format.
- diffpy.structure.parsers.p_cif.rx_float
- Constant regular expression for leading_float(). - Type:
- re.Pattern 
 
- diffpy.structure.parsers.p_cif.symvec
- Helper dictionary for getSymOp(). - Type:
- dict 
 
Note
References: https://www.iucr.org/resources/cif
- class diffpy.structure.parsers.p_cif.P_cif(eps=None)[source]
- Bases: - StructureParser- Simple parser for CIF structure format. - Reads Structure from the first block containing _atom_site_label key. Following blocks, if any, are ignored. - Parameters:
- eps (float, Optional) – Fractional coordinates cutoff for duplicate positions. When - Noneuse the default for ExpandAsymmetricUnit:- 1.0e-5.
 - format
- Structure format name. - Type:
- str 
 
 - ciffile
- Instance of CifFile from PyCifRW. - Type:
- CifFile 
 
 - spacegroup
- Instance of SpaceGroup used for symmetry expansion. - Type:
 
 - eps
- Resolution in fractional coordinates for non-equal positions. Used for expansion of asymmetric unit. - Type:
- float 
 
 - eau
- Instance of ExpandAsymmetricUnit from SymmetryUtilities. - Type:
 
 - asymmetric_unit
- List of Atom instances for the original asymmetric unit in the CIF file. - Type:
- list 
 
 - labelindex
- Dictionary mapping unique atom label to index of Atom in self.asymmetric_unit. - Type:
- dict 
 
 - anisotropy
- Dictionary mapping unique atom label to displacement anisotropy resolved at that site. - Type:
- dict 
 
 - cif_sgname
- Space group name obtained by looking up the value of _space_group_name_Hall, _symmetry_space_group_name_Hall, _space_group_name_H-M_alt, _symmetry_space_group_name_H-M items. - Nonewhen neither is defined.- Type:
- str or None 
 
 - BtoU = 0.012665147955292222
- Conversion factor from B values to U values. - Type:
- float 
 
 - parse(s)[source]
- Create Structure instance from a string in CIF format. - Parameters:
- s (str) – A string in CIF format. 
- Returns:
- Structure instance. 
- Return type:
- Raises:
- StructureFormatError – When the data do not constitute a valid CIF format. 
 
 - parseFile(filename)[source]
- Create Structure from an existing CIF file. - Parameters:
- filename (str) – Path to structure file. 
- Returns:
- Structure instance. 
- Return type:
- Raises:
- StructureFormatError – When the data do not constitute a valid CIF format. 
- IOError – When the file cannot be opened. 
 
 
 - parseLines(lines)[source]
- Parse list of lines in CIF format. - Parameters:
- lines (list) – List of strings stripped of line terminator. 
- Returns:
- Structure instance. 
- Return type:
- Raises:
- StructureFormatError – When the data do not constitute a valid CIF format. 
 
 
- diffpy.structure.parsers.p_cif.getParser(eps=None)[source]
- Return new parser object for CIF format. - Parameters:
- eps (float, Optional) – fractional coordinates cutoff for duplicate positions. When - Noneuse the default for ExpandAsymmetricUnit:- 1.0e-5.
- Returns:
- Instance of P_cif. 
- Return type:
 
- diffpy.structure.parsers.p_cif.getSymOp(s)[source]
- Create SpaceGroups.SymOp instance from a string. - Parameters:
- s (str) – Formula for equivalent coordinates, for example - 'x,1/2-y,1/2+z'.
- Returns:
- Instance of SymOp. 
- Return type:
 
- diffpy.structure.parsers.p_cif.leading_float(s, d=0.0)[source]
- Extract the first float from a string and ignore trailing characters. - Useful for extracting values from “value(std)” syntax. - Parameters:
- s (str) – The string to be scanned for floating point value. 
- d (float, Optional) – The default value when s is “.” or “?”, which in CIF format stands for inapplicable and unknown, respectively. 
 
- Returns:
- The extracted floating point value. 
- Return type:
- float 
- Raises:
- ValueError – When string does not start with a float. 
 
diffpy.structure.parsers.p_auto module
Parser for automatic file format detection.
This Parser does not provide the the toLines() method.
- class diffpy.structure.parsers.p_auto.P_auto(**kw)[source]
- Bases: - StructureParser- Parser with automatic detection of structure format. - This parser attempts to automatically detect the format of a given structure file and parse it accordingly. When successful, it sets its format attribute to the detected structure format. - Parameters:
- **kw (dict) – Keyword arguments for the structure parser. 
 - format
- Detected structure format. Initially set to “auto” and updated after successful detection of the structure format. - Type:
- str 
 
 - pkw
- Keyword arguments passed to the parser. - Type:
- dict 
 
 - parse(s)[source]
- Detect format and create Structure instance from a string. - Set format attribute to the detected file format. - Parameters:
- s (str) – String with structure data. 
- Returns:
- Structure object. 
- Return type:
- Raises:
 
 - parseFile(filename)[source]
- Detect format and create Structure instance from an existing file. - Set format attribute to the detected file format. - Parameters:
- filename (str) – Path to structure file. 
- Returns:
- Structure object. 
- Return type:
- Raises:
- StructureFormatError – If the structure format is unknown or invalid. 
- IOError – If the file cannot be read. 
 
 
 
diffpy.structure.parsers.p_pdffit module
Parser for PDFfit structure format.
- class diffpy.structure.parsers.p_pdffit.P_pdffit[source]
- Bases: - StructureParser- Parser for PDFfit structure format. - format
- Format name, default “pdffit”. - Type:
- str 
 
 - ignored_lines
- List of lines ignored during parsing. - Type:
- list 
 
 - stru
- Structure instance used for cif input or output. - Type:
 
 - parseLines(lines)[source]
- Parse list of lines in PDFfit format. - Parameters:
- lines (list of str) – List of lines in PDB format. 
- Returns:
- Parsed structure instance. 
- Return type:
- Raises:
- StructureFormatError – File not in PDFfit format. 
 
 
diffpy.structure.parsers.p_xcfg module
Parser for extended CFG format used by atomeye.
- diffpy.structure.parsers.p_xcfg.AtomicMass
- Dictionary of atomic masses for elements. - Type:
- dict 
 
- class diffpy.structure.parsers.p_xcfg.P_xcfg[source]
- Bases: - StructureParser- Parser for AtomEye extended CFG format. - format
- Format name, default “xcfg”. - Type:
- str 
 
 - cluster_boundary = 2
- Width of boundary around corners of non-periodic cluster to avoid PBC effects in atomeye. - Type:
- int 
 
 - parseLines(lines)[source]
- Parse list of lines in XCFG format. - Parameters:
- lines (list of str) – List of lines in XCFG format. 
- Returns:
- Parsed structure instance. 
- Return type:
- Raises:
- StructureFormatError – Invalid XCFG format. 
 
 - toLines(stru)[source]
- Convert Structure stru to a list of lines in XCFG atomeye format. - Parameters:
- stru (Structure) – Structure to be converted. 
- Returns:
- List of lines in XCFG format. 
- Return type:
- list of str 
- Raises:
- StructureFormatError – Cannot convert empty structure to XCFG format. 
 
 
diffpy.structure.parsers.parser_index_mod module
Index of recognized structure formats, their IO capabilities and associated modules where they are defined.
- diffpy.structure.parsers.parser_index_mod.parser_index
- Dictionary of recognized structure formats. The keys are format names and the values are dictionaries with the following keys: - modulestr
- Name of the module that defines the parser class. 
- file_extensionstr
- File extension for the format, including the leading dot. 
- file_patternstr
- File pattern for the format, using ‘|’ as separator for multiple patterns. 
- has_inputbool
- Trueif the parser can read the format.
- has_outputbool
- Trueif the parser can write the format.
 - Type:
- dict 
 
Note
Plugins for new structure formats need to be added to the parser_index dictionary in this module.
diffpy.structure.parsers.p_pdb module
Basic parser for PDB structure format.
Note
- class diffpy.structure.parsers.p_pdb.P_pdb[source]
- Bases: - StructureParser- Simple parser for PDB format. - The parser understands following PDB records: TITLE, CRYST1, SCALE1, SCALE2, SCALE3, ATOM, SIGATM, ANISOU, SIGUIJ, TER, HETATM, END. - format
- Format name, default “pdb”. - Type:
- str 
 
 - atomLines(stru, idx)[source]
- Build ATOM records and possibly SIGATM, ANISOU or SIGUIJ records for structure stru atom number aidx. 
 - orderOfRecords = ['HEADER', 'OBSLTE', 'TITLE', 'CAVEAT', 'COMPND', 'SOURCE', 'KEYWDS', 'EXPDTA', 'AUTHOR', 'REVDAT', 'SPRSDE', 'JRNL', 'REMARK', 'REMARK', 'REMARK', 'REMARK', 'DBREF', 'SEQADV', 'SEQRES', 'MODRES', 'HET', 'HETNAM', 'HETSYN', 'FORMUL', 'HELIX', 'SHEET', 'TURN', 'SSBOND', 'LINK', 'HYDBND', 'SLTBRG', 'CISPEP', 'SITE', 'CRYST1', 'ORIGX1', 'ORIGX2', 'ORIGX3', 'SCALE1', 'SCALE2', 'SCALE3', 'MTRIX1', 'MTRIX2', 'MTRIX3', 'TVECT', 'MODEL', 'ATOM', 'SIGATM', 'ANISOU', 'SIGUIJ', 'TER', 'HETATM', 'ENDMDL', 'CONECT', 'MASTER', 'END']
- Ordered list of PDB record labels. - Type:
- list 
 
 - parseLines(lines)[source]
- Parse list of lines in PDB format. - Parameters:
- lines (list of str) – List of lines in PDB format. 
- Returns:
- Parsed structure instance. 
- Return type:
- Raises:
- StructureFormatError – Invalid PDB record. 
 
 - toLines(stru)[source]
- Convert Structure stru to a list of lines in PDB format. - Parameters:
- stru (Structure) – Structure to be converted. 
- Returns:
- List of lines in PDB format. 
- Return type:
- list of str 
 
 - validRecords = {'ANISOU': None, 'ATOM': None, 'AUTHOR': None, 'CAVEAT': None, 'CISPEP': None, 'COMPND': None, 'CONECT': None, 'CRYST1': None, 'DBREF': None, 'END': None, 'ENDMDL': None, 'EXPDTA': None, 'FORMUL': None, 'HEADER': None, 'HELIX': None, 'HET': None, 'HETATM': None, 'HETNAM': None, 'HETSYN': None, 'HYDBND': None, 'JRNL': None, 'KEYWDS': None, 'LINK': None, 'MASTER': None, 'MODEL': None, 'MODRES': None, 'MTRIX1': None, 'MTRIX2': None, 'MTRIX3': None, 'OBSLTE': None, 'ORIGX1': None, 'ORIGX2': None, 'ORIGX3': None, 'REMARK': None, 'REVDAT': None, 'SCALE1': None, 'SCALE2': None, 'SCALE3': None, 'SEQADV': None, 'SEQRES': None, 'SHEET': None, 'SIGATM': None, 'SIGUIJ': None, 'SITE': None, 'SLTBRG': None, 'SOURCE': None, 'SPRSDE': None, 'SSBOND': None, 'TER': None, 'TITLE': None, 'TURN': None, 'TVECT': None}
- Dictionary of PDB record labels. - Type:
- dict 
 
 
diffpy.structure.parsers.p_discus module
Parser for DISCUS structure format.
- class diffpy.structure.parsers.p_discus.P_discus[source]
- Bases: - StructureParser- Parser for DISCUS structure format. The parser chokes on molecule and generator records. - format
- File format name, default “discus”. - Type:
- str 
 
 - nl
- Line number of the current line being parsed. - Type:
- int 
 
 - lines
- List of lines from the input file. - Type:
- list of str 
 
 - line
- Current line being parsed. - Type:
- str 
 
 - stru
- Structure being parsed. - Type:
 
 - ignored_lines
- List of lines that were ignored during parsing. - Type:
- list of str 
 
 - cell_read
- Trueif cell record processed.- Type:
- bool 
 
 - ncell_read
- Trueif ncell record processed.- Type:
- bool 
 
 - parseLines(lines)[source]
- Parse list of lines in DISCUS format. - Parameters:
- lines (list of str) – List of lines from the input file. 
- Returns:
- Parsed PDFFitStructure instance. 
- Return type:
- Raises:
- StructureFormatError – If the file is not in DISCUS format. 
 
 
diffpy.structure.parsers.p_xyz module
Parser for XYZ file format, where.
- First line gives number of atoms. 
- Second line has optional title. 
- Remaining lines contain element, x, y, z. 
- class diffpy.structure.parsers.p_xyz.P_xyz[source]
- Bases: - StructureParser- Parser for standard XYZ structure format. - format
- Format name, default “xyz”. - Type:
- str 
 
 - parseLines(lines)[source]
- Parse list of lines in XYZ format. - Parameters:
- lines (list of str) – List of lines in XYZ format. 
- Returns:
- Parsed structure instance. 
- Return type:
- Raises:
- StructureFormatError – Invalid XYZ format.