.. _Parsers Example: :tocdepth: 2 Parsers Example ############### This example will demonstrate how diffpy.utils lets us easily process and serialize files. Using the parsers module, we can load file data into simple and easy-to-work-with Python objects. 1) To begin, unzip :download:`parserdata<./exampledata/parserdata.zip>` and take a look at ``data.txt``. Our goal will be to extract and serialize the data table as well as the parameters listed in the header of this file. 2) To get the data table, we will use the ``loadData`` function. The default behavior of this function is to find and extract a data table from a file.:: from diffpy.utils.parsers import loadData data_table = loadData('') While this will work with most datasets, on our ``data.txt`` file, we got a ``ValueError``. The reason for this is due to the comments ``$ Phase Transition Near This Temperature Range`` and ``--> Note Significant Jump in Rw <--`` embedded within the dataset. To fix this, try using the ``comments`` parameter. :: data_table = loadData('', comments=['$', '-->']) This parameter tells ``loadData`` that any lines beginning with ``$`` and ``-->`` are just comments and more entries in our data table may follow. Here are a few other parameters to test out: * ``delimiter=','``: Look for a comma-separated data table. Useful for csv file types. However, since ``data.txt`` is whitespace separated, running :: loadData('', comments=['$', '-->'], delimiter=',') returns an empty list. * ``minrows=50``: Only look for data tables with at least 50 rows. Since our data table has much less than that many rows, running :: loadData('', comments=['$', '-->'], minrows=50) returns an empty list. * ``usecols=[0, 3]``: Only return the 0th and 3rd columns (zero-indexed) of the data table. For ``data.txt``, this corresponds to the temperature and rw columns. :: loadData('', comments=['$', '-->'], usecols=[0, 3]) 3) Next, to get the header information, we can again use ``loadData``, but this time with the ``headers`` parameter enabled. :: hdata = loadData('', comments=['$', '-->'], headers=True) 4) Rather than working with separate ``data_table`` and ``hdata`` objects, it may be easier to combine them into a single dictionary. We can do so using the ``serialize_data`` function. :: from diffpy.utils.parsers import serialize_data file_data = serialize_data('', hdata, data_table, dt_colnames=data_table_column_names) data_dict = file_data['data.txt'] Now we can extract specific data table columns from the dictionary. :: data_table_temperature_column = data_dict['temperature'] data_table_rw_column = data_dict['rw'] 5) When we are done working with the data, we can store it on disc for later use. This can also be done using the ``serialize_data`` function with an additional ``serial_file`` parameter.:: parsed_file_data = serialize_data('', hdata, data_table, serial_file='') The returned value, ``parsed_file_data``, is the dictionary we just added to ``serialfile.json``. To extract the data from the serial file, we use ``deserialize_data''. :: from diffpy.utils.parsers import deserialize_data parsed_file_data = deserialize_data('') 6) Finally, ``serialize_data`` allows us to store data from multiple text file in a single serial file. For one last bit of practice, we will extract and add the data from ``moredata.txt`` into the same ``serialdata.json`` file.:: data_table = loadData('') hdata = loadData('', headers=True) serialize_data('', hdata, data_table, serial_file='') The serial file ``serialfile.json`` should now contain two entries: ``data.txt`` and ``moredata.txt``. The data from each file can be accessed using :: serial_data = deserialize_data('') data_txt_data = serial_data['data.txt'] # Access data.txt data moredata_txt_data = serial_data['moredata.txt'] # Access moredata.txt data For more information, check out the :ref:`documentation` of the ``parsers`` module.