Tools Example

The tools module contains various tools that may be useful when you manipulate and analyze diffraction data.

Automatically Capture User Info

One task we would like to do is to capture and propagate useful metadata that describes the diffraction data. Some is essential such as wavelength and radiation type. Other metadata is useful such as information about the sample, co-workers and so on. However, one of the most important bits of information is the name of the data owner. For example, in DiffractionObjects this is stored in the metadata dictionary as owner_name, owner_email, and owner_orcid.

To reduce experimenter overhead when collecting this information, we have developed an infrastructure that helps to capture this information automatically when you are using DiffractionObjects and other diffpy tools. You may also reuse this infrastructure for your own projects using tools in this tutorial.

This example will demonstrate how diffpy.utils allows us to conveniently load and manage user and package information. Using the tools module, we can efficiently get them in terms of a dictionary.

Load user info into your program

To use this functionality in your own code make use of the get_user_info function in diffpy.utils.tools which will search for information about the user, parse it, and return it in a dictionary object e.g. if the user is “Jane Doe” with email “janedoe@gmail.com” and ORCID “0000-0000-0000-0000”, and if the function can find the information (more on this below), if you type this

from diffpy.utils.tools import get_user_info
user_info = get_user_info()

The function will return

{"owner_name": "Jane Doe", "owner_email": "janedoe@email.com", "owner_orcid": "0000-0000-0000-0000"}

Where does get_user_info() get the user information from?

The function will first attempt to load the information from configuration files with the name diffpyconfig.json on your hard-drive. It looks for files in the current working directory and in the computer-user’s home (i.e., login) directory. For example, it might be in C:/Users/yourname`` or something like that, but to find this directory, open a terminal and a unix or mac system type

cd ~
pwd

Or type Echo $HOME. On a Windows computer

echo %USERPROFILE%"

It is also possible to override the values in the config files at run-time by passing values directly into the function according to get_user_info, for example, get_user_info(owner_name="Janet Doe", owner_email="janetdoe@email.com", owner_orcid="1111-1111-1111-1111"). The information to pass into get_user_info could be entered by a user through a command-line interface or into a gui.

What if no config files exist yet?

If no configuration files can be found, they can be created using a text editor, or by using a diffpy tool called check_and_build_global_config() which, if no global config file can be found, prompts the user for the information then writes the config file in the user’s home directory.

When building an application where you want to capture data-owner information, we recommend you execute check_and_build_global_config() first followed by get_user_info in your app workflow. E.g.,

from diffpy.utils.tools import check_and_build_global_config, get_user_info
from datetime import datetime
import json

def my_cool_data_enhancer_app_main(data, filepath):
    check_and_build_global_config()
    metadata_enhanced_data = get_user_info()
    metadata_enhanced_data.update({"creation_time": datetime.now(),
                                   "data": data})
    with open(filepath, "w") as f:
        json.dump(metadata_enhanced_data, f)

check_and_build_global_config() only interrupts execution if it can’t find a valid config file, and so if the user enters valid information it will only run once. However, if you want to bypass this behavior, check_and_build_global_config() takes an optional boolean skip_config_creation parameter that could be set to True at runtime to override the config creation.

What happens when you run check_and_build_global_config()?

When you set skip_config_creation to False and there is no existing global configuration file, the function will prompt you for inputs (name, email, ORCID). An example of the prompts you may see is:

Please enter the name you would want future work to be credited to: Jane Doe
Please enter your email: janedoe@example.com
Please enter your orcid ID if you know it: 0000-0000-0000-0000

After receiving the inputs, the function will write the information to the diffpyconfig.json file in your home directory.

check_and_build_global_config() returns True if the config file exists (whether it created it or not) and False if the config file does not exist in the user’s home allowing you to develop your own workflow for handling missing config files after running it with skip_config_creation=True.

I entered the wrong information in my config file so it always loads incorrect information, how do I fix that?

It is easy to fix this simply by deleting the global and/or local config files, which will allow you to re-enter the information during the check_and_build_global_config() initialization workflow. You can also simply edit the diffpyconfig.json file directly using a text editor.

Locate the file diffpyconfig.json, in your home directory and open it in an editor

{
    "owner_name": "John Doe",
    "owner_email": "john.doe@example.com"
    "owner_orcid": "0000-0000-4321-1234"
}

Then you can edit the username and email as needed, make sure to save your edits.

Automatically Capture Info about a Software Package Being Used

We also have a handy tool for capturing information about a python package that is being used to save in the metadata. To use this functionality, use he function get_package_info, which inserts or updates software package names and versions in a given metadata dictionary under the key “package_info”, e.g.,

{"package_info": {"diffpy.utils": "0.3.0", "my_package": "0.3.1"}}

If the installed version of the package “my_package” is 0.3.1.

This function can be used in your code as follows

from diffpy.utils.tools import get_package_info
package_metadata = get_package_info("my_package")

or

package_metadata = get_package_info(["first_package", "second_package"])

which returns (for example)

{"package_info": {"diffpy.utils": "0.3.0", "first_package": "1.0.1", "second_package": "0.0.7"}}

You can also specify an existing dictionary to be updated with the information.

existing_dict = {"key": "value"}
updated_dict = get_package_info("my_package", metadata=existing_dict))

Which returns

{"key": "value", "package_info": {"diffpy.utils": "0.3.0", "my_package": "0.3.1"}}

Note that "diffpy.utils" is automatically included in the package info since the get_user_info function is part of diffpy.utils.