pyUSID

Python framework for storing, visualizing, and processing spectroscopy, imaging or any observational / experimental data

What?

  • The Universal Spectroscopic and Imaging Data (USID) model:
    • facilitates the representation of any spectroscopic or imaging data regardless of its origin, modality, size, or dimensionality.
    • enables the development of instrument- and modality- agnostic data processing and analysis algorithms.
    • is just a definition or a blueprint rather than something tangible and readily usable.
  • pyUSID is a python package that currently provides three pieces of functionality:
    1. io: Primarily, it enables the storage and access of USID in hierarchical data format (HDF5) files (referred to as h5USID files) using python
    2. viz: It has handy tools for visualizing USID and general scientific data
    3. processing: It provides a framework for formulating scientific problems into computational problems. See pycroscopy - a sister project that uses pyUSID for analysis of microscopy data.
  • pyUSID uses a data-centric approach wherein the raw data collected from the instrument, results from analysis and processing routines are all written to the same h5USID file for traceability, reproducibility, and provenance.
  • Just as scipy uses numpy underneath, scientific packages like pycroscopy use pyUSID for all file-handling, general plotting utilities and a data processing framework
  • pyUSID uses popular packages such as numpy, h5py, joblib, matplotlib, etc. for most of the storage, computation, and visualization.
  • See a high-level overview of pyUSID in this presentation
  • Jump to our GitHub project

Why?

As we see it, there are a few opportunities in scientific imaging (that surely apply to several other scientific domains):

1. Growing data sizes
  • Cannot use desktop computers for analysis
  • Need: High performance computing, storage resources and compatible, scalable file structures
2. Increasing data complexity
  • Sophisticated imaging and spectroscopy modes resulting in 5,6,7… dimensional data
  • Need: Robust software and generalized data formatting
3. Multiple file formats
  • Different formats from each instrument. Proprietary in most cases
  • Incompatible for correlation
  • Need: Open, instrument-independent data format
4. Expensive analysis software
  • Software supplied with instruments often insufficient / incapable of custom analysis routines
  • Commercial software (Eg: Matlab, Origin..) are often prohibitively expensive.
  • Need: Free, powerful, open source, user-friendly software
5. Closed science
  • Analysis software and data not shared
  • No guarantees of reproducibility or traceability
  • Need: open source data structures, file formats, centralized code and data repositories

Who?

  • This project begun largely as an effort by scientists and engineers at the Institute for Functional Imaging of Materials (IFIM) to standardize data representation, storage and processing for a very large variety of imaging and spectroscopy instruments.
  • It is now being developed and maintained by Suhas Somnath of the Advanced Data & Workflows Group (ADWG) at the Oak Ridge National Laboratory Leadership Computing Facility (OLCF) and Chris R. Smith of IFIM. Please visit our credits and acknowledgements page for more information.
  • While pyUSID was originally a part of pycroscopy up to 2017, it is now serves as an independent, science-agnostic data handling package. pyUSID was born so that it can integrate with other existing mature packages in any domain. If you are interested in integrating our data model with your existing package, please get in touch with us.