Scientific analysis of nanoscale materials imaging data
pycroscopy is a python package for image processing and scientific analysis of imaging modalities such as multi-frequency scanning probe microscopy, scanning tunneling spectroscopy, x-ray diffraction microscopy, and transmission electron microscopy. pycroscopy uses a data-centric model wherein the raw data collected from the microscope, results from analysis and processing routines are all written to standardized hierarchical data format (HDF5) files for traceability, reproducibility, and provenance.
- With pycroscopy we aim to:
- Serve as a hub for collaboration across scientific domains (microscopists, material scientists, biologists…)
- provide a community-developed, open standard for data formatting
- provide a framework for developing data analysis routines
- significantly lower the barrier to advanced data analysis procedures by simplifying I/O, processing, visualization, etc.
To learn more about the motivation, general structure, and philosophy of pycroscopy, please read this short introduction.
Jump to our GitHub project
This project begun largely as an effort by scientists and engineers at the Center for Nanophase Materials Sciences (CNMS) to implement a python library that can support the I/O, processing, and analysis of the gargantuan stream of images that their microscopes generate (thanks to the large CNMS users community!).
By sharing our methodology and code for analyzing materials imaging we hope that it will benefit the wider community of materials science/physics. We also hope, quite ardently, that other materials scientists would follow suit.
The core pycroscopy team consists of:
- @ssomnath (Suhas Somnath),
- @CompPhysChris (Chris R. Smith),
- @nlaanait (Numan Laanait),
- @stephenjesse (Stephen Jesse)
Substantial contributions from many developers including:
There is that little thing called open science…
As we see it, there are a few opportunities in microscopy / imaging and materials science:
- 1. Growing data sizes
- Cannot use desktop computers for analysis
- Need: High performance computing, storage resources and compatible, scalable file structures
- 2. Increasing data complexity
- Sophisticated imaging and spectroscopy modes resulting in 5,6,7… dimensional data
- Need: Robust software and generalized data formatting
- 3. Multiple file formats
- Different formats from each instrument. Proprietary in most cases
- Incompatible for correlation
- Need: Open, instrument independent data format
- 4. Disjoint communities
- Similar analysis routines written by each community (SPM, STEM, TOF SIMs, XRD…) independently!
- Need: Centralized repository, instrument agonistic analysis routines that bring communities together
- 5. Expensive analysis software
- Software supplied with instruments often insufficient / incapable of custom analysis routines
- Commercial software (Eg: Matlab, Origin..) are often prohibitively expensive.
- Need: Free, powerful, open souce, user-friendly software
- pycroscopy uses an instrument agnostic data structure that facilitates the storage of data, regardless of dimensionality (conventional 2D images to 9D multispectral SPM datasets) or instrument of origin (AFMs, STMs, STEMs, TOF SIMS, and many more).
- This general defenition of data allows us to write a single and generalized version of analysis and processing functions that can be applied to any kind of data.
- The data is stored in heirarchical
data format (HDF5)
- Allow easy and open acceess to data from any programming language.
- Accomodate datasets ranging from kilobytes (kB) to petabytes (pB)
- Are readily compaible with supercomputers and support parallel I/O
- Allows storage of relevant parameters along with data for improved traceability and reproducability of analysis
- Scientific workflows are developed and disseminated through jupyter notebooks that are interactive and portable web applications containing, text, images, code / scripts, and text-based and graphical results
- Once a user converts their microscope’s data format into a HDF5 format, by simply extending some of the classes in `io`, the user gains access to the rest of the utilities present in pycroscopy.*.
- The package structure is simple, with 4 main modules:
- io: Reading and writing to HDF5 files + translating data from custom & proprietary microscope formats to HDF5.
- processing: multivariate statistics, machine Learning, and signal filtering.
- analysis: model-dependent analysis of information.
- viz: Plotting functions and interactive jupyter widgets to visualize multidimenional data