Frequently asked questions¶
pyUSID philosophy¶
How is pyUSID different from h5py?¶
Given that the vast majority of pyUSID’s capability is focused towards accessing and manipulating data in hierarchical
data format (HDF5) files, it is only natural that comparisons may be drawn between pyUSID
and h5py
(the popular python
package used by many for reading and writing to HDF5 files). pyUSID
does indeed use h5py
underneath and in no way
is an alternative to h5py
. Rather, pyUSID
provides functions that greatly simplify the task of reading and writing
to Universal Spectroscopy and Imaging Data (USID) formatted HDF5 files. In addition to file and data handling tools,
pyUSID also provides data visualization tools and a framework for processing data.
Is pyUSID specific to any communities?¶
Not at all. We have ensured that the basic data model, file formatting, and processing paradigm are general enough that they can be extended to any other scientific domain so long as each experiment involves N
identical observations of S
values.
Also, please see our answer to ‘Who uses pyUSID’ below:
Who uses pyUSID?¶
The Institute for Functional Imaging of Materials (IFIM) at Oak Ridge National Laboratory uses pycroscopy (built on pyUSID) exclusively for in-house research as well as supporting the numerous users who visit IFIM to use their state-of-art scanning probe microscopy techniques.
Synchrotron Radiation Research at Lund University
Nuclear Engineering and Health Physics, Idaho State University
Prof. David Ginger’s group at Department of Chemistry, University of Washington
Idaho National Laboratory
Central Michigan University
Iowa State University
George Western University
Brown University
University of Mons
and many more groups in universities and national labs.
Please get in touch with us if you would like your group / university to be added here.
Why is pyUSID written in python and not C / Fortran / Julia?¶
Here are some of the main reasons pyUSID is written in python:
Ease of use: One of the main objectives of pyUSID is to lower the barrier to advanced data analytics for domain scientists such as material scientists. A C++ / Fortran version of pyUSID would certainly have been more efficient than the current python code-base. However, the learning curve for writing efficient C code is far steeper compared to python / Julia / Matlab for the average domain scientist. Focusing on science is a big enough job for domain scientists and we want to make it as easy as possible to adopt pyUSID even for those who are novices at programming.
Optimized core packages: Furthermore, our code makes heavy use of highly efficient numerical and scientific libraries such as Numpy and Scipy that are comparable in speed to C so we do not expect our code to be substantially slower than C / Fortran.
Support: Julia is a (relatively) new language similar to python that promises to be as fast as C and as easy as python and purpose-built for efficient computing. However, as of this writing, Julia unfortunately still does not have open-source package ecosystem that is as large or diverse (think of the many packages necessary to read obscure proprietary file formats generated by instruments as an example) as python.
Industry standard: Furthermore, python’s unchallenged leadership in the data analytics / deep learning field have only validated it as the language of choice.
We welcome you to develop application programming interfaces (APIs) for languages besides python.
Why not implement Dask as a backend to pyUSID?¶
We have explored Dask.distributed
and Dask.arrays
as backends for how pyUSID handles data and (parallel) computing.
We encountered the following challenges that are beyond the scope of this package:
Serialization: Our plan was to create
lazy
Dask.Array
objects based on the HDF5 Datasets. However, Dask was unable to serialize and pickleh5py.Dataset
objects. Though there are potential work-arounds, further incompatibility makes it challenging to reconcile for every child class built onpyUSID.Process
Bookkeeping: A fundamental feature of
pyUSID.Process
is the ability to continue computation on a dataset after interruption. UsingDask.distrbiuted
to scale the computation means giving up on clean and predicable checkpointing, which means being unable to continue an interrupted computation.
Please see this issue on Dask’s repository for more information.
Why not just use the base HDF libraries or h5py instead?¶
USID represents all data, regardless of dimensionality as a flattened 2D matrix. The base HDF libraries and h5py do not know anything about USID and do not support USID-specific operations via simple 1-line commands such as:
writing a complex, N-dimensional data cube to disk,
folding back the 2D USID data from HDF5 into its original N-dimensional shape (if and where available)
slicing datasets by dimensions without reading the entire dataset into memory and reshaping it,
reducing specified dimensions, etc.
interactively visualizing N-dimensional datasets
All of the above functionalities (and far more) are available in pyUSID.
Moreover, pyUSID also provides numerous plotting utilities, that we ourselves have used for generating publication-quality figures.
Finally, pyUSID also provides computational tools and a convenient framework for translating scientific problems into computational problems via the pyUSID.Process
class.
pyUSID is written in python, so it is going to be slow since it cannot use all the cores on my CPU, right?¶
Actually, all data processing / analysis algorithms we have written using pyUSID.Process
so far can use every single core on your CPU. Given N CPU cores, you should notice a nearly N-fold speed up in your computation.
Note that the goal of pyUSID was never to maximize performance but rather to simplify and lower the barrier for the average scientist who may not be an expert programmer.
By default, we set aside 1-2 cores for the operating system and other user applications such as an internet browser, Microsoft Word, etc.
Using pyUSID¶
I don’t know programming. Does this preclude me from using pyUSID?¶
Not at all. One of the tenets of pyUSID is lowering the barrier for scientists and researchers. To this end, we have put together a list of useful tutorials and examples and examples to guide you. You should have no trouble getting started even if you do not know programming. That being said, you would be able to make the fullest use of pyUSID if you knew basic programming in python.
What sort of computer do I need to run pyUSID?¶
You can use practically any laptop / desktop / virtual machine running Windows / Mac OS / Linux. pyUSID is not tested on 32 bit operating systems (very rare).
I am not able to find an example on topic X / I find tutorial Y confusing / I need help!¶
We appreciate your feedback regarding the documentation. Please contact us and we will add / improve our documentation.
What do I do when something is broken?¶
Often, others may have encountered the same problem and may have brought up a similar issue. Try searching on google and trying out some suggested solutions. If this does not work, raise an issue
here and one of us will work with you to resolve the problem.
How can I reference pyUSID?¶
Please reference our Arxiv paper for now. This manuscript was submitted to Advanced Structural and Chemical Imaging recently and is currently being peer-reviewed.
Becoming a part of the effort¶
I don’t know python / I don’t think I write great python code. Does this preclude me from contributing to pyUSID?¶
Not really. Python is far easier to learn than many languages. If you know Matlab, Julia, C++, Fortran or any other programming language. You should not have a hard time reading our code or contributing to the codebase.
You can still contribute your code.
I would like to help but I don’t know programming¶
Your contributions are very valuable to the imaging and scientific community at large. You can help even if you DON’T know how to program!
You can spread the word - tell anyone who you think may benefit from using pyUSID.
Tell us what you think of our documentation or share your own.
Let us know what you would like to see in pyUSID.
Put us in touch with others working on similar efforts so that we can join forces.
I would like to help and I am OK at programming¶
Chances are that you are far better at python than you might think! Interesting tidbit - The (first version of the) first module of pyUSID was written less than a week after we learnt how to write code in python. We weren’t great programmers when we began but we would like to think that we have gotten a lot better since then.
There are several things we want to improve or add. Please get in touch to start a conversation.
Can you add my code to pyUSID?¶
Please see our guidelines for contributing code