Motivation¶

Suhas Somnath

8/8/2017

The quest for understanding more about matter has necessitated the development of a multitude of instruments, each capable of numerous measurement modalities.

The Center for Nanophase Materials Science (CNMS) in Oak Ridge National Laboratory is home to several dozens of cutting-edge research instruments. Nearly all of these instruments are commercially available instruments, which generate and store data in different ways. The diversity of data formats was significantly impeding the sharing, correlation, analysis, and curation of data. These challenges were only exacerbated by the steady and frequent stream of visiting researchers who would visit the CNMS to conduct their research. As researchers supporting the user facility, we desperately needed a solution for handling data from our instruments. The sections below describe the challenges and concerns with regards to data structuring, storage, archival, curation, etc. in greater detail.

Proprietary file formats¶

Typically, each commercial instruments generates data files formatted in proprietary file formats by the instrument manufacturer. The proprietary nature of these file formats and the obfuscated data model within the files impede scientific progress in the following ways:

By making it challenging for researchers to extract data from these files
Impeding the correlation of data acquired from different instruments.
Inability to store results back into the same file
Inflexibility to accommodate few kilobytes to several gigabytes of data
Requiring different versions of analysis routines for each data format
In some cases, requiring proprietary software provided with the instrument to access the data

Future concerns¶

Several fields are moving towards the open science paradigm which will require journals and researchers to support journal papers with data and analysis software
US Federal agencies that support scientific research mandate that the data be stored in a manner that is open, standardized and curation-ready in order to meet both the guidelines for data sharing and satisfy the implementation of digital data management as outlined by the United States Department of Energy.

Motivation¶

Proprietary file formats¶

Future concerns¶

Other problems¶