Write/read SIDpy Dataset via pyNSID

Author: Maxim Ziatdinov

Date: September 2020

update: - Gerd Duscher 01/2021 (compatibility to pyNSID version 0.0.2)

A fast introduction into how to write SIDpy datasets to NSID formatted HDF5 files

Start with standard imports:

[1]:
# Ensure python 3 compatibility:
from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

import warnings

import h5py
import matplotlib.pylab as plt
import numpy as np

# we will also need a sidpy package
try:
    import sidpy
except ModuleNotFoundError:
    !pip3 install sidpy
    import sidpy

import pyNSID

warnings.filterwarnings("ignore", module="numpy.core.fromnumeric")
warnings.filterwarnings("ignore", module="pyNSID.io.nsi_reader")

Creating sidpy.Dataset object(s)

Let’s create a simple sidpy Dataset from a numpy array:

[2]:
dataset = sidpy.Dataset.from_array(np.random.random([4, 5, 10]), name='new')
dataset
[2]:
Array Chunk
Bytes 1.60 kB 1.60 kB
Shape (4, 5, 10) (4, 5, 10)
Count 1 Tasks 1 Chunks
Type float64 numpy.ndarray
10 5 4

Let’s also define the dataset attributes…

[3]:
dataset.data_type = 'SPECTRAL_IMAGE'
dataset.units = 'nA'
dataset.quantity = 'Current'

… and set individual dimensions. In the case of spectroscopic datasets, the first two dimensions are typically spatial units (e.g. nm) and the third one can be energy (e.g. \(meV\) or \(nm^{-1}\)).

[4]:
dataset.set_dimension(0, sidpy.Dimension(np.arange(dataset.shape[0]), 'x',
                                        units='nm', quantity='Length',
                                        dimension_type='spatial'))
dataset.set_dimension(1, sidpy.Dimension(np.linspace(-2, 2, num=dataset.shape[1], endpoint=True), 'y',
                                        units='nm', quantity='Length',
                                        dimension_type='spatial'))
dataset.set_dimension(2, sidpy.Dimension(np.sin(np.linspace(0, 2 * np.pi, num=dataset.shape[2])), 'bias',
                                        units='mV', quantity='Voltage',
                                        dimension_type='spectral'))
[5]:
print(dataset.dim_0)
print(dataset.dim_1)
print(dataset.dim_2)
x:  Length (nm) of size (4,)
y:  Length (nm) of size (5,)
bias:  Voltage (mV) of size (10,)

Writing sidpy.Dataset object(s) to HDF5 files

Load NSID-formatted h5 file:

[6]:
hf = h5py.File("test.hf5", 'a')

Let’s create a new channel where we are going to save our sidpy dataset:

[7]:
hf.create_group('Measurement_000/Channel_000')
print(*hf["Measurement_000"].keys())
Channel_000

Now let’s write our sidpy dataset into the newly created channel:

[8]:
pyNSID.hdf_io.write_nsid_dataset(dataset, hf['Measurement_000/Channel_000'], main_data_name="new_spectrum");
C:\Users\gduscher\Anaconda3\lib\site-packages\pyNSID\io\hdf_utils.py:351: FutureWarning: validate_h5_dimension may be removed in a future version
  warn('validate_h5_dimension may be removed in a future version',

Close h5 file:

[9]:
hf.close()

Reading sidpy.Dataset object(s)

Load back the file:

[10]:
hf = h5py.File("test.hf5", 'r+')
print(*hf["Measurement_000"].keys())
Channel_000

Find our dataset:

[11]:
dataset_hdf5  =  pyNSID.io.hdf_utils.find_dataset(hf,'new_spectrum')[0]
dataset_hdf5

[11]:
<HDF5 dataset "new_spectrum": shape (4, 5, 10), type "<f8">
[12]:
hf.close()

Read the dataset stored in HDF5 format as a sidpy object (Dataset) using NSIDReader:

[17]:
dr = pyNSID.NSIDReader('test.hf5')
dataset_sid = dr.read()[0]
assert isinstance(dataset_sid, sidpy.Dataset)
dataset_sid.plot()
../../_images/notebooks_00_basic_usage_write_read_sidpy_dataset_28_0.png
[18]:
print(dataset_sid)
dataset_sid
sidpy.Dataset of type SPECTRAL_IMAGE with:
 dask.array<generic, shape=(4, 5, 10), dtype=float64, chunksize=(4, 5, 10), chunktype=numpy.ndarray>
 data contains: Current (nA)
 and Dimensions:
x:  Length (nm) of size (4,)
y:  Length (nm) of size (5,)
bias:  Voltage (mV) of size (10,)
[18]:
Array Chunk
Bytes 1.60 kB 1.60 kB
Shape (4, 5, 10) (4, 5, 10)
Count 1 Tasks 1 Chunks
Type float64 numpy.ndarray
10 5 4

Close File

[19]:
dataset_sid.h5_dataset.file.close()
[ ]: