{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating a Reader\n", "\n", "**Suhas Somnath, Sudhajit Misra**\n", "\n", "10/9/2020\n", "\n", "This document illustrates an example of extracting data and metadata out of proprietary raw\n", "data files, thereby describing how one would write a ``sidpy.Reader`` class.\n", "\n", "The captured information would be populated into a / set of ``sidpy.Dataset``\n", "object(s) as appropriate.\n", "\n", "## Introduction\n", "In most scientific disciplines, commercial instruments tend to write the data and metadata out into proprietary file\n", "formats that significantly impede access to the data and metadata, thwart sharing of data and correlation of data from\n", "multiple instruments, and complicate long-term archival, among other things. One of the data wrangling steps in science\n", "is the extraction of the data and metadata out of the proprietary file formats and writing the information into files\n", "that are easier to access, share, etc. The overwhelming part of this data wrangling effort is in investigating how to\n", "extract the data and metadata into memory. Often, the data and parameters in these files are **not** straightforward to\n", "access. In certain cases, additional / dedicated software packages are necessary to access the data while in many other\n", "cases, it is possible to extract the necessary information from built-in **numpy** or similar python packages included\n", "with **anaconda**. Once the information is accessible in the computer memory, such as in the\n", "form of numpy arrays, scientists have a wide variety of tools to write the data out into files.\n", "\n", "Simpler data such as images or single spectra can easily be written into plain text files. Simple or complex / large /\n", "multidimensional data can certainly be stored as numpy data files. However, there are significant drawbacks to writing\n", "data into non-standardized structures or file formats. First, while the structure of the data and metadata may be\n", "intuitive for the original author of the data, that may not be the case for another researcher. Furthermore, such\n", "formatting may change from a day-to-day basis. As a consequence, it becomes challenging to develop code that can accept\n", "such data whose format keeps changing.\n", "\n", "One solution to these challenges is to write the data out into standardized files such as ``h5USID`` files.\n", "The USID model aims to make data access, storage, curation, etc. simply by storing the data along with all\n", "relevant parameters in a single file (HDF5 for now).\n", "\n", "The process of copying data from the original format to **h5USID** files is called\n", "**Translation** and the classes available in pyUSID and children packages such as pycroscopy that perform these\n", "operation are called **Translators**.\n", "\n", "As we alluded to earlier, the process of developing a ``sidpy.Reader`` can be\n", "broken down into two basic components:\n", "\n", "1. Extracting data and metadata out of the proprietary file format\n", "2. Populating one or more ``sidpy.Dataset`` objects as necessary\n", "\n", "This process is the same regardless of the origin, complexity, or size of the scientific data. It is not necessary that\n", "the two components be disjoint - there are many situations where both components may need to happen simultaneously\n", "especially when the data sizes are very large.\n", "\n", "The goal of this document is to demonstrate how one would extract data and parameters from a raw data file. For the purposes of demonstration, we use Scanning Tunnelling Spectroscopy (STS) raw data file obtained from an Omicron Scanning Tunneling Microscope (STM).\n", "In this dataset, a spectra was collected for each position in a two-dimensional grid of spatial locations, thereby\n", "resulting in a 3D dataset.\n", "\n", "The code in this example is an abbreviation of the\n", "`AscTranslator `_\n", "available in our sister package - `pycroscopy`.\n", "\n", "### Recommended pre-requisite reading\n", "\n", "Before proceeding with this example, we recommend learning about ``Sidpy.Reader``:\n", "\n", ".. tip::\n", " You can download and run this document as a Jupyter notebook using the link at the bottom of this page.\n", "\n", "### Import all necessary packages\n", "There are a few setup procedures that need to be followed before any code is written. In this step, we simply load a\n", "few python packages that will be necessary in the later steps." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Ensure python 3 compatibility:\n", "from __future__ import division, print_function, absolute_import, unicode_literals\n", "\n", "# The package for accessing files in directories, etc.:\n", "import os\n", "import zipfile\n", "\n", "\n", "# Warning package in case something goes wrong\n", "from warnings import warn\n", "import subprocess\n", "import sys\n", "\n", "def install(package):\n", " subprocess.call([sys.executable, \"-m\", \"pip\", \"install\", package])\n", "# Package for downloading online files:\n", "try:\n", " # This package is not part of anaconda and may need to be installed.\n", " import wget\n", "except ImportError:\n", " warn('wget not found. Will install with pip.')\n", " import pip\n", " install(wget)\n", " import wget\n", "\n", "# The mathematical computation package:\n", "import numpy as np\n", "\n", "# The package used for creating and manipulating HDF5 files:\n", "import h5py\n", "\n", "# Packages for plotting:\n", "import matplotlib.pyplot as plt\n", "\n", "# import sidpy - supporting package for creating Dataset object:\n", "try:\n", " import sidpy\n", "except ImportError:\n", " warn('sidpy not found. Will install with pip.')\n", " import pip\n", " install('sidpy')\n", " import sidpy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Procure the Raw Data file\n", "\n", "Here we will download a compressed data file from Github and unpack it:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "url = 'https://raw.githubusercontent.com/pycroscopy/pyUSID/master/data/STS.zip'\n", "zip_path = 'STS.zip'\n", "if os.path.exists(zip_path):\n", " os.remove(zip_path)\n", "_ = wget.download(url, zip_path, bar=None)\n", "\n", "zip_path = os.path.abspath(zip_path)\n", "# figure out the folder to unzip the zip file to\n", "folder_path, _ = os.path.split(zip_path)\n", "zip_ref = zipfile.ZipFile(zip_path, 'r')\n", "# unzip the file\n", "zip_ref.extractall(folder_path)\n", "zip_ref.close()\n", "# delete the zip file\n", "os.remove(zip_path)\n", "\n", "data_file_path = 'STS.asc'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Extracting data and metadata from proprietary files\n", "\n", "### 1.1 Explore the raw data file\n", "\n", "Inherently, one may not know how to read these ``.asc`` files. One option is to try and read the file as a text file\n", "one line at a time.\n", "\n", "If one is lucky, as in the case of these ``.asc`` files, the file can be read like conventional text files.\n", "\n", "Here is how we tested to see if the ``asc`` files could be interpreted as text files. Below, we read just the first 10\n", "lines in the file" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# File Format = ASCII\n", "# Created by SPIP 4.6.5.0 2016-09-22 13:32\n", "# Original file: C:\\Users\\Administrator\\AppData\\Roaming\\Omicron NanoTechnology\\MATRIX\\default\\Results\\16-Sep-2016\\I(V) TraceUp Tue Sep 20 09.17.08 2016 [14-1] STM_Spectroscopy STM\n", "# x-pixels = 100\n", "# y-pixels = 100\n", "# x-length = 29.7595\n", "# y-length = 29.7595\n", "# x-offset = -967.807\n", "# y-offset = -781.441\n", "# z-points = 500\n" ] } ], "source": [ "with open(data_file_path, 'r') as file_handle:\n", " for lin_ind in range(10):\n", " print(file_handle.readline().replace('\\n', ''))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.2 Read the contents of the file\n", "\n", "Now that we know that these files are simple text files, we can manually go through the file to find out which lines\n", "are important, at what lines the data starts etc.\n", "Manual investigation of such ``.asc`` files revealed that these files are always formatted in the same way. Also, they\n", "contain instrument- and experiment-related parameters in the first ``403`` lines and then contain data which is\n", "arranged as one pixel per row.\n", "\n", "STS experiments result in 3 dimensional datasets ``(X, Y, current)``. In other words, a 1D array of current data (as a\n", "function of excitation bias) is sampled at every location on a two dimensional grid of points on the sample.\n", "By knowing where the parameters are located and how the data is structured, it is possible to extract the necessary\n", "information from these files.\n", "\n", "Since we know that the data sizes (<200 MB) are much smaller than the physical memory of most computers, we can start\n", "by safely loading the contents of the entire file to memory." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "#Reading the entire file into memory\n", "with open(data_file_path, 'r') as file_handle:\n", " string_lines = file_handle.readlines()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.3 Extract the metadata\n", "\n", "In the case of these ``.asc`` files, the parameters are present in the first few lines of the file. Below we will\n", "demonstrate how we parse the first 17 lines to extract some very important parameters. Note that there are several\n", "other important parameters in the next 350 or so lines. However, in the interest of brevity, we will focus only on the\n", "first few lines of the file. \n", "\n", "The interested reader is recommended to read the ``ASCTranslator`` available in ``pycroscopy`` for more complete details." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x-pixels :\t 100\n", "y-pixels :\t 100\n", "x-length :\t 29.7595\n", "y-length :\t 29.7595\n", "x-offset :\t -967.807\n", "y-offset :\t -781.441\n", "z-points :\t 500\n", "z-section :\t 491\n", "z-unit :\t nV\n", "z-range :\t 2000000000\n", "z-offset :\t 1116.49\n", "value-unit :\t nA\n", "scanspeed :\t 59519000000\n", "voidpixels :\t 0\n" ] } ], "source": [ "# Preparing an empty dictionary to store the metadata / parameters as key-value pairs\n", "parm_dict = dict()\n", "\n", "# Reading parameters stored in the first few rows of the file\n", "for line in string_lines[3:17]:\n", " # Remove the hash / pound symbol, if any\n", " line = line.replace('# ', '')\n", " # Remove new-line escape-character, if any\n", " line = line.replace('\\n', '')\n", " # Break the line into two parts - the parameter name and the corresponding value\n", " temp = line.split('=')\n", " # Remove spaces in the value. Remember, the value is still a string and not a number\n", " test = temp[1].strip()\n", " # Now, attempt to convert the value to a number (floating point):\n", " try:\n", " test = float(test)\n", " # In certain cases, the number is actually an integer, check and convert if it is:\n", " if test % 1 == 0:\n", " test = int(test)\n", " except ValueError:\n", " pass\n", " parm_dict[temp[0].strip()] = test\n", "\n", "# Print out the parameters extracted\n", "for key in parm_dict.keys():\n", " print(key, ':\\t', parm_dict[key]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point, we recommend reformatting the parameter names to standardized nomenclature.\n", "We realize that the materials imaging community has not yet agreed upon standardized nomenclature for metadata.\n", "Therefore, we leave this as an optional, yet recommended step. \n", "\n", "For example, in pycroscopy, we may categorize the number of rows and columns in an image under ``grid`` and\n", "data sampling parameters under ``IO``.As an example, we may rename ``x-pixels`` to ``positions_num_cols`` and ``y-pixels`` to ``positions_num_rows``.\n", "\n", "### 1.4 Extract parameters that define dimensions\n", "\n", "Just having the metadata above and the main measurement data is insufficient to fully describe experimental data.\n", "We also need to know how the experimental parameters were varied to acquire the multidimensional dataset at hand.\n", "In other words, we need to answer how the grid of locations was defined and how the bias was varied to acquire the\n", "current information at each location. This is precisely what we will do below.\n", "\n", "Since, we did not parse the entire list of parameters present in the file above, we will need to make some up.\n", "Please refer to the formal ``ASCTranslator`` to see how this step would have been different." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "\n", "num_rows = int(parm_dict['y-pixels'])\n", "num_cols = int(parm_dict['x-pixels'])\n", "num_pos = num_rows * num_cols\n", "spectra_length = int(parm_dict['z-points'])\n", "\n", "# We will assume that data was collected from -3 nm to +7 nm on the Y-axis or along the rows\n", "y_vec = np.linspace(-3, 7, num_rows, endpoint=True)\n", "\n", "# We will assume that data was collected from -5 nm to +5 nm on the X-axis or along the columns\n", "x_vec = np.linspace(-5, 5, num_cols, endpoint=True)\n", "\n", "# The bias was sampled from -1 to +1 V in the experiment. Here is how we generate the Bias axis:\n", "bias_vec = np.linspace(-1, 1, spectra_length)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.5 Extract the data\n", "\n", "We have observed that the data in these ``.asc`` files are consistently present after the first ``403`` lines of\n", "parameters. Using this knowledge, we need to populate a data array using data that is currently present as text lines\n", "in memory (from step 2).\n", "\n", "These ``.asc`` file store the 3D data (X, Y, spectra) as a 2D matrix (positions, spectra). In other words, the spectra\n", "are arranged one below another. Thus, reading the 2D matrix from top to bottom, the data arranged column-by-column,\n", "and then row-by-row So, for simplicity, we will prepare an empty 2D numpy array to store the data as it exists in the\n", "raw data file.\n", "\n", "Recall that in step 2, we were lucky enough to read the entire data file into memory given its small size.\n", "The data is already present in memory as a list of strings that need to be parsed as a matrix of numbers." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "num_headers = 403\n", "\n", "raw_data_2d = np.zeros(shape=(num_pos, spectra_length), dtype=np.float32)\n", "\n", "# Iterate over every measurement position:\n", "for pos_index in range(num_pos):\n", " # First, get the correct (string) line corresponding to the current measurement position.\n", " # Recall that we would need to skip the many header lines to get to the data\n", " this_line = string_lines[num_headers + pos_index]\n", " # Each (string) line contains numbers separated by tabs (``\\t``). Let us break the line into several shorter strings\n", " # each containing one number. We will ignore the last entry since it is empty.\n", " string_spectrum = this_line.split('\\t')[:-1] # omitting the new line\n", " # Now that we have a list of numbers represented as strings, we need to convert this list to a 1D numpy array\n", " # the converted array is set to the appropriate position in the main 2D array.\n", " raw_data_2d[pos_index] = np.array(string_spectrum, dtype=np.float32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the data is so large that it cannot fit into memory, we would need to read data one (or a few) position(s) at a time, process it (e.g. convert from string to numbers), and write it to the HDF5 file without keeping much or any data in memory.\n", "\n", "The three-dimensional dataset (``Y``, ``X``, ``Bias``) is currently represented as a two-dimensional array:\n", "(``X`` * ``Y``, ``Bias``). To make it easier for us to understand and visualize, we can turn it into a 3D array:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Shape of 2D data: (10000, 500), Shape of 3D data: (100, 100, 500)\n" ] } ], "source": [ "raw_data_3d = raw_data_2d.reshape(num_rows, num_cols, spectra_length)\n", "print('Shape of 2D data: {}, Shape of 3D data: {}'.format(raw_data_2d.shape, raw_data_3d.shape))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just as we did for the parameters (``X``, ``Y``, and ``Bias``) that were varied in the experiment, we need to specify the quantity that is recorded from the sensors / detectors, units, and what the data represents:\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "main_data_name = 'STS'\n", "main_qty = 'Current'\n", "main_units = 'nA'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualize the extracted data\n", "Here is a visualization of the current-voltage spectra at a few locations:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support. ' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " if (mpl.ratio != 1) {\n", " fig.send_message(\"set_dpi_ratio\", {'dpi_ratio': mpl.ratio});\n", " }\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " fig.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '
');\n", " var titletext = $(\n", " '
');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('
');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var backingStore = this.context.backingStorePixelRatio ||\n", "\tthis.context.webkitBackingStorePixelRatio ||\n", "\tthis.context.mozBackingStorePixelRatio ||\n", "\tthis.context.msBackingStorePixelRatio ||\n", "\tthis.context.oBackingStorePixelRatio ||\n", "\tthis.context.backingStorePixelRatio || 1;\n", "\n", " mpl.ratio = (window.devicePixelRatio || 1) / backingStore;\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width * mpl.ratio);\n", " canvas.attr('height', height * mpl.ratio);\n", " canvas.attr('style', 'width: ' + width + 'px; height: ' + height + 'px;');\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('
');\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('');\n", " button.click(method_name, toolbar_event);\n", " button.mouseover(tooltip, toolbar_mouse_event);\n", " nav_element.append(button);\n", " }\n", "\n", " // Add the status bar.\n", " var status_bar = $('');\n", " nav_element.append(status_bar);\n", " this.message = status_bar[0];\n", "\n", " // Add the close button to the window.\n", " var buttongrp = $('
');\n", " var button = $('');\n", " button.click(function (evt) { fig.handle_close(fig, {}); } );\n", " button.mouseover('Stop Interaction', toolbar_mouse_event);\n", " buttongrp.append(button);\n", " var titlebar = this.root.find($('.ui-dialog-titlebar'));\n", " titlebar.prepend(buttongrp);\n", "}\n", "\n", "mpl.figure.prototype._root_extra_style = function(el){\n", " var fig = this\n", " el.on(\"remove\", function(){\n", "\tfig.close_ws(fig, {});\n", " });\n", "}\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(el){\n", " // this is important to make the div 'focusable\n", " el.attr('tabindex', 0)\n", " // reach out to IPython and tell the keyboard manager to turn it's self\n", " // off when our div gets focus\n", "\n", " // location in version 3\n", " if (IPython.notebook.keyboard_manager) {\n", " IPython.notebook.keyboard_manager.register_events(el);\n", " }\n", " else {\n", " // location in version 2\n", " IPython.keyboard_manager.register_events(el);\n", " }\n", "\n", "}\n", "\n", "mpl.figure.prototype._key_event_extra = function(event, name) {\n", " var manager = IPython.notebook.keyboard_manager;\n", " if (!manager)\n", " manager = IPython.keyboard_manager;\n", "\n", " // Check for shift+enter\n", " if (event.shiftKey && event.which == 13) {\n", " this.canvas_div.blur();\n", " // select the cell after this one\n", " var index = IPython.notebook.find_cell_index(this.cell_info[0]);\n", " IPython.notebook.select(index + 1);\n", " }\n", "}\n", "\n", "mpl.figure.prototype.handle_save = function(fig, msg) {\n", " fig.ondownload(fig, null);\n", "}\n", "\n", "\n", "mpl.find_output_cell = function(html_output) {\n", " // Return the cell and output element which can be found *uniquely* in the notebook.\n", " // Note - this is a bit hacky, but it is done because the \"notebook_saving.Notebook\"\n", " // IPython event is triggered only after the cells have been serialised, which for\n", " // our purposes (turning an active figure into a static one), is too late.\n", " var cells = IPython.notebook.get_cells();\n", " var ncells = cells.length;\n", " for (var i=0; i= 3 moved mimebundle to data attribute of output\n", " data = data.data;\n", " }\n", " if (data['text/html'] == html_output) {\n", " return [cell, data, j];\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\n", "// Register the function which deals with the matplotlib target/channel.\n", "// The kernel may be null if the page has been refreshed.\n", "if (IPython.notebook.kernel != null) {\n", " IPython.notebook.kernel.comm_manager.register_target('matplotlib', mpl.mpl_figure_comm);\n", "}\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sidpy\\viz\\plot_utils\\image.py:404: MatplotlibDeprecationWarning: Since 3.2, mpl_toolkits's own colorbar implementation is deprecated; it will be removed two minor releases later. Set the 'mpl_toolkits.legacy_colorbar' rcParam to False to use Matplotlib's default colorbar implementation and suppress this deprecation warning.\n", " cb = axes.cbar_axes[0].colorbar(im)\n", "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\mpl_toolkits\\axes_grid1\\axes_grid.py:51: MatplotlibDeprecationWarning: \n", "The mpl_toolkits.axes_grid1.colorbar module was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use matplotlib.colorbar instead.\n", " from .colorbar import Colorbar\n" ] } ], "source": [ "fig, axes = sidpy.plot_utils.plot_map_stack(raw_data_3d, reverse_dims=True, pad_mult=(0.15, 0.15),\n", " title='Spatial maps of current at different bias', stdevs=2,\n", " color_bar_mode='single', num_ticks=3, x_vec=x_vec, y_vec=y_vec,\n", " evenly_spaced=True, fig_mult=(3, 3), title_yoffset=0.95)\n", "\n", "for axis, bias_ind in zip(axes, np.linspace(0, len(bias_vec), 9, endpoint=False, dtype=np.uint)):\n", " axis.set_title('Bias = %3.2f V' % bias_vec[bias_ind])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Populating the ``Dataset`` object\n", "\n", "Now that we are able to read the data from the Raw Data file, we can write this data into a ``sidpy.Dataset`` object" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sidpy.Dataset of type UNKNOWN with:\n", " dask.array\n", " data contains: generic (generic)\n", " and Dimensions: \n", " a: generic (generic) of size (100,)\n", " b: generic (generic) of size (100,)\n", " c: generic (generic) of size (500,)\n" ] } ], "source": [ "data_set = sidpy.Dataset.from_array(raw_data_3d, name='Raw_Data')\n", "print(data_set)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we specify the dimensions. Since ``data_set`` is a ``sidpy.Dataset`` object, we use the ``set_dimension`` method of ``sidpy`` to set the ``Dimension`` attributes." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "data_set.set_dimension(0, sidpy.Dimension(y_vec, name='y',units='nm',\n", " quantity='Length',\n", " dimension_type='spatial'))\n", "data_set.set_dimension(1, sidpy.Dimension(x_vec, name='x', units='nm',\n", " quantity='Length',\n", " dimension_type='spatial'))\n", "data_set.set_dimension(2, sidpy.Dimension( bias_vec, name='bias',\n", " quantity='Bias',\n", " dimension_type='spectral'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Generic top level metadata can be added as you go along" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "data_set.data_type = sidpy.DataTypes.SPECTRAL_IMAGE\n", "data_set.units = main_units\n", "data_set.quantity = 'Current'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instrument-specific metadata" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "data_set.metadata = parm_dict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Viewing the metadata" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "{'x-pixels': 100,\n", " 'y-pixels': 100,\n", " 'x-length': 29.7595,\n", " 'y-length': 29.7595,\n", " 'x-offset': -967.807,\n", " 'y-offset': -781.441,\n", " 'z-points': 500,\n", " 'z-section': 491,\n", " 'z-unit': 'nV',\n", " 'z-range': 2000000000,\n", " 'z-offset': 1116.49,\n", " 'value-unit': 'nA',\n", " 'scanspeed': 59519000000,\n", " 'voidpixels': 0}" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_set.metadata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Visualize the ``Dataset`` object" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support. ' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " if (mpl.ratio != 1) {\n", " fig.send_message(\"set_dpi_ratio\", {'dpi_ratio': mpl.ratio});\n", " }\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " fig.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '
');\n", " var titletext = $(\n", " '
');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('
');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var backingStore = this.context.backingStorePixelRatio ||\n", "\tthis.context.webkitBackingStorePixelRatio ||\n", "\tthis.context.mozBackingStorePixelRatio ||\n", "\tthis.context.msBackingStorePixelRatio ||\n", "\tthis.context.oBackingStorePixelRatio ||\n", "\tthis.context.backingStorePixelRatio || 1;\n", "\n", " mpl.ratio = (window.devicePixelRatio || 1) / backingStore;\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width * mpl.ratio);\n", " canvas.attr('height', height * mpl.ratio);\n", " canvas.attr('style', 'width: ' + width + 'px; height: ' + height + 'px;');\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('
');\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('');\n", " button.click(method_name, toolbar_event);\n", " button.mouseover(tooltip, toolbar_mouse_event);\n", " nav_element.append(button);\n", " }\n", "\n", " // Add the status bar.\n", " var status_bar = $('');\n", " nav_element.append(status_bar);\n", " this.message = status_bar[0];\n", "\n", " // Add the close button to the window.\n", " var buttongrp = $('
');\n", " var button = $('');\n", " button.click(function (evt) { fig.handle_close(fig, {}); } );\n", " button.mouseover('Stop Interaction', toolbar_mouse_event);\n", " buttongrp.append(button);\n", " var titlebar = this.root.find($('.ui-dialog-titlebar'));\n", " titlebar.prepend(buttongrp);\n", "}\n", "\n", "mpl.figure.prototype._root_extra_style = function(el){\n", " var fig = this\n", " el.on(\"remove\", function(){\n", "\tfig.close_ws(fig, {});\n", " });\n", "}\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(el){\n", " // this is important to make the div 'focusable\n", " el.attr('tabindex', 0)\n", " // reach out to IPython and tell the keyboard manager to turn it's self\n", " // off when our div gets focus\n", "\n", " // location in version 3\n", " if (IPython.notebook.keyboard_manager) {\n", " IPython.notebook.keyboard_manager.register_events(el);\n", " }\n", " else {\n", " // location in version 2\n", " IPython.keyboard_manager.register_events(el);\n", " }\n", "\n", "}\n", "\n", "mpl.figure.prototype._key_event_extra = function(event, name) {\n", " var manager = IPython.notebook.keyboard_manager;\n", " if (!manager)\n", " manager = IPython.keyboard_manager;\n", "\n", " // Check for shift+enter\n", " if (event.shiftKey && event.which == 13) {\n", " this.canvas_div.blur();\n", " // select the cell after this one\n", " var index = IPython.notebook.find_cell_index(this.cell_info[0]);\n", " IPython.notebook.select(index + 1);\n", " }\n", "}\n", "\n", "mpl.figure.prototype.handle_save = function(fig, msg) {\n", " fig.ondownload(fig, null);\n", "}\n", "\n", "\n", "mpl.find_output_cell = function(html_output) {\n", " // Return the cell and output element which can be found *uniquely* in the notebook.\n", " // Note - this is a bit hacky, but it is done because the \"notebook_saving.Notebook\"\n", " // IPython event is triggered only after the cells have been serialised, which for\n", " // our purposes (turning an active figure into a static one), is too late.\n", " var cells = IPython.notebook.get_cells();\n", " var ncells = cells.length;\n", " for (var i=0; i= 3 moved mimebundle to data attribute of output\n", " data = data.data;\n", " }\n", " if (data['text/html'] == html_output) {\n", " return [cell, data, j];\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\n", "// Register the function which deals with the matplotlib target/channel.\n", "// The kernel may be null if the page has been refreshed.\n", "if (IPython.notebook.kernel != null) {\n", " IPython.notebook.kernel.comm_manager.register_target('matplotlib', mpl.mpl_figure_comm);\n", "}\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "my_data.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also bin the data for viewing. Let us try viewing the data as 10 X 10 pixel size" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "my_data.view.set_bin([10, 10])\n", "my_data.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cleaning up\n", "Remove the original data file to free up space:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "os.remove(data_file_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More information\n", "\n", "``sidpy`` package is used to create the ``Dataset`` object in this example. This package provides utilities for storing, processing and visualizing spectroscopic and imaging data. \n", "It is recommended the user familiarize themselves with ``sidpy``. These example `notebooks `_ demonstrate how to create and visualize ``Dataset`` objects.\n", "\n", "Our sister class - pycroscopy, has several\n", "`translators `_ that translate popular\n", "file formats generated by nanoscale imaging instruments.\n", "These will be moved to ScopeReaders soon\n", "\n", "We have found python packages online to open a few proprietary file formats and have written translators using these\n", "packages. If you are having trouble reading the data in your files and cannot find any packages online, consider\n", "contacting the manufacturer of the instrument which generated the data in the proprietary format for help.\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }