{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Speed up computations with parallel_compute()\n\n**Suhas Somnath, Chris R. Smith**\n\n9/8/2017\n\n**This document will demonstrate how ``sidpy.proc.comp_utils.parallel_compute()`` can significantly speed up data processing by\nusing all available CPU cores in a computer**\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\nQuite often, we need to perform the same operation on every single component in our data. One of the most popular\nexamples is functional fitting applied to spectra collected at each location on a grid. While, the operation itself\nmay not take very long, computing this operation thousands of times, once per location, using a single CPU core can\ntake a long time to complete. Most personal computers today come with at least two cores, and in many cases, each of\nthese cores is represented via two logical cores, thereby summing to a total of at least four cores. Thus, it is\nprudent to make use of these unused cores whenever possible. Fortunately, there are a few python packages that\nfacilitate the efficient use of all CPU cores with minimal modifications to the existing code.\n\n``sidpy.proc.comp_utils.parallel_compute()`` is a very handy function that simplifies parallel computation significantly to a\n**single function call** and will be discussed in this document.\n\n## Example scientific problem\nFor this example, we will be working with a ``Band Excitation Piezoresponse Force Microscopy (BE-PFM)`` imaging dataset\nacquired from advanced atomic force microscopes. In this dataset, a spectra was collected for each position in a two\ndimensional grid of spatial locations. Thus, this is a three dimensional dataset that has been flattened to a two\ndimensional matrix in accordance with **Universal Spectroscopy and Imaging Data (USID)** model.\n\nEach spectra in this dataset is expected to have a single peak. The goal is to find the positions of the peaks in each\nspectra. Clearly, the operation of finding the peak in one spectra is independent of the same operation on another\nspectra. Thus, we could in theory divide the dataset in to N parts and use N CPU cores to compute the results much\nfaster than it would take a single core to compute the results. There is an important caveat to this statement and it\nwill be discussed at the end of this document.\n\n**Here, we will learn how to fit the thousands of spectra using all available cores on a computer.**\nNote, that this is applicable only for a single CPU. Please refer to another advanced example for multi-CPU computing.\n\n
In order to run this document on your own computer, you will need to:\n\n 1. Download the document as a Jupyter notebook using the link at the bottom of this page.\n 2. Save the contents of `this python file
This documentation is being generated automatically by a computer in the cloud whose workload cannot be controlled\n or predicted. Therefore, the computational times reported in this document may not be consistent and can even be\n contradictory. For best results, we recommend that download and run this document as a jupyter notebook.
This documentation is being generated automatically by a computer in the cloud whose workload cannot be controlled\n or predicted. Therefore, the computational times reported in this document may not be consistent and can even be\n contradictory. For best results, we recommend that download and run this document as a jupyter notebook.\n\n If everything ran correctly, you should see the computational time decrease substantially from 1 to 2 cores but\n the decrease from 2 to 3 or 3 to 4 cores should be minimal or negligible.