sidpy.proc.fitter_refactor.SidpyFitterRefactor

class sidpy.proc.fitter_refactor.SidpyFitterRefactor(dataset, model_function, guess_function, ind_dims=None, num_params=None)[source]

Bases: object

A parallelized fitter for sidpy.Datasets that supports K-Means-based initial guesses for improved convergence on large datasets.

dataset

The original sidpy dataset containing data and metadata.

Type:

sidpy.Dataset

dask_data

The underlying dask array used for parallel computation.

Type:

dask.array.Array

model_func

The function to fit. Expected signature: f(x_axis, *params).

Type:

callable

guess_func

The function to generate initial guesses. Expected signature: f(x_axis, y_data).

Type:

callable

metadata

A dictionary containing fit parameters, model source code, and configuration.

Type:

dict

Initializes the SidpyFitterKMeans.

Inputs

datasetsidpy.Dataset

Dataset to be fitted.

model_functioncallable

The model function to use for fitting.

guess_functioncallable

The function to generate initial parameters for the model.

ind_dimsint or tuple of int, optional

The indices of the dimensions to fit over. Default is whatever are the spectral dimensions

num_params: int, optional but required in case of 2D or higher fitting

The number of parameters the fitting function expects.

Methods

do_fit

Executes the parallel fit.

do_guess

Parallelized guess logic across all pixels.

do_kmeans_guess

Performs K-Means clustering to find representative spectra for prior fitting.

reconstruct_function

Reconstructs a python function from source code stored in metadata.

setup_calc

Prepares the calculation by rechunking and determining the parameter count.

transform_to_sidpy

Convert the fit results into sidpy.Dataset(s).

do_fit(guesses=None, use_kmeans=False, n_clusters=10, fit_parameter_labels=None, loss='linear', f_scale=1.0, return_cov=False)[source]

Executes the parallel fit.

Parameters:
  • guesses (dask.array.Array, optional) – Initial guesses. If None, generated automatically.

  • use_kmeans (bool, optional) – Whether to use K-means priors. Default is False.

  • n_clusters (int, optional) – Number of clusters if use_kmeans is True. Default is 10.

  • fit_parameter_labels (list of str, optional) – List of string labels for the fit parameters (e.g. [‘Amp’, ‘Phase’]). These are simply saved in metadata.

  • loss (str, optional) – Loss function for least_squares (e.g., ‘linear’, ‘soft_l1’, ‘huber’, ‘cauchy’, ‘arctan’).

  • f_scale (float, optional) – Value of soft margin between inlier and outlier residuals. Default is 1.0.

  • return_cov (bool, optional) – If True, returns a tuple (fit_dataset, cov_dataset). The cov_dataset contains the covariance matrix for the fit parameters. CAUTION: This significantly increases memory usage.

Returns:

If return_cov is False: returns the Fit Parameter dataset. If return_cov is True: returns (Fit Parameter dataset, Covariance Matrix dataset).

Return type:

sidpy.Dataset or tuple(sidpy.Dataset, sidpy.Dataset)

do_guess()[source]

Parallelized guess logic across all pixels.

do_kmeans_guess(n_clusters=10)[source]

Performs K-Means clustering to find representative spectra for prior fitting. We use Dask-ML Kmeans to do this in a scalable fashion.

Parameters:

n_clusters (int, optional) – Number of clusters to use for K-Means. Default is 10.

Returns:

A dask array containing the initial guesses for every pixel.

Return type:

dask.array.Array

static reconstruct_function(source_code_input, context=None)[source]

Reconstructs a python function from source code stored in metadata. Robustly handles lists, strings, and indentation issues.

setup_calc(chunks='auto')[source]

Prepares the calculation by rechunking and determining the parameter count.

Parameters:

chunks (str or tuple, optional) – The chunk size for the dask array. Default is ‘auto’.

transform_to_sidpy(fit_dask_array)[source]

Convert the fit results into sidpy.Dataset(s). Handles splitting parameters and covariance if present.