sidpy.proc.fitter_refactor.SidpyFitterRefactor¶

class sidpy.proc.fitter_refactor.SidpyFitterRefactor(dataset, model_function, guess_function, ind_dims=None, num_params=None, lower_bounds=None, upper_bounds=None)[source]¶

Bases: object

A parallelized fitter for sidpy.Datasets that supports K-Means-based initial guesses for improved convergence on large datasets.

dataset¶

The original sidpy dataset containing data and metadata.

Type:: sidpy.Dataset

dask_data¶

The underlying dask array used for parallel computation.

Type:: dask.array.Array

model_func¶

The function to fit. Expected signature: f(x_axis, *params).

Type:: callable

guess_func¶

The function to generate initial guesses. Expected signature: f(x_axis, y_data).

Type:: callable

metadata¶

A dictionary containing fit parameters, model source code, and configuration.

Type:: dict

Initializes the SidpyFitterKMeans.

Inputs¶

datasetsidpy.Dataset

Dataset to be fitted.

model_functioncallable

The model function to use for fitting.

guess_functioncallable

The function to generate initial parameters for the model.

ind_dimsint or tuple of int, optional

The indices of the dimensions to fit over. Default is whatever are the spectral dimensions

num_params: int, optional but required in case of 2D or higher fitting

The number of parameters the fitting function expects.

lower_boundsNone, float, or array-like, optional

Lower bounds for the fit parameters. Can be:

None (default): no lower bound (-inf) on any parameter.
scalar float: the same lower bound applied to every parameter.
array-like of length num_params: per-parameter lower bounds.

Must satisfy lower_bounds <= upper_bounds element-wise.

upper_boundsNone, float, or array-like, optional

Upper bounds for the fit parameters. Same rules as lower_bounds. Must satisfy upper_bounds >= lower_bounds element-wise.

Methods

`do_fit`	Executes the parallel fit.
`do_guess`	Parallelized guess logic across all pixels.
`do_kmeans_guess`	Performs K-Means clustering to find representative spectra for prior fitting.
`reconstruct_function`	Reconstructs a python function from source code stored in metadata.
`setup_calc`	Prepares the calculation by rechunking and determining the parameter count.
`transform_to_sidpy`	Convert the fit results into sidpy.Dataset(s).

do_fit(guesses=None, use_kmeans=False, n_clusters=10, fit_parameter_labels=None, loss='linear', f_scale=1.0, return_cov=False)[source]¶

Executes the parallel fit.

Parameters:

guesses (dask.array.Array, optional) – Initial guesses. If None, generated automatically.
use_kmeans (bool, optional) – Whether to use K-means priors. Default is False.
n_clusters (int, optional) – Number of clusters if use_kmeans is True. Default is 10.
fit_parameter_labels (list of str, optional) – List of string labels for the fit parameters (e.g. [‘Amp’, ‘Phase’]). These are simply saved in metadata.
loss (str, optional) – Loss function for least_squares (e.g., ‘linear’, ‘soft_l1’, ‘huber’, ‘cauchy’, ‘arctan’).
f_scale (float, optional) – Value of soft margin between inlier and outlier residuals. Default is 1.0.
return_cov (bool, optional) – If True, returns a tuple (fit_dataset, cov_dataset). The cov_dataset contains the covariance matrix for the fit parameters. CAUTION: This significantly increases memory usage.

Returns:

If return_cov is False: returns the Fit Parameter dataset. If return_cov is True: returns (Fit Parameter dataset, Covariance Matrix dataset).

Return type:

sidpy.Dataset or tuple(sidpy.Dataset, sidpy.Dataset)

do_guess()[source]¶: Parallelized guess logic across all pixels.

do_kmeans_guess(n_clusters=10)[source]¶

Performs K-Means clustering to find representative spectra for prior fitting. We use Dask-ML Kmeans to do this in a scalable fashion.

Parameters:: n_clusters (int, optional) – Number of clusters to use for K-Means. Default is 10.
Returns:: A dask array containing the initial guesses for every pixel.
Return type:: dask.array.Array

static reconstruct_function(source_code_input, context=None)[source]¶: Reconstructs a python function from source code stored in metadata. Robustly handles lists, strings, and indentation issues.

setup_calc(chunks='auto')[source]¶

Prepares the calculation by rechunking and determining the parameter count.

Parameters:: chunks (str or tuple, optional) – The chunk size for the dask array. Default is ‘auto’.

transform_to_sidpy(fit_dask_array)[source]¶: Convert the fit results into sidpy.Dataset(s). Handles splitting parameters and covariance if present.