Skip to content

recovar.heterogeneity

Heterogeneity analysis pipeline: covariance estimation, PCA, latent embedding, and volume reconstruction.

covariance_estimation

Regularized covariance matrix estimation from half-set cryo-EM data.

recovar.heterogeneity.covariance_estimation

Regularized covariance matrix estimation from half-set cryo-EM data.

get_default_covariance_computation_options(grid_size=None, adaptive_n_pcs=False)

Return default options dict for covariance computation.

Uses a fixed number of principal components (200) by default for reproducibility across different GPU configurations. If adaptive_n_pcs=True, the number of PCs is reduced to fit in available GPU memory (useful for smaller GPUs, but results will depend on hardware).

Parameters:

Name Type Description Default
grid_size

Side length of the 3-D reconstruction grid. Used for memory estimation and adaptive PC count.

None
adaptive_n_pcs

If True, reduce the number of PCs to fit in GPU memory. Default False (always use 200 PCs).

False

Returns:

Type Description

Dictionary with keys reg_fn, left_kernel,

right_kernel, column_sampling_scheme,

n_pcs_to_compute, among others.

compute_regularized_covariance_columns_in_batch(dataset, means, mean_prior, volume_mask, dilated_volume_mask, valid_idx, gpu_memory, options, picked_frequencies, use_multi_gpu=False, n_gpus=None)

Compute regularized covariance matrix columns in GPU-sized batches.

Iterates over picked_frequencies in batches that fit in GPU memory and concatenates the results.

Parameters:

Name Type Description Default
dataset

A CryoEMDataset with halfset_indices set.

required
means

Dict with keys 'combined', 'prior', 'lhs'.

required
mean_prior

Prior mean volume (Fourier coefficients).

required
volume_mask

Binary mask selecting valid voxels.

required
dilated_volume_mask

Dilated version of volume_mask.

required
valid_idx

Indices of valid Fourier frequencies.

required
gpu_memory

Available GPU memory in GB.

required
options

Pipeline options namespace.

required
picked_frequencies

1-D array of frequency indices to compute.

required
use_multi_gpu

Distribute across multiple GPUs.

False
n_gpus

Number of GPUs (None = auto-detect).

None

Returns:

Type Description

Tuple (covariance_cols, picked_frequencies, fscs) where

covariance_cols is a dict with key 'est_mask' and fscs

contains per-column FSC curves.

variance_relion_kernel_trilinear(config, images, mean_estimate, volume_mask, image_mask, rotation_matrices, translations, ctf_params, noise_variance, soften=5, Ft_y=None, Ft_ctf=None, Ft_im=None, Ft_one=None)

Variance estimation via RELION-style trilinear kernel — Equinox API.

Backprojects into half-volume accumulators (Ft_y, Ft_ctf, Ft_im, Ft_one). Pass None to initialise from zero; pass an existing half-volume to accumulate.

compute_both_H_B(dataset, means, dilated_volume_mask, picked_frequencies, gpu_memory, options, use_multi_gpu=False, n_gpus=None)

Compute H and B matrices for both half-sets.

preprocess_covariance_batch(config, images, rotation_matrices, translations, ctf_params, particle_indices, mean_estimate, volume_mask, image_stack_mask, opts)

Preprocess one image batch for covariance column computation.

Applies tight mask, centers images (y_i - A_i*mu), masks, computes CTF, and gets grid-point coordinates. Each sub-step calls existing JIT-compiled helpers; this function is a Python-level orchestrator (not itself jitted).

Returns

tuple of (centered_images, ctf_on_grid, plane_coords, image_mask, tilt_labels)

compute_H_B_for_halfset(cryo, mean_estimate, volume_mask, picked_frequencies, gpu_memory, options, image_subset=None, halfset_id=None)

Compute H and B matrices for one half-set.

Replaces the old compute_H_B + compute_H_B_in_volume_batch pair. Uses explicit dataset batch iteration and :class:CovColumnOpts for static kernel options.

mean_estimate must already be in the correct form for disc_type: raw Fourier coefficients for 'linear_interp', or spline coefficients (from cubic_interpolation.calculate_spline_coefficients) for 'cubic'. The caller (compute_both_H_B) is responsible for the conversion.

reduce_covariance_inner(config, images, model, opts, image_mask, rotation_matrices, translations, ctf_params, noise_variance, hermitian_weights=None, lhs=None, rhs=None, tilt_labels=None)

Covariance estimation inner loop — Equinox API.

Replaces the 24-param reduce_covariance_est_inner with structured inputs. All geometry/CTF/discretization params come from config. Per-batch arrays are passed explicitly. Reconstruction state (mean, mask, basis) comes from model. Boolean flags come from opts.

group_sum_by_labels(array, tilt_labels, max_groups)

General function to group and sum arrays by tilt_labels. This is JIT-compatible and assumes tilt_labels are consecutive indices (0, 1, 2, ...).

Parameters:

Name Type Description Default
array

Array to sum, shape (n_images, n_features)

required
tilt_labels

Group labels, shape (n_images,)

required
max_groups

Maximum number of groups (should be >= max(tilt_labels) + 1)

required

Returns:

Type Description

Array with same shape as input, where each element is replaced by the sum of its group

adjoint_kernel_slice(images, rotation_matrices, image_shape, volume_shape, kernel='triangular', volumes=None)

Backproject images to volume(s).

When images is 3-D (batch, n_images, n_pix), vmaps over the batch. When images is 2-D (n_images, n_pix), backprojects a single batch. volumes is an optional accumulator: the backprojection is added into volumes instead of starting from zeros.

compute_freq_batch(config, opts, freq_batch, images, ctf_on_grid, plane_coords, rotation_matrices, noise_variances, image_mask, tilt_labels, premultiplied_ctf, shared_label, no_mask, H_accum=None, B_accum=None)

Compute H and B for a batch of frequencies in a single XLA program.

Replaces the Python for-loop over frequencies with jax.lax.fori_loop. Each iteration computes one frequency's images and backprojects them, accumulating into H_accum / B_accum via .at[k].add().

Memory: only one frequency's intermediate images live at a time.

preprocess_tilt_labels_for_batch(tilt_labels)

Pre-process tilt_labels to be consecutive indices starting from 0. This should be called outside JIT to handle arbitrary tilt label values.

Parameters:

Name Type Description Default
tilt_labels

Array of arbitrary tilt label values

required

Returns:

Type Description

Array of consecutive indices 0, 1, 2, ... with the same grouping

structure as the input labels.

principal_components

Principal component analysis of the estimated covariance operator.

recovar.heterogeneity.principal_components

Principal component analysis of the estimated covariance operator.

estimate_principal_components(dataset, options, means, mean_prior, volume_mask, dilated_volume_mask, valid_idx, batch_size, gpu_memory_to_use, covariance_options=None, variance_estimate=None, use_reg_mean_in_contrast=False, use_multi_gpu=False, n_gpus=None)

Estimate principal components of the covariance operator.

Computes regularized covariance columns, then extracts their leading eigenvectors and eigenvalues via SVD.

Parameters:

Name Type Description Default
dataset

A CryoEMDataset with halfset_indices set.

required
options

Pipeline options namespace.

required
means

Dict with mean volume estimates.

required
mean_prior

Prior mean volume (Fourier coefficients).

required
volume_mask

Binary mask selecting valid voxels.

required
dilated_volume_mask

Dilated version of volume_mask.

required
valid_idx

Indices of valid Fourier frequencies.

required
batch_size

Image batch size for GPU processing.

required
gpu_memory_to_use

Available GPU memory in GB.

required
covariance_options

Options dict (default: auto-generated).

None
variance_estimate

Pre-computed variance estimate.

None
use_reg_mean_in_contrast

Use regularized mean for contrast estimation.

False
use_multi_gpu

Distribute across multiple GPUs.

False
n_gpus

Number of GPUs (None = auto-detect).

None

Returns:

Type Description

Tuple (u, s, covariance_cols, picked_frequencies, column_fscs)

where u and s are dicts with keys 'real' and 'rescaled'

containing eigenvectors and eigenvalues respectively.

flip_columns_structured(columns, volume_shape)

Hermitian conjugate flip using structured numpy ops instead of fancy indexing.

For frequency index (x,y,z), the negated frequency maps to ((N-x)%N, (N-y)%N, (N-z)%N). Boundary voxels (where any coordinate is 0) are zeroed because the mapping is degenerate there (clipping artifact). This is equivalent to the old batch_flip_vec2 but uses np.flip (a zero-copy view) instead of random-access fancy indexing (columns[mapped_idx,:]), giving better cache behavior and lower peak memory on large arrays.

embedding

Per-image latent coordinate estimation via linear projection.

recovar.heterogeneity.embedding

Per-image latent coordinate estimation via linear projection.

get_per_image_embedding(mean, u, s, basis_size, dataset, volume_mask, gpu_memory, disc_type='linear_interp', contrast_grid=None, contrast_option='contrast', compute_covariances=True, ignore_zero_frequency=False, contrast_mean=1, contrast_variance=np.inf, compute_bias=False, image_subset_in_tilt_series=None)

Compute per-image latent coordinates by projecting onto principal components.

For each image, estimates the linear coefficients (latent embedding) that best explain the image given the mean volume and eigenvectors, optionally estimating per-image contrast and covariance.

Parameters:

Name Type Description Default
mean

Mean volume in Fourier space, shape (volume_size,).

required
u

Eigenvectors, shape (volume_size, n_components).

required
s

Eigenvalues, shape (n_components,).

required
basis_size

Number of principal components to use.

required
dataset

A CryoEMDataset with halfset_indices set, or a list of two half-set CryoEMDataset instances.

required
volume_mask

Binary mask selecting valid voxels.

required
gpu_memory

Available GPU memory in GB.

required
disc_type

Discretization type ('linear_interp' or 'cubic').

'linear_interp'
contrast_grid

Grid of contrast values to search over.

None
contrast_option

Contrast estimation mode ('contrast', 'contrast_shared', or 'none').

'contrast'
compute_covariances

Compute per-image latent covariance matrices.

True
ignore_zero_frequency

Exclude the DC component.

False
contrast_mean

Prior mean for contrast estimation.

1
contrast_variance

Prior variance for contrast estimation.

inf
compute_bias

Compute per-image bias terms.

False
image_subset_in_tilt_series

Subset of tilt images to use.

None

Returns:

Type Description

Tuple (zs, precision_zs, est_contrasts, bias) where zs has shape

(n_images, basis_size), precision_zs is the per-image posterior

precision matrix (inverse covariance) with shape

(n_images, basis_size, basis_size) (or None),

est_contrasts has shape (n_images,), and bias is

None unless compute_bias is True.

compute_batch_coords(config, images, model, opts, image_mask, contrast_grid, contrast_mean=1.0, contrast_variance=np.inf, hermitian_weights=None, *, rotation_matrices, translations, ctf_params, noise_variance)

Compute latent coordinates for a batch — Equinox API.

Replaces the 27-param compute_single_batch_coords_split.

Parameters

hermitian_weights : jax.Array or None Precomputed sqrt(w) weights for half-spectrum inner products (see :func:_rfft2_hermitian_weights). When provided, all inner products are computed in the compressed half-spectrum representation, roughly halving memory use and compute for the inner-product step.

compute_grouped_shared_batch_coords(config, images, model, image_mask, contrast_grid, contrast_mean, contrast_variance, group_ids, n_groups, compute_covariances, compute_bias, hermitian_weights=None, *, rotation_matrices, translations, ctf_params, noise_variance)

Solve shared-label, shared-contrast batches for multiple particles at once.

This path is used when a single iterator batch contains images from multiple particles. It computes the AU statistics once over all images, then segment-sums by particle id and solves all particles in one batched call. This avoids per-particle Python/JIT dispatch overhead while preserving shared-label semantics.

set_contrasts_in_cryos(dataset, contrasts)

Apply per-image contrast factors to CTF parameters.

heterogeneity_volume

Kernel-regression volume reconstruction from latent embeddings.

recovar.heterogeneity.heterogeneity_volume

Kernel-regression volume reconstruction from latent embeddings.

make_volumes_kernel_estimate_from_results(latent_point, results, ndim, cryos=None, n_bins=11, output_folder=None, B_factor=0, metric_used='locmost_likely', n_min_particles=50)

Reconstruct a volume at an arbitrary latent-space point.

Convenience wrapper around :func:make_volumes_kernel_estimate_local that loads the dataset and computes heterogeneity distances from a pipeline results dict.

Parameters:

Name Type Description Default
latent_point

Target point in latent space, shape (zdim,).

required
results

Pipeline output dictionary (loaded from pickle).

required
ndim

Latent dimensionality to use.

required
cryos

Pre-loaded dataset (CryoEMDataset or CryoEMDataset); loaded from results if None.

None
n_bins

Number of heterogeneity bins for kernel regression.

11
output_folder

Directory for output MRC files.

None
B_factor

B-factor sharpening in Angstroms squared.

0
metric_used

Volume quality metric for selection.

'locmost_likely'
n_min_particles

Minimum particles per bin.

50

make_volumes_kernel_estimate_local(heterogeneity_distances, dataset, vol_paths, ndim, bins, B_factor, tau=None, n_min_particles=50, metric_used='locshellmost_likely', upsampling_for_ests=1, use_mask_ests=False, grid_correct_ests=False, locres_sampling=25, locres_maskrad=None, locres_edgwidth=None, kernel_rad=4, save_all_estimates=False, heterogeneity_kernel='parabola', use_fast_rfft=False)

Reconstruct volumes along a heterogeneity path using kernel regression.

For each bin along the heterogeneity axis, selects nearby images weighted by a kernel function and performs a local 3-D reconstruction with local-resolution filtering. Results are written as MRC files.

Parameters:

Name Type Description Default
heterogeneity_distances

Per-half-set log-likelihood distances, list of two arrays each of shape (n_images_half,).

required
dataset

CryoEMDataset with halfset_indices set. Halfset datasets are obtained via dataset.get_halfset(k).

required
vol_paths

:class:VolumeOutputPaths defining where to write outputs.

required
ndim

Latent dimensionality (-1 for automatic).

required
bins

Number of bins (int) or explicit bin edges (array).

required
B_factor

B-factor sharpening in Angstroms squared.

required
tau

Regularization parameter (None = auto).

None
n_min_particles

Minimum particles per bin.

50
metric_used

Volume quality metric for local-resolution selection.

'locshellmost_likely'
upsampling_for_ests

Upsampling factor for estimates.

1
use_mask_ests

Apply mask to estimates.

False
grid_correct_ests

Apply gridding correction.

False
locres_sampling

Number of local-resolution shells.

25
locres_maskrad

Local-resolution mask radius.

None
locres_edgwidth

Local-resolution mask edge width.

None
kernel_rad

Radius of the heterogeneity kernel.

4
save_all_estimates

Save all intermediate estimates.

False
heterogeneity_kernel

Kernel shape ('parabola' or 'flat').

'parabola'

get_inds_for_subvolume(path_to_vol_folder, subvolume_idx, prefix=None, index=None)

Get image indices contributing to a local subvolume.

Supports both the new diagnostics/{stem}/ layout and the old flat layout where all files live directly in path_to_vol_folder.

Parameters

path_to_vol_folder : str Path to the volume output directory. subvolume_idx : int Index of the local subvolume to query. prefix : str, optional Volume name prefix for new layout resolution. index : int, optional Volume index for new layout resolution.

latent_density

Kernel density estimation and stable-state detection in latent space.

recovar.heterogeneity.latent_density

Kernel density estimation and stable-state detection in latent space.

trajectory

Minimum-energy path finding between states in latent space.

recovar.heterogeneity.trajectory

Minimum-energy path finding between states in latent space.

adaptive_kernel_discretization

Adaptive kernel methods for heterogeneity volume estimation.

recovar.heterogeneity.adaptive_kernel_discretization

Adaptive kernel discretization for heterogeneous cryo-EM reconstruction.

Implements the kernel-based volume estimation with cross-validated bandwidth selection described in the RECOVAR method. Provides both polynomial (nearest- neighbor) and triangular (linear-interpolation) kernel precomputation, along with residual-based model selection across discretization parameters.

vec_index_to_half_vec_index(indices, volume_shape, flip_positive=False)

Map full-volume flat indices to canonical half-volume flat indices.

Returns (half_indices, non_redundant) where non_redundant is True for voxels whose value is stored directly (no conjugation needed when expanding back to the full volume).

When flip_positive is True, redundant voxels (negative last-axis frequencies in the rfft sense) are mapped to their Hermitian conjugate partner in the non-redundant half.

half_vec_index_to_vec_index(indices_half, volume_shape)

Map canonical half-volume flat indices back to full-volume flat indices.

precompute_kernel_batch(config, images, rotation_matrices, translations, ctf_params, noise_variance, pol_degree=0, XWX=None, F=None, heterogeneity_distances=None, heterogeneity_bins=None)

Precompute kernel for one batch — Equinox API.

Uses nearest-neighbor scatter so images are full-spectrum (not half-image). noise_variance is passed explicitly.

precompute_triangular_kernel_batch(config, images, rotation_matrices, translations, ctf_params, noise_variance, pol_degree=0, XWX=None, F=None, heterogeneity_distances=None, heterogeneity_bins=None)

Precompute triangular kernel for one batch — Equinox API.

Expects images in half-image (rfft) format and explicit noise_variance from noise.get() (full-spectrum OK, auto-converted to half-pixel noise internally).

compute_residuals_batch(config, images, weights, rotation_matrices, translations, ctf_params, pol_degree=0, use_linear_interp=False)

Compute residuals for many weights — Equinox API.

batch_im_adjoint_forward(config, slices, ctf_params, rotation_matrices, half_image=False, half_volume=False)

Adjoint forward model vmapped over last axis of slices.

Returns shape (n_bins, volume_size) — batch axis first.

locres

Local resolution estimation and local filtering of reconstructed volumes.

recovar.heterogeneity.locres

Local resolution estimation and local filtering of reconstructed volumes.

local_resolution(map1, map2, B_factor, voxel_size, locres_sampling=25, locres_maskrad=None, locres_edgwidth=None, locres_minres=50, use_filter=True, fsc_threshold=1 / 7, use_v2=True, filter_edgewidth=2, filter_map1=False)

Estimate local FSC resolution and optionally build locally filtered maps.

The implementation keeps legacy behavior but stores large real-valued Fourier arrays in packed half-volume layout whenever possible to reduce GPU memory.

apply_fsc_weighting(FT, fsc, volume_shape=None)

Apply FSC-based sharpening weights to full or half-volume spectra.

filter_with_local_fsc(ft_sum, fsc, local_resol, voxel_size, filter_edgewidth, volume_shape=None)

Local FSC filter for either full-spectrum or half-spectrum inputs.

low_pass_filter_map(FT, ori_size, low_pass, voxel_size, filter_edgewidth, do_highpass_instead=False, volume_shape=None)

Raised-cosine low-pass/high-pass filter in Fourier space.

Supports both full complex spectra and packed half-volume spectra.

get_sampling_points(grid_size, locres_sampling, locres_maskrad, voxel_size)

Sampling coordinates for local-resolution windows in centered frequency grid.

Returned points use the same centered integer-coordinate convention as legacy code ([-N/2, ..., N/2) for even sizes), but generation is vectorized to avoid Python triple-loops for large grids.

filter_with_global_fsc(ft_sum, fsc, voxel_size, filter_edgewidth, mask=None, fsc_mask=None, B_factor=None, volume_shape=None)

Apply global FSC-based filtering to a Fourier transform.

Parameters: - ft_sum: Fourier transform of the map to filter - fsc: Fourier Shell Correlation curve - voxel_size: Voxel size in Angstroms - filter_edgewidth: Width of the filter edge in pixels - mask: Optional mask to apply to final result (if None, no mask applied) - fsc_mask: Optional mask used only for FSC estimation (if None, no FSC mask) - B_factor: Optional B-factor for sharpening (if None, no B-factor applied)

Returns: - Filtered map in real space

filter_with_global_fsc_and_mask(ft_sum, fsc, voxel_size, filter_edgewidth, mask_radius=None, mask_edgewidth=None, fsc_mask_radius=None, fsc_mask_edgewidth=None, B_factor=None, mask=None, fsc_mask=None, volume_shape=None)

Apply global FSC-based filtering with automatic spherical mask.

Parameters: - ft_sum: Fourier transform of the map to filter - fsc: Fourier Shell Correlation curve - voxel_size: Voxel size in Angstroms - filter_edgewidth: Width of the filter edge in pixels - mask_radius: Radius of the spherical mask for final result (if None, no mask applied) - mask_edgewidth: Width of the mask edge (if None, uses 10% of mask_radius) - fsc_mask_radius: Radius of the spherical mask for FSC estimation (if None, no FSC mask) - fsc_mask_edgewidth: Width of the FSC mask edge (if None, uses 10% of fsc_mask_radius) - B_factor: Optional B-factor for sharpening (if None, no B-factor applied) - mask: Custom mask for final result (overrides mask_radius/mask_edgewidth if provided) - fsc_mask: Custom mask for FSC estimation (overrides fsc_mask_radius/fsc_mask_edgewidth if provided)

Returns: - Filtered map in real space with spherical mask applied

filter_maps_with_global_fsc(map1, map2, voxel_size, filter_edgewidth=2, mask_radius=None, mask_edgewidth=None, fsc_mask_radius=None, fsc_mask_edgewidth=None, B_factor=None, fsc_threshold=1 / 7, mask=None, fsc_mask=None)

Convenience function to filter two maps using global FSC-based filtering.

Parameters: - map1: First map - map2: Second map - voxel_size: Voxel size in Angstroms - filter_edgewidth: Width of the filter edge in pixels (default: 2) - mask_radius: Radius of the spherical mask for final result (if None, no mask applied) - mask_edgewidth: Width of the mask edge (if None, uses 10% of mask_radius) - fsc_mask_radius: Radius of the spherical mask for FSC estimation (if None, no FSC mask) - fsc_mask_edgewidth: Width of the FSC mask edge (if None, uses 10% of fsc_mask_radius) - B_factor: Optional B-factor for sharpening (if None, no B-factor applied) - fsc_threshold: FSC threshold for resolution determination (default: 1/7) - mask: Custom mask for final result (overrides mask_radius/mask_edgewidth if provided) - fsc_mask: Custom mask for FSC estimation (overrides fsc_mask_radius/fsc_mask_edgewidth if provided)

Returns: - filtered_combined: Filtered version of the combined map (average of map1 and map2) - fsc: The FSC curve used for filtering - global_resol: The global resolution determined from FSC

filter_single_map_with_global_fsc(map1, map2, voxel_size, filter_edgewidth=2, mask_radius=None, mask_edgewidth=None, fsc_mask_radius=None, fsc_mask_edgewidth=None, B_factor=None, fsc_threshold=1 / 7, mask=None, fsc_mask=None)

Filter a single map using FSC computed from two maps.

Parameters: - map1: Map to filter - map2: Reference map for FSC computation - voxel_size: Voxel size in Angstroms - filter_edgewidth: Width of the filter edge in pixels (default: 2) - mask_radius: Radius of the spherical mask for final result (if None, no mask applied) - mask_edgewidth: Width of the mask edge (if None, uses 10% of mask_radius) - fsc_mask_radius: Radius of the spherical mask for FSC estimation (if None, no FSC mask) - fsc_mask_edgewidth: Width of the FSC mask edge (if None, uses 10% of fsc_mask_radius) - B_factor: Optional B-factor for sharpening (if None, no B-factor applied) - fsc_threshold: FSC threshold for resolution determination (default: 1/7) - mask: Custom mask for final result (overrides mask_radius/mask_edgewidth if provided) - fsc_mask: Custom mask for FSC estimation (overrides fsc_mask_radius/fsc_mask_edgewidth if provided)

Returns: - filtered_map: Filtered version of map1 - fsc: The FSC curve used for filtering - global_resol: The global resolution determined from FSC