recovar.heterogeneity¶
Heterogeneity analysis pipeline: covariance estimation, PCA, latent embedding, and volume reconstruction.
covariance_estimation¶
Regularized covariance matrix estimation from half-set cryo-EM data.
recovar.heterogeneity.covariance_estimation
¶
Regularized covariance matrix estimation from half-set cryo-EM data.
get_default_covariance_computation_options(grid_size=None, adaptive_n_pcs=False)
¶
Return default options dict for covariance computation.
Uses a fixed number of principal components (200) by default for
reproducibility across different GPU configurations. If
adaptive_n_pcs=True, the number of PCs is reduced to fit in
available GPU memory (useful for smaller GPUs, but results will
depend on hardware).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid_size
|
Side length of the 3-D reconstruction grid. Used for memory estimation and adaptive PC count. |
None
|
|
adaptive_n_pcs
|
If True, reduce the number of PCs to fit in GPU memory. Default False (always use 200 PCs). |
False
|
Returns:
| Type | Description |
|---|---|
|
Dictionary with keys |
|
|
|
|
|
|
compute_regularized_covariance_columns_in_batch(dataset, means, mean_prior, volume_mask, dilated_volume_mask, valid_idx, gpu_memory, options, picked_frequencies, use_multi_gpu=False, n_gpus=None)
¶
Compute regularized covariance matrix columns in GPU-sized batches.
Iterates over picked_frequencies in batches that fit in GPU memory and concatenates the results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
A |
required | |
means
|
Dict with keys |
required | |
mean_prior
|
Prior mean volume (Fourier coefficients). |
required | |
volume_mask
|
Binary mask selecting valid voxels. |
required | |
dilated_volume_mask
|
Dilated version of volume_mask. |
required | |
valid_idx
|
Indices of valid Fourier frequencies. |
required | |
gpu_memory
|
Available GPU memory in GB. |
required | |
options
|
Pipeline options namespace. |
required | |
picked_frequencies
|
1-D array of frequency indices to compute. |
required | |
use_multi_gpu
|
Distribute across multiple GPUs. |
False
|
|
n_gpus
|
Number of GPUs ( |
None
|
Returns:
| Type | Description |
|---|---|
|
Tuple |
|
|
covariance_cols is a dict with key |
|
|
contains per-column FSC curves. |
variance_relion_kernel_trilinear(config, images, mean_estimate, volume_mask, image_mask, rotation_matrices, translations, ctf_params, noise_variance, soften=5, Ft_y=None, Ft_ctf=None, Ft_im=None, Ft_one=None)
¶
Variance estimation via RELION-style trilinear kernel — Equinox API.
Backprojects into half-volume accumulators (Ft_y, Ft_ctf, Ft_im, Ft_one). Pass None to initialise from zero; pass an existing half-volume to accumulate.
compute_both_H_B(dataset, means, dilated_volume_mask, picked_frequencies, gpu_memory, options, use_multi_gpu=False, n_gpus=None)
¶
Compute H and B matrices for both half-sets.
preprocess_covariance_batch(config, images, rotation_matrices, translations, ctf_params, particle_indices, mean_estimate, volume_mask, image_stack_mask, opts)
¶
Preprocess one image batch for covariance column computation.
Applies tight mask, centers images (y_i - A_i*mu), masks, computes CTF, and gets grid-point coordinates. Each sub-step calls existing JIT-compiled helpers; this function is a Python-level orchestrator (not itself jitted).
Returns¶
tuple of (centered_images, ctf_on_grid, plane_coords, image_mask, tilt_labels)
compute_H_B_for_halfset(cryo, mean_estimate, volume_mask, picked_frequencies, gpu_memory, options, image_subset=None, halfset_id=None)
¶
Compute H and B matrices for one half-set.
Replaces the old compute_H_B + compute_H_B_in_volume_batch pair.
Uses explicit dataset batch iteration and :class:CovColumnOpts for
static kernel options.
mean_estimate must already be in the correct form for disc_type:
raw Fourier coefficients for 'linear_interp', or spline coefficients
(from cubic_interpolation.calculate_spline_coefficients) for 'cubic'.
The caller (compute_both_H_B) is responsible for the conversion.
reduce_covariance_inner(config, images, model, opts, image_mask, rotation_matrices, translations, ctf_params, noise_variance, hermitian_weights=None, lhs=None, rhs=None, tilt_labels=None)
¶
Covariance estimation inner loop — Equinox API.
Replaces the 24-param reduce_covariance_est_inner with structured inputs.
All geometry/CTF/discretization params come from config.
Per-batch arrays are passed explicitly.
Reconstruction state (mean, mask, basis) comes from model.
Boolean flags come from opts.
group_sum_by_labels(array, tilt_labels, max_groups)
¶
General function to group and sum arrays by tilt_labels. This is JIT-compatible and assumes tilt_labels are consecutive indices (0, 1, 2, ...).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
array
|
Array to sum, shape (n_images, n_features) |
required | |
tilt_labels
|
Group labels, shape (n_images,) |
required | |
max_groups
|
Maximum number of groups (should be >= max(tilt_labels) + 1) |
required |
Returns:
| Type | Description |
|---|---|
|
Array with same shape as input, where each element is replaced by the sum of its group |
adjoint_kernel_slice(images, rotation_matrices, image_shape, volume_shape, kernel='triangular', volumes=None)
¶
Backproject images to volume(s).
When images is 3-D (batch, n_images, n_pix), vmaps over the batch.
When images is 2-D (n_images, n_pix), backprojects a single batch.
volumes is an optional accumulator: the backprojection is added into
volumes instead of starting from zeros.
compute_freq_batch(config, opts, freq_batch, images, ctf_on_grid, plane_coords, rotation_matrices, noise_variances, image_mask, tilt_labels, premultiplied_ctf, shared_label, no_mask, H_accum=None, B_accum=None)
¶
Compute H and B for a batch of frequencies in a single XLA program.
Replaces the Python for-loop over frequencies with
jax.lax.fori_loop. Each iteration computes one frequency's images
and backprojects them, accumulating into H_accum / B_accum via
.at[k].add().
Memory: only one frequency's intermediate images live at a time.
preprocess_tilt_labels_for_batch(tilt_labels)
¶
Pre-process tilt_labels to be consecutive indices starting from 0. This should be called outside JIT to handle arbitrary tilt label values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tilt_labels
|
Array of arbitrary tilt label values |
required |
Returns:
| Type | Description |
|---|---|
|
Array of consecutive indices |
|
|
structure as the input labels. |
principal_components¶
Principal component analysis of the estimated covariance operator.
recovar.heterogeneity.principal_components
¶
Principal component analysis of the estimated covariance operator.
estimate_principal_components(dataset, options, means, mean_prior, volume_mask, dilated_volume_mask, valid_idx, batch_size, gpu_memory_to_use, covariance_options=None, variance_estimate=None, use_reg_mean_in_contrast=False, use_multi_gpu=False, n_gpus=None)
¶
Estimate principal components of the covariance operator.
Computes regularized covariance columns, then extracts their leading eigenvectors and eigenvalues via SVD.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
A |
required | |
options
|
Pipeline options namespace. |
required | |
means
|
Dict with mean volume estimates. |
required | |
mean_prior
|
Prior mean volume (Fourier coefficients). |
required | |
volume_mask
|
Binary mask selecting valid voxels. |
required | |
dilated_volume_mask
|
Dilated version of volume_mask. |
required | |
valid_idx
|
Indices of valid Fourier frequencies. |
required | |
batch_size
|
Image batch size for GPU processing. |
required | |
gpu_memory_to_use
|
Available GPU memory in GB. |
required | |
covariance_options
|
Options dict (default: auto-generated). |
None
|
|
variance_estimate
|
Pre-computed variance estimate. |
None
|
|
use_reg_mean_in_contrast
|
Use regularized mean for contrast estimation. |
False
|
|
use_multi_gpu
|
Distribute across multiple GPUs. |
False
|
|
n_gpus
|
Number of GPUs ( |
None
|
Returns:
| Type | Description |
|---|---|
|
Tuple |
|
|
where u and s are dicts with keys |
|
|
containing eigenvectors and eigenvalues respectively. |
flip_columns_structured(columns, volume_shape)
¶
Hermitian conjugate flip using structured numpy ops instead of fancy indexing.
For frequency index (x,y,z), the negated frequency maps to ((N-x)%N, (N-y)%N, (N-z)%N). Boundary voxels (where any coordinate is 0) are zeroed because the mapping is degenerate there (clipping artifact). This is equivalent to the old batch_flip_vec2 but uses np.flip (a zero-copy view) instead of random-access fancy indexing (columns[mapped_idx,:]), giving better cache behavior and lower peak memory on large arrays.
embedding¶
Per-image latent coordinate estimation via linear projection.
recovar.heterogeneity.embedding
¶
Per-image latent coordinate estimation via linear projection.
get_per_image_embedding(mean, u, s, basis_size, dataset, volume_mask, gpu_memory, disc_type='linear_interp', contrast_grid=None, contrast_option='contrast', compute_covariances=True, ignore_zero_frequency=False, contrast_mean=1, contrast_variance=np.inf, compute_bias=False, image_subset_in_tilt_series=None)
¶
Compute per-image latent coordinates by projecting onto principal components.
For each image, estimates the linear coefficients (latent embedding) that best explain the image given the mean volume and eigenvectors, optionally estimating per-image contrast and covariance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mean
|
Mean volume in Fourier space, shape |
required | |
u
|
Eigenvectors, shape |
required | |
s
|
Eigenvalues, shape |
required | |
basis_size
|
Number of principal components to use. |
required | |
dataset
|
A |
required | |
volume_mask
|
Binary mask selecting valid voxels. |
required | |
gpu_memory
|
Available GPU memory in GB. |
required | |
disc_type
|
Discretization type ( |
'linear_interp'
|
|
contrast_grid
|
Grid of contrast values to search over. |
None
|
|
contrast_option
|
Contrast estimation mode ( |
'contrast'
|
|
compute_covariances
|
Compute per-image latent covariance matrices. |
True
|
|
ignore_zero_frequency
|
Exclude the DC component. |
False
|
|
contrast_mean
|
Prior mean for contrast estimation. |
1
|
|
contrast_variance
|
Prior variance for contrast estimation. |
inf
|
|
compute_bias
|
Compute per-image bias terms. |
False
|
|
image_subset_in_tilt_series
|
Subset of tilt images to use. |
None
|
Returns:
| Type | Description |
|---|---|
|
Tuple |
|
|
|
|
|
precision matrix (inverse covariance) with shape |
|
|
|
|
|
est_contrasts has shape |
|
|
|
compute_batch_coords(config, images, model, opts, image_mask, contrast_grid, contrast_mean=1.0, contrast_variance=np.inf, hermitian_weights=None, *, rotation_matrices, translations, ctf_params, noise_variance)
¶
Compute latent coordinates for a batch — Equinox API.
Replaces the 27-param compute_single_batch_coords_split.
Parameters¶
hermitian_weights : jax.Array or None
Precomputed sqrt(w) weights for half-spectrum inner products
(see :func:_rfft2_hermitian_weights). When provided, all inner
products are computed in the compressed half-spectrum representation,
roughly halving memory use and compute for the inner-product step.
compute_grouped_shared_batch_coords(config, images, model, image_mask, contrast_grid, contrast_mean, contrast_variance, group_ids, n_groups, compute_covariances, compute_bias, hermitian_weights=None, *, rotation_matrices, translations, ctf_params, noise_variance)
¶
Solve shared-label, shared-contrast batches for multiple particles at once.
This path is used when a single iterator batch contains images from multiple particles. It computes the AU statistics once over all images, then segment-sums by particle id and solves all particles in one batched call. This avoids per-particle Python/JIT dispatch overhead while preserving shared-label semantics.
set_contrasts_in_cryos(dataset, contrasts)
¶
Apply per-image contrast factors to CTF parameters.
heterogeneity_volume¶
Kernel-regression volume reconstruction from latent embeddings.
recovar.heterogeneity.heterogeneity_volume
¶
Kernel-regression volume reconstruction from latent embeddings.
make_volumes_kernel_estimate_from_results(latent_point, results, ndim, cryos=None, n_bins=11, output_folder=None, B_factor=0, metric_used='locmost_likely', n_min_particles=50)
¶
Reconstruct a volume at an arbitrary latent-space point.
Convenience wrapper around :func:make_volumes_kernel_estimate_local
that loads the dataset and computes heterogeneity distances from a
pipeline results dict.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
latent_point
|
Target point in latent space, shape |
required | |
results
|
Pipeline output dictionary (loaded from pickle). |
required | |
ndim
|
Latent dimensionality to use. |
required | |
cryos
|
Pre-loaded dataset ( |
None
|
|
n_bins
|
Number of heterogeneity bins for kernel regression. |
11
|
|
output_folder
|
Directory for output MRC files. |
None
|
|
B_factor
|
B-factor sharpening in Angstroms squared. |
0
|
|
metric_used
|
Volume quality metric for selection. |
'locmost_likely'
|
|
n_min_particles
|
Minimum particles per bin. |
50
|
make_volumes_kernel_estimate_local(heterogeneity_distances, dataset, vol_paths, ndim, bins, B_factor, tau=None, n_min_particles=50, metric_used='locshellmost_likely', upsampling_for_ests=1, use_mask_ests=False, grid_correct_ests=False, locres_sampling=25, locres_maskrad=None, locres_edgwidth=None, kernel_rad=4, save_all_estimates=False, heterogeneity_kernel='parabola', use_fast_rfft=False)
¶
Reconstruct volumes along a heterogeneity path using kernel regression.
For each bin along the heterogeneity axis, selects nearby images weighted by a kernel function and performs a local 3-D reconstruction with local-resolution filtering. Results are written as MRC files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
heterogeneity_distances
|
Per-half-set log-likelihood distances,
list of two arrays each of shape |
required | |
dataset
|
|
required | |
vol_paths
|
:class: |
required | |
ndim
|
Latent dimensionality ( |
required | |
bins
|
Number of bins (int) or explicit bin edges (array). |
required | |
B_factor
|
B-factor sharpening in Angstroms squared. |
required | |
tau
|
Regularization parameter ( |
None
|
|
n_min_particles
|
Minimum particles per bin. |
50
|
|
metric_used
|
Volume quality metric for local-resolution selection. |
'locshellmost_likely'
|
|
upsampling_for_ests
|
Upsampling factor for estimates. |
1
|
|
use_mask_ests
|
Apply mask to estimates. |
False
|
|
grid_correct_ests
|
Apply gridding correction. |
False
|
|
locres_sampling
|
Number of local-resolution shells. |
25
|
|
locres_maskrad
|
Local-resolution mask radius. |
None
|
|
locres_edgwidth
|
Local-resolution mask edge width. |
None
|
|
kernel_rad
|
Radius of the heterogeneity kernel. |
4
|
|
save_all_estimates
|
Save all intermediate estimates. |
False
|
|
heterogeneity_kernel
|
Kernel shape ( |
'parabola'
|
get_inds_for_subvolume(path_to_vol_folder, subvolume_idx, prefix=None, index=None)
¶
Get image indices contributing to a local subvolume.
Supports both the new diagnostics/{stem}/ layout and the old
flat layout where all files live directly in path_to_vol_folder.
Parameters¶
path_to_vol_folder : str Path to the volume output directory. subvolume_idx : int Index of the local subvolume to query. prefix : str, optional Volume name prefix for new layout resolution. index : int, optional Volume index for new layout resolution.
latent_density¶
Kernel density estimation and stable-state detection in latent space.
recovar.heterogeneity.latent_density
¶
Kernel density estimation and stable-state detection in latent space.
trajectory¶
Minimum-energy path finding between states in latent space.
recovar.heterogeneity.trajectory
¶
Minimum-energy path finding between states in latent space.
adaptive_kernel_discretization¶
Adaptive kernel methods for heterogeneity volume estimation.
recovar.heterogeneity.adaptive_kernel_discretization
¶
Adaptive kernel discretization for heterogeneous cryo-EM reconstruction.
Implements the kernel-based volume estimation with cross-validated bandwidth selection described in the RECOVAR method. Provides both polynomial (nearest- neighbor) and triangular (linear-interpolation) kernel precomputation, along with residual-based model selection across discretization parameters.
vec_index_to_half_vec_index(indices, volume_shape, flip_positive=False)
¶
Map full-volume flat indices to canonical half-volume flat indices.
Returns (half_indices, non_redundant) where non_redundant is True
for voxels whose value is stored directly (no conjugation needed when
expanding back to the full volume).
When flip_positive is True, redundant voxels (negative last-axis frequencies in the rfft sense) are mapped to their Hermitian conjugate partner in the non-redundant half.
half_vec_index_to_vec_index(indices_half, volume_shape)
¶
Map canonical half-volume flat indices back to full-volume flat indices.
precompute_kernel_batch(config, images, rotation_matrices, translations, ctf_params, noise_variance, pol_degree=0, XWX=None, F=None, heterogeneity_distances=None, heterogeneity_bins=None)
¶
Precompute kernel for one batch — Equinox API.
Uses nearest-neighbor scatter so images are full-spectrum (not half-image).
noise_variance is passed explicitly.
precompute_triangular_kernel_batch(config, images, rotation_matrices, translations, ctf_params, noise_variance, pol_degree=0, XWX=None, F=None, heterogeneity_distances=None, heterogeneity_bins=None)
¶
Precompute triangular kernel for one batch — Equinox API.
Expects images in half-image (rfft) format and explicit
noise_variance from noise.get() (full-spectrum OK, auto-converted
to half-pixel noise internally).
compute_residuals_batch(config, images, weights, rotation_matrices, translations, ctf_params, pol_degree=0, use_linear_interp=False)
¶
Compute residuals for many weights — Equinox API.
batch_im_adjoint_forward(config, slices, ctf_params, rotation_matrices, half_image=False, half_volume=False)
¶
Adjoint forward model vmapped over last axis of slices.
Returns shape (n_bins, volume_size) — batch axis first.
locres¶
Local resolution estimation and local filtering of reconstructed volumes.
recovar.heterogeneity.locres
¶
Local resolution estimation and local filtering of reconstructed volumes.
local_resolution(map1, map2, B_factor, voxel_size, locres_sampling=25, locres_maskrad=None, locres_edgwidth=None, locres_minres=50, use_filter=True, fsc_threshold=1 / 7, use_v2=True, filter_edgewidth=2, filter_map1=False)
¶
Estimate local FSC resolution and optionally build locally filtered maps.
The implementation keeps legacy behavior but stores large real-valued Fourier arrays in packed half-volume layout whenever possible to reduce GPU memory.
apply_fsc_weighting(FT, fsc, volume_shape=None)
¶
Apply FSC-based sharpening weights to full or half-volume spectra.
filter_with_local_fsc(ft_sum, fsc, local_resol, voxel_size, filter_edgewidth, volume_shape=None)
¶
Local FSC filter for either full-spectrum or half-spectrum inputs.
low_pass_filter_map(FT, ori_size, low_pass, voxel_size, filter_edgewidth, do_highpass_instead=False, volume_shape=None)
¶
Raised-cosine low-pass/high-pass filter in Fourier space.
Supports both full complex spectra and packed half-volume spectra.
get_sampling_points(grid_size, locres_sampling, locres_maskrad, voxel_size)
¶
Sampling coordinates for local-resolution windows in centered frequency grid.
Returned points use the same centered integer-coordinate convention as legacy
code ([-N/2, ..., N/2) for even sizes), but generation is vectorized to
avoid Python triple-loops for large grids.
filter_with_global_fsc(ft_sum, fsc, voxel_size, filter_edgewidth, mask=None, fsc_mask=None, B_factor=None, volume_shape=None)
¶
Apply global FSC-based filtering to a Fourier transform.
Parameters: - ft_sum: Fourier transform of the map to filter - fsc: Fourier Shell Correlation curve - voxel_size: Voxel size in Angstroms - filter_edgewidth: Width of the filter edge in pixels - mask: Optional mask to apply to final result (if None, no mask applied) - fsc_mask: Optional mask used only for FSC estimation (if None, no FSC mask) - B_factor: Optional B-factor for sharpening (if None, no B-factor applied)
Returns: - Filtered map in real space
filter_with_global_fsc_and_mask(ft_sum, fsc, voxel_size, filter_edgewidth, mask_radius=None, mask_edgewidth=None, fsc_mask_radius=None, fsc_mask_edgewidth=None, B_factor=None, mask=None, fsc_mask=None, volume_shape=None)
¶
Apply global FSC-based filtering with automatic spherical mask.
Parameters: - ft_sum: Fourier transform of the map to filter - fsc: Fourier Shell Correlation curve - voxel_size: Voxel size in Angstroms - filter_edgewidth: Width of the filter edge in pixels - mask_radius: Radius of the spherical mask for final result (if None, no mask applied) - mask_edgewidth: Width of the mask edge (if None, uses 10% of mask_radius) - fsc_mask_radius: Radius of the spherical mask for FSC estimation (if None, no FSC mask) - fsc_mask_edgewidth: Width of the FSC mask edge (if None, uses 10% of fsc_mask_radius) - B_factor: Optional B-factor for sharpening (if None, no B-factor applied) - mask: Custom mask for final result (overrides mask_radius/mask_edgewidth if provided) - fsc_mask: Custom mask for FSC estimation (overrides fsc_mask_radius/fsc_mask_edgewidth if provided)
Returns: - Filtered map in real space with spherical mask applied
filter_maps_with_global_fsc(map1, map2, voxel_size, filter_edgewidth=2, mask_radius=None, mask_edgewidth=None, fsc_mask_radius=None, fsc_mask_edgewidth=None, B_factor=None, fsc_threshold=1 / 7, mask=None, fsc_mask=None)
¶
Convenience function to filter two maps using global FSC-based filtering.
Parameters: - map1: First map - map2: Second map - voxel_size: Voxel size in Angstroms - filter_edgewidth: Width of the filter edge in pixels (default: 2) - mask_radius: Radius of the spherical mask for final result (if None, no mask applied) - mask_edgewidth: Width of the mask edge (if None, uses 10% of mask_radius) - fsc_mask_radius: Radius of the spherical mask for FSC estimation (if None, no FSC mask) - fsc_mask_edgewidth: Width of the FSC mask edge (if None, uses 10% of fsc_mask_radius) - B_factor: Optional B-factor for sharpening (if None, no B-factor applied) - fsc_threshold: FSC threshold for resolution determination (default: 1/7) - mask: Custom mask for final result (overrides mask_radius/mask_edgewidth if provided) - fsc_mask: Custom mask for FSC estimation (overrides fsc_mask_radius/fsc_mask_edgewidth if provided)
Returns: - filtered_combined: Filtered version of the combined map (average of map1 and map2) - fsc: The FSC curve used for filtering - global_resol: The global resolution determined from FSC
filter_single_map_with_global_fsc(map1, map2, voxel_size, filter_edgewidth=2, mask_radius=None, mask_edgewidth=None, fsc_mask_radius=None, fsc_mask_edgewidth=None, B_factor=None, fsc_threshold=1 / 7, mask=None, fsc_mask=None)
¶
Filter a single map using FSC computed from two maps.
Parameters: - map1: Map to filter - map2: Reference map for FSC computation - voxel_size: Voxel size in Angstroms - filter_edgewidth: Width of the filter edge in pixels (default: 2) - mask_radius: Radius of the spherical mask for final result (if None, no mask applied) - mask_edgewidth: Width of the mask edge (if None, uses 10% of mask_radius) - fsc_mask_radius: Radius of the spherical mask for FSC estimation (if None, no FSC mask) - fsc_mask_edgewidth: Width of the FSC mask edge (if None, uses 10% of fsc_mask_radius) - B_factor: Optional B-factor for sharpening (if None, no B-factor applied) - fsc_threshold: FSC threshold for resolution determination (default: 1/7) - mask: Custom mask for final result (overrides mask_radius/mask_edgewidth if provided) - fsc_mask: Custom mask for FSC estimation (overrides fsc_mask_radius/fsc_mask_edgewidth if provided)
Returns: - filtered_map: Filtered version of map1 - fsc: The FSC curve used for filtering - global_resol: The global resolution determined from FSC