Skip to content

recovar.data_io

Dataset loading, metadata extraction, and image access for cryo-EM and cryo-ET data.

Flow

flowchart TD
    A[CLI / pipeline args] --> B[halfsets.py<br/>split policy + halfset loading]
    A --> C[cryoem_dataset.py<br/>load_dataset(...)]
    C --> D[image_sources.py<br/>image source assembly]
    C --> E[metadata_readers.py<br/>STAR / CS metadata parsing]
    D --> F[image_backends.py<br/>file-backed SPA / cryo-ET loaders]
    F --> G[image_loader.py<br/>MRC / MRCS / HDF5 I/O]
    E --> H[image_metadata.py<br/>ImageMetadata]
    D --> I[_index_utils.py<br/>image/group remapping]
    B --> I
    D --> J[cryoem_dataset.py<br/>CryoEMDataset]
    H --> J
    I --> J
    B --> J
    J --> K[iter_batches(...)]
    K --> L[pipeline / compute_state / analyze]
SPA / cryo-ET load path
  -> cryoem_dataset.load_dataset(...)
       -> image_sources.create_image_source(...)
            -> image_backends.py
            -> image_loader.py
       -> metadata_readers.auto_parse_poses / auto_parse_ctf
            -> image_metadata.ImageMetadata
       -> CryoEMDataset(...)

Halfset / subset path
  -> halfsets.get_split_indices / get_split_tilt_indices
  -> halfsets.load_halfset_dataset / load_halfset_dataset_from_args
  -> CryoEMDataset with halfset_indices

Downstream runtime path
  -> CryoEMDataset.iter_batches(...)
  -> explicit tuples:
     (images, rotation_matrices, translations, ctf_params, noise_variance, particle_indices, image_indices)

Cross-cutting indexing
  _index_utils.py
    - DatasetIndexLayout: local <-> original image/group ids
    - TiltSeriesOriginalIndexMap: particle <-> image ids in the original file

Keep these responsibilities separate:

  • image_sources.py owns raw image access, lazy/eager loading, and subset views.
  • image_metadata.py owns rotations, translations, and CTF rows.
  • cryoem_dataset.py is the only high-level coordinator and batch iterator surface.
  • halfsets.py owns split policy and halfset bookkeeping.
  • _index_utils.py owns image/group/particle remapping logic.
  • image_backends.py owns only the low-level stack and tilt-series loaders used underneath image sources.

Public surface used by the main runtime:

  • cryoem_dataset.load_dataset
  • halfsets.load_halfset_dataset
  • halfsets.load_halfset_dataset_from_args
  • CryoEMDataset.iter_batches
  • CryoEMDataset.subset

cryoem_dataset

Core dataset classes and loading functions.

recovar.data_io.cryoem_dataset

Top-level cryo-EM / cryo-ET dataset assembly and batch iteration.

Architecture: - image_sources.py owns raw image loading, lazy/eager access, and subset views - image_metadata.py owns poses and CTF metadata only - CryoEMDataset coordinates both layers and exposes the single explicit batch iterator used by downstream code

CryoEMDataset(image_source, voxel_size, metadata, ctf_evaluator=None, dtype=np.complex64, dataset_indices=None, grid_size=None, tilt_series_flag=False, premultiplied_ctf=False)

Core dataset class for cryo-EM heterogeneity analysis.

Wraps particle images with per-image metadata (poses, CTF parameters) and provides geometry helpers for 3-D reconstruction and embedding.

For half-set reconstructions, two CryoEMDataset instances are typically managed via halfset_indices on the dataset.

Attributes:

Name Type Description
grid_size

Side length of the image (and default 3-D reconstruction grid).

voxel_size

Pixel / voxel size in Angstroms.

n_images

Number of particle images in this dataset.

image_source

Underlying image-loading layer (None for simulation).

tilt_series_flag

True when the dataset represents tilt-series data.

metadata property

The per-image metadata store.

image_source property

Image-loading layer for this dataset.

index_layout property

Explicit local/original image/group mapping for this dataset.

original_image_indices property

Original source-file image index for each local image.

original_group_indices property

Original source-file group index for each local group.

rotation_matrices property writable

Per-image rotation matrices (read-only view).

translations property writable

Per-image translations (read-only view).

CTF_params property writable

Per-image CTF parameters (read-only view).

image_mask property

Circular window mask from the image stack.

data_multiplier property writable

Sign multiplier for data inversion (±1).

dataset_tilt_indices property

Per-particle tilt index lists (tilt-series only).

tilt_particles property

List of per-particle tilt index arrays (tilt-series only).

ctf_evaluator property

The :class:~recovar.core.ctf.CTFEvaluator for this dataset.

get_ctf_column(col)

Read a single CTF parameter column for all images.

get_ctf_params_copy()

Return a mutable copy of the full CTF parameter array.

update_poses(rots, trans)

Replace all poses.

update_ctf(ctf_params)

Replace all CTF parameters.

process_images(images, apply_image_mask=False)

Apply windowing + full DFT preprocessing to raw images.

process_images_half(images, apply_image_mask=False)

Apply windowing + rfft2 preprocessing → half-spectrum output.

subset(indices)

Return a new CryoEMDataset containing only the images at indices.

The returned dataset uses an ImageSource subset view, so the subset/remap logic stays inside the image-loading layer rather than being duplicated in the dataset class.

can_reload_from_original_images()

Whether this dataset can rebuild a file-backed view by original ids.

reload_from_original_images(original_image_indices, *, lazy=None)

Reload a dataset view from original file image indices.

This is used only when an independent file-backed dataset is required. The input indices are always in original file ordering, never this dataset's local ordering.

get_halfset_dataset(halfset_id, *, independent=False, lazy=None)

Return one halfset as either a lightweight view or independent reload.

prefers_independent_halfset_datasets()

Return whether hot halfset iteration should reload independent datasets.

Lazy file-backed datasets pay extra per-batch remapping cost when they iterate through subset views. Reloading each halfset directly from the original files preserves the same batch contents while avoiding that remap layer in heavy downstream loops.

materialize_halfset_datasets(*, independent=None, lazy=None)

Build the two halfset datasets used by downstream kernels.

Parameters

independent : bool, optional Whether to reload each halfset from the original files instead of constructing subset views. Defaults to :meth:prefers_independent_halfset_datasets. lazy : bool, optional Laziness flag used only for independent reloads. Defaults to the parent dataset's image-source laziness.

get_predicted_image(indices, volume, skip_ctf=False, spatial=True)

Get predicted images for given indices using forward model.

Parameters:

Name Type Description Default
indices

Array of indices to predict images for

required
volume

Volume to use for prediction

required
skip_ctf

Whether to skip CTF application

False
spatial

Whether to return images in real space (True) or Fourier space (False)

True

Returns:

Type Description

Predicted images in real space if spatial=True, otherwise in Fourier space

n_halfset_images(halfset_id)

Number of images in a given halfset.

get_particle_halfset_indices()

Per-half canonical particle indices for tilt-series datasets.

For SPA datasets, this simply returns halfset_indices (images and particles are 1-to-1). For tilt-series, it maps each half's image indices through the image→particle mapping and returns the unique canonical (dataset_tilt_indices) particle ids per half.

split_halfset_array(arr, per_particle=False)

Split a dataset-local-ordered array by halfset membership.

Parameters

per_particle : bool If True and this is a tilt-series dataset, split by particle/group indices instead of image indices.

iter_batches(batch_size, *, halfset_id=None, indices=None, noise_model=None, noise_half=True, noise_by_particle=False, by_image=True, prefetch=True, pack_groups=False)

Iterate over dataset batches, yielding explicit batch fields.

Parameters

batch_size : int halfset_id : int, optional Halfset index (0 or 1). Mutually exclusive with indices. indices : array-like, optional Iterate over this subset of image indices. noise_model : optional Noise model used to populate the yielded noise_variance field. noise_half : bool Use half-spectrum noise (default True for mean reconstruction). noise_by_particle : bool Index noise by particle group (for covariance path). by_image : bool True = flat per-image iteration; False = particle-grouped (tilt). prefetch : bool Enable 1-lookahead prefetch buffer (default True). pack_groups : bool Pack multiple tilt-series particles into each batch up to batch_size images. Only applies when by_image=False. Padded entries get sentinel particle_id=-1.

Yields

tuple (images, rotation_matrices, translations, ctf_params, noise_variance, particle_indices, image_indices)

get_halfset(halfset_id)

Return a halfset dataset, lazily materializing and caching.

The cache is invalidated automatically when mutable state (contrasts, noise, poses, CTF) changes on this dataset.

set_contrasts(contrasts)

Multiply per-image CTF contrast column by contrasts.

contrasts must be in this dataset's ordering (original ordering for a full dataset, or local ordering for a subset). For tilt-series with per-particle contrasts (len < n_images), each particle's tilt images share a single contrast value.

set_noise(noise_variance)

Set the radial noise model for this dataset.

If the dataset already has a VariableRadialNoiseModel, updates it; otherwise sets a RadialNoiseModel.

load_dataset(particles_file, poses_file=None, ctf_file=None, datadir=None, n_images=None, ind=None, lazy=True, padding=0, uninvert_data=False, tilt_series=False, tilt_series_ctf=None, dose_per_tilt=2.9, angle_per_tilt=3, premultiplied_ctf=False, strip_prefix=None, sort_with_Bfac=False, downsample_D=None)

Load a cryo-EM / cryo-ET dataset.

Poses and CTF can come from: - Pickle files (legacy cryoDRGN format) via poses_file / ctf_file - Auto-extracted from the particles STAR or CS file when those are None

reorder_to_original_indexing(arr, ds, use_tilt_indices=False)

Reorder a halfset-concatenated array back to original file ordering.

For SPA (use_tilt_indices=False), uses ds.halfset_indices (image-level). For tilt-series (use_tilt_indices=True), uses the canonical particle indices derived from each half's images so that per-particle data is scattered to its original particle position.

reorder_to_dataset_indexing(arr, ds, use_tilt_indices=False)

Reorder a halfset-concatenated array back to this dataset's local ordering.

subsample_cryoem_dataset(cryo, good_indices)

Return a new CryoEMDataset containing only the images at good_indices.

image_backends

Low-level Grain-backed image backends.

recovar.data_io.image_backends

File-backed image backends used underneath :mod:recovar.data_io.image_sources.

This module owns the low-level Grain-backed loaders for:

  • single-particle image stacks
  • cryo-ET tilt-series grouped by particle

It does not own metadata, halfset policy, or the top-level dataset view. Those live in image_metadata.py, halfsets.py, and cryoem_dataset.py respectively.

ParticleImageDataset(image_file, lazy=True, ind=None, invert_data=False, datadir='', padding=0, max_threads=16, strip_prefix=None, downsample_D=None, device=None, **kwargs)

Dataset for cryo-EM particle images.

Implements __getitem__ / __len__ which is the protocol expected by both grain.RandomAccessDataSource and the downstream loaders.

process_images_half(images, apply_image_mask=False)

Return half-spectrum images using the legacy full-FFT path.

The old pipeline applied process_images first and then converted the full FFT layout to half-spectrum storage. Direct rfft is mathematically close, but it is not numerically identical and that drift is enough to change downstream PCA / outlier regressions.

TiltSeriesDataset(starfile_path, lazy=True, num_tilts=None, random_tilts=False, ind=None, voltage=None, dose_per_tilt=None, angle_per_tilt=None, expected_res=None, tilt_file_option='relion5', **kwargs)

Bases: ParticleImageDataset

Dataset for tilt series with automatic particle grouping.

image_sources

Raw image loading abstraction and subset/image-group remapping.

recovar.data_io.image_sources

Image-source layer for cryo-EM / cryo-ET datasets.

This module cleanly separates image loading from metadata storage and from the top-level dataset/view object. It provides:

  • backend sources that load images from files, lazily or eagerly
  • subset views that remap image/group indices without leaking that logic into the dataset class

ImageSource

Abstract image-source interface used by the dataset layer.

already_prefetches property

Whether batch iteration already performs background prefetching.

BackendImageSource(backend, *, info)

Bases: ImageSource

Image source backed by the low-level file/image backends.

SubsetImageSource(parent, image_indices)

Bases: ImageSource

Image-source view over a subset of images.

image_metadata

Typed metadata container for poses and CTF rows.

recovar.data_io.image_metadata

Per-image metadata storage for poses and CTF parameters.

ImageMetadata(rotation_matrices, translations, ctf_params, *, rotation_dtype=np.float32, ctf_dtype=np.float32, real_dtype=np.float32)

Per-image metadata store.

This class owns only metadata arrays. It has no loading, iteration, subset-view, or halfset logic.

halfsets

Halfset and split logic for SPA and cryo-ET.

recovar.data_io.halfsets

Half-set splitting logic for cryo-EM reconstruction.

Provides functions for splitting a dataset into two independent half-sets used for FSC-based resolution estimation. Supports random splits, RELION _rlnRandomSubset, explicit halfset files, and tilt-series-aware particle-level splitting.

HalfsetDatasetSpec(particles_file, poses_file=None, ctf_file=None, datadir=None, uninvert_data=False, padding=0, n_images=None, tilt_series=False, tilt_series_ctf=None, angle_per_tilt=None, dose_per_tilt=None, premultiplied_ctf=False, strip_prefix=None, downsample_D=None) dataclass

Normalized file/loader settings for constructing a halfset dataset.

split_index_list(all_valid_image_indices, split_random_seed=0)

Split a list of indices into two balanced halves with reproducible randomization.

Parameters:

Name Type Description Default
all_valid_image_indices

Array of indices to split

required
split_random_seed

Random seed for reproducible splits

0

Returns:

Type Description

List of two numpy arrays containing the split indices

get_split_indices(particles_file, datadir=None, strip_prefix=None, ind_file=None, split_random_seed=0, validate_split=True, n_images=None)

Get indices for splitting dataset into halfsets.

Parameters:

Name Type Description Default
particles_file

Path to particles STAR file

required
datadir

Data directory (optional)

None
strip_prefix

Prefix to strip from file paths (optional)

None
ind_file

File containing specific indices to use (optional)

None
split_random_seed

Random seed for reproducible splits

0
validate_split

Whether to validate the split is balanced

True
n_images

Pre-computed image count (avoids re-reading the file)

None

Returns:

Type Description

List of two numpy arrays containing indices for each halfset

get_split_tilt_indices(particles_file, ind_file=None, tilt_ind_file=None, ntilts=None, datadir=None, particle_halfset_indices_file=None)

Split a tilt-series dataset into two halfsets (image indices).

Supports optional filtering by image/particle indices and precomputed splits.

load_halfset_dataset(spec, *, ind_split, lazy=False)

Load one dataset view and attach halfset-local indices for iteration.

resolve_halfset_indices(args)

Determine which images belong to each reconstruction half-set.

Priority order
  1. Explicit halfsets file (--halfsets).
  2. _rlnRandomSubset column in the STAR file (RELION convention).
  3. Random 50/50 split of all valid images.

load_halfset_dataset_from_args(args, lazy=False, ind_split=None)

Resolve halfsets from args and load the shared dataset view.

_index_utils

Canonical local/original image, group, and particle index mapping helpers.

recovar.data_io._index_utils

Explicit index-domain helpers for cryo-EM / cryo-ET datasets.

This module centralizes the translation between:

  • local image indices inside a loaded dataset view
  • original image indices in the source file
  • local group indices inside a loaded dataset view
  • original group indices in the source file

For SPA datasets, image and group domains are identical. For grouped datasets such as cryo-ET tilt series, a group corresponds to one particle / tilt series and expands to one or more local images.

DatasetIndexLayout(original_image_indices, grouped=False, original_group_indices=None, group_local_image_indices=None) dataclass

Index mapping for one dataset view.

Parameters

original_image_indices For each local image index, the source-file image index. grouped False for SPA, where image and group domains are the same. True for grouped datasets such as tilt-series data. original_group_indices For each local group index, the source-file group index. group_local_image_indices Only used when grouped=True. Each entry lists the local images belonging to one local group.

Notes

Original image/group ids may repeat in SPA subsets created from duplicate selections. Reverse lookup therefore uses explicit "last-write-wins" semantics, matching the previous subset remap behavior.

TiltSeriesOriginalIndexMap(particle_to_images, image_to_particle, tilt_numbers=None)

Original-file particle/image mapping used by cryo-ET selection logic.

normalize_indices(values, n_total, *, name='indices', allow_none=False)

Normalize int/bool indices to an int32 array with bounds checking.

load_index_like(value)

Return an in-memory index selection from an array-like or pickle path.

normalize_image_indices(values, *, n_total=None, name='indices')

Normalize image indices, optionally without a known dataset size.

When n_total is known, this is strict bounds-checked normalization. When it is unknown, the function still validates rank, dtype, and non-negativity, but cannot reject out-of-range values.

deduplicate_preserve_order(values, *, name='indices')

Drop duplicate values while keeping the first occurrence order.

filter_preserve_order(values, allowed)

Return the subset of values that appears in allowed, keeping order.

metadata_readers

Extract poses and CTF parameters from RELION .star and cryoSPARC .cs files.

recovar.data_io.metadata_readers

Extract poses and CTF parameters directly from RELION .star and cryoSPARC .cs files.

This module eliminates the need for cryoDRGN preprocessing. The output formats match exactly what load_utils.load_ctf_params and load_utils.load_poses return, so downstream code (load_cryodrgn_dataset) can consume either source interchangeably.

parse_poses_from_star(star_path, D)

Extract rotation matrices and translations from a RELION .star file.

Parameters:

Name Type Description Default
star_path str

Path to .star file.

required
D int

Target image dimension in pixels (used for translation normalisation).

required

Returns:

Name Type Description
rotations ndarray

(N, 3, 3) rotation matrices (float64).

translations ndarray

(N, 2) translations in fractional units (|val| <= 1). Multiply by D to obtain pixel offsets.

parse_ctf_from_star(star_path, D)

Extract CTF parameters from a RELION .star file.

Parameters:

Name Type Description Default
star_path str

Path to .star file.

required
D int

Target image dimension in pixels. Pixel size is adjusted for the ratio original_D / D.

required

Returns:

Type Description
ndarray

(N, 8) array with columns

ndarray

[Apix, DFU, DFV, DFANG, VOLT, CS, W, PHASE_SHIFT].

ndarray

This matches the output format of load_utils.load_ctf_params.

parse_poses_from_cs(cs_path, D)

Extract rotation matrices and translations from a cryoSPARC .cs file.

CryoSPARC stores rotations as 3-vector exponential maps (axis-angle / Rodrigues). Translations are in pixels.

Parameters:

Name Type Description Default
cs_path str

Path to .cs file.

required
D int

Target image dimension in pixels.

required

Returns:

Name Type Description
rotations ndarray

(N, 3, 3) rotation matrices.

translations ndarray

(N, 2) in fractional units.

parse_ctf_from_cs(cs_path, D)

Extract CTF parameters from a cryoSPARC .cs file.

Parameters:

Name Type Description Default
cs_path str

Path to .cs file.

required
D int

Target image dimension in pixels.

required

Returns:

Type Description
ndarray

(N, 8) array: [Apix, DFU, DFV, DFANG, VOLT, CS, W, PHASE_SHIFT].

can_extract_poses(filepath)

Return True if poses can be auto-extracted from this file type.

auto_parse_poses(filepath, D)

Auto-extract poses from STAR or CS file based on extension.

auto_parse_ctf(filepath, D)

Auto-extract CTF parameters from STAR or CS file based on extension.

starfile

RELION .star file reading and writing.

recovar.data_io.starfile

Utilities for reading and writing RELION .star files.

Supports both RELION 3.0 (single data table) and RELION 3.1 (with optics table).

Equivalent to cryodrgn/starfile

StarFile(starfile=None, *, data=None, data_optics=None)

Container for RELION .star file data with convenient access methods.

Attributes:

Name Type Description
df

Main data table

data_optics

Optics table (None for RELION 3.0)

Initialize from file or data tables.

Parameters:

Name Type Description Default
starfile Optional[str]

Path to .star file (mutually exclusive with data)

None
data Optional[DataFrame]

Main data table (keyword only)

None
data_optics Optional[DataFrame]

Optics table (keyword only)

None

has_optics property

Whether this is RELION 3.1 format with optics table.

relion31 property

Alias for has_optics (compatibility).

apix property

Pixel size (Angstroms/pixel) for each particle.

Tries _rlnImagePixelSize first (RELION 3.1+), then falls back to _rlnDetectorPixelSize * 1e4 / _rlnMagnification (older RELION).

resolution property

Image size (pixels) for each particle.

Tries _rlnImageSize first (RELION 3.1 optics table). Falls back to reading the MRC header of the first particle stack referenced in _rlnImageName (RELION 3.0 files).

load(filepath) classmethod

Load from .star file (convenience method).

save(filepath)

Save to .star file.

write(filepath)

Alias for save().

__len__()

Number of particles in main data table.

__eq__(other)

Check equality with another StarFile.

get_optics_values(field, dtype=None)

Get per-particle values for a field, consulting optics table if available.

Parameters:

Name Type Description Default
field str

Field name to retrieve

required
dtype Optional[dtype]

Optional dtype to cast values to

None

Returns:

Type Description
Optional[ndarray]

Array of values (one per particle) or None if field not found

set_optics_values(field, values)

Set per-particle values for a field in appropriate table.

Parameters:

Name Type Description Default
field str

Field name to set

required
values Union[float, List, ndarray]

Single value or array of values

required

flatten_to_relion30()

Convert to RELION 3.0 format by flattening optics into main table.

Returns:

Type Description
DataFrame

DataFrame with all optics fields merged into main table

to_relion30()

Alias for flatten_to_relion30 (compatibility).

read_star(filepath)

Parse a RELION .star file into main data and optional optics tables.

Results are cached by normalised absolute path + file mtime so that repeated calls for the same unchanged file (e.g. once for halfsets, once per halfset for CTF/poses/image loading) incur only one disk read, while any write to the file automatically triggers a re-parse.

Parameters:

Name Type Description Default
filepath str

Path to .star file

required

Returns:

Type Description
Tuple[DataFrame, Optional[DataFrame]]

Tuple of (main_data, optics_data) where optics_data is None for RELION 3.0

write_star(filepath, data, data_optics=None)

Write data to a RELION .star file.

Parameters:

Name Type Description Default
filepath str

Output file path

required
data DataFrame

Main data table

required
data_optics Optional[DataFrame]

Optional optics table (for RELION 3.1 format)

None

load_utils

CTF and pose loading utilities (legacy pickle format).

recovar.data_io.load_utils

Utilities for loading CTF parameters and pose information from pickle files. Equivalent to cryodrgn/load

load_ctf_params(D, ctf_params_pkl)

Load and adjust CTF parameters for a given image size.

Parameters:

Name Type Description Default
D int

Target image dimension (must be even)

required
ctf_params_pkl str

Path to pickle file containing CTF parameters

required

Returns:

Type Description
ndarray

CTF parameters array with shape (N, 8), excluding image size column

load_poses(infile, Nimg, D, ind=None)

Load pose information (rotations and translations) from pickle files.

Parameters:

Name Type Description Default
infile Union[str, List[str]]

Path to pickle file(s). Can be: - Single file containing (rotations, translations) tuple - Single file containing rotations only - List of two files: [rotations_file, translations_file]

required
Nimg int

Expected number of images

required
D int

Image dimension in pixels

required
ind Optional[ndarray]

Optional index array to filter poses

None

Returns:

Type Description
Tuple[ndarray, Optional[ndarray], int]

Tuple of (rotations, translations, D) where: - rotations: (Nimg, 3, 3) array of rotation matrices - translations: (Nimg, 2) array of translations in pixels, or None - D: Image dimension (passthrough)

image_loader

Image loading from MRC/MRCS stacks and HDF5 files.

recovar.data_io.image_loader

Utilities for loading cryo-EM particle images from various file formats.

Supported formats: - MRC/MRCS: Single or multi-image MRC stacks - STAR: RELION star files referencing MRC stacks - CS: cryoSPARC particle files - TXT: Text file listing MRC paths

All loaders share the ImageLoader base class which provides a uniform interface for indexing, batching, and caching.

ImageLoader(num_images, image_size, dtype=np.float32)

Base class for loading particle images.

Provides a uniform interface for indexing (int, slice, array, bool mask), lazy/eager loading, caching, and batched iteration.

n property

Compatibility alias for num_images.

D property

Compatibility alias for image_size.

selection_indices property

Original source-row indices represented by this loader.

from_file(filepath, lazy=True, indices=None, datadir='', max_threads=1, strip_prefix=None) staticmethod

Compatibility alias for load_images().

image_count(filepath, datadir=None, strip_prefix=None) classmethod

Get image count without constructing a full loader.

For MRC/MRCS files, reads only the header (1 kB). For other formats, falls back to full lazy construction.

__getitem__(key)

Get images using indexing syntax.

get(indices=None)

Get images at specified indices.

Parameters:

Name Type Description Default
indices

Indices to retrieve (int, slice, array, or None for all)

None

Returns:

Type Description
ndarray

Array of shape (N, image_size, image_size)

images(indices=None, require_contiguous=False)

Compatibility alias for get().

iter_batches(batch_size=1000)

Iterate over images in batches.

Yields:

Type Description
Tuple[ndarray, ndarray]

(indices, images) tuples

chunks(chunksize=1000)

Compatibility alias for iter_batches().

load_all()

Load and cache all images in memory.

MRCLoader(filepath, indices=None, lazy=True, skip_staging=False)

Bases: ImageLoader

Load images from a single MRC/MRCS file.

Uses contiguous seek+fromfile for sequential reads and individual seek+fromfile for scattered random access. A lazy np.memmap view is available via _get_memmap() for bulk access patterns.

Local staging: if RECOVAR_CACHE_DIR (or $TMPDIR) is set, the MRC file is transparently copied to that directory on first access and all subsequent reads go to the fast local copy. See :mod:recovar.data_io.staging for details and performance numbers.

close()

Release memory-mapped file resources.

MultiMRCLoader(file_map, indices=None, lazy=True, max_threads=1, raw_paths=None, skip_staging=False)

Bases: ImageLoader

Load images distributed across multiple MRC files.

close()

Release resources for all sub-loaders.

from_txt(filepath, indices=None, lazy=True, max_threads=1, skip_staging=False) staticmethod

Create loader from text file listing MRC paths.

StarLoader(filepath, indices=None, datadir='', lazy=True, max_threads=1, strip_prefix=None, skip_staging=False)

Bases: MultiMRCLoader

Load images from RELION STAR file.

CryoSparcLoader(filepath, indices=None, datadir='', lazy=True, max_threads=1, strip_prefix=None, skip_staging=False)

Bases: MultiMRCLoader

Load images from cryoSPARC CS file.

DownsamplingImageLoader(base_loader, target_D)

Bases: ImageLoader

Wrapper that Fourier-crops images on the fly during loading.

load_images(filepath, indices=None, datadir='', lazy=True, max_threads=1, strip_prefix=None, skip_staging=False)

Load cryo-EM images from file.

Parameters:

Name Type Description Default
filepath str

Path to data file (.mrcs, .star, .txt, .cs)

required
indices Optional[ndarray]

Optional subset of image indices to load

None
datadir str

Base directory for resolving relative paths

''
lazy bool

If True, defer loading until access

True
max_threads int

Number of threads for parallel I/O

1
strip_prefix Optional[str]

Prefix to strip from paths in metadata

None
skip_staging bool

If True, skip local staging (useful for one-shot reads like downsampling where staging the full-res data is wasteful)

False

Returns:

Type Description

ImageLoader instance for the specified file