Troubleshooting¶
Data loading errors¶
"Cannot find MRC file(s)"¶
FileNotFoundError: Cannot find 15 MRC file(s) referenced in the metadata.
First missing: /old/path/to/Micrographs/image.mrcs
RECOVAR automatically tries extension swaps (.mrc ↔ .mrcs) and flat-directory fallbacks before failing. If it still can't find files:
-
Diagnose with
recovar check_pathsto see exactly what paths are tried: -
Fix with
--datadirand/or--strip-prefix:
See Fixing broken paths.
Tip
The web GUI validates input files when you select them — it checks that paths resolve and shows the particle count before you submit a job.
"CS file has no alignments3D/pose field"¶
Your .cs file doesn't contain pose information (e.g., it's a passthrough file or import job). Use the *_particles.cs file from a refinement job, not a passthrough or import file.
"Must provide --poses and --ctf for .mrcs input"¶
When using .mrcs files directly, you must provide pose and CTF pickle files:
Use .star or .cs files instead for automatic extraction.
GPU and memory¶
"No GPU found"¶
JAX can't see your GPU. Check:
If only CPU shows up, reinstall JAX with CUDA:
On clusters, you may need to load CUDA modules first:
Out of GPU memory¶
Try these in order:
- Downsample:
--downsample 128reduces memory by ~4x vs 256 - Limit memory:
--gpu-gb 8to control allocation - Low memory mode:
--low-memory-optionor--very-low-memory-option - Lazy loading:
--lazyto avoid loading full dataset into RAM - Fewer images:
--n-images 50000for initial exploration
JAX pre-allocation¶
By default, JAX pre-allocates most GPU memory. On shared machines:
Custom CUDA extension is unavailable or nvcc is missing¶
RECOVAR prefers its custom CUDA backproject/project extension on GPU because it
is substantially faster than the pure JAX fallback. Installing recovar[gpu]
or .[gpu] gives you CUDA-enabled JAX wheels, but a working JAX GPU install
alone is not enough to build that extension.
If RECOVAR stops with a custom CUDA build/load error, make sure a local CUDA toolkit/compiler is available through one of these mechanisms:
NVCC=/full/path/to/nvccCUDACXX=/full/path/to/nvccnvccavailable onPATHLOCAL_CUDA_PATH,CUDA_HOME, orCUDA_PATHpointing at a toolkit root
Then either rerun RECOVAR so it can auto-build the shared library, or build it explicitly first:
If you need to get unblocked temporarily, force the slower JAX GPU path:
That workaround is supported, but it is not the preferred configuration.
JaxRuntimeError: INTERNAL: Autotuning failed for HLO ... NOT_FOUND: No valid config found!¶
This is XLA's GPU autotuner — the part of JAX that picks an optimal CUDA
kernel at JIT time — failing because it has no tuning data for your GPU.
Symptoms include the message above, sometimes preceded by repeated
Allocator (GPU_X_bfc) ran out of memory warnings (those are autotuner
candidate-kernel probes failing, not real OOM).
This happens on GPUs newer than the JAX version was tuned for — most often Blackwell (sm_100, sm_120: B100/B200, RTX 50-series, RTX PRO Blackwell) on JAX versions cut before Blackwell support landed.
Workaround — disable autotuning so XLA falls back to default heuristic kernel selection:
You can stack this with RECOVAR_DISABLE_CUDA=1 if you also need to
avoid recovar's custom kernel:
If --xla_gpu_autotune_level=0 doesn't help, two more knobs to try:
export XLA_FLAGS="--xla_gpu_autotune_level=0 --xla_gpu_enable_triton_gemm=false"
# or
export XLA_FLAGS="--xla_gpu_autotune_level=0 --xla_gpu_enable_command_buffer="
If none of these work, your GPU is past what your JAX version's XLA can lower at all. The fix is upgrading JAX once a release with tuning data for your hardware is available — there is no recovar-side workaround. This is a JAX/XLA limitation, not a recovar bug.
Pipeline issues¶
Mean looks wrong¶
- Check your mask isn't inverted or too tight:
--mask-dilate-iter 10 - Run with
--only-meanfirst to quickly verify setup - Try
--mask=sphereto rule out mask issues - Check that poses are from a good consensus refinement
- Use the GUI's slice viewer (
recovar gui) to inspect the mean volume and mask side-by-side
Results differ between runs¶
This can happen if:
- Half-set splits differ (use
--halfsetsto fix a specific split) - Image ordering changed (use
--indto fix image selection) - JAX non-determinism on different hardware
Pipeline is slow¶
- Downsample:
--downsample 128is the biggest speedup - Multi-GPU:
--multi-gpufor parallel processing - Fewer PCs:
--zdim=4,10instead of1,2,4,10,20 - Skip analysis steps: Use
--only-meanfor quick checks
Analysis issues¶
UMAP is slow¶
For large datasets (>200k particles), UMAP can take a long time:
You can run UMAP separately later on a subset of particles.
Density estimation runtime¶
Runtime scales as O(N^pca_dim). Keep --pca_dim at 4 or below:
Check all_densities.png and Lcurve.png to verify the optimal regularization was selected correctly.
Installation issues¶
Native fast-marching extension build failure¶
The in-tree C++ fast-marching extension is optional. Published Linux and macOS wheels include it on supported builds, while source installs compile it locally when a C++ toolchain is available. If that build fails, installation still succeeds and RECOVAR uses the pure-Python fallback.
If you want the native backend, install a working C++ toolchain and then reinstall RECOVAR.
JAX version conflicts¶
RECOVAR requires JAX 0.9.0.1. Pin the version:
Multiple recovar installations¶
If you have multiple editable installs, the wrong one may be imported: