Outlier Detection¶
RECOVAR includes tools for detecting junk particles and outliers in your dataset.
Manual outlier detection via k-means
A common approach is to run recovar analyze with k-means clustering, then inspect the PC scatter plots for isolated clusters. The Tutorial demonstrates this on EMPIAR-10076: cluster 0 (1.3% of particles) is visibly separated from the main body and is removed using extract_image_subset_from_kmeans before re-running the pipeline.
Junk particle detection¶
This analyzes the pipeline output to identify particles that are likely junk (ice, aggregates, etc.) based on their fit to the model. Output is organized into plots/ and data/ subdirectories. Use --save-all-plots for a full diagnostic plot dump (default: just indices and summary).
Outlier detection¶
Identifies statistical outliers in the dataset. Like junk detection, output uses plots/ and data/ subdirectories, with --save-all-plots for full diagnostics.
Pipeline with outliers¶
For a combined workflow that runs the pipeline and outlier detection together:
Using results¶
Both commands output indices of detected outliers/junk. You can use these to filter your dataset: