Skip to content

PCA feature compression for organelle attribution#10

Open
gav-sturm wants to merge 35 commits intomainfrom
pca_feature_compression
Open

PCA feature compression for organelle attribution#10
gav-sturm wants to merge 35 commits intomainfrom
pca_feature_compression

Conversation

@gav-sturm
Copy link
Collaborator

@gav-sturm gav-sturm commented Mar 17, 2026

Summary

  • Adds per-channel and pooled (downsampled) PCA variance sweep pipeline (pca_optimization.py) to find the optimal number of PCs per biological signal using copairs mAP scoring
  • Outputs guide/gene-level h5ads at peak PCs with embeddings stored in obsm following scanpy convention (X_pca, X_umap, X_phate)
  • Adds DINO vs CellProfiler mAP comparison script (compare_dino_cp_map.py) with scatterplot + linear regression and discordant gene highlighting
  • Adds hconcat_by_perturbation and multi-level aggregation helpers to anndata_utils.py

Dependencies

Requires czbiohub-sf/ops_utils#4 to be merged to main first.

Test plan

  • Run --slurm --downsampled on DINO features, verify per-signal h5ads written to dino/per_signal/
  • Run --aggregate-only --downsampled, verify guide_pca_optimized.h5ad has obsm['X_pca'], obsm['X_umap'], obsm['X_phate']
  • Run python -m ops_utils.validation.embedding_convention guide_pca_optimized.h5ad — all checks pass
  • Run --cell-profiler --slurm --downsampled, verify outputs land in cellprofiler/
  • Run compare_dino_cp_map.py once both feature types complete

🤖 Generated with Claude Code

@gav-sturm gav-sturm requested a review from ahillsley March 17, 2026 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant