feat: add feature vector extraction to classification responses#77
feat: add feature vector extraction to classification responses#77mohamedelabbas1996 wants to merge 23 commits intomainfrom
Conversation
🔍 Existing Issues For ReviewYour pull request is modifying functions with the following pre-existing issues: 📄 File: trapdata/api/models/classification.py
Did you find this useful? React with a 👍 or 👎 |
| ] | ||
|
|
||
| plotly = "^5.21.0" | ||
| scikit-learn = "^1.3.0" |
There was a problem hiding this comment.
I think we should make these optional dependencies and just use numpy in the tests. unless we need to use them in the core app.
[tool.poetry.extras]
dev = ["plotly", "scikit-learn"]
| model.eval() | ||
| return model | ||
|
|
||
| def get_features(self, batch_input: torch.Tensor) -> torch.Tensor: |
There was a problem hiding this comment.
Nice work on this method of extracting features! It seems more flexible than our current feature extractor. Perhaps we should add a comment in both feature extractors that the other one exists. And eventually update the old one to use this code.
Plan: Bringing this branch up to date with mainThis branch is 30 commits behind main and has merge conflicts in 3 files. Main has since refactored the classification code significantly (added StrategyMerge main into this branch and resolve conflicts, adapting the feature extraction additions to work with main's refactored code. Conflicts to Resolve1. Main refactored Resolution: Adapt feature extraction to main's
2. 3. PSv2 Worker / Antenna IntegrationThe PSv2 worker ( Serialization path: Feature vectors are 2048 floats per classification. Models that don't implement Files to Modify
Verification
|
…tion foundation Merge main into feat/add-classification-features-to-response. Conflicts in pyproject.toml, poetry.lock, and api/models/classification.py resolved by taking main's version. Mohamed's get_features() and features schema field came through auto-merge and will be refined in subsequent commits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds include_features and include_logits flags to PipelineConfigRequest (API) and Settings (worker). Adds features field to ClassificationResponse. Makes logits field conditional (default None). Both default to off for backward compatibility and reduced response size.
APIMothClassifier now accepts include_features and include_logits flags. When enabled, predict_batch() extracts features via get_features() and post_process_batch() conditionally includes logits. Both flow through ClassifierResult → update_detection_classification() → ClassificationResponse. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
API endpoint passes both flags from PipelineConfigRequest to classifier. Worker passes both from Settings (AMI_INCLUDE_FEATURES, AMI_INCLUDE_LOGITS env vars) to classifier constructor. No changes needed to _process_batch() since the predict_batch()/post_process_batch() overrides handle the flow.
Tests that features are 2048-dim when enabled, logits present when enabled, both absent when disabled (default), and both present when both flags set. Replaces Mohamed's original tests with opt-in config pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds opt-in extraction and inclusion of model feature vectors and raw logits in classification responses. Flags flow from Settings → API request config → worker → classifier; model-level feature hook implemented; response schemas and tests updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant API as API
participant Worker as Worker
participant Classifier as APIMothClassifier
participant Model as Resnet50TimmClassifier
participant Response as Response
Client->>API: POST /process (PipelineRequest incl. include_features, include_logits)
API->>Worker: enqueue/process job (with config flags)
Worker->>Classifier: instantiate (include_features, include_logits)
Worker->>Classifier: predict_batch(batch)
Classifier->>Model: forward(batch) -> logits
alt include_features
Classifier->>Model: get_features(batch)
Model-->>Classifier: feature tensor (e.g., 2048-dim)
end
Classifier->>Classifier: post_process_batch(logits)
Classifier-->>Response: ClassificationResult(logits?, features?)
Response-->>Client: PipelineResponse
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Resolve conflicts in worker.py by taking origin/main's version (from PR #122) and re-applying our include_features/include_logits classifier constructor change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
trapdata/settings.py (1)
46-48: Consider adding documentation entries for new settings.The new
include_featuresandinclude_logitssettings work correctly but lack entries in thefieldsdict (lines 73-183) that other settings use for Kivy UI integration and documentation. This is optional since these are likely only used in worker/API contexts, not the GUI.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@trapdata/settings.py` around lines 46 - 48, Add documentation/UI metadata entries for the two new booleans by adding keys "include_features" and "include_logits" to the fields dict so Kivy and docs pick them up; mirror the format used by other boolean settings in the same fields dict (provide a human-friendly label, a short description, type/validator as boolean, and default value) so the Settings/include_features and Settings/include_logits options appear in the UI/docs like the other settings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@trapdata/api/models/classification.py`:
- Around line 73-79: predict_batch currently returns (logits, features) which
breaks InferenceBaseClass.run() timing (it expects a tensor and uses
len(batch_output) as batch size); change predict_batch in classification.py (the
predict_batch method that calls self.model and self.get_features) to return only
the logits tensor and move feature extraction to a separate method or populate
self.last_features (or leave get_features unused here) so callers that need
features can call get_features(batch_input) explicitly; ensure no callers expect
a tuple from predict_batch and that APIMothClassifier.run() /
InferenceBaseClass.run() continue to receive a tensor for timing.
In `@trapdata/api/tests/test_features_extraction.py`:
- Around line 50-53: Replace the bare assertion of the response status with a
unittest assertion: in the test block that uses self.file_server and calls
self.client.post("/process", json=pipeline_request.model_dump()) (the block that
then constructs PipelineResponse(**response.json())), change the bare "assert
response.status_code == 200" to use self.assertEqual(response.status_code, 200)
so the test uses unittest's assertion style and yields better failure messages.
---
Nitpick comments:
In `@trapdata/settings.py`:
- Around line 46-48: Add documentation/UI metadata entries for the two new
booleans by adding keys "include_features" and "include_logits" to the fields
dict so Kivy and docs pick them up; mirror the format used by other boolean
settings in the same fields dict (provide a human-friendly label, a short
description, type/validator as boolean, and default value) so the
Settings/include_features and Settings/include_logits options appear in the
UI/docs like the other settings.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4d622b1d-d5ec-46e8-8540-e7f09551e702
📒 Files selected for processing (9)
trapdata/antenna/worker.pytrapdata/api/api.pytrapdata/api/models/classification.pytrapdata/api/schemas.pytrapdata/api/tests/test_features_extraction.pytrapdata/common/constants.pytrapdata/ml/models/base.pytrapdata/ml/models/classification.pytrapdata/settings.py
💤 Files with no reviewable changes (1)
- trapdata/common/constants.py
- predict_batch() now stores features in self._last_features instead of returning a tuple, preserving compatibility with base class run() which uses len(batch_output) for timing calculation - Existing tests that assert logits are present now pass include_logits=True since logits are opt-in (default off) - Use self.assertEqual for status code assertion in test helper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Worker path test verifies features flow through predict_batch/post_process_batch - Validity test checks features are non-zero, have variance, and differ between detections (not just checking existence and dimension)
There was a problem hiding this comment.
🧹 Nitpick comments (1)
docs/superpowers/plans/2026-03-25-feature-vector-extraction.md (1)
675-678: Consider adding language specifier for consistency.The environment variable example block (lines 676-678) lacks a language specifier. Adding one would improve syntax highlighting and satisfy the markdownlint rule.
♻️ Suggested fix
**Worker:** Set environment variables: -``` +```bash AMI_INCLUDE_FEATURES=true AMI_INCLUDE_LOGITS=true</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@docs/superpowers/plans/2026-03-25-feature-vector-extraction.mdaround lines
675 - 678, Add a language specifier to the environment variable code fence so
markdownlint and syntax highlighting work; update the fenced block containing
AMI_INCLUDE_FEATURES and AMI_INCLUDE_LOGITS (the code fence around those two
lines) to start withbash instead of just, preserving the two env lines
unchanged.</details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In@docs/superpowers/plans/2026-03-25-feature-vector-extraction.md:
- Around line 675-678: Add a language specifier to the environment variable code
fence so markdownlint and syntax highlighting work; update the fenced block
containing AMI_INCLUDE_FEATURES and AMI_INCLUDE_LOGITS (the code fence around
those two lines) to start withbash instead of just, preserving the two
env lines unchanged.</details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: defaults **Review profile**: CHILL **Plan**: Pro **Run ID**: `3d7c6ef0-da9b-455f-b4f2-5c36b9a33381` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between aa530fc2976a76ad98d1f1777c923fdd48952720 and 598d6edbe6c9e40fc834302d37db3524b3e8f160. </details> <details> <summary>📒 Files selected for processing (4)</summary> * `docs/superpowers/plans/2026-03-25-feature-vector-extraction.md` * `trapdata/api/models/classification.py` * `trapdata/api/tests/test_api.py` * `trapdata/api/tests/test_features_extraction.py` </details> <details> <summary>✅ Files skipped from review due to trivial changes (2)</summary> * trapdata/api/tests/test_api.py * trapdata/api/tests/test_features_extraction.py </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
- Clear self._last_features after post_process_batch reads it to free GPU memory between batches - Add include_features and include_logits to Kivy settings fields dict for discoverability in the desktop app UI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
trapdata/api/models/classification.py (1)
39-47:⚠️ Potential issue | 🟡 MinorAdd type hints to override methods and move
include_features/include_logitsto keyword-only parameters.Three issues:
Type hints missing:
predict_batch()(line 73) andpost_process_batch()(line 81) override base-class methods but lack type hints, violating the project's type hint requirement.Unconditional CPU transfer: Line 91 unconditionally calls
logits.cpu(), but the result is only used wheninclude_logits=True(line 100). Move this transfer inside the conditional to avoid overhead on the default inference path.Positional argument contract: Adding
include_featuresandinclude_logitsbefore*argschanges the positional constructor contract. Moving them after*argsas keyword-only parameters prevents silent rebinding if any caller passes extra positional arguments.♻️ Suggested signature change
def __init__( self, source_images: typing.Iterable[SourceImage], detections: typing.Iterable[DetectionResponse], terminal: bool = True, - include_features: bool = False, - include_logits: bool = False, *args, + include_features: bool = False, + include_logits: bool = False, **kwargs, ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@trapdata/api/models/classification.py` around lines 39 - 47, The constructor __init__ currently places include_features and include_logits before *args (changing the positional contract) and lacks keyword-only semantics—move include_features and include_logits to be keyword-only parameters after *args/**kwargs to preserve positional behavior; add proper type hints to the overriding methods predict_batch(...) and post_process_batch(...) to match the base-class signatures (use the exact parameter and return types from the base class) so static typing passes; finally, avoid unconditional CPU transfer by moving logits.cpu() so it is only called inside the conditional where include_logits is True (use the local variable logits only when include_logits) to prevent unnecessary overhead.
🧹 Nitpick comments (1)
trapdata/api/models/classification.py (1)
73-79: Add explicit tensor/result annotations to the new overrides.These two methods are now part of the feature/logit contract, but both signatures are still untyped. Adding concrete
torch.Tensor/list[ClassifierResult]annotations will make the flow easier to reason about and keeps this file aligned with the repo standard.📝 Suggested annotations
- def predict_batch(self, batch): + def predict_batch(self, batch: torch.Tensor) -> torch.Tensor: @@ - def post_process_batch(self, batch_output): + def post_process_batch( + self, batch_output: torch.Tensor + ) -> list[ClassifierResult]:As per coding guidelines, "Use type hints in function signatures to document expected types without requiring extensive documentation".
Also applies to: 81-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@trapdata/api/models/classification.py` around lines 73 - 79, The methods (e.g., predict_batch and the companion override in the same class between lines 81-114) lack type annotations—add explicit type hints so predict_batch returns torch.Tensor and accepts a torch.Tensor (or appropriate torch.Tensor subtype) for the batch parameter, and annotate the other override to return list[ClassifierResult] (import or reference ClassifierResult as needed); update the function signatures (e.g., def predict_batch(self, batch: torch.Tensor) -> torch.Tensor) and the companion method signature accordingly to match the feature/logit contract and repo typing conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@trapdata/api/models/classification.py`:
- Around line 39-47: The constructor __init__ currently places include_features
and include_logits before *args (changing the positional contract) and lacks
keyword-only semantics—move include_features and include_logits to be
keyword-only parameters after *args/**kwargs to preserve positional behavior;
add proper type hints to the overriding methods predict_batch(...) and
post_process_batch(...) to match the base-class signatures (use the exact
parameter and return types from the base class) so static typing passes;
finally, avoid unconditional CPU transfer by moving logits.cpu() so it is only
called inside the conditional where include_logits is True (use the local
variable logits only when include_logits) to prevent unnecessary overhead.
---
Nitpick comments:
In `@trapdata/api/models/classification.py`:
- Around line 73-79: The methods (e.g., predict_batch and the companion override
in the same class between lines 81-114) lack type annotations—add explicit type
hints so predict_batch returns torch.Tensor and accepts a torch.Tensor (or
appropriate torch.Tensor subtype) for the batch parameter, and annotate the
other override to return list[ClassifierResult] (import or reference
ClassifierResult as needed); update the function signatures (e.g., def
predict_batch(self, batch: torch.Tensor) -> torch.Tensor) and the companion
method signature accordingly to match the feature/logit contract and repo typing
conventions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 26361a03-760f-4551-84a2-1e5354138216
📒 Files selected for processing (2)
trapdata/api/models/classification.pytrapdata/settings.py
🚧 Files skipped from review as they are similar to previous changes (1)
- trapdata/settings.py
Description
This PR adds support for returning model feature vectors (embeddings) alongside classification results in the Data Companion API and worker.
The classification pipeline now supports returning a vector embedding per classification, derived from the classification model backbone (ResNet50). Feature vectors are 2048-dim embeddings extracted before the classification head, useful for downstream tasks like clustering, similarity search, and tracking.
The changes are fully backward-compatible for models that do not implement custom
get_features(), as they will fallback to returningNonefrom the base class.Also makes raw logits optionally available in the response.
Both features and logits are opt-in via request config or worker environment variables to keep default response size unchanged.
Related Issues
#752
Screenshots
Detection features clustering visualization using K-means + PCA

Changes
InferenceBaseClass.get_features()base method (returns None for models that don't support it)Resnet50TimmClassifier.get_features()extracts 2048-dim embeddings viamodel.forward_features()ClassifierResult.featuresfield flows through the classification pipelineClassificationResponse.featuresand conditionallogitsin the API schemainclude_features/include_logitsflags inPipelineConfigRequestandSettingsUsage
API request:
{ "pipeline": "global_moths_2024", "source_images": [...], "config": { "include_features": true, "include_logits": true } }Worker environment:
Credits
Original feature extraction implementation by @mohamedelabbas1996. Updated to work with the current codebase and extended with opt-in config toggles. His original branch is preserved at
archive/feat/add-classification-features-to-response-original.Follow-up Work
FeatureExtractorpipeline stage with the classifier-integratedget_features()introduced here, eliminating redundant forward passes and simplifying the pipeline from 5 to 4 stages.Related PRs
Test plan
pytest trapdata/api/tests/test_features_extraction.py -v— feature/logits opt-in, worker path, and feature validity testspytest trapdata/ -x— full test suite🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
include_featuresconfiguration flaginclude_logitsconfiguration flagTests