Skip to content

feat: add Mothbot YOLO11m detection pipeline#141

Open
mihow wants to merge 22 commits intomainfrom
worktree-mothbot-pipeline
Open

feat: add Mothbot YOLO11m detection pipeline#141
mihow wants to merge 22 commits intomainfrom
worktree-mothbot-pipeline

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented Apr 15, 2026

Summary

Adds the Mothbot YOLO11m-OBB detector to the AMI Data Companion and pairs it with existing classifiers in two new pipelines:

  • mothbot_insect_orders_2025 — YOLO-OBB → ConvNeXt insect-order classifier (no binary prefilter).
  • mothbot_panama_moths_2023 — YOLO-OBB → binary moth/non-moth prefilter → Panama 2023 species classifier (148 species).

Both are verified end-to-end via the local Antenna NATS loop (55 images → 1015 detections → 1665 classifications on the Panama pipeline).

What changed

Core features

  • New ML-layer detector MothObjectDetector_YOLO11m_Mothbot in trapdata/ml/models/localization.py. Wraps ultralytics.YOLO, converts OBB output (4 rotated corners) to a YoloDetection dataclass via _corners_to_yolo_detection. Coordinates are clamped to image bounds via result.orig_shape to prevent downstream crop failures.
  • New API wrapper APIMothDetector_YOLO11m_Mothbot in trapdata/api/models/localization.py. Populates the new rotation field on DetectionResponse from the OBB angle.
  • Two new pipeline classes MothbotInsectOrderClassifier and MothbotMothClassifierPanama, each inheriting 100% of classifier logic from the existing pipeline and only overriding detector_cls = APIMothDetector_YOLO11m_Mothbot.
  • detector_cls class attribute on APIMothClassifier: each pipeline pairs a classifier with a specific detector. All pre-existing pipelines inherit the default (FasterRCNN 2023) — no behavior change.
  • pipeline_description class attribute on APIMothClassifier: lets the pipeline level carry a description of the full detector+classifier combo for /info, distinct from the classifier algorithm's own description.
  • Optional rotation: float | None field on DetectionResponse. Populated by the YOLO detector; no consumer reads it yet. Future species classifiers could use it for tighter rotated crops.
  • Rename CLASSIFIER_CHOICESPIPELINE_CHOICES across 7 files. Honest naming — the dict is used as a pipeline registry, not just classifiers.
  • should_filter_detections() tightened from in to issubclass() so subclasses of exempt classes inherit the exemption.

Bug fixes discovered during integration testing

  • RGB→BGR channel swap (0726b23): ultralytics' LoadPilAndNumpy._single_check passes numpy inputs as-is (no channel conversion), but the model was trained on cv2-loaded BGR images. Our antenna path fed RGB tensors, causing the detector to produce large, low-quality full-image boxes. Median detection width dropped from ~1227 px to ~245 px after the fix.
  • Negative OBB coordinate clamping (68e0c16): YOLO-OBB emits corners outside image bounds for detections near edges (observed: y1=-274 on a 2464-tall image). Negative coords caused PyTorch slicing to treat them as end-relative indices, yielding empty (H=0) crops that crash the classifier's Resize((128,128)). Fixed at the detector level using result.orig_shape and defence-in-depth in the antenna worker.
  • Degenerate zero-area detection filter (7edc9d9): OBB results with all 4 corners collinear produce zero-area axis-aligned envelopes. Filtered in post_process_single.
  • Tensor batch handling (0df47bf): antenna's REST dataloader passes tensors (not PIL); added torch→numpy conversion in predict_batch for the YOLO path.
  • Hardcoded detector in antenna worker (9f2b5f9): worker was instantiating APIMothDetector directly instead of using classifier_class.detector_cls.

Licensing

Component License
yolo11m_4500_imgsz1600_b1_2024-01-18.pt weights AGPL-3.0 (tagged in checkpoint)
ultralytics library AGPL-3.0
Mothbot_Process repo No explicit license
AMI Data Companion AGPL-3.0

No verbatim-ported code from Mothbot's unlicensed repo — the detection wrapper is a fresh implementation.

Test plan

  • Unit tests (pure-CPU, no model load): test_mothbot_yolo.py — 9 tests covering OBB→envelope math, degenerate/negative/out-of-bounds coords, RGB→BGR conversion, post_process_single filtering.
  • End-to-end integration tests: test_mothbot_pipeline.py — sends one vermont image through each pipeline via /process, asserts detections, rotation field, correct algorithm keys, and binary prefilter for Panama.
  • Full suite: uv run pytest trapdata/ — all tests pass.
  • Antenna NATS e2e: test_ml_job_e2e — 55 images through mothbot_panama_moths_2023, 16 through mothbot_insect_orders_2025, zero failures.
  • Detection quality verified: median bbox 245×235 px on 3280×2464 source images.
  • CI runs against Arbutus-hosted weights.

Follow-ups

  • Rotated crops for classification: the YOLO-OBB rotation field is populated but unused. A future classifier could use cv2.warpAffine + cv2.getRectSubPix (matching Mothbot's crop_rect() approach) to produce tighter, rotation-corrected crops instead of the current axis-aligned envelope. Would improve classification accuracy for angled specimens.
  • Accuracy/latency evaluation: YOLO-OBB vs FasterRCNN 2023 comparison on the same image set. The YOLO model produces more detections (~18/image vs ~8 for FasterRCNN on Panama images) — unclear whether these are better recall or more false positives.
  • Antenna create_pipelines rename handling: sync inserts new rows but doesn't rename existing ones. Pipeline name changes require manual DB fixup on each deployment. Worth a fix in the Antenna create_pipelines logic (update name/description when slug matches).
  • Empty algorithm descriptions in Antenna: all pipelines ship description: '' for their algorithm rows in Antenna. The description field is set on the Python classes but not flowing through registration. Pre-existing issue, not mothbot-specific.
  • BioCLIP classifier pipeline: add Mothbot's pybioclip order/species classifier as a third pipeline variant. See classifier integration notes in the previous PR description for the reference implementation pointers.
  • Binary filter bbox clamping: the _apply_binary_classification function in worker.py has the same unclamped int(bbox.y1) pattern that was fixed in the main classifier crop loop. Currently safe because the detector now clamps at the source, but worth adding defence-in-depth for consistency.
  • Runbook: see docs/claude/runbook-mothbot-local-testing.md for the local testing loop used during development.

🤖 Generated with Claude Code

mihow and others added 11 commits April 14, 2026 18:16
New API pipeline (mothbot_insect_orders_2025) pairing the Mothbot
YOLO11m-OBB detector with the existing InsectOrderClassifier2025.
Architecture: detector_cls class attribute on APIMothClassifier,
CLASSIFIER_CHOICES renamed to PIPELINE_CHOICES, optional rotation
field on DetectionResponse.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
YOLO weights and ultralytics are AGPL-3 (compatible with our
AGPL-3 project). Mothbot repo has no explicit license, so we
re-implement rather than verbatim-port; one adapted snippet
(torch 2.6 weights_only fallback) will be attributed inline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Step-by-step plan derived from the spec at
docs/superpowers/specs/2026-04-14-mothbot-detection-pipeline-design.md.
Nine tasks, TDD-style, each ending with a green test run and a commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The dict maps pipeline slug to the classifier class, but it's used
as the pipeline registry. Rename for honesty. No behavior change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces a detector_cls class attribute on APIMothClassifier,
defaulting to APIMothDetector (FasterRCNN 2023). The /process and
/info handlers now read Classifier.detector_cls instead of a
hardcoded reference. No behavior change — every existing pipeline
keeps the default.

Enables pairing a non-FasterRCNN detector with a specific classifier
in a future commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Forward-looking schema addition for detectors that produce oriented
bounding boxes (first consumer: Mothbot YOLO11m-OBB in a follow-up
commit). Existing detectors leave it None. The downstream classifier
still crops axis-aligned; this field is preserved so a future species
classifier can use it for rotated crops without re-running detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Required for the Mothbot YOLO11m detector (follow-up commit). No
code imports it in this commit.

Note: ultralytics is AGPL-3.0. This is not a license escalation —
the project is already AGPL-3 (PR #137).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements MothObjectDetector_YOLO11m_Mothbot, a single-class
('creature') insect detector trained by Digital Naturalism
Laboratories. Weights hosted on Arbutus and lazily downloaded
via the existing InferenceBaseClass machinery.

Adds a YoloDetection dataclass and a _corners_to_yolo_detection
helper that converts OBB corners into an axis-aligned envelope +
rotation angle, with unit tests on the coordinate math.

The torch 2.6 weights_only fallback is adapted from
Mothbot_Process/pipeline/detect.py (unlicensed repo; pattern is
standard ultralytics PyTorch 2.6 compat handling).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wraps MothObjectDetector_YOLO11m_Mothbot for the /process endpoint:
consumes SourceImage objects, builds DetectionResponses with the new
rotation field populated from the YOLO-OBB angle. No pipeline uses
this detector yet — registration follows in the next commit.

Also fixes a bug in MothObjectDetector_YOLO11m_Mothbot.get_labels:
the base class __init__ called get_labels(None) and overwrote the
class-level category_map with {}; the override returns the class-level
map when no labels_path is set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pairs the Mothbot YOLO11m detector with the existing
InsectOrderClassifier2025 (ConvNeXt-T, 16 insect orders). Binary
prefilter is skipped — same policy as the existing
insect_orders_2025 pipeline, since the order classifier already
distinguishes non-moth insects.

Also tightens should_filter_detections() to use issubclass() so
subclasses of the exempt classifier set inherit the policy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sends one test image through the /process endpoint with the
mothbot_insect_orders_2025 slug, asserts detections are returned,
at least one has a populated rotation field, and each has an
order-level terminal classification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 15, 2026

Warning

Rate limit exceeded

@mihow has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 25 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 10 minutes and 25 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bdde3be7-93e2-47bf-955f-555c11cead37

📥 Commits

Reviewing files that changed from the base of the PR and between 0df47bf and 59ea781.

📒 Files selected for processing (10)
  • docs/claude/runbook-mothbot-local-testing.md
  • trapdata/antenna/client.py
  • trapdata/antenna/registration.py
  • trapdata/antenna/worker.py
  • trapdata/api/api.py
  • trapdata/api/models/classification.py
  • trapdata/api/tests/test_mothbot_pipeline.py
  • trapdata/cli/worker.py
  • trapdata/ml/models/localization.py
  • trapdata/ml/models/tests/test_mothbot_yolo.py
📝 Walkthrough

Walkthrough

This PR adds a new pipeline slug mothbot_insect_orders_2025 using a YOLO11m-OBB detector paired with the insect-order classifier, renames the pipeline registry from CLASSIFIER_CHOICES to PIPELINE_CHOICES, adds a detector_cls attribute to classifiers, extends DetectionResponse with optional rotation, and implements YOLO ML/API detector classes plus tests and dependency updates.

Changes

Cohort / File(s) Summary
Documentation
docs/superpowers/plans/2026-04-14-mothbot-detection-pipeline.md, docs/superpowers/specs/2026-04-14-mothbot-detection-pipeline-design.md
Added plan and design docs describing architecture, API/ML classes, schema change (rotation), dependency note (ultralytics>=8.3), and test plan.
Pipeline Registry Migration
trapdata/antenna/registration.py, trapdata/antenna/worker.py, trapdata/api/api.py, trapdata/cli/base.py, trapdata/cli/worker.py, trapdata/api/tests/utils.py
Replaced CLASSIFIER_CHOICES with PIPELINE_CHOICES; updated lookups, worker pipeline selection, CLI validation/defaults, registration logging, and test helpers to use the renamed registry.
API Models & Schemas
trapdata/api/models/classification.py, trapdata/api/models/localization.py, trapdata/api/schemas.py
Added detector_cls to APIMothClassifier; introduced MothbotInsectOrderClassifier bound to YOLO detector; added APIMothDetector_YOLO11m_Mothbot; extended DetectionResponse with optional `rotation: float
ML YOLO Implementation
trapdata/ml/models/localization.py
Implemented MothObjectDetector_YOLO11m_Mothbot, YoloDetection dataclass, Ultralytics loader with compatibility retry, PIL-list dataloader contract, predict_batch integration, and post_process_single converting OBB corners → axis-aligned envelope + rotation.
API Wiring & Behavior
trapdata/api/api.py, trapdata/antenna/worker.py
API and worker now resolve classifier via PIPELINE_CHOICES and instantiate detectors via classifier_class.detector_cls(...); detection-filtering logic adjusted to use subclass checks for skipping binary filter.
Tests
trapdata/api/tests/test_api.py, trapdata/api/tests/test_mothbot_pipeline.py, trapdata/ml/models/tests/test_mothbot_yolo.py
Updated tests to use PIPELINE_CHOICES; added YOLO post-process unit tests; added end-to-end integration test asserting detections include rotation and terminal insect-order classifications.
Dependencies & Metadata
pyproject.toml
Changed license text to AGPL-3.0; added runtime dependency ultralytics>=8.3.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant APIServer as API Server\n/process
    participant Detector as YOLO11m Detector
    participant Classifier as InsectOrder Classifier
    participant Response as PipelineResponse

    Client->>APIServer: POST /process (pipeline=mothbot_insect_orders_2025)
    APIServer->>Detector: Classifier.detector_cls.run(source images)
    Detector->>Detector: get_model() -> ultralytics.YOLO
    Detector->>Detector: predict_batch(imgs) -> ultralytics.predict(imgsz, conf, max_det)
    Detector->>Detector: post_process_single() -> YoloDetection (x1,y1,x2,y2,rotation,score)
    Detector-->>APIServer: DetectionResponse[] (rotation populated when available)
    APIServer->>Classifier: classify detections
    Classifier-->>APIServer: classifications (including terminal flags)
    APIServer-->>Response: assemble PipelineResponse
    Response-->>Client: 200 OK with detections + classifications
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐇 I hopped through code with whiskers keen,
A YOLO hum, a moth’s lean spin,
Rotations caught, orders told,
Pipelines joined, new stories bold—
Hooray, the mothbot’s in!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.51% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately describes the main change: introducing a new Mothbot YOLO11m detection pipeline. It is concise, clear, and directly relates to the changeset's primary objective.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-mothbot-pipeline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
trapdata/api/api.py (1)

143-163: ⚠️ Potential issue | 🟠 Major

/info generation still cold-loads pipeline models at startup.

make_pipeline_config_response() constructs both the detector and classifier just to read metadata. Because initialize_service_info() calls this for every entry in PIPELINE_CHOICES, adding the Mothbot pipeline now makes startup/readiness depend on downloading and loading its weights too. That turns metadata generation into a heavy, failure-prone cold-start path instead of keeping model loading lazy.

A safer pattern here is to build PipelineConfigResponse from class-level metadata (name, description, weights path, category map, detector pairing) without instantiating the models.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@trapdata/api/api.py` around lines 143 - 163, The pipeline config builder is
cold-instantiating models (via Classifier(), detector =
Classifier.detector_cls(), MothClassifierBinary()) which forces weights
download; change make_pipeline_config_response / initialize_service_info to
assemble PipelineConfigResponse from class-level metadata instead of
constructing instances: read attributes like Classifier.name / description /
weights_path / category_map and Classifier.detector_cls (use its class-level
metadata, not detector_cls(...)), and whether
should_filter_detections(Classifier) is true to add a binary entry using
MothClassifierBinary's class attributes (not instantiation); update
make_algorithm_config_response callers to accept class-metadata objects or build
AlgorithmConfigResponse directly from those class attributes so no model
constructors are called during /info generation.
🧹 Nitpick comments (1)
trapdata/api/models/classification.py (1)

40-43: Widen detector_cls's type annotation to APIInferenceBaseClass.

APIMothDetector_YOLO11m_Mothbot is not a subclass of APIMothDetector—they are siblings inheriting separately from APIInferenceBaseClass. The type annotation on line 43 (type[APIMothDetector]) does not match the override on line 248 (detector_cls = APIMothDetector_YOLO11m_Mothbot), creating a type contract mismatch. Change the annotation to type[APIInferenceBaseClass] to accurately reflect the supported detector types.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@trapdata/api/models/classification.py` around lines 40 - 43, Change the type
annotation of the detector_cls attribute from type[APIMothDetector] to
type[APIInferenceBaseClass] so subclasses like APIMothDetector_YOLO11m_Mothbot
(which inherit from APIInferenceBaseClass, not APIMothDetector) can override it
without violating the type contract; update the annotation on the detector_cls
declaration and ensure APIInferenceBaseClass is imported or accessible in
classification.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Line 34: The pyproject.toml license metadata is incorrect (declares MIT) and
must match the repository LICENSE (AGPL-3); update the license field in
pyproject.toml (change the value of the license key from "MIT" to "AGPL-3.0" or
"AGPL-3") so the package metadata reflects the actual AGPL-3 license.

In `@trapdata/antenna/worker.py`:
- Around line 428-429: The code currently indexes PIPELINE_CHOICES[pipeline]
which can raise KeyError for stale/invalid pipeline slugs; update the processing
path to guard resolution by checking membership first (e.g., if pipeline not in
PIPELINE_CHOICES) and handle unknown slugs by logging a warning and
returning/acknowledging/skipping the message instead of proceeding, otherwise
set classifier_class = PIPELINE_CHOICES[pipeline] and continue to call
should_filter_detections(classifier_class); ensure the skip path uses the same
failure/ack behavior as other non-retriable errors to avoid repeated failed
processing cycles.

In `@trapdata/ml/models/localization.py`:
- Around line 395-400: The rotation is computed from cv2.minAreaRect but pts are
being cast to int32, losing sub-pixel precision; remove the pts.astype(np.int32)
cast and pass the float32 pts directly to cv2.minAreaRect so the rect angle
preserves sub-pixel accuracy (update the call that creates rect from pts and
keep the rest returning YoloDetection(x1, y1, x2, y2, rotation=angle,
score=...)).

---

Outside diff comments:
In `@trapdata/api/api.py`:
- Around line 143-163: The pipeline config builder is cold-instantiating models
(via Classifier(), detector = Classifier.detector_cls(), MothClassifierBinary())
which forces weights download; change make_pipeline_config_response /
initialize_service_info to assemble PipelineConfigResponse from class-level
metadata instead of constructing instances: read attributes like Classifier.name
/ description / weights_path / category_map and Classifier.detector_cls (use its
class-level metadata, not detector_cls(...)), and whether
should_filter_detections(Classifier) is true to add a binary entry using
MothClassifierBinary's class attributes (not instantiation); update
make_algorithm_config_response callers to accept class-metadata objects or build
AlgorithmConfigResponse directly from those class attributes so no model
constructors are called during /info generation.

---

Nitpick comments:
In `@trapdata/api/models/classification.py`:
- Around line 40-43: Change the type annotation of the detector_cls attribute
from type[APIMothDetector] to type[APIInferenceBaseClass] so subclasses like
APIMothDetector_YOLO11m_Mothbot (which inherit from APIInferenceBaseClass, not
APIMothDetector) can override it without violating the type contract; update the
annotation on the detector_cls declaration and ensure APIInferenceBaseClass is
imported or accessible in classification.py.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 472a8862-393f-4f62-ba32-c2cdd6bff870

📥 Commits

Reviewing files that changed from the base of the PR and between a33746a and aba15b3.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • docs/superpowers/plans/2026-04-14-mothbot-detection-pipeline.md
  • docs/superpowers/specs/2026-04-14-mothbot-detection-pipeline-design.md
  • pyproject.toml
  • trapdata/antenna/registration.py
  • trapdata/antenna/worker.py
  • trapdata/api/api.py
  • trapdata/api/models/classification.py
  • trapdata/api/models/localization.py
  • trapdata/api/schemas.py
  • trapdata/api/tests/test_api.py
  • trapdata/api/tests/test_mothbot_pipeline.py
  • trapdata/api/tests/utils.py
  • trapdata/cli/base.py
  • trapdata/cli/worker.py
  • trapdata/ml/models/localization.py
  • trapdata/ml/models/tests/__init__.py
  • trapdata/ml/models/tests/test_mothbot_yolo.py

Comment thread pyproject.toml
Comment on lines +428 to 429
classifier_class = PIPELINE_CHOICES[pipeline]
use_binary_filter = should_filter_detections(classifier_class)
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard unknown pipeline slugs before classifier resolution.

Line 428 can throw KeyError for stale/invalid queue payloads, which turns into repeated failed processing cycles instead of a controlled skip path.

Suggested resilience fix
-    classifier_class = PIPELINE_CHOICES[pipeline]
+    classifier_class = PIPELINE_CHOICES.get(pipeline)
+    if classifier_class is None:
+        logger.error(f"Unknown pipeline slug received from queue: {pipeline}")
+        return False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
classifier_class = PIPELINE_CHOICES[pipeline]
use_binary_filter = should_filter_detections(classifier_class)
classifier_class = PIPELINE_CHOICES.get(pipeline)
if classifier_class is None:
logger.error(f"Unknown pipeline slug received from queue: {pipeline}")
return False
use_binary_filter = should_filter_detections(classifier_class)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@trapdata/antenna/worker.py` around lines 428 - 429, The code currently
indexes PIPELINE_CHOICES[pipeline] which can raise KeyError for stale/invalid
pipeline slugs; update the processing path to guard resolution by checking
membership first (e.g., if pipeline not in PIPELINE_CHOICES) and handle unknown
slugs by logging a warning and returning/acknowledging/skipping the message
instead of proceeding, otherwise set classifier_class =
PIPELINE_CHOICES[pipeline] and continue to call
should_filter_detections(classifier_class); ensure the skip path uses the same
failure/ack behavior as other non-retriable errors to avoid repeated failed
processing cycles.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says: Valid concern but deferring — this behavior is pre-existing (same code as pre-rename CLASSIFIER_CHOICES[pipeline]), not introduced by this PR. Unknown slugs in the queue indicate an out-of-sync registration/worker deployment, which today fails loudly with a KeyError + retry loop. Converting to a quiet skip is a behavior change worth its own PR with explicit thought about how to surface the misconfiguration (metrics? dead-letter?). Leaving for a follow-up.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mihow — Understood, that's a reasonable call. The pre-existing loud failure on an unknown slug does at least surface the misconfiguration immediately (stack trace in logs), whereas a quiet skip could silently drop queue messages without any signal. A dedicated follow-up that considers the full observability story (metrics, dead-letter, alerting) is the right place for this.

Would you like me to open a follow-up GitHub issue to track this, so it doesn't get lost?

Comment thread trapdata/ml/models/localization.py
- pyproject.toml: license metadata MIT -> AGPL-3.0 to match LICENSE (AGPLv3
  since PR #137). Metadata was stale, not specific to this PR, but directly
  relates to the licensing claims in this PR's description.
- trapdata/ml/models/localization.py: drop int32 cast before cv2.minAreaRect.
  cv2 accepts float points and the cast was dropping ~0.2 deg of rotation
  precision on shallow-angled boxes. Verified empirically.
- trapdata/api/models/classification.py: widen detector_cls annotation from
  type[APIMothDetector] to type[APIInferenceBaseClass]. APIMothDetector_YOLO11m_Mothbot
  is a sibling of APIMothDetector, not a subclass — common ancestor is
  APIInferenceBaseClass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented Apr 15, 2026

Claude says: Addressed 3 of the 4 CodeRabbit findings in 3665064. Summary:

  • 🔴 pyproject license (fixed) — MIT → AGPL-3.0 to match LICENSE.
  • 🟡 int32 cast before minAreaRect (fixed) — drops ~0.2° of rotation precision on shallow-angled boxes.
  • 🧹 detector_cls type annotation (fixed) — widened to type[APIInferenceBaseClass] since the YOLO detector is a sibling of APIMothDetector, not a subclass.
  • 🟡 KeyError on unknown pipeline slug (deferred) — pre-existing behavior, replied in thread.

Not addressed — 🟠 /info cold-loading models

Valid and real, but pre-existing — make_pipeline_config_response() has been instantiating every pipeline at startup long before this PR. I confirmed this empirically: curl /info on a fresh server loads weights for all 11 pipelines sequentially and took minutes even with cached weights. Adding the Mothbot pipeline makes it 11 instead of 10, not a qualitative change.

The proper fix — building PipelineConfigResponse from class-level metadata without instantiation — is a worthwhile follow-up but touches enough of the info pathway that it deserves its own PR rather than being bundled into the Mothbot work. Happy to open it next if you agree.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
trapdata/ml/models/localization.py (1)

466-466: Consider making category_map immutable or annotating as ClassVar.

Ruff flags this mutable dict as a class attribute (RUF012). While the code never mutates it, accidental modification would affect all instances. Two options:

  1. Annotate with ClassVar to document intent:
  2. Use an immutable mapping
♻️ Option 1: ClassVar annotation
+from typing import ClassVar
+
 class MothObjectDetector_YOLO11m_Mothbot(ObjectDetector):
     ...
-    category_map = {0: "creature"}
+    category_map: ClassVar[dict[int, str]] = {0: "creature"}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@trapdata/ml/models/localization.py` at line 466, The class-level mutable dict
category_map should be made immutable or explicitly annotated as a class
variable to avoid accidental instance-wide mutation: either annotate it as
ClassVar[Dict[int, str]] (add "from typing import ClassVar, Dict" and change the
declaration to "category_map: ClassVar[Dict[int, str]] = {0: 'creature'}") or
convert it to an immutable mapping (e.g., wrap with types.MappingProxyType by
importing MappingProxyType and doing "category_map = MappingProxyType({0:
'creature'})"), or mark it Final (from typing import Final) if you prefer;
update imports accordingly and ensure references use the new symbol name
category_map.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@trapdata/ml/models/localization.py`:
- Line 466: The class-level mutable dict category_map should be made immutable
or explicitly annotated as a class variable to avoid accidental instance-wide
mutation: either annotate it as ClassVar[Dict[int, str]] (add "from typing
import ClassVar, Dict" and change the declaration to "category_map:
ClassVar[Dict[int, str]] = {0: 'creature'}") or convert it to an immutable
mapping (e.g., wrap with types.MappingProxyType by importing MappingProxyType
and doing "category_map = MappingProxyType({0: 'creature'})"), or mark it Final
(from typing import Final) if you prefer; update imports accordingly and ensure
references use the new symbol name category_map.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cd952a92-0fae-49e5-9021-e0c1aab2f512

📥 Commits

Reviewing files that changed from the base of the PR and between aba15b3 and 3665064.

📒 Files selected for processing (3)
  • pyproject.toml
  • trapdata/api/models/classification.py
  • trapdata/ml/models/localization.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • pyproject.toml
  • trapdata/api/models/classification.py

mihow and others added 3 commits April 15, 2026 07:01
Antenna's pipeline registry keys on the pipeline name. Without this,
`mothbot_insect_orders_2025` and `insect_orders_2025` both registered as
"Insect Order Classifier" and the second POST silently dedupes against
the existing record, so the mothbot pipeline never appears in the
Antenna UI.

Override name on MothbotInsectOrderClassifier so Antenna treats it as a
separate pipeline. The terminal classifier's algorithm key changes to
`insect_order_classifier_mothbot_yolo_detector`; updated the integration
test to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… APIMothDetector

The original detector_cls refactor in #141 updated the API path (api.py)
but missed the antenna worker path. When a pipeline declares a non-default
detector (e.g. mothbot_insect_orders_2025 with YOLO11m), the worker was
still running FasterRCNN detection, then posting results with the FasterRCNN
algorithm key, and Antenna rejected them because the /info endpoint declared
only the YOLO and order-classifier algorithms. NATS re-delivered the message
in a loop.

Discovered when testing job 1492 on the local Antenna at beast:
  Error processing pipeline result for job 1492: Detection algorithm
  fasterrcnn_for_ami_moth_traps_2023 is not a known algorithm. Known:
  ['insect_order_classifier_mothbot_yolo_detector',
  'mothbot_yolo11m_creature_detector'].

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Mothbot YOLO detector's predict_batch strictly required list[PIL.Image],
which works for the API/ml-layer path but fails for the antenna worker
path. The antenna RESTDataset applies torchvision.transforms.ToTensor()
to images before they reach the detector, producing tensor batches that
the original check rejected with:

  Mothbot YOLO11m Creature Detector expects a list of PIL images from
  the collate fn; got <class 'torch.Tensor'>

FasterRCNN's detection path already accepts both, which is why this only
surfaced when wiring a YOLO-based pipeline into the worker.

Accept three input forms now:
- list[PIL.Image] (ML-layer dataloader, unchanged)
- torch.Tensor (B, C, H, W) from REST stacked batches
- list[torch.Tensor] (C, H, W) from REST mixed-size fallback

For tensor inputs, convert back to HWC uint8 numpy so ultralytics does
its own letterboxing / normalization at imgsz=1600, matching the PIL
path's behavior.

Discovered running job 1493 through the beast Antenna worker.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@trapdata/ml/models/localization.py`:
- Around line 575-583: The override of save_results in
MothObjectDetector_YOLO11m_Mothbot is turning the ML-layer save path into a
silent no-op and dropping detections; instead, convert the YOLO-specific
batch_output (YoloDetection entries) into the base bbox payload and call the
base persistence logic (e.g., call super().save_results or reuse
ObjectDetector.save_results) so non-API code still persists results like
FasterRCNN; locate save_results in MothObjectDetector_YOLO11m_Mothbot, map each
YoloDetection -> bbox format expected by ObjectDetector (preserving score,
class, and coords), and then invoke the base save routine rather than logging a
no-op.
- Around line 420-449: The fallback patch that monkey-patches _torch.load and
toggles TORCH_FORCE_WEIGHTS_ONLY_LOAD / TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD must be
serialized to avoid races: add a module-level threading.Lock (e.g.,
_torch_load_patch_lock) and acquire it before saving
original_load/original_force_* and before setting _torch.load, os.environ,
constructing YOLO(str(weights_path)); release the lock in the finally block
after restoring _torch.load and env vars; also import threading at top if not
present. This ensures the _patched_load, _torch.load replacement, and
environment variable changes around YOLO(...) are atomic across threads.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 88b1a7ad-3b51-4184-aaa3-a1804f51d258

📥 Commits

Reviewing files that changed from the base of the PR and between 3665064 and 0df47bf.

📒 Files selected for processing (4)
  • trapdata/antenna/worker.py
  • trapdata/api/models/classification.py
  • trapdata/api/tests/test_mothbot_pipeline.py
  • trapdata/ml/models/localization.py
✅ Files skipped from review due to trivial changes (1)
  • trapdata/api/tests/test_mothbot_pipeline.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • trapdata/antenna/worker.py
  • trapdata/api/models/classification.py

Comment on lines +420 to +449
except Exception as err:
if "Weights only load failed" not in str(err):
raise

logger.info(
"Retrying YOLO load with torch.load(weights_only=False) compatibility "
"(trusted local checkpoint)"
)
original_load = _torch.load
original_force_wo = os.environ.get("TORCH_FORCE_WEIGHTS_ONLY_LOAD")
original_force_no_wo = os.environ.get("TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD")

def _patched_load(*args, **kwargs):
kwargs["weights_only"] = False
return original_load(*args, **kwargs)

_torch.load = _patched_load
try:
os.environ["TORCH_FORCE_WEIGHTS_ONLY_LOAD"] = "0"
os.environ["TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD"] = "1"
return YOLO(str(weights_path))
finally:
_torch.load = original_load
if original_force_wo is None:
os.environ.pop("TORCH_FORCE_WEIGHTS_ONLY_LOAD", None)
else:
os.environ["TORCH_FORCE_WEIGHTS_ONLY_LOAD"] = original_force_wo
if original_force_no_wo is None:
os.environ.pop("TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD", None)
else:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For Ultralytics 8.3 with PyTorch 2.5/2.6, is there a supported way to load local YOLO checkpoints without temporarily monkey-patching torch.loador changingTORCH_FORCE_WEIGHTS_ONLY_LOAD/TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD process-wide?

💡 Result:

No, there is no supported way to load local YOLO checkpoints in Ultralytics 8.3 with PyTorch 2.5/2.6 without monkey-patching or environment variables. Ultralytics addresses the PyTorch weights_only=True default (introduced in 2.6) by internally patching torch.load calls with a torch_load wrapper that sets weights_only=False only where needed (PR #21260, merged in v8.3.162). This is a localized patch, not a global monkey-patch of torch.load or process-wide env var change. Standard usage like model = YOLO('path/to/local.pt') works seamlessly without user intervention, as confirmed in source code importing from ultralytics.utils.patches import torch_load and model loading via load_checkpoint which uses the patched loader. Issues mentioning errors were due to outdated Ultralytics versions; upgrading resolves them without user-side patches.

Citations:


🏁 Script executed:

# Find Ultralytics version requirement
find . -type f \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" \) | head -20

Repository: RolnickLab/ami-data-companion

Length of output: 89


🏁 Script executed:

# Look at the full _load_ultralytics_yolo function
rg -n "_load_ultralytics_yolo" -A 100 trapdata/ml/models/localization.py | head -150

Repository: RolnickLab/ami-data-companion

Length of output: 6759


🏁 Script executed:

# Check for synchronization mechanisms (locks, etc) in the file
rg -n "Lock|lock|mutex|RLock|threading" trapdata/ml/models/localization.py

Repository: RolnickLab/ami-data-companion

Length of output: 231


🏁 Script executed:

# Check Ultralytics version in pyproject.toml
cat pyproject.toml | grep -i ultralytics

Repository: RolnickLab/ami-data-companion

Length of output: 96


Add synchronization to the torch.load fallback path for concurrent cold loads.

This fallback only triggers on "Weights only load failed" errors. If two models fail to load and simultaneously enter this path, both threads will unprotected: (1) replace _torch.load globally, (2) modify process env vars TORCH_FORCE_WEIGHTS_ONLY_LOAD / TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD, and (3) restore them in the finally block. A concurrent load observing partial restoration can see inconsistent state, making startup nondeterministic.

Protect this block with a module-level lock, or consider upgrading Ultralytics to 8.3.162+, which includes a built-in localized torch_load patch (PR #21260) that avoids user-side monkey-patching entirely. Standard YOLO(weights_path) should then work without fallback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@trapdata/ml/models/localization.py` around lines 420 - 449, The fallback
patch that monkey-patches _torch.load and toggles TORCH_FORCE_WEIGHTS_ONLY_LOAD
/ TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD must be serialized to avoid races: add a
module-level threading.Lock (e.g., _torch_load_patch_lock) and acquire it before
saving original_load/original_force_* and before setting _torch.load,
os.environ, constructing YOLO(str(weights_path)); release the lock in the
finally block after restoring _torch.load and env vars; also import threading at
top if not present. This ensures the _patched_load, _torch.load replacement, and
environment variable changes around YOLO(...) are atomic across threads.

Comment on lines +575 to +583
def save_results(self, item_ids, batch_output, *args, **kwargs):
"""The ML-layer base class expects a save method. The API wrapper
overrides this, so the DB path is never hit when used via the API.
Provide a no-op that logs, for symmetry with the FasterRCNN class.
"""
logger.info(
f"{self.name} ML-layer save_results called with {len(item_ids)} items "
"(no-op; API wrapper handles persistence)"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't turn the ML-layer save path into a silent no-op.

ObjectDetector.save_results() persists detections, but this override drops everything on the floor. The API wrapper already has its own save_results(), so this only hurts non-API uses of MothObjectDetector_YOLO11m_Mothbot, where detections will now disappear silently instead of being written like the FasterRCNN detectors. Please either map YoloDetection into the base bbox payload here or keep the no-op confined to the API-only wrapper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@trapdata/ml/models/localization.py` around lines 575 - 583, The override of
save_results in MothObjectDetector_YOLO11m_Mothbot is turning the ML-layer save
path into a silent no-op and dropping detections; instead, convert the
YOLO-specific batch_output (YoloDetection entries) into the base bbox payload
and call the base persistence logic (e.g., call super().save_results or reuse
ObjectDetector.save_results) so non-API code still persists results like
FasterRCNN; locate save_results in MothObjectDetector_YOLO11m_Mothbot, map each
YoloDetection -> bbox format expected by ObjectDetector (preserving score,
class, and coords), and then invoke the base save routine rather than logging a
no-op.

mihow and others added 7 commits April 15, 2026 17:25
YOLO-OBB can produce detections where all 4 corners share a y coordinate
(e.g. at the image edge), yielding an axis-aligned envelope with H=0.
These propagate to ClassificationImageDataset.crop() and crash
torchvision's Resize with "Input and output sizes should be greater than 0".

Filter them out in post_process_single() with a warning log, so the
mothbot_insect_orders_2025 pipeline handles edge-of-frame detections
gracefully rather than erroring the entire job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
YOLO-OBB emits corners outside the image for detections near edges
(observed live: y1=-274.39 on a 2464-tall Panama diopsis image). The
min/max envelope then carries negative coords downstream, where the
antenna worker's int() cast preserves the sign and PyTorch's tensor
slicing treats it as end-relative indexing -- yielding an empty crop
and an H=0 RuntimeError from the classifier's Resize.

Fix at the source by clamping the envelope to [0, width] x [0, height]
using result.orig_shape in post_process_single. Adds an optional
image_shape kwarg on _corners_to_yolo_detection so the clamp is opt-in
(existing callers that don't know the image dimensions stay backward-
compatible).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Defence-in-depth for the YOLO-OBB out-of-bounds issue now fixed in the
detector. If any other detector ever returns negative coords or coords
past the image bounds, the previous guard (y1 >= y2 or x1 >= x2) would
not catch it -- PyTorch's tensor[:, y1:y2, x1:x2] silently produces an
empty crop when y1 is negative (treated as end-relative) or when y1 is
past the tensor height (clamped to height, making the slice empty).

Clamp to [0, H] x [0, W] before the bounds check so the guard works on
the effective slice range. The bounds-check warning now also logs the
raw float coords, which makes this class of bug obvious next time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lytics

YOLO11m-Mothbot was trained on cv2.imread-loaded images (BGR). When we
feed numpy arrays directly, ultralytics' LoadPilAndNumpy._single_check
explicitly does NOT reorder channels (it only converts PIL RGB to BGR,
not numpy). Our antenna path was going PIL -> ToTensor (RGB) -> permute
-> numpy (still RGB), which meant YOLO was seeing swapped R and B
channels. The detector still produced output, but at much lower quality:
median detection width was 1227px on a 3280px image -- i.e. roughly
full-image boxes. After flipping to BGR, median detection width drops
to ~245px, matching expected moth sizes.

Extracted the RGB->BGR conversion into _tensor_to_bgr_numpy and covered
it with a unit test so the channel order won't silently regress.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…data

Adds a second pipeline that pairs the Mothbot YOLO11m-OBB detector with
the existing 2023 Panama moth species classifier. The full chain is
Mothbot YOLO -> binary moth/non-moth prefilter -> Panama 2023 species
classifier. Verified end-to-end against the local Antenna instance
(1015 detections / 1665 classifications across 55 images).

Pipeline-level cleanup:
- Introduce APIMothClassifier.pipeline_description, used by
  make_pipeline_config_response as the pipeline's description when
  set. Falls back to the classifier's own description otherwise, so
  every existing pipeline keeps its current /info payload.
- Rename MothbotInsectOrderClassifier from
  "Insect Order Classifier (Mothbot YOLO detector)" to
  "Mothbot YOLO + Insect Orders 2025". The parenthetical read as if
  the classifier was the Mothbot-specific component, but actually
  only the detector changes; the new name makes the combo explicit.
  Also fills in pipeline_description so the /info combo reads sensibly.
- The new MothbotMothClassifierPanama follows the same naming +
  pipeline_description pattern.

Also refreshes the integration test to cover both pipelines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers the full iteration loop: worker startup, pipeline registration
via /info sync, triggering test_ml_job_e2e, and detection quality
verification. Documents the Antenna create_pipelines stale-name gotcha.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…run, and ami worker run alias

- `ami worker register --pipeline <slug>` now only advertises the
  specified pipelines instead of all. Filters the pipeline_configs
  list before POSTing to Antenna.
- `ami worker [run] --project <id>` limits which project jobs the
  worker pulls via `project__id__in` query param on the /jobs endpoint.
- `ami worker run` is now an explicit subcommand alias for `ami worker`.
- Registration logging now reports processing service ID, created vs
  updated pipeline counts, and specific slugs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant