feat: add additional prediction and identification fields to CSV export#1214
feat: add additional prediction and identification fields to CSV export#1214
Conversation
…elds Separate machine predictions from human identifications in exports and API, so researchers see both side-by-side. Previously the determination was overwritten when a human verified, losing the original ML prediction. Model layer: - Extract find_best_prediction() and find_best_identification() from update_occurrence_determination() for reuse by exports and API - Set determination_score to None for human-determined occurrences (ML confidence preserved in best_machine_prediction_score) New CSV export fields: - best_machine_prediction_name, _algorithm, _score - verified_by, verified_by_count, agreed_with_algorithm - determination_matches_machine_prediction - best_detection_bbox, best_detection_source_image_url, best_detection_occurrence_url API changes: - Add best_machine_prediction nested object to OccurrenceListSerializer (always populated regardless of verification status) Closes #1213 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for antenna-ssec canceled.
|
✅ Deploy Preview for antenna-preview canceled.
|
📝 WalkthroughWalkthroughThe PR extends the occurrence export and API to separately expose machine predictions, verification metadata, and detection details alongside human determinations. It adds new fields to the tabular serializer and API response, implements queryset annotations for efficient data retrieval, and updates verification-status computation logic to prioritize explicit participant counts. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR updates occurrence exports and the API to preserve and expose both machine predictions and human verifications side-by-side, preventing ML predictions from being lost when a human identification is added.
Changes:
- Refactors occurrence logic to expose reusable “best prediction” and “best identification” selection methods and adjusts determination score semantics for human IDs.
- Extends CSV exports with new machine prediction, verification, and detection-related fields backed by queryset annotations.
- Adds a
best_machine_predictionnested object to the occurrences list API serializer.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
ami/main/models.py |
Adds queryset annotations for best detection/prediction/verification and refactors best prediction/identification selection + determination score update logic. |
ami/main/api/serializers.py |
Adds best_machine_prediction field to occurrence list API responses. |
ami/exports/format_types.py |
Adds new CSV export fields and wires exporter queryset to include new annotations. |
ami/exports/tests.py |
Adds tests covering the new CSV export fields across ML-only and human verification scenarios. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ami/exports/format_types.py`:
- Around line 192-204: Use the annotated winning detection already attached to
the exported object instead of recomputing it: in
get_best_detection_source_image_url and get_best_detection_occurrence_url, read
the annotated detection from the obj (e.g. getattr(obj, "best_detection", None)
or the specific annotated attributes like
best_detection_source_image_path/public_base_url on that detection) and build
both the source image URL and the occurrence/context URL from that detection's
stored fields (path, public_base_url or signed_url, and occurrence/context link)
rather than calling obj.context_url() which re-queries and may use a different
ordering; this ensures URLs correspond to the exported bbox/path and avoids N+1
queries.
In `@ami/main/api/serializers.py`:
- Around line 1396-1421: get_best_machine_prediction currently calls
obj.find_best_prediction() and directly accesses prediction.taxon and
prediction.algorithm, causing an N+1; change it to reuse obj.best_prediction
(use that if present, fall back to obj.find_best_prediction() only if needed)
and avoid dereferencing relations that aren't prefetched—either use the
already-attached taxon/algorithm on the prediction object or ensure these are
provided via queryset annotations/prefetch_related and passed through context to
TaxonNestedSerializer/AlgorithmNestedSerializer to prevent per-row queries;
update get_best_machine_prediction to check obj.best_prediction first and only
call find_best_prediction as a fallback, and document that views/queries should
prefetch prediction__taxon and prediction__algorithm.
In `@ami/main/models.py`:
- Around line 3210-3218: The code only sets new_score when new_determination
changes, so when authority flips but the taxon stays the same we never recompute
determination_score; update the logic around
occurrence.find_best_identification() and occurrence.find_best_prediction()
(references: top_identification, top_prediction, new_determination, new_score,
determination_score) so that whenever the authority changes you still set
new_score appropriately (None for a human top_identification,
top_prediction.score for an ML top_prediction) and assign determination_score on
the occurrence even if the taxon equals current_determination; apply the same
fix in the analogous block handling lines ~3225-3228.
- Around line 2864-2867: The aggregate verified_by_count is over-counting
because Count("identifications") is performed across detection rows; change the
Count to be distinct so each identification is only counted once (e.g. use
Count("identifications", distinct=True,
filter=Q(identifications__withdrawn=False))). Update the verified_by_count
definition (the Count call) to include distinct=True while keeping the existing
filter on identifications.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ca26e0c4-7c2a-409c-8ae0-af43172af29c
📒 Files selected for processing (4)
ami/exports/format_types.pyami/exports/tests.pyami/main/api/serializers.pyami/main/models.py
- Add distinct=True to verified_by_count to prevent join multiplication - Fix update_occurrence_determination to recompute score even when taxon stays the same (handles authority flip without taxon change) - Remove email fallback in verified_by (PII concern, name-only) - Filter withdrawn identifications in verification_status fallback - Use obj.best_prediction cached property instead of find_best_prediction() in API serializer to avoid N+1 queries - Build occurrence URL from annotated fields instead of context_url() to avoid N+1 queries in export - Use source_image.timestamp instead of datetime.now() in tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
From reading the PR description, this looks great Michael! Some comments and questions:
|
|
Very astute observations @annavik !
That makes sense! Although I am leaning away from including this count at all, since we haven't talked about verification at different ranks or disagreements. I realized what I want here is just "participant_count" to highlight if more than one user son is in the identification history. In general, I want to stick with the assumption that the platform is for experts to ID and the last expert gets the word -- like a museum collection, rather than a community ID platform.
Great question! Probably yes! This would require additional changes (I am just exposing this value in this PR, not calculating it). Maybe an easy change though.
I agree it's too soon! Will remove.
I like this one since it's based on the taxon ID rather than string matching, and I think it will be a common use case. |
…urrence URL Per @annavik and @mihow's review: - Rename `verified_by_count` to `participant_count` (counts distinct users, not total IDs) - Remove `best_detection_occurrence_url` (URLs not stable enough to distribute in exports yet) - Clean up unused annotations (event_id, source_image_id) from with_best_detection() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
ami/exports/format_types.py (1)
112-145:⚠️ Potential issue | 🟠 MajorDon’t bake the preview frontend domain into exported CSVs.
best_detection_occurrence_urlis hardcoded to the preview app, so exported files become wrong as soon as the frontend base URL differs by environment or changes later. That makes the new column non-portable for exactly the reason raised in the PR discussion.Either drop this column or build it from a configured frontend base URL instead of a literal hostname.
Also applies to: 199-209
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/exports/format_types.py` around lines 112 - 145, The serializer currently hardcodes the preview frontend domain for the best_detection_occurrence_url column (the SerializerMethodField best_detection_occurrence_url and its getter), making exported CSVs non-portable; either remove best_detection_occurrence_url from the serializer fields, or change the getter (e.g., get_best_detection_occurrence_url) to construct the link from a configurable frontend base URL (read from settings or an env var like FRONTEND_BASE_URL) and fall back to None if not set; update the Meta.fields list and the corresponding method(s) (also referenced in the same file around the other similar fields at the later occurrence) so no literal hostname is embedded.ami/main/models.py (1)
3215-3242:⚠️ Potential issue | 🟠 MajorCompare determination IDs and allow clearing stale determinations.
The fallback query at Lines 3215-3220 returns a raw
determinationid, whilenew_determinationis aTaxoninstance. Combined with theif new_determination ...guard, unchanged rows look changed, and rows with no remaining identification/prediction never clear an old saved determination.Suggested fix
- current_determination = ( - current_determination - or Occurrence.objects.select_related("determination") - .values("determination") - .get(pk=occurrence.pk)["determination"] - ) + current_determination_id = getattr(current_determination, "pk", current_determination) + if current_determination_id is None: + current_determination_id = ( + Occurrence.objects.values_list("determination_id", flat=True).get(pk=occurrence.pk) + ) new_determination = None new_score = None @@ - if new_determination and new_determination != current_determination: + new_determination_id = new_determination.pk if new_determination else None + if new_determination_id != current_determination_id: logger.debug(f"Changing det. of {occurrence} from {current_determination} to {new_determination}") occurrence.determination = new_determination needs_update = True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/main/models.py` around lines 3215 - 3242, The bug is that current_determination is being read as a raw determination id while new_determination is a Taxon instance, causing false positives and preventing clearing of stale determinations; fix by normalizing types before comparison: either load the actual Taxon object for current_determination (e.g. use Occurrence.objects.select_related("determination").get(pk=occurrence.pk).determination) or compare IDs consistently (e.g. get current_determination_id via .values_list("determination", flat=True) and compare to new_determination.pk), and ensure when no identification/prediction remains you set occurrence.determination = None (and update determination_score) so stale determinations are cleared; update the comparisons around current_determination, new_determination, and occurrence.determination_score accordingly (references: current_determination, new_determination, occurrence.find_best_identification, occurrence.find_best_prediction, determination_score).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ami/exports/tests.py`:
- Line 376: The unpacking at the call to _create_occurrence_with_prediction
currently assigns unused variables occurrence and classification (e.g.,
"occurrence, classification = self._create_occurrence_with_prediction()") which
triggers RUF059; change the bindings to use the underscore discard pattern
(e.g., "_" or "_classification") for both returned values at each call site
(including the other similar call around line 417) or assign only the needed
value via a single-variable assignment from the function result, updating
references accordingly so unused names are removed.
In `@ami/main/api/serializers.py`:
- Around line 1396-1421: The function get_best_machine_prediction currently
returns None when obj.best_prediction is missing but the API contract requires
the nested object to always be present; change the early-return to return a dict
with the expected keys populated with nullable defaults instead. Specifically,
inside get_best_machine_prediction (and using obj.best_prediction,
TaxonNestedSerializer, AlgorithmNestedSerializer as anchors), always return a
dict with keys taxon, algorithm, score, determination_matches_machine_prediction
where taxon and algorithm are None if no prediction, score is None, and
determination_matches_machine_prediction is None (or False if you prefer
boolean) when no prediction exists; keep the existing logic that fills these
fields when prediction is present.
In `@ami/main/models.py`:
- Around line 3078-3105: find_best_prediction currently calls self.predictions()
which applies per-algorithm max-score filtering and causes a different result
than OccurrenceQuerySet.with_best_machine_prediction(); change
find_best_prediction to mirror with_best_machine_prediction by querying
Classification objects for this occurrence directly (not via predictions()),
ordering by "-terminal", "-score" across all classifications, and returning the
first result so the API/CSV and this method pick the same best machine
prediction; update any references to find_best_prediction/best_prediction if
needed to rely on the new query behavior.
---
Outside diff comments:
In `@ami/exports/format_types.py`:
- Around line 112-145: The serializer currently hardcodes the preview frontend
domain for the best_detection_occurrence_url column (the SerializerMethodField
best_detection_occurrence_url and its getter), making exported CSVs
non-portable; either remove best_detection_occurrence_url from the serializer
fields, or change the getter (e.g., get_best_detection_occurrence_url) to
construct the link from a configurable frontend base URL (read from settings or
an env var like FRONTEND_BASE_URL) and fall back to None if not set; update the
Meta.fields list and the corresponding method(s) (also referenced in the same
file around the other similar fields at the later occurrence) so no literal
hostname is embedded.
In `@ami/main/models.py`:
- Around line 3215-3242: The bug is that current_determination is being read as
a raw determination id while new_determination is a Taxon instance, causing
false positives and preventing clearing of stale determinations; fix by
normalizing types before comparison: either load the actual Taxon object for
current_determination (e.g. use
Occurrence.objects.select_related("determination").get(pk=occurrence.pk).determination)
or compare IDs consistently (e.g. get current_determination_id via
.values_list("determination", flat=True) and compare to new_determination.pk),
and ensure when no identification/prediction remains you set
occurrence.determination = None (and update determination_score) so stale
determinations are cleared; update the comparisons around current_determination,
new_determination, and occurrence.determination_score accordingly (references:
current_determination, new_determination, occurrence.find_best_identification,
occurrence.find_best_prediction, determination_score).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3b1730fd-0066-43a0-8e14-4c23b916b341
📒 Files selected for processing (4)
ami/exports/format_types.pyami/exports/tests.pyami/main/api/serializers.pyami/main/models.py
|
Claude says: Thanks for the great feedback @annavik! Addressed in 66d9390:
|
Exposes the email of the prior human identifier when the best identification was explicitly an "Agree" with another user. Lets consumers trace agreement chains without the backend having to resolve transitivity. Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude says: Added While implementing this I noticed a pre-existing UI bug where agreeing with a human identification from the occurrence details card sends the wrong FK type. Filed separately as #1226 — it means |
- Add `backfill_determination_score` management command to null-out determination_score on legacy human-determined occurrences. Supports --dry-run and --project filtering. - Tighten the legacy score-backfill block in Occurrence.save() to skip human-determined occurrences (avoids a redundant query and suppresses the misleading "Could not determine score" warning). - Fix type mismatch in update_occurrence_determination(): when the caller didn't pass current_determination, the fallback fetched the FK id (int) and compared it against a Taxon instance, which was always unequal and caused unnecessary save() calls and misleading debug logs. Normalize current_determination to an id for the comparison and accept int|Taxon|None. Tests: BackfillDeterminationScoreTest covers the ML-only, human-determined, and withdrawn-identification cases. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
ami/exports/tests.py (1)
376-376:⚠️ Potential issue | 🟡 MinorFix unused unpacked variable to avoid RUF059.
At Line [376] and Line [417],
classificationis assigned but never used.Suggested fix
- occurrence, classification = self._create_occurrence_with_prediction() + occurrence, _classification = self._create_occurrence_with_prediction() ... - occurrence, classification = self._create_occurrence_with_prediction(taxon=self.taxon_a) + occurrence, _classification = self._create_occurrence_with_prediction(taxon=self.taxon_a)Also applies to: 417-417
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/exports/tests.py` at line 376, The test assigns an unused second value from self._create_occurrence_with_prediction() to classification causing RUF059; update both call sites where you do "occurrence, classification = self._create_occurrence_with_prediction()" (and the duplicate) to ignore the unused value by renaming it to "_" or "_classification" (e.g., "occurrence, _ = self._create_occurrence_with_prediction()") so the returned but unused classification is not flagged.ami/main/models.py (1)
3067-3075:⚠️ Potential issue | 🟠 MajorAlign
find_best_prediction()with the export/API query.At Line 3075 this still starts from
self.predictions(), which pre-filters to per-algorithm max scores.OccurrenceQuerySet.with_best_machine_prediction()in the same file orders across all classifications, so the model method can disagree with the exportedbest_machine_prediction_*fields for the same occurrence.Suggested fix
def find_best_prediction(self) -> "Classification | None": """ Find the best machine prediction for this occurrence. @@ Uses the highest scoring classification (from any algorithm) as the best prediction. Considers terminal classifications first, then non-terminal ones. (Terminal classifications are the final classifications of a pipeline, non-terminal are intermediate models.) """ - return self.predictions().order_by("-terminal", "-score").first() + return Classification.objects.filter(detection__occurrence=self).order_by("-terminal", "-score").first()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/main/models.py` around lines 3067 - 3075, The method find_best_prediction() currently calls self.predictions(), which applies the per-algorithm max-score prefilter and can disagree with OccurrenceQuerySet.with_best_machine_prediction() that considers all classifications; update find_best_prediction() to query all Classification rows for this occurrence (e.g., via the Classification model filter by occurrence=self or the raw reverse relation like self.classifications.all()) and then order by "-terminal", "-score" to pick the single best prediction so it matches the logic used in with_best_machine_prediction().
🧹 Nitpick comments (1)
ami/exports/tests.py (1)
465-465: Update docstring to match renamed field semantics.The text still mentions
verified_by_count; tests now validateparticipant_count(distinct participants), so this wording is stale.Suggested fix
- """Multiple identifications: verified_by_count reflects all non-withdrawn IDs.""" + """Multiple identifications: participant_count reflects distinct non-withdrawn users."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/exports/tests.py` at line 465, Update the test docstring to reflect the renamed field: replace references to "verified_by_count" with "participant_count" and adjust the description to state that participant_count reflects the number of distinct non-withdrawn participants (or similar wording used elsewhere in the codebase). Locate the docstring in ami/exports/tests.py (the triple-quoted string currently reading "Multiple identifications: verified_by_count reflects all non-withdrawn IDs.") and change it to mention participant_count and distinct participants so the docstring matches the assertions in the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ami/exports/format_types.py`:
- Around line 163-165: The method get_agreed_with_user currently returns the
collaborator's email (agreed_with_user_email) which leaks PII; change it to
return a non-PII identifier or display name instead (e.g., agreed_with_user_id
or agreed_with_user_display_name) by reading the corresponding attribute from
obj and falling back to None if absent—update get_agreed_with_user to use that
non-PII field rather than agreed_with_user_email and ensure callers/CSV exporter
expect the new identifier.
In `@ami/main/models.py`:
- Around line 3224-3229: The current guard only sets occurrence.determination
when new_determination is truthy, leaving a stale determination when everything
authoritative is removed; update the logic around
new_determination/current_determination_id so that if new_determination is None
but current_determination_id is set you explicitly clear
occurrence.determination (set to None), mark needs_update = True, and log the
change (similar to the existing logger.debug), otherwise keep the existing
assignment when new_determination differs; touch the block that references
new_determination, occurrence.determination, current_determination_id, and
needs_update to implement this behavior.
---
Duplicate comments:
In `@ami/exports/tests.py`:
- Line 376: The test assigns an unused second value from
self._create_occurrence_with_prediction() to classification causing RUF059;
update both call sites where you do "occurrence, classification =
self._create_occurrence_with_prediction()" (and the duplicate) to ignore the
unused value by renaming it to "_" or "_classification" (e.g., "occurrence, _ =
self._create_occurrence_with_prediction()") so the returned but unused
classification is not flagged.
In `@ami/main/models.py`:
- Around line 3067-3075: The method find_best_prediction() currently calls
self.predictions(), which applies the per-algorithm max-score prefilter and can
disagree with OccurrenceQuerySet.with_best_machine_prediction() that considers
all classifications; update find_best_prediction() to query all Classification
rows for this occurrence (e.g., via the Classification model filter by
occurrence=self or the raw reverse relation like self.classifications.all()) and
then order by "-terminal", "-score" to pick the single best prediction so it
matches the logic used in with_best_machine_prediction().
---
Nitpick comments:
In `@ami/exports/tests.py`:
- Line 465: Update the test docstring to reflect the renamed field: replace
references to "verified_by_count" with "participant_count" and adjust the
description to state that participant_count reflects the number of distinct
non-withdrawn participants (or similar wording used elsewhere in the codebase).
Locate the docstring in ami/exports/tests.py (the triple-quoted string currently
reading "Multiple identifications: verified_by_count reflects all non-withdrawn
IDs.") and change it to mention participant_count and distinct participants so
the docstring matches the assertions in the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a6709b2e-0997-453a-bfba-a309722205ef
📒 Files selected for processing (5)
ami/exports/format_types.pyami/exports/tests.pyami/main/management/commands/backfill_determination_score.pyami/main/models.pyami/main/tests.py
…otation
The API's cached `best_prediction` and the CSV export's annotated
`best_machine_prediction_*` fields could pick different rows for the same
occurrence:
- `find_best_prediction()` went through `self.predictions()`, which filters to
the max-score row per algorithm **before** applying the `-terminal, -score`
ordering. A high-score non-terminal could survive dedup and then lose to a
lower-score terminal on ordering.
- `with_best_machine_prediction()` orders directly over all classifications, so
it picks the terminal winner regardless of intra-algorithm dedup.
Align `find_best_prediction()` to query classifications directly with the same
`-terminal, -score, -pk` ordering. Add `-pk` to both so ties break
deterministically.
Also:
- Serializer `get_best_machine_prediction()` now returns a stable shape
({taxon: null, algorithm: null, score: null, ...}) when no prediction exists,
instead of null — clients read nullable fields instead of branching on type.
- `update_occurrence_determination()` now clears `occurrence.determination` when
the last non-withdrawn identification is deleted and no prediction remains,
instead of leaving a stale taxon.
Tests: new `test_api_and_csv_pick_same_best_prediction_with_mixed_terminal`
covers the divergence case.
Co-Authored-By: Claude <noreply@anthropic.com>
- Extract BEST_MACHINE_PREDICTION_ORDER as a shared constant used by both Occurrence.find_best_prediction() (row-at-a-time) and OccurrenceQuerySet.with_best_machine_prediction() (bulk annotation). Both methods now reference the constant and each other's docstrings, so future edits to the ordering touch one place and can't silently diverge. - Revert get_best_machine_prediction() to return None when no prediction exists (it was briefly returning a dict of nulls). Annotate the method's return type as `dict | None` so drf-spectacular emits nullability. - Add UpdateOccurrenceDeterminationTest covering the stale-determination clearing fix: withdrawing the only identification on an occurrence with no predictions now clears occurrence.determination (previously left a stale taxon in place). Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
When I verify an occurrence, the determination score goes to None as we designed in this PR, but! That makes the occurrence disappear since our default filters filter on a minimum determination score. And it sorts them last (if you disable the default filtering). Some options:
- Use the best machine prediction score as the determination score - I think this it the behavior we were expecting. It doesn't feel quite right from a data integrity perspective, but it is nice and simple.
- Update all queries and API endpoints and display sorting, etc. Default filters should allow determination scores of None or ignore determination score if the occurrence is identified. Use the best_machine_score instead of the determination score where possible & appropriate.
- Revert the behavior - This PR was meant to be about updating the export format - maybe do a custom override of determination score just for the export (determination_score = None if occurrence is verified)
- Something else - use
best_machine_scorein at least one place in the API/UI like on the taxon details page where it shows the "best score" for all occurrences of a species.
|
I'm inclined to keep the behavior of determination_score as-is (in main) and focus on introducing |
Oh I see! I think option 1 sounds like a good option for now, together with this logic:
|
Per discussion on #1214, narrow this PR to the export format changes only. Drop the determination_score=None-for-humans behavior and the backfill command that supported it. What stays: BEST_MACHINE_PREDICTION_ORDER, with_best_machine_prediction(), with_verification_info(), the expanded with_best_detection() (source-image fields), all CSV export columns, and the best_machine_prediction nested object on OccurrenceListSerializer. None of these read or mutate determination_score, so existing sorting / filtering / default-filter behavior is unchanged. What goes: - Occurrence.find_best_prediction() / find_best_identification() extractions (best_prediction / best_identification cached_properties restored to main's bodies) - get_determination_score() — restored to return best_identification.score for human-determined occurrences - update_occurrence_determination() — restored to main (loses the FK type fix and the stale-determination-clearing fix; both are real bugs but out of scope for this PR) - Occurrence.save() legacy backfill — restored to main - backfill_determination_score management command + its test - UpdateOccurrenceDeterminationTest (covered the stale-det fix that's no longer in scope) Sortable best_machine_prediction_score on the API queryset is tracked separately in #1242. Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude says: Pushed 6de816a to narrow this PR to just the export-format additions per @mihow's direction above. Reverted (back to
Kept (the actual export feature):
Diff vs Sortable Tests: |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
ami/main/models.py (1)
2828-2850:⚠️ Potential issue | 🟠 MajorKeep
Occurrence.best_predictionaligned with this queryset annotation.This path now picks the winner from all classifications, but
Occurrence.best_predictionstill goes throughOccurrence.predictions(), which first collapses to per-algorithm max scores. The API and CSV can therefore disagree again when one algorithm emits a higher-scoring non-terminal row and a lower-scoring terminal row. The new mixed-terminal test also won’t catch that case because it uses two different algorithms.Suggested fix
`@functools.cached_property` def best_prediction(self): - return self.predictions().order_by("-terminal", "-score").first() + return ( + Classification.objects.filter(detection__occurrence=self) + .select_related("taxon", "algorithm") + .order_by(*BEST_MACHINE_PREDICTION_ORDER) + .first() + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/main/models.py` around lines 2828 - 2850, Occurrence.best_prediction is out-of-sync with with_best_machine_prediction because the method currently uses Occurrence.predictions() which collapses to per-algorithm max scores; update Occurrence.best_prediction to pick the winner from all Classification rows (same logic as with_best_machine_prediction) by querying Classification.objects.filter(detection__occurrence=self) and ordering by BEST_MACHINE_PREDICTION_ORDER, then returning the first result (or None) so API/CSV use the identical selection rules; ensure you reference BEST_MACHINE_PREDICTION_ORDER and use the same fields (taxon, score, algorithm, taxon_id) and tiebreaker logic as in with_best_machine_prediction.ami/exports/tests.py (1)
376-376:⚠️ Potential issue | 🟡 MinorDrop the unused
classificationbindings.Both unpacked
classificationvariables are unused, so Ruff will flag these withRUF059.Suggested fix
- occurrence, classification = self._create_occurrence_with_prediction() + occurrence, _classification = self._create_occurrence_with_prediction() ... - occurrence, classification = self._create_occurrence_with_prediction(taxon=self.taxon_a) + occurrence, _classification = self._create_occurrence_with_prediction(taxon=self.taxon_a)Also applies to: 414-414
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/exports/tests.py` at line 376, The tuple unpacking from _create_occurrence_with_prediction() binds an unused classification variable; update the call sites (e.g., where you have "occurrence, classification = self._create_occurrence_with_prediction()") to ignore the second value by using a throwaway name (e.g., "occurrence, _ = self._create_occurrence_with_prediction()") or only grab the first element, and apply the same change to the other occurrence of this pattern in the file so Ruff RUF059 is not triggered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ami/main/api/serializers.py`:
- Around line 1396-1425: get_best_machine_prediction currently calls
obj.best_prediction and then accesses prediction.taxon/algorithm which triggers
per-occurrence DB queries; change it to use annotated/prefetched data instead:
ensure the queryset that feeds this serializer prefetches/annotates the
best_prediction relation and its taxon and algorithm (e.g.
prefetch_related('best_prediction__taxon','best_prediction__algorithm') or
annotate best_prediction_* fields), then in get_best_machine_prediction read
those prefetched/annotated attributes (e.g. use getattr(obj, 'best_prediction',
None) only after prefetch, access obj.best_prediction.taxon and
obj.best_prediction.algorithm without causing lazy loads, or construct
taxon_data/algorithm_data from annotated fields) and call TaxonNestedSerializer
/ AlgorithmNestedSerializer on the already-prefetched instances so no extra
queries are executed per occurrence.
---
Duplicate comments:
In `@ami/exports/tests.py`:
- Line 376: The tuple unpacking from _create_occurrence_with_prediction() binds
an unused classification variable; update the call sites (e.g., where you have
"occurrence, classification = self._create_occurrence_with_prediction()") to
ignore the second value by using a throwaway name (e.g., "occurrence, _ =
self._create_occurrence_with_prediction()") or only grab the first element, and
apply the same change to the other occurrence of this pattern in the file so
Ruff RUF059 is not triggered.
In `@ami/main/models.py`:
- Around line 2828-2850: Occurrence.best_prediction is out-of-sync with
with_best_machine_prediction because the method currently uses
Occurrence.predictions() which collapses to per-algorithm max scores; update
Occurrence.best_prediction to pick the winner from all Classification rows (same
logic as with_best_machine_prediction) by querying
Classification.objects.filter(detection__occurrence=self) and ordering by
BEST_MACHINE_PREDICTION_ORDER, then returning the first result (or None) so
API/CSV use the identical selection rules; ensure you reference
BEST_MACHINE_PREDICTION_ORDER and use the same fields (taxon, score, algorithm,
taxon_id) and tiebreaker logic as in with_best_machine_prediction.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ef0a4b08-fc54-433f-8400-029a0d2dec1d
📒 Files selected for processing (3)
ami/exports/tests.pyami/main/api/serializers.pyami/main/models.py
Avoids per-row lazy loads when callers serialize the result. Affects both the new OccurrenceListSerializer.best_machine_prediction field and the existing get_determination_details path in the same serializer. Per CodeRabbit thread on #1214 — full annotation-driven serialization remains tracked in #1242. Co-Authored-By: Claude <noreply@anthropic.com>
…level Sits at the same level as `determination` / `determination_score` rather than nested under `best_machine_prediction`. The flag describes the determination's relationship to the prediction, so it doesn't belong as an attribute of the prediction object — particularly once `determination` becomes its own nested object. Co-Authored-By: Claude <noreply@anthropic.com>
…diction from API; share public-URL builder - Drop determination_matches_machine_prediction from OccurrenceListSerializer. API consumers can compare taxon IDs directly; CSV consumers compare species name strings, so the field stays in the tabular export only. - Add SourceImage.build_public_url(base_url, path) staticmethod and reuse it from both SourceImage.public_url() and the export's get_best_detection_source_image_url(). The export deliberately works from annotated path + public_base_url to avoid loading a SourceImage row per occurrence; presigned URLs (private buckets) aren't supported in that path for the same reason. Addresses review feedback on PR #1214. Co-Authored-By: Claude <noreply@anthropic.com>
… → bool
- Rename `best_detection_source_image_url` (CSV column) and the underlying
`best_detection_source_image_{path,public_base_url}` queryset annotations to
`best_detection_capture_*` to match the public "Capture" terminology used
in the UI. Internal model is still SourceImage; only externally-visible
names change.
- Change `verification_status` from "Verified"/"Not verified" strings to a
bool (True/False). The hardcoded strings were awkward, and the field is now
consistent with the bool-shaped `determination_matches_machine_prediction`.
Addresses review feedback on PR #1214.
Co-Authored-By: Claude <noreply@anthropic.com>






Summary
Adds machine-prediction and verification fields to the CSV occurrence export and an equivalent
best_machine_predictionobject to the API list serializer, so researchers can see ML predictions side-by-side with human verifications.Scope narrowed (per discussion): this PR no longer changes
determination_scorebehavior. Sortablebest_machine_prediction_scoreon the OccurrenceViewSet is split out to #1242 so this PR can land quickly.What's in the PR
OccurrenceQuerySet.with_best_machine_prediction()— annotatesbest_machine_prediction_{name,score,algorithm,taxon_id}OccurrenceQuerySet.with_verification_info()— annotatesverified_by_name,participant_count,agreed_with_{algorithm_name,user_email}OccurrenceQuerySet.with_best_detection()extended withbest_detection_capture_{path,public_base_url}annotations (capture = source image; matches the public UI terminology)OccurrenceTabularSerializer(CSV) — new columns belowOccurrenceListSerializer.best_machine_prediction— nested object on the API list responseBEST_MACHINE_PREDICTION_ORDERmodule-level constant — defines("-terminal", "-score", "-pk")ordering for the queryset annotationOccurrence.predictions()—select_related("taxon", "algorithm")added so callers serializing the result (OccurrenceListSerializer.best_machine_prediction, the existingget_determination_details) don't lazy-load FKsSourceImage.build_public_url(base_url, path)staticmethod — small shared helper used by bothSourceImage.public_url()and the export'sget_best_detection_source_image_urlverified_by_count→participant_count(distinct users), removedbest_detection_occurrence_url, addedagreed_with_userselect_relatedabove),distinct=Trueon counts, withdrawn-id filtering, timezone-safe testsWhat's not in this PR (intentional)
Occurrence.determination_scorebehavior is unchanged frommain— human verifications still write theIdentification.score(1.0) intodetermination_score, so existing default filters / sorting / UI continue to workbest_machine_prediction_scorevia the API → Make best_machine_prediction_score available for sort/filter on OccurrenceViewSet #1242best_machine_prediction(eliminates the pre-existing per-rowobj.best_predictionquery, shared withget_determination_details) → Make best_machine_prediction_score available for sort/filter on OccurrenceViewSet #1242update_occurrence_determinationFK type fix → can be revisited in a focused PRNew CSV export columns
best_machine_prediction_nameIdia aemulabest_machine_prediction_algorithmmoth-classifier-v2best_machine_prediction_score0.881verified_byJane Smithparticipant_count2agreed_with_algorithmmoth-classifier-v2agreed_with_userjane@example.orgdetermination_matches_machine_predictionTruebest_detection_bbox[0.1, 0.1, 0.5, 0.5]best_detection_capture_urlhttps://s3.../image.jpgExisting column behavior change
verification_statusnow exports a boolean (True/False) instead of the strings"Verified"/"Not verified". CSV consumers that string-matched on those values will need to update.Consumers that want to follow transitive agreement chains (B agreed with A, A agreed with ML) can do so by chaining
agreed_with_useracross rows themselves — the backend exposes the direct link only.New API fields on
OccurrenceListSerializerbest_machine_prediction— nested object, always populated regardless of verification statusAPI consumers can compare
determination.idtobest_machine_prediction.taxon.iddirectly, so the boolean isn't repeated in the API response (it's exported in the CSV asdetermination_matches_machine_prediction, where consumers can't easily compare nested taxa).determination_scoreretains its existing semantics (1.0 when a human has identified the occurrence; the ML score otherwise):{ "id": 993, "determination": {"id": 4205, "name": "Catocala relicta"}, "determination_score": 1.0, "best_machine_prediction": { "taxon": {"id": 4205, "name": "Idia aemula"}, "algorithm": {"id": 3, "name": "moth-classifier-v2"}, "score": 0.881 }, "identifications": [...] }Test plan
ami.exports.tests.ExportNewFieldsTest— 19 tests pass)ami.main.tests -k occurrence)lintandtestchecks passDiff vs
mainFollow-up
best_machine_prediction_scoresortable onOccurrenceViewSetand drive the API field from the queryset annotation (eliminates the pre-existing per-rowClassificationquery inobj.best_prediction)update_occurrence_determination()— separate PRCloses #1213