Skip to content

feat: add additional prediction and identification fields to CSV export#1214

Merged
mihow merged 15 commits intomainfrom
feat/export-machine-prediction-fields
Apr 17, 2026
Merged

feat: add additional prediction and identification fields to CSV export#1214
mihow merged 15 commits intomainfrom
feat/export-machine-prediction-fields

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented Apr 8, 2026

Summary

Adds machine-prediction and verification fields to the CSV occurrence export and an equivalent best_machine_prediction object to the API list serializer, so researchers can see ML predictions side-by-side with human verifications.

Scope narrowed (per discussion): this PR no longer changes determination_score behavior. Sortable best_machine_prediction_score on the OccurrenceViewSet is split out to #1242 so this PR can land quickly.

What's in the PR

  • OccurrenceQuerySet.with_best_machine_prediction() — annotates best_machine_prediction_{name,score,algorithm,taxon_id}
  • OccurrenceQuerySet.with_verification_info() — annotates verified_by_name, participant_count, agreed_with_{algorithm_name,user_email}
  • OccurrenceQuerySet.with_best_detection() extended with best_detection_capture_{path,public_base_url} annotations (capture = source image; matches the public UI terminology)
  • OccurrenceTabularSerializer (CSV) — new columns below
  • OccurrenceListSerializer.best_machine_prediction — nested object on the API list response
  • BEST_MACHINE_PREDICTION_ORDER module-level constant — defines ("-terminal", "-score", "-pk") ordering for the queryset annotation
  • Occurrence.predictions()select_related("taxon", "algorithm") added so callers serializing the result (OccurrenceListSerializer.best_machine_prediction, the existing get_determination_details) don't lazy-load FKs
  • SourceImage.build_public_url(base_url, path) staticmethod — small shared helper used by both SourceImage.public_url() and the export's get_best_detection_source_image_url
  • Per @annavik / @mihow feedback: verified_by_countparticipant_count (distinct users), removed best_detection_occurrence_url, added agreed_with_user
  • Addressed CodeRabbit review feedback: N+1 query fixes (incl. the select_related above), distinct=True on counts, withdrawn-id filtering, timezone-safe tests

What's not in this PR (intentional)

New CSV export columns

Column Example value Description
best_machine_prediction_name Idia aemula Top ML prediction taxon
best_machine_prediction_algorithm moth-classifier-v2 Algorithm that produced the prediction
best_machine_prediction_score 0.881 ML confidence score
verified_by Jane Smith Display name of the most recent human identifier
participant_count 2 Number of distinct users with non-withdrawn identifications
agreed_with_algorithm moth-classifier-v2 Algorithm name if the human explicitly agreed with an ML prediction
agreed_with_user jane@example.org Email of the prior identifier if the best identification was an "Agree" with another user
determination_matches_machine_prediction True Does the human ID match the ML prediction?
best_detection_bbox [0.1, 0.1, 0.5, 0.5] Bounding box coordinates
best_detection_capture_url https://s3.../image.jpg Public URL to the source capture (full-frame original)

Existing column behavior change

  • verification_status now exports a boolean (True/False) instead of the strings "Verified"/"Not verified". CSV consumers that string-matched on those values will need to update.

Consumers that want to follow transitive agreement chains (B agreed with A, A agreed with ML) can do so by chaining agreed_with_user across rows themselves — the backend exposes the direct link only.

New API fields on OccurrenceListSerializer

  • best_machine_prediction — nested object, always populated regardless of verification status

API consumers can compare determination.id to best_machine_prediction.taxon.id directly, so the boolean isn't repeated in the API response (it's exported in the CSV as determination_matches_machine_prediction, where consumers can't easily compare nested taxa).

determination_score retains its existing semantics (1.0 when a human has identified the occurrence; the ML score otherwise):

{
  "id": 993,
  "determination": {"id": 4205, "name": "Catocala relicta"},
  "determination_score": 1.0,
  "best_machine_prediction": {
    "taxon": {"id": 4205, "name": "Idia aemula"},
    "algorithm": {"id": 3, "name": "moth-classifier-v2"},
    "score": 0.881
  },
  "identifications": [...]
}

Test plan

  • Export field tests covering: ML-only, agreeing human, disagreeing human, multiple identifications, human-agrees-with-human, bbox, field presence (ami.exports.tests.ExportNewFieldsTest — 19 tests pass)
  • All occurrence-related main tests pass (ami.main.tests -k occurrence)
  • All identification / determination main tests pass
  • CI lint and test checks pass

Diff vs main

ami/exports/format_types.py |  81 ++++++++++++--
ami/exports/tests.py        | 251 +++++++++++++++++++++++++++++++++++++++++++-
ami/main/api/serializers.py |  28 +++++
ami/main/models.py          |  90 +++++++++++++++-
4 files changed, 437 insertions(+), 13 deletions(-)

Follow-up

  • Make best_machine_prediction_score available for sort/filter on OccurrenceViewSet #1242 — Make best_machine_prediction_score sortable on OccurrenceViewSet and drive the API field from the queryset annotation (eliminates the pre-existing per-row Classification query in obj.best_prediction)
  • Stale-determination-clearing fix in update_occurrence_determination() — separate PR
  • "Verified label" UX changes (do not filter verified out by score threshold) — separate ticket

Closes #1213

…elds

Separate machine predictions from human identifications in exports and API,
so researchers see both side-by-side. Previously the determination was
overwritten when a human verified, losing the original ML prediction.

Model layer:
- Extract find_best_prediction() and find_best_identification() from
  update_occurrence_determination() for reuse by exports and API
- Set determination_score to None for human-determined occurrences
  (ML confidence preserved in best_machine_prediction_score)

New CSV export fields:
- best_machine_prediction_name, _algorithm, _score
- verified_by, verified_by_count, agreed_with_algorithm
- determination_matches_machine_prediction
- best_detection_bbox, best_detection_source_image_url,
  best_detection_occurrence_url

API changes:
- Add best_machine_prediction nested object to OccurrenceListSerializer
  (always populated regardless of verification status)

Closes #1213

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 8, 2026 02:07
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 8, 2026

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit d50d775
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/69e1d68a80ce6b000714d214

@netlify
Copy link
Copy Markdown

netlify bot commented Apr 8, 2026

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit d50d775
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/69e1d68ad933c40008662eea

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

The PR extends the occurrence export and API to separately expose machine predictions, verification metadata, and detection details alongside human determinations. It adds new fields to the tabular serializer and API response, implements queryset annotations for efficient data retrieval, and updates verification-status computation logic to prioritize explicit participant counts.

Changes

Cohort / File(s) Summary
Queryset Annotations
ami/main/models.py
Added BEST_MACHINE_PREDICTION_ORDER constant and three new queryset methods: with_best_machine_prediction() annotates best-classification fields (name, algorithm, score, taxon ID), with_best_detection() extended with source-image path/URL metadata, and with_verification_info() annotates verification-related fields (verified-by name, participant count, user agreement names/emails).
Tabular Export Serializer
ami/exports/format_types.py
Added 10 new fields to OccurrenceTabularSerializer: machine prediction (best_machine_prediction_name, best_machine_prediction_algorithm, best_machine_prediction_score), verification (verified_by, participant_count, agreed_with_algorithm, agreed_with_user, determination_matches_machine_prediction), and detection (best_detection_bbox, best_detection_source_image_url). Modified get_verification_status() logic to prioritize participant_count annotation when available, with fallback to non-withdrawn identification count. Implemented 6 new getter methods to populate verification/determination comparison and detection URL fields.
API Serializer
ami/main/api/serializers.py
Added best_machine_prediction nested field to OccurrenceListSerializer returning dict with taxon, algorithm, score, and determination-match boolean; handles missing prediction data gracefully.
Export Field Tests
ami/exports/tests.py
Added comprehensive ExportNewFieldsTest test case validating new CSV export columns across 8 scenarios: machine-only predictions, human agreements/disagreements with machine, cross-human agreements, multiple participants, bbox export, terminal-classification ordering, and column header completeness.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 Predictions now bloom in fields anew,
Machine and human, side by side they grew,
With scores and verifiers for all to see,
The data now speaks more truthfully! ✨📊

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR successfully implements all primary coding requirements from #1213: adds best_machine_prediction queryset annotations and serializer fields, implements verification/identification fields with proper N+1 query fixes, exposes detection metadata, adds comprehensive test coverage (19 passing tests), and implements the API nested object. The one intentional deviation (determination_score not set to null, per narrowed scope per discussion) is explicitly documented.
Out of Scope Changes check ✅ Passed All changes are directly aligned with #1213 objectives: queryset annotations, serializer fields, export columns, and API object additions. The PR explicitly documents and justifies narrowed scope (determination_score/backfill reverted, sortability deferred to #1242, UI/integrity checks as separate follow-ups). No unrelated changes detected.
Title check ✅ Passed The title accurately summarizes the main objective of the PR: adding prediction and identification fields to CSV export, which aligns with the primary changeset across format_types.py, models.py, and tests.py.
Description check ✅ Passed PR description provides comprehensive detail on changes, motivation, test coverage, and follow-ups. Includes Summary, What's in the PR, What's not in the PR, new CSV columns, new API fields, Test plan, and Closes #1213.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/export-machine-prediction-fields

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates occurrence exports and the API to preserve and expose both machine predictions and human verifications side-by-side, preventing ML predictions from being lost when a human identification is added.

Changes:

  • Refactors occurrence logic to expose reusable “best prediction” and “best identification” selection methods and adjusts determination score semantics for human IDs.
  • Extends CSV exports with new machine prediction, verification, and detection-related fields backed by queryset annotations.
  • Adds a best_machine_prediction nested object to the occurrences list API serializer.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
ami/main/models.py Adds queryset annotations for best detection/prediction/verification and refactors best prediction/identification selection + determination score update logic.
ami/main/api/serializers.py Adds best_machine_prediction field to occurrence list API responses.
ami/exports/format_types.py Adds new CSV export fields and wires exporter queryset to include new annotations.
ami/exports/tests.py Adds tests covering the new CSV export fields across ML-only and human verification scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ami/main/models.py Outdated
Comment thread ami/main/models.py Outdated
Comment thread ami/main/models.py Outdated
Comment thread ami/main/models.py
Comment thread ami/exports/format_types.py Outdated
Comment thread ami/exports/format_types.py Outdated
Comment thread ami/exports/format_types.py Outdated
Comment thread ami/main/api/serializers.py Outdated
Comment thread ami/exports/tests.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/exports/format_types.py`:
- Around line 192-204: Use the annotated winning detection already attached to
the exported object instead of recomputing it: in
get_best_detection_source_image_url and get_best_detection_occurrence_url, read
the annotated detection from the obj (e.g. getattr(obj, "best_detection", None)
or the specific annotated attributes like
best_detection_source_image_path/public_base_url on that detection) and build
both the source image URL and the occurrence/context URL from that detection's
stored fields (path, public_base_url or signed_url, and occurrence/context link)
rather than calling obj.context_url() which re-queries and may use a different
ordering; this ensures URLs correspond to the exported bbox/path and avoids N+1
queries.

In `@ami/main/api/serializers.py`:
- Around line 1396-1421: get_best_machine_prediction currently calls
obj.find_best_prediction() and directly accesses prediction.taxon and
prediction.algorithm, causing an N+1; change it to reuse obj.best_prediction
(use that if present, fall back to obj.find_best_prediction() only if needed)
and avoid dereferencing relations that aren't prefetched—either use the
already-attached taxon/algorithm on the prediction object or ensure these are
provided via queryset annotations/prefetch_related and passed through context to
TaxonNestedSerializer/AlgorithmNestedSerializer to prevent per-row queries;
update get_best_machine_prediction to check obj.best_prediction first and only
call find_best_prediction as a fallback, and document that views/queries should
prefetch prediction__taxon and prediction__algorithm.

In `@ami/main/models.py`:
- Around line 3210-3218: The code only sets new_score when new_determination
changes, so when authority flips but the taxon stays the same we never recompute
determination_score; update the logic around
occurrence.find_best_identification() and occurrence.find_best_prediction()
(references: top_identification, top_prediction, new_determination, new_score,
determination_score) so that whenever the authority changes you still set
new_score appropriately (None for a human top_identification,
top_prediction.score for an ML top_prediction) and assign determination_score on
the occurrence even if the taxon equals current_determination; apply the same
fix in the analogous block handling lines ~3225-3228.
- Around line 2864-2867: The aggregate verified_by_count is over-counting
because Count("identifications") is performed across detection rows; change the
Count to be distinct so each identification is only counted once (e.g. use
Count("identifications", distinct=True,
filter=Q(identifications__withdrawn=False))). Update the verified_by_count
definition (the Count call) to include distinct=True while keeping the existing
filter on identifications.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca26e0c4-7c2a-409c-8ae0-af43172af29c

📥 Commits

Reviewing files that changed from the base of the PR and between 83a970c and f21ec4d.

📒 Files selected for processing (4)
  • ami/exports/format_types.py
  • ami/exports/tests.py
  • ami/main/api/serializers.py
  • ami/main/models.py

Comment thread ami/exports/format_types.py Outdated
Comment thread ami/main/api/serializers.py Outdated
Comment thread ami/main/models.py Outdated
Comment thread ami/main/models.py Outdated
- Add distinct=True to verified_by_count to prevent join multiplication
- Fix update_occurrence_determination to recompute score even when taxon
  stays the same (handles authority flip without taxon change)
- Remove email fallback in verified_by (PII concern, name-only)
- Filter withdrawn identifications in verification_status fallback
- Use obj.best_prediction cached property instead of find_best_prediction()
  in API serializer to avoid N+1 queries
- Build occurrence URL from annotated fields instead of context_url()
  to avoid N+1 queries in export
- Use source_image.timestamp instead of datetime.now() in tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread ami/main/api/serializers.py Outdated
@annavik
Copy link
Copy Markdown
Member

annavik commented Apr 8, 2026

From reading the PR description, this looks great Michael!

Some comments and questions:

  • verified_by_count: From the PR description it sounds like this is considering all human identifications. Since we use the word "verified", maybe this should only consider identifications in line with the final determination?

  • agreed_with_algorithm: If you agree with an algorithm, then I agree with you, have I also agreed with the algorithm...?

  • best_detection_occurrence_url: There is a risk this URL could get broken if we change environment URL, do you think our are URLs are stable enough to start distribute in exports?

  • determination_matches_machine_prediction: I guess this is nice to have, but this information can also be derived from other columns. We could skip if we want to keep things compact! :)

@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented Apr 13, 2026

Very astute observations @annavik !

  • verified_by_count: From the PR description it sounds like this is considering all human identifications. Since we use the word "verified", maybe this should only consider identifications in line with the final determination?

That makes sense! Although I am leaning away from including this count at all, since we haven't talked about verification at different ranks or disagreements. I realized what I want here is just "participant_count" to highlight if more than one user son is in the identification history. In general, I want to stick with the assumption that the platform is for experts to ID and the last expert gets the word -- like a museum collection, rather than a community ID platform.

  • agreed_with_algorithm: If you agree with an algorithm, then I agree with you, have I also agreed with the algorithm...?

Great question! Probably yes! This would require additional changes (I am just exposing this value in this PR, not calculating it). Maybe an easy change though.

  • best_detection_occurrence_url: There is a risk this URL could get broken if we change environment URL, do you think our are URLs are stable enough to start distribute in exports?

I agree it's too soon! Will remove.

  • determination_matches_machine_prediction: I guess this is nice to have, but this information can also be derived from other columns. We could skip if we want to keep things compact! :)

I like this one since it's based on the taxon ID rather than string matching, and I think it will be a common use case.

mihow and others added 2 commits April 13, 2026 15:44
…urrence URL

Per @annavik and @mihow's review:
- Rename `verified_by_count` to `participant_count` (counts distinct users, not total IDs)
- Remove `best_detection_occurrence_url` (URLs not stable enough to distribute in exports yet)
- Clean up unused annotations (event_id, source_image_id) from with_best_detection()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
ami/exports/format_types.py (1)

112-145: ⚠️ Potential issue | 🟠 Major

Don’t bake the preview frontend domain into exported CSVs.

best_detection_occurrence_url is hardcoded to the preview app, so exported files become wrong as soon as the frontend base URL differs by environment or changes later. That makes the new column non-portable for exactly the reason raised in the PR discussion.

Either drop this column or build it from a configured frontend base URL instead of a literal hostname.

Also applies to: 199-209

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/exports/format_types.py` around lines 112 - 145, The serializer currently
hardcodes the preview frontend domain for the best_detection_occurrence_url
column (the SerializerMethodField best_detection_occurrence_url and its getter),
making exported CSVs non-portable; either remove best_detection_occurrence_url
from the serializer fields, or change the getter (e.g.,
get_best_detection_occurrence_url) to construct the link from a configurable
frontend base URL (read from settings or an env var like FRONTEND_BASE_URL) and
fall back to None if not set; update the Meta.fields list and the corresponding
method(s) (also referenced in the same file around the other similar fields at
the later occurrence) so no literal hostname is embedded.
ami/main/models.py (1)

3215-3242: ⚠️ Potential issue | 🟠 Major

Compare determination IDs and allow clearing stale determinations.

The fallback query at Lines 3215-3220 returns a raw determination id, while new_determination is a Taxon instance. Combined with the if new_determination ... guard, unchanged rows look changed, and rows with no remaining identification/prediction never clear an old saved determination.

Suggested fix
-    current_determination = (
-        current_determination
-        or Occurrence.objects.select_related("determination")
-        .values("determination")
-        .get(pk=occurrence.pk)["determination"]
-    )
+    current_determination_id = getattr(current_determination, "pk", current_determination)
+    if current_determination_id is None:
+        current_determination_id = (
+            Occurrence.objects.values_list("determination_id", flat=True).get(pk=occurrence.pk)
+        )
     new_determination = None
     new_score = None
@@
-    if new_determination and new_determination != current_determination:
+    new_determination_id = new_determination.pk if new_determination else None
+    if new_determination_id != current_determination_id:
         logger.debug(f"Changing det. of {occurrence} from {current_determination} to {new_determination}")
         occurrence.determination = new_determination
         needs_update = True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/main/models.py` around lines 3215 - 3242, The bug is that
current_determination is being read as a raw determination id while
new_determination is a Taxon instance, causing false positives and preventing
clearing of stale determinations; fix by normalizing types before comparison:
either load the actual Taxon object for current_determination (e.g. use
Occurrence.objects.select_related("determination").get(pk=occurrence.pk).determination)
or compare IDs consistently (e.g. get current_determination_id via
.values_list("determination", flat=True) and compare to new_determination.pk),
and ensure when no identification/prediction remains you set
occurrence.determination = None (and update determination_score) so stale
determinations are cleared; update the comparisons around current_determination,
new_determination, and occurrence.determination_score accordingly (references:
current_determination, new_determination, occurrence.find_best_identification,
occurrence.find_best_prediction, determination_score).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/exports/tests.py`:
- Line 376: The unpacking at the call to _create_occurrence_with_prediction
currently assigns unused variables occurrence and classification (e.g.,
"occurrence, classification = self._create_occurrence_with_prediction()") which
triggers RUF059; change the bindings to use the underscore discard pattern
(e.g., "_" or "_classification") for both returned values at each call site
(including the other similar call around line 417) or assign only the needed
value via a single-variable assignment from the function result, updating
references accordingly so unused names are removed.

In `@ami/main/api/serializers.py`:
- Around line 1396-1421: The function get_best_machine_prediction currently
returns None when obj.best_prediction is missing but the API contract requires
the nested object to always be present; change the early-return to return a dict
with the expected keys populated with nullable defaults instead. Specifically,
inside get_best_machine_prediction (and using obj.best_prediction,
TaxonNestedSerializer, AlgorithmNestedSerializer as anchors), always return a
dict with keys taxon, algorithm, score, determination_matches_machine_prediction
where taxon and algorithm are None if no prediction, score is None, and
determination_matches_machine_prediction is None (or False if you prefer
boolean) when no prediction exists; keep the existing logic that fills these
fields when prediction is present.

In `@ami/main/models.py`:
- Around line 3078-3105: find_best_prediction currently calls self.predictions()
which applies per-algorithm max-score filtering and causes a different result
than OccurrenceQuerySet.with_best_machine_prediction(); change
find_best_prediction to mirror with_best_machine_prediction by querying
Classification objects for this occurrence directly (not via predictions()),
ordering by "-terminal", "-score" across all classifications, and returning the
first result so the API/CSV and this method pick the same best machine
prediction; update any references to find_best_prediction/best_prediction if
needed to rely on the new query behavior.

---

Outside diff comments:
In `@ami/exports/format_types.py`:
- Around line 112-145: The serializer currently hardcodes the preview frontend
domain for the best_detection_occurrence_url column (the SerializerMethodField
best_detection_occurrence_url and its getter), making exported CSVs
non-portable; either remove best_detection_occurrence_url from the serializer
fields, or change the getter (e.g., get_best_detection_occurrence_url) to
construct the link from a configurable frontend base URL (read from settings or
an env var like FRONTEND_BASE_URL) and fall back to None if not set; update the
Meta.fields list and the corresponding method(s) (also referenced in the same
file around the other similar fields at the later occurrence) so no literal
hostname is embedded.

In `@ami/main/models.py`:
- Around line 3215-3242: The bug is that current_determination is being read as
a raw determination id while new_determination is a Taxon instance, causing
false positives and preventing clearing of stale determinations; fix by
normalizing types before comparison: either load the actual Taxon object for
current_determination (e.g. use
Occurrence.objects.select_related("determination").get(pk=occurrence.pk).determination)
or compare IDs consistently (e.g. get current_determination_id via
.values_list("determination", flat=True) and compare to new_determination.pk),
and ensure when no identification/prediction remains you set
occurrence.determination = None (and update determination_score) so stale
determinations are cleared; update the comparisons around current_determination,
new_determination, and occurrence.determination_score accordingly (references:
current_determination, new_determination, occurrence.find_best_identification,
occurrence.find_best_prediction, determination_score).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3b1730fd-0066-43a0-8e14-4c23b916b341

📥 Commits

Reviewing files that changed from the base of the PR and between f21ec4d and 6bf3a44.

📒 Files selected for processing (4)
  • ami/exports/format_types.py
  • ami/exports/tests.py
  • ami/main/api/serializers.py
  • ami/main/models.py

Comment thread ami/exports/tests.py
Comment thread ami/main/api/serializers.py Outdated
Comment thread ami/main/models.py Outdated
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented Apr 13, 2026

Claude says: Thanks for the great feedback @annavik! Addressed in 66d9390:

  • verified_by_countparticipant_count: Renamed and changed to count distinct users (not total identifications). Per @mihow's comment, this highlights when more than one person contributed to the identification history.

  • best_detection_occurrence_url: Removed entirely — agreed that platform URLs aren't stable enough to distribute in exports yet.

  • agreed_with_algorithm: No changes in this PR — the transitive agreement question ("if you agree with an algorithm, then I agree with you...") is a good one for a follow-up.

  • determination_matches_machine_prediction: Kept as-is — it compares by taxon ID rather than string matching, which avoids edge cases.

Exposes the email of the prior human identifier when the best identification
was explicitly an "Agree" with another user. Lets consumers trace agreement
chains without the backend having to resolve transitivity.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented Apr 13, 2026

Claude says: Added agreed_with_user column in 7aa6173 — exposes the email of the prior identifier the best identification explicitly agreed with. Rationale per @mihow: surfacing the raw link is enough; downstream consumers can walk the chain themselves if they need to.

While implementing this I noticed a pre-existing UI bug where agreeing with a human identification from the occurrence details card sends the wrong FK type. Filed separately as #1226 — it means agreed_with_identification is likely undercounted in production today.

@mihow mihow changed the title feat(exports): add machine prediction, verification, and detection fields feat: change humans core to None, add prediction and identification fields to CSV export Apr 14, 2026
- Add `backfill_determination_score` management command to null-out
  determination_score on legacy human-determined occurrences. Supports
  --dry-run and --project filtering.
- Tighten the legacy score-backfill block in Occurrence.save() to skip
  human-determined occurrences (avoids a redundant query and suppresses
  the misleading "Could not determine score" warning).
- Fix type mismatch in update_occurrence_determination(): when the caller
  didn't pass current_determination, the fallback fetched the FK id (int)
  and compared it against a Taxon instance, which was always unequal and
  caused unnecessary save() calls and misleading debug logs. Normalize
  current_determination to an id for the comparison and accept int|Taxon|None.

Tests: BackfillDeterminationScoreTest covers the ML-only, human-determined,
and withdrawn-identification cases.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
ami/exports/tests.py (1)

376-376: ⚠️ Potential issue | 🟡 Minor

Fix unused unpacked variable to avoid RUF059.

At Line [376] and Line [417], classification is assigned but never used.

Suggested fix
-        occurrence, classification = self._create_occurrence_with_prediction()
+        occurrence, _classification = self._create_occurrence_with_prediction()
...
-        occurrence, classification = self._create_occurrence_with_prediction(taxon=self.taxon_a)
+        occurrence, _classification = self._create_occurrence_with_prediction(taxon=self.taxon_a)

Also applies to: 417-417

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/exports/tests.py` at line 376, The test assigns an unused second value
from self._create_occurrence_with_prediction() to classification causing RUF059;
update both call sites where you do "occurrence, classification =
self._create_occurrence_with_prediction()" (and the duplicate) to ignore the
unused value by renaming it to "_" or "_classification" (e.g., "occurrence, _ =
self._create_occurrence_with_prediction()") so the returned but unused
classification is not flagged.
ami/main/models.py (1)

3067-3075: ⚠️ Potential issue | 🟠 Major

Align find_best_prediction() with the export/API query.

At Line 3075 this still starts from self.predictions(), which pre-filters to per-algorithm max scores. OccurrenceQuerySet.with_best_machine_prediction() in the same file orders across all classifications, so the model method can disagree with the exported best_machine_prediction_* fields for the same occurrence.

Suggested fix
     def find_best_prediction(self) -> "Classification | None":
         """
         Find the best machine prediction for this occurrence.
@@
         Uses the highest scoring classification (from any algorithm) as the best prediction.
         Considers terminal classifications first, then non-terminal ones.
         (Terminal classifications are the final classifications of a pipeline, non-terminal are intermediate models.)
         """
-        return self.predictions().order_by("-terminal", "-score").first()
+        return Classification.objects.filter(detection__occurrence=self).order_by("-terminal", "-score").first()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/main/models.py` around lines 3067 - 3075, The method
find_best_prediction() currently calls self.predictions(), which applies the
per-algorithm max-score prefilter and can disagree with
OccurrenceQuerySet.with_best_machine_prediction() that considers all
classifications; update find_best_prediction() to query all Classification rows
for this occurrence (e.g., via the Classification model filter by
occurrence=self or the raw reverse relation like self.classifications.all()) and
then order by "-terminal", "-score" to pick the single best prediction so it
matches the logic used in with_best_machine_prediction().
🧹 Nitpick comments (1)
ami/exports/tests.py (1)

465-465: Update docstring to match renamed field semantics.

The text still mentions verified_by_count; tests now validate participant_count (distinct participants), so this wording is stale.

Suggested fix
-        """Multiple identifications: verified_by_count reflects all non-withdrawn IDs."""
+        """Multiple identifications: participant_count reflects distinct non-withdrawn users."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/exports/tests.py` at line 465, Update the test docstring to reflect the
renamed field: replace references to "verified_by_count" with
"participant_count" and adjust the description to state that participant_count
reflects the number of distinct non-withdrawn participants (or similar wording
used elsewhere in the codebase). Locate the docstring in ami/exports/tests.py
(the triple-quoted string currently reading "Multiple identifications:
verified_by_count reflects all non-withdrawn IDs.") and change it to mention
participant_count and distinct participants so the docstring matches the
assertions in the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/exports/format_types.py`:
- Around line 163-165: The method get_agreed_with_user currently returns the
collaborator's email (agreed_with_user_email) which leaks PII; change it to
return a non-PII identifier or display name instead (e.g., agreed_with_user_id
or agreed_with_user_display_name) by reading the corresponding attribute from
obj and falling back to None if absent—update get_agreed_with_user to use that
non-PII field rather than agreed_with_user_email and ensure callers/CSV exporter
expect the new identifier.

In `@ami/main/models.py`:
- Around line 3224-3229: The current guard only sets occurrence.determination
when new_determination is truthy, leaving a stale determination when everything
authoritative is removed; update the logic around
new_determination/current_determination_id so that if new_determination is None
but current_determination_id is set you explicitly clear
occurrence.determination (set to None), mark needs_update = True, and log the
change (similar to the existing logger.debug), otherwise keep the existing
assignment when new_determination differs; touch the block that references
new_determination, occurrence.determination, current_determination_id, and
needs_update to implement this behavior.

---

Duplicate comments:
In `@ami/exports/tests.py`:
- Line 376: The test assigns an unused second value from
self._create_occurrence_with_prediction() to classification causing RUF059;
update both call sites where you do "occurrence, classification =
self._create_occurrence_with_prediction()" (and the duplicate) to ignore the
unused value by renaming it to "_" or "_classification" (e.g., "occurrence, _ =
self._create_occurrence_with_prediction()") so the returned but unused
classification is not flagged.

In `@ami/main/models.py`:
- Around line 3067-3075: The method find_best_prediction() currently calls
self.predictions(), which applies the per-algorithm max-score prefilter and can
disagree with OccurrenceQuerySet.with_best_machine_prediction() that considers
all classifications; update find_best_prediction() to query all Classification
rows for this occurrence (e.g., via the Classification model filter by
occurrence=self or the raw reverse relation like self.classifications.all()) and
then order by "-terminal", "-score" to pick the single best prediction so it
matches the logic used in with_best_machine_prediction().

---

Nitpick comments:
In `@ami/exports/tests.py`:
- Line 465: Update the test docstring to reflect the renamed field: replace
references to "verified_by_count" with "participant_count" and adjust the
description to state that participant_count reflects the number of distinct
non-withdrawn participants (or similar wording used elsewhere in the codebase).
Locate the docstring in ami/exports/tests.py (the triple-quoted string currently
reading "Multiple identifications: verified_by_count reflects all non-withdrawn
IDs.") and change it to mention participant_count and distinct participants so
the docstring matches the assertions in the test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a6709b2e-0997-453a-bfba-a309722205ef

📥 Commits

Reviewing files that changed from the base of the PR and between 6bf3a44 and bb46a28.

📒 Files selected for processing (5)
  • ami/exports/format_types.py
  • ami/exports/tests.py
  • ami/main/management/commands/backfill_determination_score.py
  • ami/main/models.py
  • ami/main/tests.py

Comment thread ami/exports/format_types.py
Comment thread ami/main/models.py Outdated
mihow and others added 2 commits April 13, 2026 18:12
…otation

The API's cached `best_prediction` and the CSV export's annotated
`best_machine_prediction_*` fields could pick different rows for the same
occurrence:

- `find_best_prediction()` went through `self.predictions()`, which filters to
  the max-score row per algorithm **before** applying the `-terminal, -score`
  ordering. A high-score non-terminal could survive dedup and then lose to a
  lower-score terminal on ordering.
- `with_best_machine_prediction()` orders directly over all classifications, so
  it picks the terminal winner regardless of intra-algorithm dedup.

Align `find_best_prediction()` to query classifications directly with the same
`-terminal, -score, -pk` ordering. Add `-pk` to both so ties break
deterministically.

Also:
- Serializer `get_best_machine_prediction()` now returns a stable shape
  ({taxon: null, algorithm: null, score: null, ...}) when no prediction exists,
  instead of null — clients read nullable fields instead of branching on type.
- `update_occurrence_determination()` now clears `occurrence.determination` when
  the last non-withdrawn identification is deleted and no prediction remains,
  instead of leaving a stale taxon.

Tests: new `test_api_and_csv_pick_same_best_prediction_with_mixed_terminal`
covers the divergence case.

Co-Authored-By: Claude <noreply@anthropic.com>
- Extract BEST_MACHINE_PREDICTION_ORDER as a shared constant used by both
  Occurrence.find_best_prediction() (row-at-a-time) and
  OccurrenceQuerySet.with_best_machine_prediction() (bulk annotation). Both
  methods now reference the constant and each other's docstrings, so future
  edits to the ordering touch one place and can't silently diverge.
- Revert get_best_machine_prediction() to return None when no prediction
  exists (it was briefly returning a dict of nulls). Annotate the method's
  return type as `dict | None` so drf-spectacular emits nullability.
- Add UpdateOccurrenceDeterminationTest covering the stale-determination
  clearing fix: withdrawing the only identification on an occurrence with
  no predictions now clears occurrence.determination (previously left a
  stale taxon in place).

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow requested a review from annavik April 14, 2026 15:13
@mihow mihow changed the title feat: change humans core to None, add prediction and identification fields to CSV export feat: change humans scores to None, add prediction and identification fields to CSV export Apr 14, 2026
@mihow mihow changed the title feat: change humans scores to None, add prediction and identification fields to CSV export feat: change human scores to None, add prediction and identification fields to CSV export Apr 15, 2026
Copy link
Copy Markdown
Collaborator Author

@mihow mihow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I verify an occurrence, the determination score goes to None as we designed in this PR, but! That makes the occurrence disappear since our default filters filter on a minimum determination score. And it sorts them last (if you disable the default filtering). Some options:

  1. Use the best machine prediction score as the determination score - I think this it the behavior we were expecting. It doesn't feel quite right from a data integrity perspective, but it is nice and simple.
  2. Update all queries and API endpoints and display sorting, etc. Default filters should allow determination scores of None or ignore determination score if the occurrence is identified. Use the best_machine_score instead of the determination score where possible & appropriate.
  3. Revert the behavior - This PR was meant to be about updating the export format - maybe do a custom override of determination score just for the export (determination_score = None if occurrence is verified)
  4. Something else - use best_machine_score in at least one place in the API/UI like on the taxon details page where it shows the "best score" for all occurrences of a species.

This looks worse
Image

This looks good!
Image

This looks good!
Image

@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented Apr 16, 2026

I'm inclined to keep the behavior of determination_score as-is (in main) and focus on introducing best_machine_prediction_score as a new field / data source. the API & view should be able to sort on best_machine_prediction_score (and it should efficiently added to the queryset) - that should be done in this PR! Then the UI can switch to displaying the best machine score instead when ready OR just add a new column in the UI to display determination score and best prediction score side by side.

@annavik
Copy link
Copy Markdown
Member

annavik commented Apr 16, 2026

When I verify an occurrence, the determination score goes to None as we designed in this PR, but! That makes the occurrence disappear since our default filters filter on a minimum determination score. And it sorts them last (if you disable the default filtering). Some options:

  1. Use the best machine prediction score as the determination score - I think this it the behavior we were expecting. It doesn't feel quite right from a data integrity perspective, but it is nice and simple.
  2. Update all queries and API endpoints and display sorting, etc. Default filters should allow determination scores of None or ignore determination score if the occurrence is identified. Use the best_machine_score instead of the determination score where possible & appropriate.
  3. Revert the behavior - This PR was meant to be about updating the export format - maybe do a custom override of determination score just for the export (determination_score = None if occurrence is verified)
  4. Something else - use best_machine_score in at least one place in the API/UI like on the taxon details page where it shows the "best score" for all occurrences of a species.

This looks worse Image

This looks good! Image

This looks good! Image

Oh I see! I think option 1 sounds like a good option for now, together with this logic:

  • Verified occurrences should never be filtered out based on score threshold
  • In the score column, for verified occurrences we ignore the score and present label "Verified"

Per discussion on #1214, narrow this PR to the export format changes
only. Drop the determination_score=None-for-humans behavior and the
backfill command that supported it.

What stays: BEST_MACHINE_PREDICTION_ORDER, with_best_machine_prediction(),
with_verification_info(), the expanded with_best_detection() (source-image
fields), all CSV export columns, and the best_machine_prediction nested
object on OccurrenceListSerializer. None of these read or mutate
determination_score, so existing sorting / filtering / default-filter
behavior is unchanged.

What goes:
- Occurrence.find_best_prediction() / find_best_identification()
  extractions (best_prediction / best_identification cached_properties
  restored to main's bodies)
- get_determination_score() — restored to return best_identification.score
  for human-determined occurrences
- update_occurrence_determination() — restored to main (loses the FK type
  fix and the stale-determination-clearing fix; both are real bugs but
  out of scope for this PR)
- Occurrence.save() legacy backfill — restored to main
- backfill_determination_score management command + its test
- UpdateOccurrenceDeterminationTest (covered the stale-det fix that's
  no longer in scope)

Sortable best_machine_prediction_score on the API queryset is tracked
separately in #1242.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented Apr 17, 2026

Claude says: Pushed 6de816a to narrow this PR to just the export-format additions per @mihow's direction above.

Reverted (back to main's behavior):

  • get_determination_score() — human verifications again store Identification.score in determination_score, so existing default filters / sort / UI are unchanged
  • update_occurrence_determination() and Occurrence.save() legacy backfill — restored to main
  • backfill_determination_score management command and its test
  • Occurrence.find_best_prediction() / find_best_identification() extractions and UpdateOccurrenceDeterminationTest

Kept (the actual export feature):

  • OccurrenceQuerySet.with_best_machine_prediction(), with_verification_info(), expanded with_best_detection()
  • All new CSV columns (best_machine_prediction_*, participant_count, agreed_with_*, determination_matches_machine_prediction, best_detection_*)
  • OccurrenceListSerializer.best_machine_prediction nested object

Diff vs main is now 4 files / +425 −9 (was 6 files / +652 −44).

Sortable best_machine_prediction_score on the API queryset is filed as #1242 so we can land this PR without expanding review surface to the hot list endpoint. The "Verified label / never-filter-verified-out" UX work @annavik raised should be a separate ticket since it touches default filters and the React side.

Tests: ami.exports.tests 19/19 pass, ami.main.tests -k occurrence 11/11 pass.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
ami/main/models.py (1)

2828-2850: ⚠️ Potential issue | 🟠 Major

Keep Occurrence.best_prediction aligned with this queryset annotation.

This path now picks the winner from all classifications, but Occurrence.best_prediction still goes through Occurrence.predictions(), which first collapses to per-algorithm max scores. The API and CSV can therefore disagree again when one algorithm emits a higher-scoring non-terminal row and a lower-scoring terminal row. The new mixed-terminal test also won’t catch that case because it uses two different algorithms.

Suggested fix
 `@functools.cached_property`
 def best_prediction(self):
-    return self.predictions().order_by("-terminal", "-score").first()
+    return (
+        Classification.objects.filter(detection__occurrence=self)
+        .select_related("taxon", "algorithm")
+        .order_by(*BEST_MACHINE_PREDICTION_ORDER)
+        .first()
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/main/models.py` around lines 2828 - 2850, Occurrence.best_prediction is
out-of-sync with with_best_machine_prediction because the method currently uses
Occurrence.predictions() which collapses to per-algorithm max scores; update
Occurrence.best_prediction to pick the winner from all Classification rows (same
logic as with_best_machine_prediction) by querying
Classification.objects.filter(detection__occurrence=self) and ordering by
BEST_MACHINE_PREDICTION_ORDER, then returning the first result (or None) so
API/CSV use the identical selection rules; ensure you reference
BEST_MACHINE_PREDICTION_ORDER and use the same fields (taxon, score, algorithm,
taxon_id) and tiebreaker logic as in with_best_machine_prediction.
ami/exports/tests.py (1)

376-376: ⚠️ Potential issue | 🟡 Minor

Drop the unused classification bindings.

Both unpacked classification variables are unused, so Ruff will flag these with RUF059.

Suggested fix
-        occurrence, classification = self._create_occurrence_with_prediction()
+        occurrence, _classification = self._create_occurrence_with_prediction()
...
-        occurrence, classification = self._create_occurrence_with_prediction(taxon=self.taxon_a)
+        occurrence, _classification = self._create_occurrence_with_prediction(taxon=self.taxon_a)

Also applies to: 414-414

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/exports/tests.py` at line 376, The tuple unpacking from
_create_occurrence_with_prediction() binds an unused classification variable;
update the call sites (e.g., where you have "occurrence, classification =
self._create_occurrence_with_prediction()") to ignore the second value by using
a throwaway name (e.g., "occurrence, _ =
self._create_occurrence_with_prediction()") or only grab the first element, and
apply the same change to the other occurrence of this pattern in the file so
Ruff RUF059 is not triggered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/main/api/serializers.py`:
- Around line 1396-1425: get_best_machine_prediction currently calls
obj.best_prediction and then accesses prediction.taxon/algorithm which triggers
per-occurrence DB queries; change it to use annotated/prefetched data instead:
ensure the queryset that feeds this serializer prefetches/annotates the
best_prediction relation and its taxon and algorithm (e.g.
prefetch_related('best_prediction__taxon','best_prediction__algorithm') or
annotate best_prediction_* fields), then in get_best_machine_prediction read
those prefetched/annotated attributes (e.g. use getattr(obj, 'best_prediction',
None) only after prefetch, access obj.best_prediction.taxon and
obj.best_prediction.algorithm without causing lazy loads, or construct
taxon_data/algorithm_data from annotated fields) and call TaxonNestedSerializer
/ AlgorithmNestedSerializer on the already-prefetched instances so no extra
queries are executed per occurrence.

---

Duplicate comments:
In `@ami/exports/tests.py`:
- Line 376: The tuple unpacking from _create_occurrence_with_prediction() binds
an unused classification variable; update the call sites (e.g., where you have
"occurrence, classification = self._create_occurrence_with_prediction()") to
ignore the second value by using a throwaway name (e.g., "occurrence, _ =
self._create_occurrence_with_prediction()") or only grab the first element, and
apply the same change to the other occurrence of this pattern in the file so
Ruff RUF059 is not triggered.

In `@ami/main/models.py`:
- Around line 2828-2850: Occurrence.best_prediction is out-of-sync with
with_best_machine_prediction because the method currently uses
Occurrence.predictions() which collapses to per-algorithm max scores; update
Occurrence.best_prediction to pick the winner from all Classification rows (same
logic as with_best_machine_prediction) by querying
Classification.objects.filter(detection__occurrence=self) and ordering by
BEST_MACHINE_PREDICTION_ORDER, then returning the first result (or None) so
API/CSV use the identical selection rules; ensure you reference
BEST_MACHINE_PREDICTION_ORDER and use the same fields (taxon, score, algorithm,
taxon_id) and tiebreaker logic as in with_best_machine_prediction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ef0a4b08-fc54-433f-8400-029a0d2dec1d

📥 Commits

Reviewing files that changed from the base of the PR and between bb46a28 and 6de816a.

📒 Files selected for processing (3)
  • ami/exports/tests.py
  • ami/main/api/serializers.py
  • ami/main/models.py

Comment thread ami/main/api/serializers.py
Avoids per-row lazy loads when callers serialize the result. Affects
both the new OccurrenceListSerializer.best_machine_prediction field
and the existing get_determination_details path in the same serializer.

Per CodeRabbit thread on #1214 — full annotation-driven serialization
remains tracked in #1242.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow changed the title feat: change human scores to None, add prediction and identification fields to CSV export feat: add additional prediction and identification fields to CSV export Apr 17, 2026
…level

Sits at the same level as `determination` / `determination_score` rather
than nested under `best_machine_prediction`. The flag describes the
determination's relationship to the prediction, so it doesn't belong as
an attribute of the prediction object — particularly once `determination`
becomes its own nested object.

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread ami/exports/format_types.py Outdated
Comment thread ami/exports/format_types.py Outdated
Comment thread ami/exports/format_types.py
Comment thread ami/main/models.py
Comment thread ami/main/api/serializers.py Outdated
mihow and others added 2 commits April 16, 2026 23:30
…diction from API; share public-URL builder

- Drop determination_matches_machine_prediction from OccurrenceListSerializer.
  API consumers can compare taxon IDs directly; CSV consumers compare species
  name strings, so the field stays in the tabular export only.
- Add SourceImage.build_public_url(base_url, path) staticmethod and reuse it
  from both SourceImage.public_url() and the export's
  get_best_detection_source_image_url(). The export deliberately works from
  annotated path + public_base_url to avoid loading a SourceImage row per
  occurrence; presigned URLs (private buckets) aren't supported in that path
  for the same reason.

Addresses review feedback on PR #1214.

Co-Authored-By: Claude <noreply@anthropic.com>
… → bool

- Rename `best_detection_source_image_url` (CSV column) and the underlying
  `best_detection_source_image_{path,public_base_url}` queryset annotations to
  `best_detection_capture_*` to match the public "Capture" terminology used
  in the UI. Internal model is still SourceImage; only externally-visible
  names change.
- Change `verification_status` from "Verified"/"Not verified" strings to a
  bool (True/False). The hardcoded strings were awkward, and the field is now
  consistent with the bool-shaped `determination_matches_machine_prediction`.

Addresses review feedback on PR #1214.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow merged commit 5b98920 into main Apr 17, 2026
7 checks passed
@mihow mihow deleted the feat/export-machine-prediction-fields branch April 17, 2026 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(exports): add machine prediction, verification, and detection fields to exports

3 participants