You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #1214 introduced OccurrenceQuerySet.with_best_machine_prediction() (used today only by the CSV exporter) which annotates each occurrence with its top ML prediction's score, taxon, and algorithm.
That annotation is not currently applied to the API list queryset, so:
Clients cannot sort or filter occurrences by ML confidence — only by determination_score, which collapses both human and machine signals.
Apply .with_best_machine_prediction() on the list action queryset (no need on detail — best_prediction cached_property already serves the detail serializer).
Add best_machine_prediction_score to ordering_fields so ?ordering=-best_machine_prediction_score works in the API.
Refactor OccurrenceListSerializer.get_best_machine_prediction to read from the annotated fields (best_machine_prediction_taxon_id, _algorithm name, _score, _name) instead of obj.best_prediction. Bulk-fetch the Taxon and Algorithm objects once per page (or expose them via prefetch) and stitch them into the serializer output, so the response is built without per-row Classification queries.
Optionally expose a min_machine_prediction_score=… filter alongside the existing classification_threshold parameter, scoped to the new annotation.
Current N+1 shape (the "before" target)
For each occurrence on the list endpoint, the serializer currently runs:
Mitigated in 46b7d36 by select_related("taxon", "algorithm") in Occurrence.predictions()
Net cost added by the best_machine_prediction field on the list serializer: 0 extra DB queries per row (cache hit + select_related). The pre-existing 1 × Classification query per occurrence remains and is what this issue should eliminate.
After this issue lands the per-row Classification query should be gone — the entire best_machine_prediction payload should be served from the queryset annotation + a single bulk taxon/algorithm fetch per page.
Acceptance criteria
GET /api/v2/occurrences/?project_id=…&ordering=-best_machine_prediction_score returns rows in descending ML-confidence order
List endpoint executes 0 per-row Classification queries for the best_machine_prediction payload (django-debug-toolbar / assertNumQueries to verify)
Pagination COUNT query is not regressed — strip annotations via .values('pk') if needed (see CLAUDE.md performance notes)
Existing ?ordering=-determination_score continues to work unchanged
Background
PR #1214 introduced
OccurrenceQuerySet.with_best_machine_prediction()(used today only by the CSV exporter) which annotates each occurrence with its top ML prediction's score, taxon, and algorithm.That annotation is not currently applied to the API list queryset, so:
determination_score, which collapses both human and machine signals.OccurrenceListSerializer.best_machine_predictionfield added in feat: add additional prediction and identification fields to CSV export #1214 is N+1 on the list endpoint (see "Current N+1 shape" below).Proposal
In
ami/main/api/views.py::OccurrenceViewSet:.with_best_machine_prediction()on thelistaction queryset (no need on detail —best_predictioncached_property already serves the detail serializer).best_machine_prediction_scoretoordering_fieldsso?ordering=-best_machine_prediction_scoreworks in the API.OccurrenceListSerializer.get_best_machine_predictionto read from the annotated fields (best_machine_prediction_taxon_id,_algorithmname,_score,_name) instead ofobj.best_prediction. Bulk-fetch the Taxon and Algorithm objects once per page (or expose them via prefetch) and stitch them into the serializer output, so the response is built without per-row Classification queries.min_machine_prediction_score=…filter alongside the existingclassification_thresholdparameter, scoped to the new annotation.Current N+1 shape (the "before" target)
For each occurrence on the list endpoint, the serializer currently runs:
get_determination_details→obj.best_predictionClassification.objects.filter(detection__occurrence=…).order_by(...).first()mainget_best_machine_prediction→obj.best_predictionget_best_machine_prediction→prediction.taxon/.algorithmselect_related("taxon", "algorithm")inOccurrence.predictions()Net cost added by the
best_machine_predictionfield on the list serializer: 0 extra DB queries per row (cache hit + select_related). The pre-existing 1 × Classification query per occurrence remains and is what this issue should eliminate.After this issue lands the per-row Classification query should be gone — the entire
best_machine_predictionpayload should be served from the queryset annotation + a single bulk taxon/algorithm fetch per page.Acceptance criteria
GET /api/v2/occurrences/?project_id=…&ordering=-best_machine_prediction_scorereturns rows in descending ML-confidence orderbest_machine_predictionpayload (django-debug-toolbar /assertNumQueriesto verify).values('pk')if needed (seeCLAUDE.mdperformance notes)?ordering=-determination_scorecontinues to work unchangedRelated