Skip to content

Make best_machine_prediction_score available for sort/filter on OccurrenceViewSet #1242

@mihow

Description

@mihow

Background

PR #1214 introduced OccurrenceQuerySet.with_best_machine_prediction() (used today only by the CSV exporter) which annotates each occurrence with its top ML prediction's score, taxon, and algorithm.

That annotation is not currently applied to the API list queryset, so:

  1. Clients cannot sort or filter occurrences by ML confidence — only by determination_score, which collapses both human and machine signals.
  2. The OccurrenceListSerializer.best_machine_prediction field added in feat: add additional prediction and identification fields to CSV export #1214 is N+1 on the list endpoint (see "Current N+1 shape" below).

Proposal

In ami/main/api/views.py::OccurrenceViewSet:

  1. Apply .with_best_machine_prediction() on the list action queryset (no need on detail — best_prediction cached_property already serves the detail serializer).
  2. Add best_machine_prediction_score to ordering_fields so ?ordering=-best_machine_prediction_score works in the API.
  3. Refactor OccurrenceListSerializer.get_best_machine_prediction to read from the annotated fields (best_machine_prediction_taxon_id, _algorithm name, _score, _name) instead of obj.best_prediction. Bulk-fetch the Taxon and Algorithm objects once per page (or expose them via prefetch) and stitch them into the serializer output, so the response is built without per-row Classification queries.
  4. Optionally expose a min_machine_prediction_score=… filter alongside the existing classification_threshold parameter, scoped to the new annotation.

Current N+1 shape (the "before" target)

For each occurrence on the list endpoint, the serializer currently runs:

Path Per-row query Note
get_determination_detailsobj.best_prediction 1 × Classification.objects.filter(detection__occurrence=…).order_by(...).first() Pre-existing — same query in main
get_best_machine_predictionobj.best_prediction 0 (cache hit from above) Cached_property reuses the result
get_best_machine_predictionprediction.taxon / .algorithm 0 Mitigated in 46b7d36 by select_related("taxon", "algorithm") in Occurrence.predictions()

Net cost added by the best_machine_prediction field on the list serializer: 0 extra DB queries per row (cache hit + select_related). The pre-existing 1 × Classification query per occurrence remains and is what this issue should eliminate.

After this issue lands the per-row Classification query should be gone — the entire best_machine_prediction payload should be served from the queryset annotation + a single bulk taxon/algorithm fetch per page.

Acceptance criteria

  • GET /api/v2/occurrences/?project_id=…&ordering=-best_machine_prediction_score returns rows in descending ML-confidence order
  • List endpoint executes 0 per-row Classification queries for the best_machine_prediction payload (django-debug-toolbar / assertNumQueries to verify)
  • Pagination COUNT query is not regressed — strip annotations via .values('pk') if needed (see CLAUDE.md performance notes)
  • Existing ?ordering=-determination_score continues to work unchanged
  • OpenAPI schema regenerated

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions