Skip to content

Fix N+1 queries in inheritance API#7936#8020

Draft
acwhite211 wants to merge 2 commits intomainfrom
issue-7875
Draft

Fix N+1 queries in inheritance API#7936#8020
acwhite211 wants to merge 2 commits intomainfrom
issue-7875

Conversation

@acwhite211
Copy link
Copy Markdown
Member

Fixes #7875
Contributed by @foozleface
Based on the #7936 PR with additional fixes

The inheritance post-query processing in api.py ran per-row database queries to look up catalog numbers for parent and COG inheritance. On a query returning 500 rows, this produced 500+ individual SELECT statements. This rewrite collects IDs needing lookup, then bulk-prefetches all catalog numbers in 1-2 queries total. Also increases CSV yield_per from 1 to 2000, which was causing row-at-a-time fetching from the database.

Implementation

  • Rewrite parent_inheritance_post_query_processing to collect IDs needing catalog number lookup, then bulk-fetch via a single query using select_related and values_list
  • Rewrite cog_inheritance_post_query_processing with two bulk-prefetch steps: (1) get parentcog_id for each child CO, (2) get primary member catalog number for each COG
  • Change yield_per(1) to yield_per(2000) in CSV and KML export paths in execution.py
  • Add N+1 regression tests that verify bulk queries are used

Testing instructions

  • Run a query that includes inherited catalog numbers (e.g., query on Components with catalogNumber column) and verify results are correct
  • Export the same query to CSV and verify the file is complete and correct
  • Run the test suite: python manage.py test specifyweb.backend.inheritance.tests.test_n_plus_one
  • Optionally, enable Django query logging and confirm the inheritance processing uses 1-3 queries instead of N+1

@acwhite211
Copy link
Copy Markdown
Member Author

Still working on getting this PR working...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 📋Back Log

Development

Successfully merging this pull request may close these issues.

Use batched iterators to evaluate large SQLAlchemy queries

2 participants