Summary
The QRF currently imputes formula-level aggregates (e.g. taxable_pension_income) and then renames them to leaf inputs (e.g. taxable_private_pension_income) before storing. This loses information — all taxable pension is attributed to private pensions, all interest deductions to mortgage, etc.
Current workaround (PR #594)
_rename_imputed_to_inputs maps:
taxable_pension_income → taxable_private_pension_income (loses public pension split)
tax_exempt_pension_income → tax_exempt_private_pension_income (same)
interest_deduction → deductible_mortgage_interest (loses non-mortgage interest)
self_employed_pension_contribution_ald → _person (entity mapping only)
self_employed_health_insurance_ald → _person (entity mapping only)
Proper fix
Train the QRF on leaf input variables from the PUF rather than formula aggregates. This would:
- Preserve the public/private pension split
- Preserve mortgage vs non-mortgage interest
- Eliminate the need for post-hoc renaming
- Give more accurate distributions for each sub-component
Variables to split
Requires checking which sub-components are available in the PUF training data.