fix(eval): Support non-English languages in response_match_score by AhrendsW · Pull Request #3923 · google/adk-python

AhrendsW · 2025-12-16T00:31:00Z

Summary

Fixes evaluation response_match_score metric for non-English languages (non-Latin scripts like Chinese, Japanese, Korean, Arabic, etc.)
Adds a Unicode-aware tokenizer that handles scripts without whitespace word boundaries
Falls back to character-level tokenization for non-Latin scripts when nltk word tokenizer doesn't split properly

Test plan

All eval metric tests pass
Full unittest suite passes
pyink and isort formatting verified via autoformat.sh
Lint checks pass (pyink --check reports no changes needed)
No merge conflicts with main

gemini-code-assist · 2025-12-16T00:31:16Z

Summary of Changes

Hello @AhrendsW, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug in the ROUGE-1 evaluation system that previously prevented accurate scoring for non-English languages. By implementing a script detection mechanism, the system can now intelligently apply or disable language-specific stemming, ensuring that evaluation scores are reliable and meaningful across a diverse range of global languages. This enhancement significantly improves the utility of the evaluation metric for internationalized content.

Highlights

Internationalization Fix: The ROUGE-1 evaluation metric now correctly handles non-English languages by addressing an issue where the English-specific Porter stemmer caused incorrect 'Match score: 0' results for non-Latin scripts.
Script Detection Logic: A new helper function, _is_latin_script(), has been introduced. It uses Python's unicodedata module to determine if a given text is primarily composed of Latin script characters, allowing for intelligent conditional application of stemming.
Conditional Stemming: The _calculate_rouge_1_scores() function has been updated to only enable stemming when both the candidate and reference texts are detected as primarily Latin script. This preserves correct behavior for English and other Latin-based languages while disabling stemming for non-Latin scripts (e.g., Thai, Chinese, Arabic, Japanese, Korean) to ensure accurate scoring.
Comprehensive Testing: Extensive unit tests have been added, including 15 tests for the _is_latin_script() function covering various script types and edge cases, and 22 tests for non-English ROUGE scoring across multiple languages (Thai, Chinese, Arabic, Japanese, Korean, Portuguese, French, German, Spanish), as well as English stemming verification and mixed-script scenarios.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively addresses the issue of incorrect ROUGE-1 scores for non-English languages by introducing a mechanism to conditionally disable stemming. The new _is_latin_script helper function is a reasonable approach to detect script type, and its application in _calculate_rouge_1_scores correctly disables the Porter stemmer for non-Latin text. The accompanying unit tests are very comprehensive and cover a wide range of languages and scenarios. I have a few suggestions to make the test assertions more precise, which will strengthen the test suite against future regressions. Overall, this is a solid fix.

gemini-code-assist · 2025-12-16T00:32:47Z