feat: Add LLM screen understanding service via OpenRouter [PRL-276]#1
feat: Add LLM screen understanding service via OpenRouter [PRL-276]#1
Conversation
Thin cooldown glyphs (1, 4) don't occlude enough pixels in the existing 30x30 icon-match region, causing isReadyForSpam to return true while a skill is still on cooldown. Add cooldownTextRegion() in BattleScreenLocations targeting the 96x48 area where the cooldown number is rendered. In SkillSpam, replace the direct imageRegion check with isReadyForSpam(), which requires both the icon similarity check (>=0.9) and an OCR confirmation that no cooldown digit is present. A 100ms stability re-check is also required before the cast is allowed. Common Tesseract misreads (O->0, I/l->1) are normalised before digit extraction. Changes are isolated to SkillSpam; command-script skill casting is unaffected. Fixes Fate-Grand-Automata#2073
Per the contribution guide, manual similarity values in Region.exists() should be avoided. Replace explicit similarity=0.9 with the in operator (Region.contains) so the user's Fine-Tune min similarity setting is respected, consistent with the rest of the codebase. Remove the now-unused skillReadySimilarity constant.
- Import Skill and use Skill.Servant instead of fully-qualified name, consistent with Caster.kt and the rest of the codebase - Extract duplicated readiness predicate into local checkReady() to avoid repeating the icon-match + cooldown-guard expression
Technical spike for LLM-powered FGO screen identification. Adds: - LlmService interface with structured screen identification - OpenRouterLlmService implementation using OkHttp + Gson - ScreenType enum covering 20 FGO screen types - Structured prompt template for reliable JSON output - Comprehensive unit tests (32 tests, all passing) - Shell script test harness for live API testing with 3 models - Spike documentation with architecture, cost estimates, risk assessment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clean up unused SerializedName import and remove redundant kotlinx-coroutines-core dependency (already transitively available via libautomata's api() dependency). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an LLM-backed “screen understanding” layer to the scripts module (OpenRouter-based) to identify FGO screen types from screenshots, plus a local spike harness and documentation for live testing. This PR also includes an OCR-based guard to reduce erroneous “skill spam” taps when skills are on cooldown.
Changes:
- Introduces
LlmService+OpenRouterLlmServicewith structured prompt templates and parsing intoScreenIdentificationResult. - Adds unit tests for prompt/content construction and response parsing, and updates
scriptsdependencies (OkHttp + Gson). - Adds a runnable spike harness (
llm-spike/) with results documentation; also updates skill spam logic with a cooldown OCR check.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/src/test/java/io/github/fate_grand_automata/scripts/llm/OpenRouterLlmServiceTest.kt | Adds parsing/request-body unit tests for OpenRouter service (via reflection). |
| scripts/src/test/java/io/github/fate_grand_automata/scripts/llm/LlmServiceTest.kt | Adds unit tests for new LLM models/templates/types. |
| scripts/src/main/java/io/github/fate_grand_automata/scripts/modules/SkillSpam.kt | Adds cooldown OCR guard and readiness recheck before spamming skills. |
| scripts/src/main/java/io/github/fate_grand_automata/scripts/locations/BattleScreenLocations.kt | Adds a region locator for skill cooldown text OCR. |
| scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/ScreenType.kt | Defines enumerated set of known screen types for identification. |
| scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/ScreenPromptTemplate.kt | Adds system/user prompt templates for structured JSON output. |
| scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/ScreenIdentificationResult.kt | Adds result + token-usage data models for LLM output. |
| scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/OpenRouterLlmService.kt | Implements OpenRouter chat-completions request building and response parsing. |
| scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/LlmService.kt | Introduces the LLM service interface. |
| scripts/build.gradle.kts | Adds OkHttp + Gson dependencies to the scripts module. |
| llm-spike/SPIKE-RESULTS.md | Documents spike approach, architecture, and preliminary results/costs. |
| llm-spike/run-spike.sh | Adds a shell-based live testing harness (ADB screenshot + OpenRouter call). |
| gradle/libs.versions.toml | Adds OkHttp version + library entry to the version catalog. |
| .gitignore | Ignores spike artifacts directories and fixes an existing ignore entry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| override suspend fun identifyScreen(screenshotBase64: String, model: String): ScreenIdentificationResult { | ||
| val requestBody = buildRequestBody(screenshotBase64, model) | ||
| val request = Request.Builder() | ||
| .url(OPENROUTER_API_URL) | ||
| .addHeader("Authorization", "Bearer $apiKey") | ||
| .addHeader("Content-Type", "application/json") | ||
| .addHeader("HTTP-Referer", "https://github.com/Fate-Grand-Automata/FGA") | ||
| .addHeader("X-Title", "FGA LLM Spike") | ||
| .post(requestBody.toRequestBody(jsonMediaType)) | ||
| .build() | ||
|
|
||
| val timeMark = TimeSource.Monotonic.markNow() | ||
|
|
||
| val response = client.newCall(request).execute() | ||
| val latencyMs = timeMark.elapsedNow().inWholeMilliseconds | ||
|
|
There was a problem hiding this comment.
identifyScreen is a suspend function but it performs a blocking OkHttp execute() call directly. If this is ever invoked from a main/UI coroutine context (or a limited dispatcher), it can block threads and cause responsiveness issues. Consider wrapping the network call in withContext(Dispatchers.IO) (or making the API non-suspending and documenting it as blocking) so callers get predictable coroutine behavior.
There was a problem hiding this comment.
Fixed in 1e0bb58. Wrapped blocking OkHttp execute() in withContext(Dispatchers.IO).
| val response = client.newCall(request).execute() | ||
| val latencyMs = timeMark.elapsedNow().inWholeMilliseconds | ||
|
|
||
| val responseBody = response.body?.string() | ||
| ?: throw LlmServiceException("Empty response body from OpenRouter") | ||
|
|
||
| if (!response.isSuccessful) { | ||
| throw LlmServiceException( | ||
| "OpenRouter API error ${response.code}: $responseBody" | ||
| ) | ||
| } | ||
|
|
||
| return parseResponse(responseBody, model, latencyMs) |
There was a problem hiding this comment.
The OkHttp Response isn’t closed if an exception occurs before/while reading the body (e.g., body == null). To avoid leaking connections, wrap the call in response.use { ... } (or at least ensure response.close() in a finally).
| val response = client.newCall(request).execute() | |
| val latencyMs = timeMark.elapsedNow().inWholeMilliseconds | |
| val responseBody = response.body?.string() | |
| ?: throw LlmServiceException("Empty response body from OpenRouter") | |
| if (!response.isSuccessful) { | |
| throw LlmServiceException( | |
| "OpenRouter API error ${response.code}: $responseBody" | |
| ) | |
| } | |
| return parseResponse(responseBody, model, latencyMs) | |
| return client.newCall(request).execute().use { response -> | |
| val latencyMs = timeMark.elapsedNow().inWholeMilliseconds | |
| val responseBody = response.body?.string() | |
| ?: throw LlmServiceException("Empty response body from OpenRouter") | |
| if (!response.isSuccessful) { | |
| throw LlmServiceException( | |
| "OpenRouter API error ${response.code}: $responseBody" | |
| ) | |
| } | |
| parseResponse(responseBody, model, latencyMs) | |
| } |
There was a problem hiding this comment.
Fixed in 1e0bb58. Added Response.use{} block for proper resource cleanup.
| // Strip markdown code fences if present | ||
| val cleanedContent = messageContent | ||
| .replace(Regex("^```json\\s*", RegexOption.MULTILINE), "") | ||
| .replace(Regex("^```\\s*$", RegexOption.MULTILINE), "") |
There was a problem hiding this comment.
Markdown fence stripping only removes an opening fence when it is exactly json at the start of a line. If the model returns a plain fence (no language tag) or adds leading whitespace, parsing will fail. Consider handling optional language tags/whitespace and removing both opening and closing fences more generally.
| // Strip markdown code fences if present | |
| val cleanedContent = messageContent | |
| .replace(Regex("^```json\\s*", RegexOption.MULTILINE), "") | |
| .replace(Regex("^```\\s*$", RegexOption.MULTILINE), "") | |
| // Strip markdown code fences (opening/closing, any language tag, optional indentation) if present | |
| val fenceRegex = Regex("^\\s*```.*$", RegexOption.MULTILINE) | |
| val cleanedContent = messageContent | |
| .replace(fenceRegex, "") |
There was a problem hiding this comment.
Fixed in 1e0bb58. Regex now handles leading whitespace, different language tags, and bare fences.
| JsonParser.parseString(cleanedContent).asJsonObject | ||
| } catch (e: Exception) { | ||
| throw LlmServiceException( | ||
| "Failed to parse LLM JSON response: ${e.message}\nRaw: $messageContent" |
There was a problem hiding this comment.
When JSON parsing fails, the thrown LlmServiceException drops the original exception as the cause, which makes debugging harder (stack trace is lost). Pass the caught exception as the cause when throwing LlmServiceException.
| "Failed to parse LLM JSON response: ${e.message}\nRaw: $messageContent" | |
| "Failed to parse LLM JSON response: ${e.message}\nRaw: $messageContent", | |
| e |
There was a problem hiding this comment.
Fixed in 1e0bb58. Original exception now passed as cause in LlmServiceException.
llm-spike/run-spike.sh
Outdated
|
|
||
| # Base64 encode the screenshot | ||
| local base64_data | ||
| base64_data=$(base64 -i "$screenshot_path" | tr -d '\n') |
There was a problem hiding this comment.
The script uses base64 -i, which is macOS-specific; GNU base64 typically doesn’t support -i. This makes the spike script non-portable on Linux CI/dev environments. Prefer base64 < file / base64 -w 0 file with detection for the platform.
| base64_data=$(base64 -i "$screenshot_path" | tr -d '\n') | |
| base64_data=$(base64 < "$screenshot_path" | tr -d '\n') |
There was a problem hiding this comment.
Fixed in 1e0bb58. Replaced macOS-specific base64 -i with portable base64 < file.
llm-spike/run-spike.sh
Outdated
| request_json=$(python3 -c " | ||
| import json | ||
| data = { | ||
| 'model': '$model', | ||
| 'messages': [ | ||
| { | ||
| 'role': 'system', | ||
| 'content': $system_prompt | ||
| }, | ||
| { | ||
| 'role': 'user', | ||
| 'content': [ | ||
| { | ||
| 'type': 'text', | ||
| 'text': 'Analyze this Fate/Grand Order screenshot and identify the current game screen. Respond with ONLY the JSON object as specified.' | ||
| }, | ||
| { | ||
| 'type': 'image_url', | ||
| 'image_url': { | ||
| 'url': 'data:image/png;base64,$base64_data' | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| ], | ||
| 'max_tokens': 1024, | ||
| 'temperature': 0.1 | ||
| } | ||
| print(json.dumps(data)) | ||
| ") |
There was a problem hiding this comment.
request_json is constructed by embedding the entire base64-encoded PNG into a python3 -c command string. For typical screenshots this can exceed shell/OS command-length limits and fail unpredictably. Consider generating the JSON via Python reading the image from disk (or piping base64 through stdin) instead of interpolating the full payload into the command line.
| request_json=$(python3 -c " | |
| import json | |
| data = { | |
| 'model': '$model', | |
| 'messages': [ | |
| { | |
| 'role': 'system', | |
| 'content': $system_prompt | |
| }, | |
| { | |
| 'role': 'user', | |
| 'content': [ | |
| { | |
| 'type': 'text', | |
| 'text': 'Analyze this Fate/Grand Order screenshot and identify the current game screen. Respond with ONLY the JSON object as specified.' | |
| }, | |
| { | |
| 'type': 'image_url', | |
| 'image_url': { | |
| 'url': 'data:image/png;base64,$base64_data' | |
| } | |
| } | |
| ] | |
| } | |
| ], | |
| 'max_tokens': 1024, | |
| 'temperature': 0.1 | |
| } | |
| print(json.dumps(data)) | |
| ") | |
| request_json=$(MODEL="$model" SYSTEM_PROMPT="$system_prompt" python3 - << 'PY' <<< "$base64_data" | |
| import json | |
| import os | |
| import sys | |
| model = os.environ["MODEL"] | |
| # SYSTEM_PROMPT contains JSON for the system message content | |
| system_prompt = json.loads(os.environ["SYSTEM_PROMPT"]) | |
| base64_data = sys.stdin.read().strip() | |
| data = { | |
| "model": model, | |
| "messages": [ | |
| { | |
| "role": "system", | |
| "content": system_prompt, | |
| }, | |
| { | |
| "role": "user", | |
| "content": [ | |
| { | |
| "type": "text", | |
| "text": "Analyze this Fate/Grand Order screenshot and identify the current game screen. Respond with ONLY the JSON object as specified.", | |
| }, | |
| { | |
| "type": "image_url", | |
| "image_url": { | |
| "url": f"data:image/png;base64,{base64_data}", | |
| }, | |
| }, | |
| ], | |
| }, | |
| ], | |
| "max_tokens": 1024, | |
| "temperature": 0.1, | |
| } | |
| print(json.dumps(data)) | |
| PY | |
| ) |
There was a problem hiding this comment.
Fixed in 1e0bb58. Base64 data now written to temp file instead of passed as CLI arg.
| import assertk.assertThat | ||
| import assertk.assertions.contains | ||
| import assertk.assertions.isEqualTo | ||
| import assertk.assertions.isGreaterThan | ||
| import assertk.assertions.isInstanceOf | ||
| import assertk.assertions.isNotEmpty | ||
| import assertk.assertions.isNotNull | ||
| import com.google.gson.Gson | ||
| import com.google.gson.JsonParser | ||
| import java.lang.reflect.InvocationTargetException | ||
| import kotlin.test.Test | ||
| import kotlin.test.assertFailsWith | ||
|
|
There was a problem hiding this comment.
Unused test members/imports add noise (e.g., private val gson = Gson() and several assertion imports are never referenced). Removing unused imports/variables will keep the tests easier to maintain and avoids future lint failures if a linter is introduced.
There was a problem hiding this comment.
Fixed in 1e0bb58. Removed unused Gson field and unused imports.
| import assertk.assertions.isGreaterThan | ||
| import assertk.assertions.isNotEmpty | ||
| import assertk.assertions.isNotNull | ||
| import io.mockk.every | ||
| import io.mockk.mockk | ||
| import io.mockk.verify |
There was a problem hiding this comment.
There are unused MockK imports (every, mockk, verify) and some unused assertion imports in this test file. Removing them reduces noise and avoids potential lint/CI failures if stricter checks are enabled later.
| import assertk.assertions.isGreaterThan | |
| import assertk.assertions.isNotEmpty | |
| import assertk.assertions.isNotNull | |
| import io.mockk.every | |
| import io.mockk.mockk | |
| import io.mockk.verify |
There was a problem hiding this comment.
Fixed in 1e0bb58. Removed unused MockK and assertion imports.
- Wrap blocking OkHttp call in withContext(Dispatchers.IO)
- Use Response.use{} for proper resource cleanup
- Make markdown fence stripping regex more robust
- Pass original exception as cause in LlmServiceException
- Fix macOS-specific base64 -i flag for cross-platform compat
- Replace inline base64 arg with temp file to avoid ARG_MAX
- Remove unused test imports and declarations
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
LlmServiceinterface andOpenRouterLlmServiceimplementation for LLM-powered screen understandingLinear Task
PRL-275 — FGA LLM Agent: Autonomous FGO Game Navigation via OpenRouter
PRL-276 — Technical Spike: LLM Screen Understanding via OpenRouter
Subtasks
Changes
scripts/.../llm/): NewLlmServiceinterface,OpenRouterLlmService(OkHttp-based),ScreenPromptTemplate,ScreenTypeenum (20 screen types),ScreenIdentificationResultscriptsmodule via version catalogllm-spike/): Shell script for ADB screenshot capture + OpenRouter API testing across 3 modelsTest Coverage
Architecture
Note
This branch includes commits from
fix/skill-spam-cooldown-ocr-guard(OCR cooldown guard) as the worktree was based on that branch. The LLM-specific changes are in the last 2 commits.Automated by auto-dev pipeline