Skip to content

feat: Add LLM screen understanding service via OpenRouter [PRL-276]#1

Open
DCPMA wants to merge 6 commits intomasterfrom
auto/PRL-275
Open

feat: Add LLM screen understanding service via OpenRouter [PRL-276]#1
DCPMA wants to merge 6 commits intomasterfrom
auto/PRL-275

Conversation

@DCPMA
Copy link
Copy Markdown
Owner

@DCPMA DCPMA commented Mar 20, 2026

Summary

  • Added LlmService interface and OpenRouterLlmService implementation for LLM-powered screen understanding
  • Created structured prompt template covering 20+ FGO screen types with disambiguation rules
  • Built test harness script for live API testing with multiple models via OpenRouter
  • Documented spike results with architecture diagram, cost estimates, and risk assessment

Linear Task

PRL-275 — FGA LLM Agent: Autonomous FGO Game Navigation via OpenRouter
PRL-276 — Technical Spike: LLM Screen Understanding via OpenRouter

Subtasks

  • PRL-276: Technical Spike — LLM Screen Understanding via OpenRouter
  • PRL-277: Navigation Engine — Screen-to-Screen Movement
  • PRL-278: Event Farming Loop MVP
  • PRL-279: Polish & Upstream PR to FGA

Changes

  • LLM Service Layer (scripts/.../llm/): New LlmService interface, OpenRouterLlmService (OkHttp-based), ScreenPromptTemplate, ScreenType enum (20 screen types), ScreenIdentificationResult
  • Dependencies: Added OkHttp 4.12.0 and Gson to scripts module via version catalog
  • Test Harness (llm-spike/): Shell script for ADB screenshot capture + OpenRouter API testing across 3 models
  • Tests: 32 unit tests passing — covers service interface, response parsing, prompt template, error handling

Test Coverage

Check Result
Lint PASS
Build PASS (all 4 modules)
Tests PASS (32/32)
Agent Verification ADB connected, screenshot pipeline validated

Architecture

FGA Existing Pipeline (fast, free):
  Screenshot → OpenCV Template Match → Known Screen? → Execute Rule
                                              │
                                         [No match]
                                              │
                                    LLM Fallback (new):
                                              │
                              OpenRouterLlmService
                              → ScreenPromptTemplate
                              → ScreenIdentificationResult
                              → Action decision

Note

This branch includes commits from fix/skill-spam-cooldown-ocr-guard (OCR cooldown guard) as the worktree was based on that branch. The LLM-specific changes are in the last 2 commits.


Automated by auto-dev pipeline

DCPMA and others added 5 commits March 16, 2026 15:10
Thin cooldown glyphs (1, 4) don't occlude enough pixels in the existing
30x30 icon-match region, causing isReadyForSpam to return true while a
skill is still on cooldown.

Add cooldownTextRegion() in BattleScreenLocations targeting the 96x48
area where the cooldown number is rendered. In SkillSpam, replace the
direct imageRegion check with isReadyForSpam(), which requires both the
icon similarity check (>=0.9) and an OCR confirmation that no cooldown
digit is present. A 100ms stability re-check is also required before
the cast is allowed.

Common Tesseract misreads (O->0, I/l->1) are normalised before digit
extraction. Changes are isolated to SkillSpam; command-script skill
casting is unaffected.

Fixes Fate-Grand-Automata#2073
Per the contribution guide, manual similarity values in Region.exists()
should be avoided. Replace explicit similarity=0.9 with the in operator
(Region.contains) so the user's Fine-Tune min similarity setting is
respected, consistent with the rest of the codebase.

Remove the now-unused skillReadySimilarity constant.
- Import Skill and use Skill.Servant instead of fully-qualified name,
  consistent with Caster.kt and the rest of the codebase
- Extract duplicated readiness predicate into local checkReady() to
  avoid repeating the icon-match + cooldown-guard expression
Technical spike for LLM-powered FGO screen identification. Adds:
- LlmService interface with structured screen identification
- OpenRouterLlmService implementation using OkHttp + Gson
- ScreenType enum covering 20 FGO screen types
- Structured prompt template for reliable JSON output
- Comprehensive unit tests (32 tests, all passing)
- Shell script test harness for live API testing with 3 models
- Spike documentation with architecture, cost estimates, risk assessment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clean up unused SerializedName import and remove redundant
kotlinx-coroutines-core dependency (already transitively available
via libautomata's api() dependency).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 20, 2026 23:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an LLM-backed “screen understanding” layer to the scripts module (OpenRouter-based) to identify FGO screen types from screenshots, plus a local spike harness and documentation for live testing. This PR also includes an OCR-based guard to reduce erroneous “skill spam” taps when skills are on cooldown.

Changes:

  • Introduces LlmService + OpenRouterLlmService with structured prompt templates and parsing into ScreenIdentificationResult.
  • Adds unit tests for prompt/content construction and response parsing, and updates scripts dependencies (OkHttp + Gson).
  • Adds a runnable spike harness (llm-spike/) with results documentation; also updates skill spam logic with a cooldown OCR check.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
scripts/src/test/java/io/github/fate_grand_automata/scripts/llm/OpenRouterLlmServiceTest.kt Adds parsing/request-body unit tests for OpenRouter service (via reflection).
scripts/src/test/java/io/github/fate_grand_automata/scripts/llm/LlmServiceTest.kt Adds unit tests for new LLM models/templates/types.
scripts/src/main/java/io/github/fate_grand_automata/scripts/modules/SkillSpam.kt Adds cooldown OCR guard and readiness recheck before spamming skills.
scripts/src/main/java/io/github/fate_grand_automata/scripts/locations/BattleScreenLocations.kt Adds a region locator for skill cooldown text OCR.
scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/ScreenType.kt Defines enumerated set of known screen types for identification.
scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/ScreenPromptTemplate.kt Adds system/user prompt templates for structured JSON output.
scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/ScreenIdentificationResult.kt Adds result + token-usage data models for LLM output.
scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/OpenRouterLlmService.kt Implements OpenRouter chat-completions request building and response parsing.
scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/LlmService.kt Introduces the LLM service interface.
scripts/build.gradle.kts Adds OkHttp + Gson dependencies to the scripts module.
llm-spike/SPIKE-RESULTS.md Documents spike approach, architecture, and preliminary results/costs.
llm-spike/run-spike.sh Adds a shell-based live testing harness (ADB screenshot + OpenRouter call).
gradle/libs.versions.toml Adds OkHttp version + library entry to the version catalog.
.gitignore Ignores spike artifacts directories and fixes an existing ignore entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +51 to +66
override suspend fun identifyScreen(screenshotBase64: String, model: String): ScreenIdentificationResult {
val requestBody = buildRequestBody(screenshotBase64, model)
val request = Request.Builder()
.url(OPENROUTER_API_URL)
.addHeader("Authorization", "Bearer $apiKey")
.addHeader("Content-Type", "application/json")
.addHeader("HTTP-Referer", "https://github.com/Fate-Grand-Automata/FGA")
.addHeader("X-Title", "FGA LLM Spike")
.post(requestBody.toRequestBody(jsonMediaType))
.build()

val timeMark = TimeSource.Monotonic.markNow()

val response = client.newCall(request).execute()
val latencyMs = timeMark.elapsedNow().inWholeMilliseconds

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identifyScreen is a suspend function but it performs a blocking OkHttp execute() call directly. If this is ever invoked from a main/UI coroutine context (or a limited dispatcher), it can block threads and cause responsiveness issues. Consider wrapping the network call in withContext(Dispatchers.IO) (or making the API non-suspending and documenting it as blocking) so callers get predictable coroutine behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e0bb58. Wrapped blocking OkHttp execute() in withContext(Dispatchers.IO).

Comment on lines +64 to +76
val response = client.newCall(request).execute()
val latencyMs = timeMark.elapsedNow().inWholeMilliseconds

val responseBody = response.body?.string()
?: throw LlmServiceException("Empty response body from OpenRouter")

if (!response.isSuccessful) {
throw LlmServiceException(
"OpenRouter API error ${response.code}: $responseBody"
)
}

return parseResponse(responseBody, model, latencyMs)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OkHttp Response isn’t closed if an exception occurs before/while reading the body (e.g., body == null). To avoid leaking connections, wrap the call in response.use { ... } (or at least ensure response.close() in a finally).

Suggested change
val response = client.newCall(request).execute()
val latencyMs = timeMark.elapsedNow().inWholeMilliseconds
val responseBody = response.body?.string()
?: throw LlmServiceException("Empty response body from OpenRouter")
if (!response.isSuccessful) {
throw LlmServiceException(
"OpenRouter API error ${response.code}: $responseBody"
)
}
return parseResponse(responseBody, model, latencyMs)
return client.newCall(request).execute().use { response ->
val latencyMs = timeMark.elapsedNow().inWholeMilliseconds
val responseBody = response.body?.string()
?: throw LlmServiceException("Empty response body from OpenRouter")
if (!response.isSuccessful) {
throw LlmServiceException(
"OpenRouter API error ${response.code}: $responseBody"
)
}
parseResponse(responseBody, model, latencyMs)
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e0bb58. Added Response.use{} block for proper resource cleanup.

Comment on lines +142 to +145
// Strip markdown code fences if present
val cleanedContent = messageContent
.replace(Regex("^```json\\s*", RegexOption.MULTILINE), "")
.replace(Regex("^```\\s*$", RegexOption.MULTILINE), "")
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown fence stripping only removes an opening fence when it is exactly json at the start of a line. If the model returns a plain fence (no language tag) or adds leading whitespace, parsing will fail. Consider handling optional language tags/whitespace and removing both opening and closing fences more generally.

Suggested change
// Strip markdown code fences if present
val cleanedContent = messageContent
.replace(Regex("^```json\\s*", RegexOption.MULTILINE), "")
.replace(Regex("^```\\s*$", RegexOption.MULTILINE), "")
// Strip markdown code fences (opening/closing, any language tag, optional indentation) if present
val fenceRegex = Regex("^\\s*```.*$", RegexOption.MULTILINE)
val cleanedContent = messageContent
.replace(fenceRegex, "")

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e0bb58. Regex now handles leading whitespace, different language tags, and bare fences.

JsonParser.parseString(cleanedContent).asJsonObject
} catch (e: Exception) {
throw LlmServiceException(
"Failed to parse LLM JSON response: ${e.message}\nRaw: $messageContent"
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When JSON parsing fails, the thrown LlmServiceException drops the original exception as the cause, which makes debugging harder (stack trace is lost). Pass the caught exception as the cause when throwing LlmServiceException.

Suggested change
"Failed to parse LLM JSON response: ${e.message}\nRaw: $messageContent"
"Failed to parse LLM JSON response: ${e.message}\nRaw: $messageContent",
e

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e0bb58. Original exception now passed as cause in LlmServiceException.


# Base64 encode the screenshot
local base64_data
base64_data=$(base64 -i "$screenshot_path" | tr -d '\n')
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script uses base64 -i, which is macOS-specific; GNU base64 typically doesn’t support -i. This makes the spike script non-portable on Linux CI/dev environments. Prefer base64 < file / base64 -w 0 file with detection for the platform.

Suggested change
base64_data=$(base64 -i "$screenshot_path" | tr -d '\n')
base64_data=$(base64 < "$screenshot_path" | tr -d '\n')

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e0bb58. Replaced macOS-specific base64 -i with portable base64 < file.

Comment on lines +106 to +135
request_json=$(python3 -c "
import json
data = {
'model': '$model',
'messages': [
{
'role': 'system',
'content': $system_prompt
},
{
'role': 'user',
'content': [
{
'type': 'text',
'text': 'Analyze this Fate/Grand Order screenshot and identify the current game screen. Respond with ONLY the JSON object as specified.'
},
{
'type': 'image_url',
'image_url': {
'url': 'data:image/png;base64,$base64_data'
}
}
]
}
],
'max_tokens': 1024,
'temperature': 0.1
}
print(json.dumps(data))
")
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request_json is constructed by embedding the entire base64-encoded PNG into a python3 -c command string. For typical screenshots this can exceed shell/OS command-length limits and fail unpredictably. Consider generating the JSON via Python reading the image from disk (or piping base64 through stdin) instead of interpolating the full payload into the command line.

Suggested change
request_json=$(python3 -c "
import json
data = {
'model': '$model',
'messages': [
{
'role': 'system',
'content': $system_prompt
},
{
'role': 'user',
'content': [
{
'type': 'text',
'text': 'Analyze this Fate/Grand Order screenshot and identify the current game screen. Respond with ONLY the JSON object as specified.'
},
{
'type': 'image_url',
'image_url': {
'url': 'data:image/png;base64,$base64_data'
}
}
]
}
],
'max_tokens': 1024,
'temperature': 0.1
}
print(json.dumps(data))
")
request_json=$(MODEL="$model" SYSTEM_PROMPT="$system_prompt" python3 - << 'PY' <<< "$base64_data"
import json
import os
import sys
model = os.environ["MODEL"]
# SYSTEM_PROMPT contains JSON for the system message content
system_prompt = json.loads(os.environ["SYSTEM_PROMPT"])
base64_data = sys.stdin.read().strip()
data = {
"model": model,
"messages": [
{
"role": "system",
"content": system_prompt,
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Analyze this Fate/Grand Order screenshot and identify the current game screen. Respond with ONLY the JSON object as specified.",
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64_data}",
},
},
],
},
],
"max_tokens": 1024,
"temperature": 0.1,
}
print(json.dumps(data))
PY
)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e0bb58. Base64 data now written to temp file instead of passed as CLI arg.

Comment on lines +3 to +15
import assertk.assertThat
import assertk.assertions.contains
import assertk.assertions.isEqualTo
import assertk.assertions.isGreaterThan
import assertk.assertions.isInstanceOf
import assertk.assertions.isNotEmpty
import assertk.assertions.isNotNull
import com.google.gson.Gson
import com.google.gson.JsonParser
import java.lang.reflect.InvocationTargetException
import kotlin.test.Test
import kotlin.test.assertFailsWith

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused test members/imports add noise (e.g., private val gson = Gson() and several assertion imports are never referenced). Removing unused imports/variables will keep the tests easier to maintain and avoids future lint failures if a linter is introduced.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e0bb58. Removed unused Gson field and unused imports.

Comment on lines +5 to +10
import assertk.assertions.isGreaterThan
import assertk.assertions.isNotEmpty
import assertk.assertions.isNotNull
import io.mockk.every
import io.mockk.mockk
import io.mockk.verify
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are unused MockK imports (every, mockk, verify) and some unused assertion imports in this test file. Removing them reduces noise and avoids potential lint/CI failures if stricter checks are enabled later.

Suggested change
import assertk.assertions.isGreaterThan
import assertk.assertions.isNotEmpty
import assertk.assertions.isNotNull
import io.mockk.every
import io.mockk.mockk
import io.mockk.verify

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e0bb58. Removed unused MockK and assertion imports.

- Wrap blocking OkHttp call in withContext(Dispatchers.IO)
- Use Response.use{} for proper resource cleanup
- Make markdown fence stripping regex more robust
- Pass original exception as cause in LlmServiceException
- Fix macOS-specific base64 -i flag for cross-platform compat
- Replace inline base64 arg with temp file to avoid ARG_MAX
- Remove unused test imports and declarations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants