feat: Add LLM screen understanding service via OpenRouter [PRL-276] by DCPMA · Pull Request #1 · DCPMA/FGA

DCPMA · 2026-03-20T23:04:54Z

Summary

Added LlmService interface and OpenRouterLlmService implementation for LLM-powered screen understanding
Created structured prompt template covering 20+ FGO screen types with disambiguation rules
Built test harness script for live API testing with multiple models via OpenRouter
Documented spike results with architecture diagram, cost estimates, and risk assessment

Linear Task

PRL-275 — FGA LLM Agent: Autonomous FGO Game Navigation via OpenRouter
PRL-276 — Technical Spike: LLM Screen Understanding via OpenRouter

Subtasks

PRL-276: Technical Spike — LLM Screen Understanding via OpenRouter
PRL-277: Navigation Engine — Screen-to-Screen Movement
PRL-278: Event Farming Loop MVP
PRL-279: Polish & Upstream PR to FGA

Changes

LLM Service Layer (scripts/.../llm/): New LlmService interface, OpenRouterLlmService (OkHttp-based), ScreenPromptTemplate, ScreenType enum (20 screen types), ScreenIdentificationResult
Dependencies: Added OkHttp 4.12.0 and Gson to scripts module via version catalog
Test Harness (llm-spike/): Shell script for ADB screenshot capture + OpenRouter API testing across 3 models
Tests: 32 unit tests passing — covers service interface, response parsing, prompt template, error handling

Test Coverage

Check	Result
Lint	PASS
Build	PASS (all 4 modules)
Tests	PASS (32/32)
Agent Verification	ADB connected, screenshot pipeline validated

Architecture

FGA Existing Pipeline (fast, free):
  Screenshot → OpenCV Template Match → Known Screen? → Execute Rule
                                              │
                                         [No match]
                                              │
                                    LLM Fallback (new):
                                              │
                              OpenRouterLlmService
                              → ScreenPromptTemplate
                              → ScreenIdentificationResult
                              → Action decision

Note

This branch includes commits from fix/skill-spam-cooldown-ocr-guard (OCR cooldown guard) as the worktree was based on that branch. The LLM-specific changes are in the last 2 commits.

Automated by auto-dev pipeline

Thin cooldown glyphs (1, 4) don't occlude enough pixels in the existing 30x30 icon-match region, causing isReadyForSpam to return true while a skill is still on cooldown. Add cooldownTextRegion() in BattleScreenLocations targeting the 96x48 area where the cooldown number is rendered. In SkillSpam, replace the direct imageRegion check with isReadyForSpam(), which requires both the icon similarity check (>=0.9) and an OCR confirmation that no cooldown digit is present. A 100ms stability re-check is also required before the cast is allowed. Common Tesseract misreads (O->0, I/l->1) are normalised before digit extraction. Changes are isolated to SkillSpam; command-script skill casting is unaffected. Fixes Fate-Grand-Automata#2073

Per the contribution guide, manual similarity values in Region.exists() should be avoided. Replace explicit similarity=0.9 with the in operator (Region.contains) so the user's Fine-Tune min similarity setting is respected, consistent with the rest of the codebase. Remove the now-unused skillReadySimilarity constant.

- Import Skill and use Skill.Servant instead of fully-qualified name, consistent with Caster.kt and the rest of the codebase - Extract duplicated readiness predicate into local checkReady() to avoid repeating the icon-match + cooldown-guard expression

Technical spike for LLM-powered FGO screen identification. Adds: - LlmService interface with structured screen identification - OpenRouterLlmService implementation using OkHttp + Gson - ScreenType enum covering 20 FGO screen types - Structured prompt template for reliable JSON output - Comprehensive unit tests (32 tests, all passing) - Shell script test harness for live API testing with 3 models - Spike documentation with architecture, cost estimates, risk assessment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Clean up unused SerializedName import and remove redundant kotlinx-coroutines-core dependency (already transitively available via libautomata's api() dependency). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds an LLM-backed “screen understanding” layer to the scripts module (OpenRouter-based) to identify FGO screen types from screenshots, plus a local spike harness and documentation for live testing. This PR also includes an OCR-based guard to reduce erroneous “skill spam” taps when skills are on cooldown.

Changes:

Introduces LlmService + OpenRouterLlmService with structured prompt templates and parsing into ScreenIdentificationResult.
Adds unit tests for prompt/content construction and response parsing, and updates scripts dependencies (OkHttp + Gson).
Adds a runnable spike harness (llm-spike/) with results documentation; also updates skill spam logic with a cooldown OCR check.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
scripts/src/test/java/io/github/fate_grand_automata/scripts/llm/OpenRouterLlmServiceTest.kt	Adds parsing/request-body unit tests for OpenRouter service (via reflection).
scripts/src/test/java/io/github/fate_grand_automata/scripts/llm/LlmServiceTest.kt	Adds unit tests for new LLM models/templates/types.
scripts/src/main/java/io/github/fate_grand_automata/scripts/modules/SkillSpam.kt	Adds cooldown OCR guard and readiness recheck before spamming skills.
scripts/src/main/java/io/github/fate_grand_automata/scripts/locations/BattleScreenLocations.kt	Adds a region locator for skill cooldown text OCR.
scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/ScreenType.kt	Defines enumerated set of known screen types for identification.
scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/ScreenPromptTemplate.kt	Adds system/user prompt templates for structured JSON output.
scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/ScreenIdentificationResult.kt	Adds result + token-usage data models for LLM output.
scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/OpenRouterLlmService.kt	Implements OpenRouter chat-completions request building and response parsing.
scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/LlmService.kt	Introduces the LLM service interface.
scripts/build.gradle.kts	Adds OkHttp + Gson dependencies to the scripts module.
llm-spike/SPIKE-RESULTS.md	Documents spike approach, architecture, and preliminary results/costs.
llm-spike/run-spike.sh	Adds a shell-based live testing harness (ADB screenshot + OpenRouter call).
gradle/libs.versions.toml	Adds OkHttp version + library entry to the version catalog.
.gitignore	Ignores spike artifacts directories and fixes an existing ignore entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T23:09:07Z

scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/OpenRouterLlmService.kt

+    override suspend fun identifyScreen(screenshotBase64: String, model: String): ScreenIdentificationResult {
+        val requestBody = buildRequestBody(screenshotBase64, model)
+        val request = Request.Builder()
+            .url(OPENROUTER_API_URL)
+            .addHeader("Authorization", "Bearer $apiKey")
+            .addHeader("Content-Type", "application/json")
+            .addHeader("HTTP-Referer", "https://github.com/Fate-Grand-Automata/FGA")
+            .addHeader("X-Title", "FGA LLM Spike")
+            .post(requestBody.toRequestBody(jsonMediaType))
+            .build()
+
+        val timeMark = TimeSource.Monotonic.markNow()
+
+        val response = client.newCall(request).execute()
+        val latencyMs = timeMark.elapsedNow().inWholeMilliseconds
+


identifyScreen is a suspend function but it performs a blocking OkHttp execute() call directly. If this is ever invoked from a main/UI coroutine context (or a limited dispatcher), it can block threads and cause responsiveness issues. Consider wrapping the network call in withContext(Dispatchers.IO) (or making the API non-suspending and documenting it as blocking) so callers get predictable coroutine behavior.

Fixed in 1e0bb58. Wrapped blocking OkHttp execute() in withContext(Dispatchers.IO).

Copilot · 2026-03-20T23:09:08Z

scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/OpenRouterLlmService.kt

+        val response = client.newCall(request).execute()
+        val latencyMs = timeMark.elapsedNow().inWholeMilliseconds
+
+        val responseBody = response.body?.string()
+            ?: throw LlmServiceException("Empty response body from OpenRouter")
+
+        if (!response.isSuccessful) {
+            throw LlmServiceException(
+                "OpenRouter API error ${response.code}: $responseBody"
+            )
+        }
+
+        return parseResponse(responseBody, model, latencyMs)


The OkHttp Response isn’t closed if an exception occurs before/while reading the body (e.g., body == null). To avoid leaking connections, wrap the call in response.use { ... } (or at least ensure response.close() in a finally).

Suggested change

val response = client.newCall(request).execute()

val latencyMs = timeMark.elapsedNow().inWholeMilliseconds

val responseBody = response.body?.string()

?: throw LlmServiceException("Empty response body from OpenRouter")

if (!response.isSuccessful) {

throw LlmServiceException(

"OpenRouter API error ${response.code}: $responseBody"

)

}

return parseResponse(responseBody, model, latencyMs)

return client.newCall(request).execute().use { response ->

val latencyMs = timeMark.elapsedNow().inWholeMilliseconds

val responseBody = response.body?.string()

?: throw LlmServiceException("Empty response body from OpenRouter")

if (!response.isSuccessful) {

throw LlmServiceException(

"OpenRouter API error ${response.code}: $responseBody"

)

}

parseResponse(responseBody, model, latencyMs)

}

Fixed in 1e0bb58. Added Response.use{} block for proper resource cleanup.

Copilot · 2026-03-20T23:09:08Z

scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/OpenRouterLlmService.kt

+        // Strip markdown code fences if present
+        val cleanedContent = messageContent
+            .replace(Regex("^```json\\s*", RegexOption.MULTILINE), "")
+            .replace(Regex("^```\\s*$", RegexOption.MULTILINE), "")


Markdown fence stripping only removes an opening fence when it is exactly json at the start of a line. If the model returns a plain fence (no language tag) or adds leading whitespace, parsing will fail. Consider handling optional language tags/whitespace and removing both opening and closing fences more generally.

Suggested change

// Strip markdown code fences if present

val cleanedContent = messageContent

.replace(Regex("^```json\\s*", RegexOption.MULTILINE), "")

.replace(Regex("^```\\s*$", RegexOption.MULTILINE), "")

// Strip markdown code fences (opening/closing, any language tag, optional indentation) if present

val fenceRegex = Regex("^\\s*```.*$", RegexOption.MULTILINE)

val cleanedContent = messageContent

.replace(fenceRegex, "")

Fixed in 1e0bb58. Regex now handles leading whitespace, different language tags, and bare fences.

Copilot · 2026-03-20T23:09:08Z

scripts/src/main/java/io/github/fate_grand_automata/scripts/llm/OpenRouterLlmService.kt

+            JsonParser.parseString(cleanedContent).asJsonObject
+        } catch (e: Exception) {
+            throw LlmServiceException(
+                "Failed to parse LLM JSON response: ${e.message}\nRaw: $messageContent"


When JSON parsing fails, the thrown LlmServiceException drops the original exception as the cause, which makes debugging harder (stack trace is lost). Pass the caught exception as the cause when throwing LlmServiceException.

Suggested change

"Failed to parse LLM JSON response: ${e.message}\nRaw: $messageContent"

"Failed to parse LLM JSON response: ${e.message}\nRaw: $messageContent",

e

Fixed in 1e0bb58. Original exception now passed as cause in LlmServiceException.

Copilot · 2026-03-20T23:09:09Z

llm-spike/run-spike.sh

+
+    # Base64 encode the screenshot
+    local base64_data
+    base64_data=$(base64 -i "$screenshot_path" | tr -d '\n')


The script uses base64 -i, which is macOS-specific; GNU base64 typically doesn’t support -i. This makes the spike script non-portable on Linux CI/dev environments. Prefer base64 < file / base64 -w 0 file with detection for the platform.

Suggested change

base64_data=$(base64 -i "$screenshot_path" | tr -d '\n')

base64_data=$(base64 < "$screenshot_path" | tr -d '\n')

Fixed in 1e0bb58. Replaced macOS-specific base64 -i with portable base64 < file.

Copilot · 2026-03-20T23:09:09Z

llm-spike/run-spike.sh

+    request_json=$(python3 -c "
+import json
+data = {
+    'model': '$model',
+    'messages': [
+        {
+            'role': 'system',
+            'content': $system_prompt
+        },
+        {
+            'role': 'user',
+            'content': [
+                {
+                    'type': 'text',
+                    'text': 'Analyze this Fate/Grand Order screenshot and identify the current game screen. Respond with ONLY the JSON object as specified.'
+                },
+                {
+                    'type': 'image_url',
+                    'image_url': {
+                        'url': 'data:image/png;base64,$base64_data'
+                    }
+                }
+            ]
+        }
+    ],
+    'max_tokens': 1024,
+    'temperature': 0.1
+}
+print(json.dumps(data))
+")


request_json is constructed by embedding the entire base64-encoded PNG into a python3 -c command string. For typical screenshots this can exceed shell/OS command-length limits and fail unpredictably. Consider generating the JSON via Python reading the image from disk (or piping base64 through stdin) instead of interpolating the full payload into the command line.

Suggested change

request_json=$(python3 -c "

import json

data = {

'model': '$model',

'messages': [

{

'role': 'system',

'content': $system_prompt

},

{

'role': 'user',

'content': [

{

'type': 'text',

'text': 'Analyze this Fate/Grand Order screenshot and identify the current game screen. Respond with ONLY the JSON object as specified.'

},

{

'type': 'image_url',

'image_url': {

'url': 'data:image/png;base64,$base64_data'

}

}

]

}

],

'max_tokens': 1024,

'temperature': 0.1

}

print(json.dumps(data))

")

request_json=$(MODEL="$model" SYSTEM_PROMPT="$system_prompt" python3 - << 'PY' <<< "$base64_data"

import json

import os

import sys

model = os.environ["MODEL"]

# SYSTEM_PROMPT contains JSON for the system message content

system_prompt = json.loads(os.environ["SYSTEM_PROMPT"])

base64_data = sys.stdin.read().strip()

data = {

"model": model,

"messages": [

{

"role": "system",

"content": system_prompt,

},

{

"role": "user",

"content": [

{

"type": "text",

"text": "Analyze this Fate/Grand Order screenshot and identify the current game screen. Respond with ONLY the JSON object as specified.",

},

{

"type": "image_url",

"image_url": {

"url": f"data:image/png;base64,{base64_data}",

},

},

],

},

],

"max_tokens": 1024,

"temperature": 0.1,

}

print(json.dumps(data))

PY

)

Fixed in 1e0bb58. Base64 data now written to temp file instead of passed as CLI arg.

Copilot · 2026-03-20T23:09:09Z

scripts/src/test/java/io/github/fate_grand_automata/scripts/llm/OpenRouterLlmServiceTest.kt

+import assertk.assertThat
+import assertk.assertions.contains
+import assertk.assertions.isEqualTo
+import assertk.assertions.isGreaterThan
+import assertk.assertions.isInstanceOf
+import assertk.assertions.isNotEmpty
+import assertk.assertions.isNotNull
+import com.google.gson.Gson
+import com.google.gson.JsonParser
+import java.lang.reflect.InvocationTargetException
+import kotlin.test.Test
+import kotlin.test.assertFailsWith
+


Unused test members/imports add noise (e.g., private val gson = Gson() and several assertion imports are never referenced). Removing unused imports/variables will keep the tests easier to maintain and avoids future lint failures if a linter is introduced.

Fixed in 1e0bb58. Removed unused Gson field and unused imports.

Copilot · 2026-03-20T23:09:10Z

scripts/src/test/java/io/github/fate_grand_automata/scripts/llm/LlmServiceTest.kt

+import assertk.assertions.isGreaterThan
+import assertk.assertions.isNotEmpty
+import assertk.assertions.isNotNull
+import io.mockk.every
+import io.mockk.mockk
+import io.mockk.verify


There are unused MockK imports (every, mockk, verify) and some unused assertion imports in this test file. Removing them reduces noise and avoids potential lint/CI failures if stricter checks are enabled later.

Suggested change

import assertk.assertions.isGreaterThan

import assertk.assertions.isNotEmpty

import assertk.assertions.isNotNull

import io.mockk.every

import io.mockk.mockk

import io.mockk.verify

Fixed in 1e0bb58. Removed unused MockK and assertion imports.

- Wrap blocking OkHttp call in withContext(Dispatchers.IO) - Use Response.use{} for proper resource cleanup - Make markdown fence stripping regex more robust - Pass original exception as cause in LlmServiceException - Fix macOS-specific base64 -i flag for cross-platform compat - Replace inline base64 arg with temp file to avoid ARG_MAX - Remove unused test imports and declarations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

DCPMA and others added 5 commits March 16, 2026 15:10

Copilot AI review requested due to automatic review settings March 20, 2026 23:04

Copilot started reviewing on behalf of DCPMA March 20, 2026 23:05 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

	"Failed to parse LLM JSON response: ${e.message}\nRaw: $messageContent"
	"Failed to parse LLM JSON response: ${e.message}\nRaw: $messageContent",
	e

	base64_data=$(base64 -i "$screenshot_path" \| tr -d '\n')
	base64_data=$(base64 < "$screenshot_path" \| tr -d '\n')

Conversation

DCPMA commented Mar 20, 2026

Summary

Linear Task

Subtasks

Changes

Test Coverage

Architecture

Note

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants