Skip to content

[Draft] inworld tts auto mode#1008

Open
ianbbqzy wants to merge 2 commits intolivekit:mainfrom
ianbbqzy:ian/inworld-auto-mode
Open

[Draft] inworld tts auto mode#1008
ianbbqzy wants to merge 2 commits intolivekit:mainfrom
ianbbqzy:ian/inworld-auto-mode

Conversation

@ianbbqzy
Copy link

@ianbbqzy ianbbqzy commented Jan 29, 2026

auto_mode to be added to config param in a separate PR when word tokenizer and user-controlled manual flushes are supported. For now, auto_mode should enhance quality and naturalness of agent response

Description

Changes Made

Pre-Review Checklist

  • Build passes: All builds (lint, typecheck, tests) pass locally
  • AI-generated code reviewed: Removed unnecessary comments and ensured code quality
  • Changes explained: All changes are properly documented and justified above
  • Scope appropriate: All changes relate to the PR title, or explanations provided for why they're included
  • Video demo: A small video demo showing changes works as expected and did not break any existing functionality using Agent Playground (if applicable)

Testing

  • Automated tests added/updated (if applicable)
  • All tests pass
  • Make sure both restaurant_agent.ts and realtime_agent.ts work properly (for major changes)

Additional Notes


Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.

Summary by CodeRabbit

  • Improvements
    • Enhanced text-to-speech with automatic streaming for more responsive audio synthesis.
    • Improved timing and alignment of words/characters in streamed audio for smoother, monotonic playback.
    • Better handling of stream completion to ensure continuous, correctly-timed audio output.

@changeset-bot
Copy link

changeset-bot bot commented Jan 29, 2026

⚠️ No Changeset found

Latest commit: 2baf4a1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Ian Lee seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

Added timestamp-cumulative handling and flush semantics to TTS synthesis stream; introduced flushCompleted?: boolean on InworldResult and autoMode?: boolean on CreateContextConfig; context creation now forces autoMode: true. Adjusted alignment timestamp offsets and generation end tracking in the stream implementation.

Changes

Cohort / File(s) Summary
TTS implementation & types
plugins/inworld/src/tts.ts
Added autoMode?: boolean to CreateContextConfig and flushCompleted?: boolean to InworldResult. Implemented cumulative timestamp tracking (#cumulativeTime, #generationEndTime) in SynthesizeStream, applied cumulative offsets to word/char alignments, reset cumulative time on flushCompleted, and always set autoMode: true in context creation. Added explanatory comments about monotonic timestamps and autoMode rationale.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hop through timestamps, neat and spry,
Offsets stacked so words don't lie,
A tiny flag — autoMode true,
Flushes tidy, streaming through,
Carrots sync up — audio by-by-by 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description contains only a brief note about auto_mode without filling out the required template sections like 'Description', 'Changes Made', 'Testing', and most checklist items remain unchecked. Complete the PR description by providing a clear description of changes, listing specific modifications made (interface additions, timestamp tracking logic), and documenting testing approach and checklist completion status.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'inworld tts auto mode' is directly related to the main change in the changeset, which adds auto mode functionality to the Inworld TTS system for enhancing response quality.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52d155b and 2baf4a1.

📒 Files selected for processing (1)
  • plugins/inworld/src/tts.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

  • plugins/inworld/src/tts.ts
**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

  • plugins/inworld/src/tts.ts
**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

  • plugins/inworld/src/tts.ts
🧠 Learnings (3)
📓 Common learnings
Learnt from: cshape
Repo: livekit/agents-js PR: 1008
File: plugins/inworld/src/tts.ts:639-641
Timestamp: 2026-02-02T23:20:23.828Z
Learning: The `autoMode` field in Inworld's WebSocket TTS API `create_context` configuration is a forward-compatible feature that will be officially released by Inworld. It is safe to include this field in the configuration as Inworld's API will silently ignore unsupported fields until the feature is available.
📚 Learning: 2026-02-02T23:20:17.980Z
Learnt from: cshape
Repo: livekit/agents-js PR: 1008
File: plugins/inworld/src/tts.ts:639-641
Timestamp: 2026-02-02T23:20:17.980Z
Learning: Include the autoMode field in the Inworld WebSocket TTS API create_context configuration in plugins/inworld/src/tts.ts as a forward-compatible option. Since the API will silently ignore unsupported fields until the feature is released, adding autoMode now is safe and prepares for future usage. Ensure you don’t rely on autoMode for current behavior and consider adding a comment indicating it's forward-compatible. If possible, add a test to verify that existing behavior remains unchanged when autoMode is not yet recognized by the API.

Applied to files:

  • plugins/inworld/src/tts.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to examples/src/test_*.ts : For plugin component debugging (STT, TTS, LLM), create test example files prefixed with `test_` under the examples directory and run with `pnpm build && node ./examples/src/test_my_plugin.ts`

Applied to files:

  • plugins/inworld/src/tts.ts
🔇 Additional comments (5)
plugins/inworld/src/tts.ts (5)

77-77: LGTM!

The optional interface additions for autoMode and flushCompleted properly extend the existing types without breaking backward compatibility.

Also applies to: 106-106


481-486: LGTM!

The cumulative timestamp tracking fields are well-documented. The comment clearly explains the monotonic timestamp invariant and why this offset mechanism is needed when the server resets timestamps after each generation.


515-519: LGTM!

The flushCompleted handler correctly captures the generation end time as the new cumulative offset, ensuring subsequent generation timestamps continue monotonically from where the previous generation ended.


525-535: LGTM!

The cumulative offset is correctly applied to word alignment timestamps. Using Math.max to update #generationEndTime properly handles potential out-of-order timestamp arrivals within a generation.


547-557: LGTM!

Character alignment timestamp handling mirrors the word alignment logic correctly. Both contribute to tracking #generationEndTime, which is appropriate when both alignment types are present.

Optional: The word and character alignment processing blocks share similar structure. Consider extracting a helper if this pattern expands further.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@plugins/inworld/src/tts.ts`:
- Around line 639-641: Remove the unsupported autoMode field and its comment
from the create_context/create (or create) message builder in the TTS WebSocket
code: find the autoMode: true property (and the preceding comment referencing
auto_mode) in the code that constructs the Inworld "create" context/message
(e.g., inside the function building the create_context payload) and delete both
the property and the misleading comment; if you believe auto-mode must be
enabled, instead add a TODO or a verification step to call Inworld support or
adjust the implementation to implement sentence-tokenizer-driven flush behavior
locally rather than relying on a non-existent API flag.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c02ff2 and 52d155b.

📒 Files selected for processing (1)
  • plugins/inworld/src/tts.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

  • plugins/inworld/src/tts.ts
**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

  • plugins/inworld/src/tts.ts
**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

  • plugins/inworld/src/tts.ts
🧠 Learnings (1)
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to examples/src/test_*.ts : For plugin component debugging (STT, TTS, LLM), create test example files prefixed with `test_` under the examples directory and run with `pnpm build && node ./examples/src/test_my_plugin.ts`

Applied to files:

  • plugins/inworld/src/tts.ts
🔇 Additional comments (1)
plugins/inworld/src/tts.ts (1)

68-77: LGTM: optional autoMode in CreateContextConfig is a clean, non-breaking extension.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants