Skip to content

feat: add multimodal support to OCI Generative AI provider#4964

Draft
fede-kamel wants to merge 10 commits intocrewAIInc:mainfrom
fede-kamel:feat/oci-multimodal
Draft

feat: add multimodal support to OCI Generative AI provider#4964
fede-kamel wants to merge 10 commits intocrewAIInc:mainfrom
fede-kamel:feat/oci-multimodal

Conversation

@fede-kamel
Copy link
Copy Markdown

@fede-kamel fede-kamel commented Mar 19, 2026

Summary

  • vision.py: vision model lists, to_data_uri, load_image, encode_image, is_vision_model helpers
  • _build_generic_content handles image_url, document_url, video_url, audio_url content types → OCI SDK ImageContent, DocumentContent, VideoContent, AudioContent
  • _message_has_multimodal_content detects non-text payloads for Cohere rejection
  • Cohere models reject multimodal with clear error (text-only)
  • supports_multimodal()True

Depends on #4963, #4962, #4961, #4959. Draft until all merge.
Tracking issue: #4944

Diff breakdown (vs structured output PR)

Change Lines
vision.py (new file) +62
completion.py (multimodal content + checks) +58/-4
__init__.py (vision exports) +15/-1
test_oci_multimodal.py (10 unit tests) +195
test_oci_integration_multimodal.py (1 live test) +106
Total +440

Test plan

  • 10 unit tests: image/document/video/audio content building, unsupported type rejection, Cohere multimodal rejection, multimodal detection, supports_multimodal, vision helpers
  • Live integration: sent 2x2 red PNG to google.gemini-2.5-flash via data URI — correctly identified color
  • All 33 prior unit tests still pass (no regressions)

@fede-kamel fede-kamel force-pushed the feat/oci-multimodal branch from 3fab3ad to 45617b1 Compare March 29, 2026 16:16
fede-kamel added a commit to fede-kamel/crewAI that referenced this pull request Mar 29, 2026
Add OCI embedding support integrated with CrewAI's RAG pipeline:
- OCIEmbeddingFunction: ChromaDB-compatible embedding callable with
  batching, config serialization, image embedding support
- OCIProvider: Pydantic-based provider with alias validation for
  env vars and config keys
- Factory registration in embeddings/factory.py + types.py
- Supports text and image embeddings, output dimensions,
  custom endpoints, all 4 OCI auth modes

Tested live against cohere.embed-english-v3.0 with API_KEY auth.

Depends on: crewAIInc#4964, crewAIInc#4963, crewAIInc#4962, crewAIInc#4961, crewAIInc#4959
Tracking issue: crewAIInc#4944
fede-kamel added a commit to fede-kamel/crewAI that referenced this pull request Mar 29, 2026
Add OCI embedding support integrated with CrewAI's RAG pipeline:
- OCIEmbeddingFunction: ChromaDB-compatible embedding callable with
  batching, config serialization, image embedding support
- OCIProvider: Pydantic-based provider with alias validation for
  env vars and config keys
- Factory registration in embeddings/factory.py + types.py
- Supports text and image embeddings, output dimensions,
  custom endpoints, all 4 OCI auth modes

Tested live against cohere.embed-english-v3.0 with API_KEY auth.

Depends on: crewAIInc#4964, crewAIInc#4963, crewAIInc#4962, crewAIInc#4961, crewAIInc#4959
Tracking issue: crewAIInc#4944
fede-kamel added a commit to fede-kamel/crewAI that referenced this pull request Mar 29, 2026
Replace asyncio.to_thread wrappers with true async I/O using aiohttp
for acall() and astream(). The OCI SDK is sync-only, so we bypass it
for HTTP and use its signer for request authentication directly.

- oci_async.py: OCIAsyncClient with aiohttp, OCI request signing,
  native SSE parsing, connection pooling
- acall(): true async chat completion (no thread pool)
- astream(): true async SSE streaming (no thread+queue bridge)
- Graceful fallback to asyncio.to_thread when aiohttp unavailable
  or client is mocked (unit tests)
- aiohttp + certifi added to crewai[oci] optional deps

Temporary measure until OCI SDK ships native async support.

Tested live: acall, astream, and concurrent acall against
meta.llama-3.3-70b-instruct with API_KEY auth.

Depends on: crewAIInc#4966, crewAIInc#4964, crewAIInc#4963, crewAIInc#4962, crewAIInc#4961, crewAIInc#4959
Tracking issue: crewAIInc#4944
Add native OCI Generative AI support to CrewAI with basic text
completion for generic (Meta, Google, OpenAI, xAI) and Cohere model
families. This is the first in a series of PRs to incrementally
build out full OCI support (streaming, tool calling, structured
output, embeddings, and multimodal in follow-up PRs).

Tracking issue: crewAIInc#4944
Supersedes: crewAIInc#4885
Tool calling is not implemented in this PR. Returning True would
cause CrewAI to choose the native tools path, silently dropping
tools from agents. Flagged by Cursor Bugbot review.
Both methods are unnecessary in this PR. The base class and callers
already default correctly when the methods are absent:
- supports_function_calling: callers use getattr with False default
- supports_stop_words: base class already returns True

These will be added back in the tool calling follow-up PR.
Remove json, re imports and _OCI_SCHEMA_NAME_PATTERN regex that
are only needed for structured output (not in this PR scope).
Use model_lower instead of model in the dot check to match the
convention used by all other providers in _matches_provider_pattern.
Flagged by Cursor Bugbot.
Add streaming text completion via OCI SSE events:
- stream=True in call() routes to _stream_call_impl with chunk events
- iter_stream() yields raw text chunks (sync generator)
- astream() wraps iter_stream via thread+queue for async callers
- _stream_chat_events holds client lock for full stream duration
- SSE event parsing handles both string and mapping payloads

Tested live against meta.llama-3.3-70b-instruct,
cohere.command-r-plus-08-2024, google.gemini-2.5-flash,
and openai.gpt-5.2-chat-latest.

Depends on: crewAIInc#4959
Tracking issue: crewAIInc#4944
Add native function calling for generic and Cohere model families:
- _format_tools converts CrewAI tool specs to OCI SDK format
- _extract_tool_calls normalizes responses back to CrewAI shape
- _handle_tool_calls executes tools and recurses until model finishes
- Cohere tool message handling with trailing tool results
- Tool choice control (auto/none/required/function)
- Passthrough parameter filtering via SDK introspection
- Streaming tool call accumulation from SSE fragments
- supports_function_calling() returns True

Tested live against meta.llama-3.3-70b-instruct with
raw tool call return and recursive tool execution.

Depends on: crewAIInc#4961 (streaming), crewAIInc#4959 (basic text)
Tracking issue: crewAIInc#4944
Add response_model (Pydantic) support for structured output:
- _build_response_format converts Pydantic schema to OCI
  JsonSchemaResponseFormat (generic) or CohereResponseJsonFormat
- _parse_structured_response validates and returns typed models
- response_model threaded through call, _call_impl, _stream_call_impl,
  and _handle_tool_calls for full coverage
- Handles JSON in markdown fences via base class _validate_structured_output

Tested live against meta.llama-3.3-70b-instruct and
google.gemini-2.5-flash.

Depends on: crewAIInc#4962 (tool calling), crewAIInc#4961 (streaming), crewAIInc#4959 (basic text)
Tracking issue: crewAIInc#4944
Add multimodal content handling for generic model families:
- vision.py: model lists, data URI helpers, image encoding utilities
- _build_generic_content handles image_url, document_url, video_url,
  audio_url content types mapped to OCI SDK content objects
- _message_has_multimodal_content detects non-text payloads
- Cohere models reject multimodal with clear error message
- supports_multimodal() returns True

Depends on: crewAIInc#4963, crewAIInc#4962, crewAIInc#4961, crewAIInc#4959
Tracking issue: crewAIInc#4944
Send a 2x2 red PNG to google.gemini-2.5-flash via data URI and
verify it identifies the color. Tests the full image_url content
pipeline end-to-end against a live OCI vision model.
@fede-kamel fede-kamel force-pushed the feat/oci-multimodal branch from 45617b1 to 52d905c Compare April 24, 2026 17:11
fede-kamel added a commit to fede-kamel/crewAI that referenced this pull request Apr 24, 2026
Add OCI embedding support integrated with CrewAI's RAG pipeline:
- OCIEmbeddingFunction: ChromaDB-compatible embedding callable with
  batching, config serialization, image embedding support
- OCIProvider: Pydantic-based provider with alias validation for
  env vars and config keys
- Factory registration in embeddings/factory.py + types.py
- Supports text and image embeddings, output dimensions,
  custom endpoints, all 4 OCI auth modes

Tested live against cohere.embed-english-v3.0 with API_KEY auth.

Depends on: crewAIInc#4964, crewAIInc#4963, crewAIInc#4962, crewAIInc#4961, crewAIInc#4959
Tracking issue: crewAIInc#4944
fede-kamel added a commit to fede-kamel/crewAI that referenced this pull request Apr 24, 2026
Add OCI embedding support integrated with CrewAI's RAG pipeline:
- OCIEmbeddingFunction: ChromaDB-compatible embedding callable with
  batching, config serialization, image embedding support
- OCIProvider: Pydantic-based provider with alias validation for
  env vars and config keys
- Factory registration in embeddings/factory.py + types.py
- Supports text and image embeddings, output dimensions,
  custom endpoints, all 4 OCI auth modes

Tested live against cohere.embed-english-v3.0 with API_KEY auth.

Depends on: crewAIInc#4964, crewAIInc#4963, crewAIInc#4962, crewAIInc#4961, crewAIInc#4959
Tracking issue: crewAIInc#4944
fede-kamel added a commit to fede-kamel/crewAI that referenced this pull request Apr 24, 2026
Replace asyncio.to_thread wrappers with true async I/O using aiohttp
for acall() and astream(). The OCI SDK is sync-only, so we bypass it
for HTTP and use its signer for request authentication directly.

- oci_async.py: OCIAsyncClient with aiohttp, OCI request signing,
  native SSE parsing, connection pooling
- acall(): true async chat completion (no thread pool)
- astream(): true async SSE streaming (no thread+queue bridge)
- Graceful fallback to asyncio.to_thread when aiohttp unavailable
  or client is mocked (unit tests)
- aiohttp + certifi added to crewai[oci] optional deps

Temporary measure until OCI SDK ships native async support.

Tested live: acall, astream, and concurrent acall against
meta.llama-3.3-70b-instruct with API_KEY auth.

Depends on: crewAIInc#4966, crewAIInc#4964, crewAIInc#4963, crewAIInc#4962, crewAIInc#4961, crewAIInc#4959
Tracking issue: crewAIInc#4944
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant