feat: add multimodal support to OCI Generative AI provider by fede-kamel · Pull Request #4964 · crewAIInc/crewAI

fede-kamel · 2026-03-19T19:47:51Z

Summary

vision.py: vision model lists, to_data_uri, load_image, encode_image, is_vision_model helpers
_build_generic_content handles image_url, document_url, video_url, audio_url content types → OCI SDK ImageContent, DocumentContent, VideoContent, AudioContent
_message_has_multimodal_content detects non-text payloads for Cohere rejection
Cohere models reject multimodal with clear error (text-only)
supports_multimodal() → True

Depends on #4963, #4962, #4961, #4959. Draft until all merge.
Tracking issue: #4944

Diff breakdown (vs structured output PR)

Change	Lines
`vision.py` (new file)	+62
`completion.py` (multimodal content + checks)	+58/-4
`__init__.py` (vision exports)	+15/-1
`test_oci_multimodal.py` (10 unit tests)	+195
`test_oci_integration_multimodal.py` (1 live test)	+106
Total	+440

Test plan

10 unit tests: image/document/video/audio content building, unsupported type rejection, Cohere multimodal rejection, multimodal detection, supports_multimodal, vision helpers
Live integration: sent 2x2 red PNG to google.gemini-2.5-flash via data URI — correctly identified color
All 33 prior unit tests still pass (no regressions)

Add OCI embedding support integrated with CrewAI's RAG pipeline: - OCIEmbeddingFunction: ChromaDB-compatible embedding callable with batching, config serialization, image embedding support - OCIProvider: Pydantic-based provider with alias validation for env vars and config keys - Factory registration in embeddings/factory.py + types.py - Supports text and image embeddings, output dimensions, custom endpoints, all 4 OCI auth modes Tested live against cohere.embed-english-v3.0 with API_KEY auth. Depends on: crewAIInc#4964, crewAIInc#4963, crewAIInc#4962, crewAIInc#4961, crewAIInc#4959 Tracking issue: crewAIInc#4944

Replace asyncio.to_thread wrappers with true async I/O using aiohttp for acall() and astream(). The OCI SDK is sync-only, so we bypass it for HTTP and use its signer for request authentication directly. - oci_async.py: OCIAsyncClient with aiohttp, OCI request signing, native SSE parsing, connection pooling - acall(): true async chat completion (no thread pool) - astream(): true async SSE streaming (no thread+queue bridge) - Graceful fallback to asyncio.to_thread when aiohttp unavailable or client is mocked (unit tests) - aiohttp + certifi added to crewai[oci] optional deps Temporary measure until OCI SDK ships native async support. Tested live: acall, astream, and concurrent acall against meta.llama-3.3-70b-instruct with API_KEY auth. Depends on: crewAIInc#4966, crewAIInc#4964, crewAIInc#4963, crewAIInc#4962, crewAIInc#4961, crewAIInc#4959 Tracking issue: crewAIInc#4944

fede-kamel · 2026-04-01T15:36:22Z

OCI GenAI PR series (tracking issue: #4944)

feat: add OCI Generative AI provider — basic text completion #4959 — Basic text completion
feat: add streaming support to OCI Generative AI provider #4961 — Streaming
feat: add tool calling support to OCI Generative AI provider #4962 — Tool calling
feat: add structured output support to OCI Generative AI provider #4963 — Structured output
feat: add multimodal support to OCI Generative AI provider #4964 — Multimodal ← this PR
feat: add OCI Generative AI embeddings provider #4966 — Embeddings
feat: add true async support to OCI provider via aiohttp #4982 — True async (aiohttp)

Depends on #4963.

Add native OCI Generative AI support to CrewAI with basic text completion for generic (Meta, Google, OpenAI, xAI) and Cohere model families. This is the first in a series of PRs to incrementally build out full OCI support (streaming, tool calling, structured output, embeddings, and multimodal in follow-up PRs). Tracking issue: crewAIInc#4944 Supersedes: crewAIInc#4885

Tool calling is not implemented in this PR. Returning True would cause CrewAI to choose the native tools path, silently dropping tools from agents. Flagged by Cursor Bugbot review.

Both methods are unnecessary in this PR. The base class and callers already default correctly when the methods are absent: - supports_function_calling: callers use getattr with False default - supports_stop_words: base class already returns True These will be added back in the tool calling follow-up PR.

Remove json, re imports and _OCI_SCHEMA_NAME_PATTERN regex that are only needed for structured output (not in this PR scope).

Use model_lower instead of model in the dot check to match the convention used by all other providers in _matches_provider_pattern. Flagged by Cursor Bugbot.

Add streaming text completion via OCI SSE events: - stream=True in call() routes to _stream_call_impl with chunk events - iter_stream() yields raw text chunks (sync generator) - astream() wraps iter_stream via thread+queue for async callers - _stream_chat_events holds client lock for full stream duration - SSE event parsing handles both string and mapping payloads Tested live against meta.llama-3.3-70b-instruct, cohere.command-r-plus-08-2024, google.gemini-2.5-flash, and openai.gpt-5.2-chat-latest. Depends on: crewAIInc#4959 Tracking issue: crewAIInc#4944

Add native function calling for generic and Cohere model families: - _format_tools converts CrewAI tool specs to OCI SDK format - _extract_tool_calls normalizes responses back to CrewAI shape - _handle_tool_calls executes tools and recurses until model finishes - Cohere tool message handling with trailing tool results - Tool choice control (auto/none/required/function) - Passthrough parameter filtering via SDK introspection - Streaming tool call accumulation from SSE fragments - supports_function_calling() returns True Tested live against meta.llama-3.3-70b-instruct with raw tool call return and recursive tool execution. Depends on: crewAIInc#4961 (streaming), crewAIInc#4959 (basic text) Tracking issue: crewAIInc#4944

Add response_model (Pydantic) support for structured output: - _build_response_format converts Pydantic schema to OCI JsonSchemaResponseFormat (generic) or CohereResponseJsonFormat - _parse_structured_response validates and returns typed models - response_model threaded through call, _call_impl, _stream_call_impl, and _handle_tool_calls for full coverage - Handles JSON in markdown fences via base class _validate_structured_output Tested live against meta.llama-3.3-70b-instruct and google.gemini-2.5-flash. Depends on: crewAIInc#4962 (tool calling), crewAIInc#4961 (streaming), crewAIInc#4959 (basic text) Tracking issue: crewAIInc#4944

Add multimodal content handling for generic model families: - vision.py: model lists, data URI helpers, image encoding utilities - _build_generic_content handles image_url, document_url, video_url, audio_url content types mapped to OCI SDK content objects - _message_has_multimodal_content detects non-text payloads - Cohere models reject multimodal with clear error message - supports_multimodal() returns True Depends on: crewAIInc#4963, crewAIInc#4962, crewAIInc#4961, crewAIInc#4959 Tracking issue: crewAIInc#4944

Send a 2x2 red PNG to google.gemini-2.5-flash via data URI and verify it identifies the color. Tests the full image_url content pipeline end-to-end against a live OCI vision model.

Add OCI embedding support integrated with CrewAI's RAG pipeline: - OCIEmbeddingFunction: ChromaDB-compatible embedding callable with batching, config serialization, image embedding support - OCIProvider: Pydantic-based provider with alias validation for env vars and config keys - Factory registration in embeddings/factory.py + types.py - Supports text and image embeddings, output dimensions, custom endpoints, all 4 OCI auth modes Tested live against cohere.embed-english-v3.0 with API_KEY auth. Depends on: crewAIInc#4964, crewAIInc#4963, crewAIInc#4962, crewAIInc#4961, crewAIInc#4959 Tracking issue: crewAIInc#4944

Replace asyncio.to_thread wrappers with true async I/O using aiohttp for acall() and astream(). The OCI SDK is sync-only, so we bypass it for HTTP and use its signer for request authentication directly. - oci_async.py: OCIAsyncClient with aiohttp, OCI request signing, native SSE parsing, connection pooling - acall(): true async chat completion (no thread pool) - astream(): true async SSE streaming (no thread+queue bridge) - Graceful fallback to asyncio.to_thread when aiohttp unavailable or client is mocked (unit tests) - aiohttp + certifi added to crewai[oci] optional deps Temporary measure until OCI SDK ships native async support. Tested live: acall, astream, and concurrent acall against meta.llama-3.3-70b-instruct with API_KEY auth. Depends on: crewAIInc#4966, crewAIInc#4964, crewAIInc#4963, crewAIInc#4962, crewAIInc#4961, crewAIInc#4959 Tracking issue: crewAIInc#4944

fede-kamel mentioned this pull request Mar 19, 2026

feat: add OCI Generative AI embeddings provider #4966

Draft

3 tasks

fede-kamel mentioned this pull request Mar 20, 2026

feat: add true async support to OCI provider via aiohttp #4982

Draft

2 tasks

fede-kamel force-pushed the feat/oci-multimodal branch from 3fab3ad to 45617b1 Compare March 29, 2026 16:16

fede-kamel added 10 commits April 24, 2026 13:11

fix: return False from supports_function_calling until tool PR

1d8f8bc

Tool calling is not implemented in this PR. Returning True would cause CrewAI to choose the native tools path, silently dropping tools from agents. Flagged by Cursor Bugbot review.

cleanup: remove unused imports and dead code

2187120

Remove json, re imports and _OCI_SCHEMA_NAME_PATTERN regex that are only needed for structured output (not in this PR scope).

fix: use model_lower consistently in OCI pattern check

3c62aab

Use model_lower instead of model in the dot check to match the convention used by all other providers in _matches_provider_pattern. Flagged by Cursor Bugbot.

test: add live multimodal integration test

52d905c

Send a 2x2 red PNG to google.gemini-2.5-flash via data URI and verify it identifies the color. Tests the full image_url content pipeline end-to-end against a live OCI vision model.

fede-kamel force-pushed the feat/oci-multimodal branch from 45617b1 to 52d905c Compare April 24, 2026 17:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add multimodal support to OCI Generative AI provider#4964

feat: add multimodal support to OCI Generative AI provider#4964
fede-kamel wants to merge 10 commits intocrewAIInc:mainfrom
fede-kamel:feat/oci-multimodal

fede-kamel commented Mar 19, 2026 •

edited

Loading

Uh oh!

fede-kamel commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fede-kamel commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Diff breakdown (vs structured output PR)

Test plan

Uh oh!

fede-kamel commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fede-kamel commented Mar 19, 2026 •

edited

Loading