Both vLLM and OpenAI documentation talk about how to use the vLLM support of the responses API however I already faced an error trying to connect a client to my runpod serverless because the worker doesn't support responses, check the documentation found below
Source: https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html
Fragment:
Usage
Once the vllm serve runs and INFO: Application startup complete has been displayed, you can send requests using HTTP request or OpenAI SDK to the following endpoints:
/v1/responses endpoint can perform tool use (browsing, python, mcp) in between chain-of-thought and deliver a final response. This endpoint leverages the openai-harmony library for input rendering and output parsing. Stateful operation and full streaming API are work in progress. Responses API is recommended by OpenAI as the way to interact with this model.
Source: https://cookbook.openai.com/articles/gpt-oss/run-vllm
Fragment:
Create a model response
post https://api.openai.com/v1/responses
Creates a model response. Provide text or image inputs to generate text or JSON outputs. Have the model call your own custom code or use built-in tools like web search or file search to use your own data as input for the model's response.
Both vLLM and OpenAI documentation talk about how to use the vLLM support of the responses API however I already faced an error trying to connect a client to my runpod serverless because the worker doesn't support responses, check the documentation found below
Source: https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html
Fragment:
Source: https://cookbook.openai.com/articles/gpt-oss/run-vllm
Fragment: