Skip to main content

Bedrock Imported Models

Bedrock Imported Models (Deepseek, Deepseek R1, Qwen, OpenAI-compatible models)

Deepseek R1

This is a separate route, as the chat template is different.

PropertyDetails
Provider Routebedrock/deepseek_r1/{model_arn}
Provider DocumentationBedrock Imported Models, Deepseek Bedrock Imported Model
from litellm import completion
import os

response = completion(
model="bedrock/deepseek_r1/arn:aws:bedrock:us-east-1:086734376398:imported-model/r4c4kewx2s0n", # bedrock/deepseek_r1/{your-model-arn}
messages=[{"role": "user", "content": "Tell me a joke"}],
)

Deepseek (not R1)

PropertyDetails
Provider Routebedrock/llama/{model_arn}
Provider DocumentationBedrock Imported Models, Deepseek Bedrock Imported Model

Use this route to call Bedrock Imported Models that follow the llama Invoke Request / Response spec

from litellm import completion
import os

response = completion(
model="bedrock/llama/arn:aws:bedrock:us-east-1:086734376398:imported-model/r4c4kewx2s0n", # bedrock/llama/{your-model-arn}
messages=[{"role": "user", "content": "Tell me a joke"}],
)

Qwen3 Imported Models

PropertyDetails
Provider Routebedrock/qwen3/{model_arn}
Provider DocumentationBedrock Imported Models, Qwen3 Models
from litellm import completion
import os

response = completion(
model="bedrock/qwen3/arn:aws:bedrock:us-east-1:086734376398:imported-model/your-qwen3-model", # bedrock/qwen3/{your-model-arn}
messages=[{"role": "user", "content": "Tell me a joke"}],
max_tokens=100,
temperature=0.7
)

Qwen2 Imported Models

PropertyDetails
Provider Routebedrock/qwen2/{model_arn}
Provider DocumentationBedrock Imported Models
NoteQwen2 and Qwen3 architectures are mostly similar. The main difference is in the response format: Qwen2 uses "text" field while Qwen3 uses "generation" field.
from litellm import completion
import os

response = completion(
model="bedrock/qwen2/arn:aws:bedrock:us-east-1:086734376398:imported-model/your-qwen2-model", # bedrock/qwen2/{your-model-arn}
messages=[{"role": "user", "content": "Tell me a joke"}],
max_tokens=100,
temperature=0.7
)

OpenAI-Compatible Imported Models (Qwen 2.5 VL, etc.)

Use this route for Bedrock imported models that follow the OpenAI Chat Completions API spec. This includes models like Qwen 2.5 VL that accept OpenAI-formatted messages with support for vision (images), tool calling, and other OpenAI features.

PropertyDetails
Provider Routebedrock/openai/{model_arn}
Provider DocumentationBedrock Imported Models
Supported FeaturesVision (images), tool calling, streaming, system messages

LiteLLMSDK Usage

Basic Usage

from litellm import completion

response = completion(
model="bedrock/openai/arn:aws:bedrock:us-east-1:046319184608:imported-model/0m2lasirsp6z", # bedrock/openai/{your-model-arn}
messages=[{"role": "user", "content": "Tell me a joke"}],
max_tokens=300,
temperature=0.5
)

With Vision (Images)

import base64
from litellm import completion

# Load and encode image
with open("image.jpg", "rb") as f:
image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = completion(
model="bedrock/openai/arn:aws:bedrock:us-east-1:046319184608:imported-model/0m2lasirsp6z",
messages=[
{
"role": "system",
"content": "You are a helpful assistant that can analyze images."
},
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_base64}"}
}
]
}
],
max_tokens=300,
temperature=0.5
)

Comparing Multiple Images

import base64
from litellm import completion

# Load images
with open("image1.jpg", "rb") as f:
image1_base64 = base64.b64encode(f.read()).decode("utf-8")
with open("image2.jpg", "rb") as f:
image2_base64 = base64.b64encode(f.read()).decode("utf-8")

response = completion(
model="bedrock/openai/arn:aws:bedrock:us-east-1:046319184608:imported-model/0m2lasirsp6z",
messages=[
{
"role": "system",
"content": "You are a helpful assistant that can analyze images."
},
{
"role": "user",
"content": [
{"type": "text", "text": "Spot the difference between these two images?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image1_base64}"}
},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image2_base64}"}
}
]
}
],
max_tokens=300,
temperature=0.5
)

LiteLLM Proxy Usage (AI Gateway)

1. Add to config

model_list:
- model_name: qwen-25vl-72b
litellm_params:
model: bedrock/openai/arn:aws:bedrock:us-east-1:046319184608:imported-model/0m2lasirsp6z

2. Start proxy

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Test it!

Basic text request:

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-25vl-72b",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
"max_tokens": 300
}'

With vision (image):

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-25vl-72b",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that can analyze images."
},
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {"url": "..."}
}
]
}
],
"max_tokens": 300,
"temperature": 0.5
}'

Moonshot Kimi K2 Thinking

Moonshot AI's Kimi K2 Thinking model is now available on Amazon Bedrock. This model features advanced reasoning capabilities with automatic reasoning content extraction.

PropertyDetails
Provider Routebedrock/moonshot.kimi-k2-thinking, bedrock/invoke/moonshot.kimi-k2-thinking
Provider DocumentationAWS Bedrock Moonshot Announcement ↗
Supported Parameterstemperature, max_tokens, top_p, stream, tools, tool_choice
Special FeaturesReasoning content extraction, Tool calling

Supported Features

  • Reasoning Content Extraction: Automatically extracts <reasoning> tags and returns them as reasoning_content (similar to OpenAI's o1 models)
  • Tool Calling: Full support for function/tool calling with tool responses
  • Streaming: Both streaming and non-streaming responses
  • System Messages: System message support

Basic Usage

Moonshot Kimi K2 SDK Usage
from litellm import completion
import os

os.environ["AWS_ACCESS_KEY_ID"] = "your-aws-access-key"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your-aws-secret-key"
os.environ["AWS_REGION_NAME"] = "us-west-2" # or your preferred region

# Basic completion
response = completion(
model="bedrock/moonshot.kimi-k2-thinking", # or bedrock/invoke/moonshot.kimi-k2-thinking
messages=[
{"role": "user", "content": "What is 2+2? Think step by step."}
],
temperature=0.7,
max_tokens=200
)

print(response.choices[0].message.content)

# Access reasoning content if present
if response.choices[0].message.reasoning_content:
print("Reasoning:", response.choices[0].message.reasoning_content)

Tool Calling Example

Kimi K2 with Tool Calling
from litellm import completion
import os

os.environ["AWS_ACCESS_KEY_ID"] = "your-aws-access-key"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your-aws-secret-key"
os.environ["AWS_REGION_NAME"] = "us-west-2"

# Tool calling example
response = completion(
model="bedrock/moonshot.kimi-k2-thinking",
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
],
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city name"
}
},
"required": ["location"]
}
}
}
]
)

if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
print(f"Tool called: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")

Streaming Example

Kimi K2 Streaming
from litellm import completion
import os

os.environ["AWS_ACCESS_KEY_ID"] = "your-aws-access-key"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your-aws-secret-key"
os.environ["AWS_REGION_NAME"] = "us-west-2"

response = completion(
model="bedrock/moonshot.kimi-k2-thinking",
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
stream=True,
temperature=0.7
)

for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")

# Check for reasoning content in streaming
if hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
print(f"\n[Reasoning: {chunk.choices[0].delta.reasoning_content}]")

Supported Parameters

ParameterTypeDescriptionSupported
temperaturefloat (0-1)Controls randomness in output
max_tokensintegerMaximum tokens to generate
top_pfloatNucleus sampling parameter
streambooleanEnable streaming responses
toolsarrayTool/function definitions
tool_choicestring/objectTool choice specification
stoparrayStop sequences❌ (Not supported on Bedrock)