/embeddings
Quick Startβ
from litellm import embedding
import os
os.environ['OPENAI_API_KEY'] = ""
response = embedding(model='text-embedding-ada-002', input=["good morning from litellm"])
Proxy Usageβ
NOTE
For vertex_ai,
export GOOGLE_APPLICATION_CREDENTIALS="absolute/path/to/service_account.json"
Add model to configβ
model_list:
- model_name: textembedding-gecko
  litellm_params:
    model: vertex_ai/textembedding-gecko
general_settings:
  master_key: sk-1234
Start proxyβ
litellm --config /path/to/config.yaml 
# RUNNING on http://0.0.0.0:4000
Testβ
- Curl
- OpenAI (python)
- Langchain Embeddings
curl --location 'http://0.0.0.0:4000/embeddings' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"input": ["Academia.edu uses"], "model": "textembedding-gecko", "encoding_format": "base64"}'
from openai import OpenAI
client = OpenAI(
  api_key="sk-1234",
  base_url="http://0.0.0.0:4000"
)
client.embeddings.create(
  model="textembedding-gecko",
  input="The food was delicious and the waiter...",
  encoding_format="float"
)
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="textembedding-gecko", openai_api_base="http://0.0.0.0:4000", openai_api_key="sk-1234")
text = "This is a test document."
query_result = embeddings.embed_query(text)
print(f"VERTEX AI EMBEDDINGS")
print(query_result[:5])
Image Embeddingsβ
For models that support image embeddings, you can pass in a base64 encoded image string to the input param.
- SDK
- PROXY
from litellm import embedding
import os
# set your api key
os.environ["COHERE_API_KEY"] = ""
response = embedding(model="cohere/embed-english-v3.0", input=["<base64 encoded image>"])
- Setup config.yaml
model_list:
  - model_name: cohere-embed
    litellm_params:
      model: cohere/embed-english-v3.0
      api_key: os.environ/COHERE_API_KEY
- Start proxy
litellm --config /path/to/config.yaml 
# RUNNING on http://0.0.0.0:4000
- Test it!
curl -X POST 'http://0.0.0.0:4000/v1/embeddings' \
-H 'Authorization: Bearer sk-54d77cd67b9febbb' \
-H 'Content-Type: application/json' \
-d '{
  "model": "cohere/embed-english-v3.0",
  "input": ["<base64 encoded image>"]
}'
Input Params for litellm.embedding()β
Any non-openai params, will be treated as provider-specific params, and sent in the request body as kwargs to the provider.
Required Fieldsβ
- 
model: string - ID of the model to use.model='text-embedding-ada-002'
- 
input: string or array - Input text to embed, encoded as a string or array of tokens. To embed multiple inputs in a single request, pass an array of strings or array of token arrays. The input must not exceed the max input tokens for the model (8192 tokens for text-embedding-ada-002), cannot be an empty string, and any array must be 2048 dimensions or less.
input=["good morning from litellm"]
Optional LiteLLM Fieldsβ
- 
user: string (optional) A unique identifier representing your end-user,
- 
dimensions: integer (Optional) The number of dimensions the resulting output embeddings should have. Only supported in OpenAI/Azure text-embedding-3 and later models.
- 
encoding_format: string (Optional) The format to return the embeddings in. Can be either"float"or"base64". Defaults toencoding_format="float"
- 
timeout: integer (Optional) - The maximum time, in seconds, to wait for the API to respond. Defaults to 600 seconds (10 minutes).
- 
api_base: string (optional) - The api endpoint you want to call the model with
- 
api_version: string (optional) - (Azure-specific) the api version for the call
- 
api_key: string (optional) - The API key to authenticate and authorize requests. If not provided, the default API key is used.
- 
api_type: string (optional) - The type of API to use.
Output from litellm.embedding()β
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.0022326677571982145,
        0.010749882087111473,
        ...
        ...
        ...
   
      ]
    }
  ],
  "model": "text-embedding-ada-002-v2",
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10
  }
}
OpenAI Embedding Modelsβ
Usageβ
from litellm import embedding
import os
os.environ['OPENAI_API_KEY'] = ""
response = embedding(
    model="text-embedding-3-small",
    input=["good morning from litellm", "this is another item"],
    metadata={"anything": "good day"},
    dimensions=5 # Only supported in text-embedding-3 and later models.
)
| Model Name | Function Call | Required OS Variables | 
|---|---|---|
| text-embedding-3-small | embedding('text-embedding-3-small', input) | os.environ['OPENAI_API_KEY'] | 
| text-embedding-3-large | embedding('text-embedding-3-large', input) | os.environ['OPENAI_API_KEY'] | 
| text-embedding-ada-002 | embedding('text-embedding-ada-002', input) | os.environ['OPENAI_API_KEY'] | 
OpenAI Compatible Embedding Modelsβ
Use this for calling /embedding endpoints on OpenAI Compatible Servers, example https://github.com/xorbitsai/inference
Note add openai/ prefix to model so litellm knows to route to OpenAI
Usageβ
from litellm import embedding
response = embedding(
  model = "openai/<your-llm-name>",     # add `openai/` prefix to model so litellm knows to route to OpenAI
  api_base="http://0.0.0.0:4000/"       # set API Base of your Custom OpenAI Endpoint
  input=["good morning from litellm"]
)
Bedrock Embeddingβ
API keysβ
This can be set as env variables or passed as params to litellm.embedding()
import os
os.environ["AWS_ACCESS_KEY_ID"] = ""  # Access key
os.environ["AWS_SECRET_ACCESS_KEY"] = "" # Secret access key
os.environ["AWS_REGION_NAME"] = "" # us-east-1, us-east-2, us-west-1, us-west-2
Usageβ
from litellm import embedding
response = embedding(
    model="amazon.titan-embed-text-v1",
    input=["good morning from litellm"],
)
print(response)
| Model Name | Function Call | 
|---|---|
| Titan Embeddings - G1 | embedding(model="amazon.titan-embed-text-v1", input=input) | 
| Cohere Embeddings - English | embedding(model="cohere.embed-english-v3", input=input) | 
| Cohere Embeddings - Multilingual | embedding(model="cohere.embed-multilingual-v3", input=input) | 
| TwelveLabs Marengo (Async) | embedding(model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0", input=input, input_type="text") | 
TwelveLabs Bedrock Embedding Modelsβ
TwelveLabs Marengo models support multimodal embeddings (text, image, video, audio) and require the input_type parameter to specify the input format.
Usageβ
from litellm import embedding
import os
# Set AWS credentials
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = "us-east-1"
# Text embedding
response = embedding(
    model="bedrock/us.twelvelabs.marengo-embed-2-7-v1:0",
    input=["Hello world from LiteLLM!"],
    input_type="text"  # Required parameter
)
# Image embedding (base64)
response = embedding(
    model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
    input=["data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQ..."],
    input_type="image",  # Required parameter
    output_s3_uri="s3://your-bucket/async-invoke-output/"
)
# Video embedding (S3 URL)
response = embedding(
    model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
    input=["s3://your-bucket/video.mp4"],
    input_type="video",  # Required parameter
    output_s3_uri="s3://your-bucket/async-invoke-output/"
)
Required Parametersβ
| Parameter | Description | Values | 
|---|---|---|
| input_type | Type of input content | "text","image","video","audio" | 
Supported Modelsβ
| Model Name | Function Call | Notes | 
|---|---|---|
| TwelveLabs Marengo 2.7 (Sync) | embedding(model="bedrock/us.twelvelabs.marengo-embed-2-7-v1:0", input=input, input_type="text") | Text embeddings only | 
| TwelveLabs Marengo 2.7 (Async) | embedding(model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0", input=input, input_type="text/image/video/audio") | All input types, requires output_s3_uri | 
Cohere Embedding Modelsβ
https://docs.cohere.com/reference/embed
Usageβ
from litellm import embedding
os.environ["COHERE_API_KEY"] = "cohere key"
# cohere call
response = embedding(
    model="embed-english-v3.0", 
    input=["good morning from litellm", "this is another item"], 
    input_type="search_document" # optional param for v3 llms
)
| Model Name | Function Call | 
|---|---|
| embed-english-v3.0 | embedding(model="embed-english-v3.0", input=["good morning from litellm", "this is another item"]) | 
| embed-english-light-v3.0 | embedding(model="embed-english-light-v3.0", input=["good morning from litellm", "this is another item"]) | 
| embed-multilingual-v3.0 | embedding(model="embed-multilingual-v3.0", input=["good morning from litellm", "this is another item"]) | 
| embed-multilingual-light-v3.0 | embedding(model="embed-multilingual-light-v3.0", input=["good morning from litellm", "this is another item"]) | 
| embed-english-v2.0 | embedding(model="embed-english-v2.0", input=["good morning from litellm", "this is another item"]) | 
| embed-english-light-v2.0 | embedding(model="embed-english-light-v2.0", input=["good morning from litellm", "this is another item"]) | 
| embed-multilingual-v2.0 | embedding(model="embed-multilingual-v2.0", input=["good morning from litellm", "this is another item"]) | 
NVIDIA NIM Embedding Modelsβ
API keysβ
This can be set as env variables or passed as params to litellm.embedding()
import os
os.environ["NVIDIA_NIM_API_KEY"] = ""  # api key
os.environ["NVIDIA_NIM_API_BASE"] = "" # nim endpoint url
Usageβ
from litellm import embedding
import os
os.environ['NVIDIA_NIM_API_KEY'] = ""
response = embedding(
    model='nvidia_nim/<model_name>', 
    input=["good morning from litellm"],
    input_type="query"
)
input_type Parameter for Embedding Modelsβ
Certain embedding models, such as nvidia/embed-qa-4 and the E5 family, operate in dual modesβone for indexing documents (passages) and another for querying. To maintain high retrieval accuracy, it's essential to specify how the input text is being used by setting the input_type parameter correctly.
Usageβ
Set the input_type parameter to one of the following values:
- "passage"β for embedding content during indexing (e.g., documents).
- "query"β for embedding content during retrieval (e.g., user queries).
Warning: Incorrect usage of
input_typecan lead to a significant drop in retrieval performance.
All models listed here are supported:
| Model Name | Function Call | 
|---|---|
| NV-Embed-QA | embedding(model="nvidia_nim/NV-Embed-QA", input) | 
| nvidia/nv-embed-v1 | embedding(model="nvidia_nim/nvidia/nv-embed-v1", input) | 
| nvidia/nv-embedqa-mistral-7b-v2 | embedding(model="nvidia_nim/nvidia/nv-embedqa-mistral-7b-v2", input) | 
| nvidia/nv-embedqa-e5-v5 | embedding(model="nvidia_nim/nvidia/nv-embedqa-e5-v5", input) | 
| nvidia/embed-qa-4 | embedding(model="nvidia_nim/nvidia/embed-qa-4", input) | 
| nvidia/llama-3.2-nv-embedqa-1b-v1 | embedding(model="nvidia_nim/nvidia/llama-3.2-nv-embedqa-1b-v1", input) | 
| nvidia/llama-3.2-nv-embedqa-1b-v2 | embedding(model="nvidia_nim/nvidia/llama-3.2-nv-embedqa-1b-v2", input) | 
| snowflake/arctic-embed-l | embedding(model="nvidia_nim/snowflake/arctic-embed-l", input) | 
| baai/bge-m3 | embedding(model="nvidia_nim/baai/bge-m3", input) | 
HuggingFace Embedding Modelsβ
LiteLLM supports all Feature-Extraction + Sentence Similarity Embedding models: https://huggingface.co/models?pipeline_tag=feature-extraction
Usageβ
from litellm import embedding
import os
os.environ['HUGGINGFACE_API_KEY'] = ""
response = embedding(
    model='huggingface/microsoft/codebert-base', 
    input=["good morning from litellm"]
)
Usage - Set input_typeβ
LiteLLM infers input type (feature-extraction or sentence-similarity) by making a GET request to the api base.
Override this, by setting the input_type yourself.
from litellm import embedding
import os
os.environ['HUGGINGFACE_API_KEY'] = ""
response = embedding(
    model='huggingface/microsoft/codebert-base', 
    input=["good morning from litellm", "you are a good bot"],
    api_base = "https://p69xlsj6rpno5drq.us-east-1.aws.endpoints.huggingface.cloud", 
    input_type="sentence-similarity"
)
Usage - Custom API Baseβ
from litellm import embedding
import os
os.environ['HUGGINGFACE_API_KEY'] = ""
response = embedding(
    model='huggingface/microsoft/codebert-base', 
    input=["good morning from litellm"],
    api_base = "https://p69xlsj6rpno5drq.us-east-1.aws.endpoints.huggingface.cloud"
)
| Model Name | Function Call | Required OS Variables | 
|---|---|---|
| microsoft/codebert-base | embedding('huggingface/microsoft/codebert-base', input=input) | os.environ['HUGGINGFACE_API_KEY'] | 
| BAAI/bge-large-zh | embedding('huggingface/BAAI/bge-large-zh', input=input) | os.environ['HUGGINGFACE_API_KEY'] | 
| any-hf-embedding-model | embedding('huggingface/hf-embedding-model', input=input) | os.environ['HUGGINGFACE_API_KEY'] | 
Mistral AI Embedding Modelsβ
All models listed here https://docs.mistral.ai/platform/endpoints are supported
Usageβ
from litellm import embedding
import os
os.environ['MISTRAL_API_KEY'] = ""
response = embedding(
    model="mistral/mistral-embed",
    input=["good morning from litellm"],
)
print(response)
| Model Name | Function Call | 
|---|---|
| mistral-embed | embedding(model="mistral/mistral-embed", input) | 
Gemini AI Embedding Modelsβ
API keysβ
This can be set as env variables or passed as params to litellm.embedding()
import os
os.environ["GEMINI_API_KEY"] = ""
Usage - Embeddingβ
from litellm import embedding
response = embedding(
  model="gemini/text-embedding-004",
  input=["good morning from litellm"],
)
print(response)
All models listed here are supported:
| Model Name | Function Call | 
|---|---|
| text-embedding-004 | embedding(model="gemini/text-embedding-004", input) | 
Vertex AI Embedding Modelsβ
Usage - Embeddingβ
import litellm
from litellm import embedding
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1"  # proj location
response = embedding(
    model="vertex_ai/textembedding-gecko",
    input=["good morning from litellm"],
)
print(response)
Supported Modelsβ
All models listed here are supported
| Model Name | Function Call | 
|---|---|
| textembedding-gecko | embedding(model="vertex_ai/textembedding-gecko", input) | 
| textembedding-gecko-multilingual | embedding(model="vertex_ai/textembedding-gecko-multilingual", input) | 
| textembedding-gecko-multilingual@001 | embedding(model="vertex_ai/textembedding-gecko-multilingual@001", input) | 
| textembedding-gecko@001 | embedding(model="vertex_ai/textembedding-gecko@001", input) | 
| textembedding-gecko@003 | embedding(model="vertex_ai/textembedding-gecko@003", input) | 
| text-embedding-preview-0409 | embedding(model="vertex_ai/text-embedding-preview-0409", input) | 
| text-multilingual-embedding-preview-0409 | embedding(model="vertex_ai/text-multilingual-embedding-preview-0409", input) | 
Voyage AI Embedding Modelsβ
Usage - Embeddingβ
from litellm import embedding
import os
os.environ['VOYAGE_API_KEY'] = ""
response = embedding(
    model="voyage/voyage-01",
    input=["good morning from litellm"],
)
print(response)
Supported Modelsβ
All models listed here https://docs.voyageai.com/embeddings/#models-and-specifics are supported
| Model Name | Function Call | 
|---|---|
| voyage-01 | embedding(model="voyage/voyage-01", input) | 
| voyage-lite-01 | embedding(model="voyage/voyage-lite-01", input) | 
| voyage-lite-01-instruct | embedding(model="voyage/voyage-lite-01-instruct", input) | 
Provider-specific Paramsβ
Any non-openai params, will be treated as provider-specific params, and sent in the request body as kwargs to the provider.
Exampleβ
Cohere v3 Models have a required parameter: input_type, it can be one of the following four values:
- input_type="search_document": (default) Use this for texts (documents) you want to store in your vector database
- input_type="search_query": Use this for search queries to find the most relevant documents in your vector database
- input_type="classification": Use this if you use the embeddings as an input for a classification system
- input_type="clustering": Use this if you use the embeddings for text clustering
https://txt.cohere.com/introducing-embed-v3/
- SDK
- PROXY
from litellm import embedding
os.environ["COHERE_API_KEY"] = "cohere key"
# cohere call
response = embedding(
    model="embed-english-v3.0", 
    input=["good morning from litellm", "this is another item"], 
    input_type="search_document" # π PROVIDER-SPECIFIC PARAM
)
via config
model_list:
  - model_name: "cohere-embed"
    litellm_params:
      model: embed-english-v3.0
      input_type: search_document # π PROVIDER-SPECIFIC PARAM
via request
curl -X POST 'http://0.0.0.0:4000/v1/embeddings' \
-H 'Authorization: Bearer sk-54d77cd67b9febbb' \
-H 'Content-Type: application/json' \
-d '{
  "model": "cohere-embed",
  "input": ["Are you authorized to work in United States of America?"],
  "input_type": "search_document" # π PROVIDER-SPECIFIC PARAM
}'
Nebius AI Studio Embedding Modelsβ
Usage - Embeddingβ
from litellm import embedding
import os
os.environ['NEBIUS_API_KEY'] = ""
response = embedding(
    model="nebius/BAAI/bge-en-icl",
    input=["Good morning from litellm!"],
)
print(response)
Supported Modelsβ
All supported models can be found here: https://studio.nebius.ai/models/embedding
| Model Name | Function Call | 
|---|---|
| BAAI/bge-en-icl | embedding(model="nebius/BAAI/bge-en-icl", input) | 
| BAAI/bge-multilingual-gemma2 | embedding(model="nebius/BAAI/bge-multilingual-gemma2", input) | 
| intfloat/e5-mistral-7b-instruct | embedding(model="nebius/intfloat/e5-mistral-7b-instruct", input) |