litellm

https://github.com/BerriAI/litellm/tree/main

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

LiteLLM manages:

Translate inputs to provider's completion, embedding, and image_generation endpoints

Consistent output, text responses will always be available at ['choices'][0]['message']['content']

Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router

Set Budgets & Rate limits per project, api key, model LiteLLM Proxy Server (LLM Gateway)

Jump to LiteLLM Proxy (LLM Gateway) Docs
Jump to Supported LLM Providers

🚨 Stable Release: Use docker images with the -stable tag. These have undergone 12 hour load tests, before being published.

Support for more providers. Missing a provider or LLM Platform, raise a feature request.

Proxy Config.yaml

https://docs.litellm.ai/docs/proxy/configs#:~:text=Quick%20Start.%20Set%20a%20model%20alias%20for%20your

model_list:
  - model_name: gpt-3.5-turbo ### RECEIVED MODEL NAME ###
    litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
      model: azure/gpt-turbo-small-eu ### MODEL NAME sent to `litellm.completion()` ###
      api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
      api_key: "os.environ/AZURE_API_KEY_EU" # does os.getenv("AZURE_API_KEY_EU")
      rpm: 6      # [OPTIONAL] Rate limit for this deployment: in requests per minute (rpm)
  - model_name: bedrock-claude-v1 
    litellm_params:
      model: bedrock/anthropic.claude-instant-v1
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-small-ca
      api_base: https://my-endpoint-canada-berri992.openai.azure.com/
      api_key: "os.environ/AZURE_API_KEY_CA"
      rpm: 6
  - model_name: anthropic-claude
    litellm_params: 
      model: bedrock/anthropic.claude-instant-v1
      ### [OPTIONAL] SET AWS REGION ###
      aws_region_name: us-east-1
  - model_name: vllm-models
    litellm_params:
      model: openai/facebook/opt-125m # the `openai/` prefix tells litellm it's openai compatible
      api_base: http://0.0.0.0:4000/v1
      api_key: none
      rpm: 1440
    model_info: 
      version: 2
  
  # Use this if you want to make requests to `claude-3-haiku-20240307`,`claude-3-opus-20240229`,`claude-2.1` without defining them on the config.yaml
  # Default models
  # Works for ALL Providers and needs the default provider credentials in .env
  - model_name: "*" 
    litellm_params:
      model: "*"

litellm_settings: # module level litellm settings - https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py
  drop_params: True
  success_callback: ["langfuse"] # OPTIONAL - if you want to start sending LLM Logs to Langfuse. Make sure to set `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` in your env

general_settings: 
  master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
  alerting: ["slack"] # [OPTIONAL] If you want Slack Alerts for Hanging LLM requests, Slow llm responses, Budget Alerts. Make sure to set `SLACK_WEBHOOK_URL` in your env

Step 2: Start Proxy with config
$ litellm --config /path/to/config.yaml
tip
Run with --detailed_debug if you need detailed debug logs
$ litellm --config /path/to/config.yaml --detailed_debug
Step 3: Test it

Sends request to model where model_name=gpt-3.5-turbo on config.yaml.

If multiple with model_name=gpt-3.5-turbo does Load Balancing

Langchain, OpenAI SDK Usage Examples
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "gpt-3.5-turbo",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ],
    }
'

https://docs.litellm.ai/docs/proxy/quick_start

Quick Start

Quick start CLI, Config, Docker

LiteLLM Server (LLM Gateway) manages:

Unified Interface: Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format

Cost tracking: Authentication, Spend Tracking & Budgets Virtual Keys

Load Balancing: between Multiple Models + Deployments of the same model - LiteLLM proxy can handle 1.5k+ requests/second during load tests.
$ pip install 'litellm[proxy]'

https://docs.litellm.ai/docs/completion/input

Provider temperature max_completion_tokens max_tokens top_p stream stream_options stop n presence_penalty frequency_penalty functions function_call logit_bias user response_format seed tools tool_choice logprobs top_logprobs extra_headers

Anthropic ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅

OpenAI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅

Azure OpenAI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅

Replicate ✅ ✅ ✅ ✅ ✅ ✅

Anyscale ✅ ✅ ✅ ✅ ✅ ✅

Cohere ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅

Huggingface ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅

Openrouter ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅

AI21 ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅

VertexAI ✅ ✅ ✅ ✅ ✅ ✅ ✅

Bedrock ✅ ✅ ✅ ✅ ✅ ✅ ✅ (model dependent)

Sagemaker ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅

TogetherAI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅

AlephAlpha ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅

NLP Cloud ✅ ✅ ✅ ✅ ✅ ✅

Petals ✅ ✅ ✅ ✅

Ollama ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅

Databricks ✅ ✅ ✅ ✅ ✅ ✅

ClarifAI ✅ ✅ ✅ ✅ ✅

Github ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ (model dependent) ✅ (model dependent)

成功案例

https://github.com/wandb/openui/tree/main

https://www.tinyash.com/blog/litellm/#:~:text=litellm

https://zhuanlan.zhihu.com/p/692686053#:~:text=LiteLLM%E7%9A%84

posted @ 2024-10-05 15:22 lightsong 阅读(58) 评论(0) 编辑收藏举报

刷新页面返回顶部

Stay Hungry,Stay Foolish!

lightsong

{Web: [React, Vue, NodeJS, HTTP]，DevOps:[Jenkins,Docker,K8S], Languages:[Python, JS, C, Lua, Shell, Groovy]}

litellm

litellm

Proxy Config.yaml

Step 2: Start Proxy with config

Step 3: Test it

Quick Start

成功案例

公告

Provider	temperature	max_completion_tokens	max_tokens	top_p	stream	stream_options	stop	n	presence_penalty	frequency_penalty	functions	function_call	logit_bias	user	response_format	seed	tools	tool_choice	logprobs	top_logprobs	extra_headers
Anthropic	✅	✅	✅	✅	✅	✅	✅							✅	✅	✅	✅	✅			✅
OpenAI	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
Azure OpenAI	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅			✅
Replicate	✅	✅	✅	✅	✅	✅
Anyscale	✅	✅	✅	✅	✅	✅
Cohere	✅	✅	✅	✅	✅	✅	✅	✅	✅
Huggingface	✅	✅	✅	✅	✅	✅	✅	✅
Openrouter	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅					✅	✅
AI21	✅	✅	✅	✅	✅	✅	✅	✅	✅
VertexAI	✅	✅	✅		✅	✅									✅	✅
Bedrock	✅	✅	✅	✅	✅	✅										✅ (model dependent)
Sagemaker	✅	✅	✅	✅	✅	✅	✅	✅
TogetherAI	✅	✅	✅	✅	✅	✅						✅			✅		✅	✅
AlephAlpha	✅	✅	✅	✅	✅	✅	✅	✅
NLP Cloud	✅	✅	✅	✅	✅	✅
Petals	✅	✅		✅	✅
Ollama	✅	✅	✅	✅	✅	✅			✅					✅			✅
Databricks	✅	✅	✅	✅	✅	✅
ClarifAI	✅	✅	✅		✅	✅
Github	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅					✅	✅ (model dependent)	✅ (model dependent)