通过litellm + ollma 试用autolabel

ollama 当前版本对于openai api 的兼容实际上部分是有问题的(目前官方在进行修改,但是暂时还没发布),我们可以通过litelmm 的proxy 模式提供openaia 兼容的api,同时可以进行灵活的改写(比如openai 的gpt-3.5-turbo 实际使用的是michaelborck/refuled ),以下是一个简单试用

环境准备

  • litellm
    核心是配置,同时推荐配置一个db 支持
    config.yaml
model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:  
      model: ollama/michaelborck/refuled
      api_base: http://172.16.1.205:11434
      api_key: demo
      rpm: 600
  - model_name: text-embedding-ada-002
    litellm_params: 
      model: ollama/michaelborck/refuled
      api_base: http://localhost:11434
      api_key: demo
  - model_name: gpt-3.5-turbo
    litellm_params: 
      model: ollama/michaelborck/refuled
      api_base: http://localhost:11434
      api_key: demo
      rpm: 600
router_settings:
  routing_strategy: usage-based-routing-v2 
litellm_settings:
  drop_params: true
general_settings: 
  store_model_in_db: true
  master_key: sk-1234 
  database_url: "postgresql://postgres:postgres@localhost:5432/litellm"

启动服务

litellm --config ./config.yaml

备注: 启动之后需要创建api key,后边需要使用

autolabel 使用

基于了官方的bank 分类场景,里边使用了semantic_similarity 处理,上边对于text-embedding-ada-002 的配置就比较重要,否则会有提示找不到embedding 模型的问题

  • banking 分类处理
    示例数据下载,对于test.csv 为了测试我删除了部分数据(标签数据),同时修改名称为testv3.csv
from autolabel import get_data
get_data('banking')

app.py

from autolabel import LabelingAgent, AutolabelDataset
config = {
    "task_name": "BankingComplaintsClassification",
    "task_type": "classification",
    "dataset": {
        "label_column": "label",
        "delimiter": ","
    },
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo"
    },
    "prompt": {
        "task_guidelines": "You are an expert at understanding bank customers support complaints and queries.\nYour job is to correctly classify the provided input example into one of the following categories.\nCategories:\n{labels}",
        "output_guidelines": "You will answer with just the correct output label and nothing else.",
        "labels": [
            "activate_my_card",
            "age_limit",
            "apple_pay_or_google_pay",
            "atm_support",
            "automatic_top_up",
            "balance_not_updated_after_bank_transfer",
            "balance_not_updated_after_cheque_or_cash_deposit",
            "beneficiary_not_allowed",
            "cancel_transfer",
            "card_about_to_expire",
            "card_acceptance",
            "card_arrival",
            "card_delivery_estimate",
            "card_linking",
            "card_not_working",
            "card_payment_fee_charged",
            "card_payment_not_recognised",
            "card_payment_wrong_exchange_rate",
            "card_swallowed",
            "cash_withdrawal_charge",
            "cash_withdrawal_not_recognised",
            "change_pin",
            "compromised_card",
            "contactless_not_working",
            "country_support",
            "declined_card_payment",
            "declined_cash_withdrawal",
            "declined_transfer",
            "direct_debit_payment_not_recognised",
            "disposable_card_limits",
            "edit_personal_details",
            "exchange_charge",
            "exchange_rate",
            "exchange_via_app",
            "extra_charge_on_statement",
            "failed_transfer",
            "fiat_currency_support",
            "get_disposable_virtual_card",
            "get_physical_card",
            "getting_spare_card",
            "getting_virtual_card",
            "lost_or_stolen_card",
            "lost_or_stolen_phone",
            "order_physical_card",
            "passcode_forgotten",
            "pending_card_payment",
            "pending_cash_withdrawal",
            "pending_top_up",
            "pending_transfer",
            "pin_blocked",
            "receiving_money",
            "Refund_not_showing_up",
            "request_refund",
            "reverted_card_payment?",
            "supported_cards_and_currencies",
            "terminate_account",
            "top_up_by_bank_transfer_charge",
            "top_up_by_card_charge",
            "top_up_by_cash_or_cheque",
            "top_up_failed",
            "top_up_limits",
            "top_up_reverted",
            "topping_up_by_card",
            "transaction_charged_twice",
            "transfer_fee_charged",
            "transfer_into_account",
            "transfer_not_received_by_recipient",
            "transfer_timing",
            "unable_to_verify_identity",
            "verify_my_identity",
            "verify_source_of_funds",
            "verify_top_up",
            "virtual_card_not_working",
            "visa_or_mastercard",
            "why_verify_identity",
            "wrong_amount_of_cash_received",
            "wrong_exchange_rate_for_cash_withdrawal"
        ],
        "few_shot_examples": "seed.csv",
        "few_shot_selection": "semantic_similarity",
        "few_shot_num": 10,
        "example_template": "Input: {example}\nOutput: {label}"
    }
}
 
agent = LabelingAgent(config=config)
ds = AutolabelDataset("testv3.csv", config=config)
agent.plan(ds)
ds = agent.run(ds, max_items=100)
ds.df.to_csv("demoapp_resultv3.csv")
print(ds.df.sample(50))
  • 运行
    注意因为默认使用的是openai 的,我们需要通过环境变量,让openai sdk 使用我们服务的地址
    对于api 地址我配置了两个,似乎OPENAI_BASE_URL 不生效,OPENAI_API_BASE 才可以,为了简单配置了两个
export OPENAI_API_KEY="sk-vphS5nLiuE0Htmy3wtrhsw" # change to your api key
export OPENAI_BASE_URL="http://localhost:4000" 
export OPENAI_API_BASE="http://localhost:4000"
  • 效果

实际效果还算不错


对于删除标记的数据,目前也可以正确的生成

说明

因为默认autolabel 对于ollama 支持不是很好,所以基于litellm 进行了兼容适配(同时也可以解决一些嵌入模型的问题),基于autolabel 的自动数据标注还是很不错的,同时
refuel 团队提供的基于llama 微调版本的模型也发布了,我们可以基于gguf 制作一个ollama 模型,方式使用,目前使用的是一个老版本的

参考资料

https://github.com/refuel-ai/autolabel
https://docs.refuel.ai/
https://docs.litellm.ai/docs/
https://ollama.com/blog/embedding-models
https://ollama.com/library/nomic-embed-text

posted on 2024-08-23 08:00  荣锋亮  阅读(3)  评论(0编辑  收藏  举报

导航