通过litellm + ollma 试用autolabel

ollama 当前版本对于openai api 的兼容实际上部分是有问题的(目前官方在进行修改，但是暂时还没发布)，我们可以通过litelmm 的proxy 模式提供openaia 兼容的api，同时可以进行灵活的改写（比如openai 的gpt-3.5-turbo 实际使用的是michaelborck/refuled ），以下是一个简单试用

环境准备

litellm
核心是配置，同时推荐配置一个db 支持
config.yaml

model_list:

  - model_name: gpt-3.5-turbo

    litellm_params:  

      model: ollama/michaelborck/refuled

      api_base: http://172.16.1.205:11434

      api_key: demo

      rpm: 600

  - model_name: text-embedding-ada-002

    litellm_params: 

      model: ollama/michaelborck/refuled

      api_base: http://localhost:11434

      api_key: demo

  - model_name: gpt-3.5-turbo

    litellm_params: 

      model: ollama/michaelborck/refuled

      api_base: http://localhost:11434

      api_key: demo

      rpm: 600

router_settings:

  routing_strategy: usage-based-routing-v2 

litellm_settings:

  drop_params: true

general_settings: 

  store_model_in_db: true

  master_key: sk-1234 

  database_url: "postgresql://postgres:postgres@localhost:5432/litellm"

启动服务

litellm --config ./config.yaml

备注：启动之后需要创建api key，后边需要使用

autolabel 使用

基于了官方的bank 分类场景，里边使用了semantic_similarity 处理，上边对于text-embedding-ada-002 的配置就比较重要，否则会有提示找不到embedding 模型的问题

banking 分类处理
示例数据下载,对于test.csv 为了测试我删除了部分数据（标签数据），同时修改名称为testv3.csv

from autolabel import get_data

get_data('banking')

app.py

from autolabel import LabelingAgent, AutolabelDataset

config = {

    "task_name": "BankingComplaintsClassification",

    "task_type": "classification",

    "dataset": {

        "label_column": "label",

        "delimiter": ","

    },

    "model": {

        "provider": "openai",

        "name": "gpt-3.5-turbo"

    },

    "prompt": {

        "task_guidelines": "You are an expert at understanding bank customers support complaints and queries.\nYour job is to correctly classify the provided input example into one of the following categories.\nCategories:\n{labels}",

        "output_guidelines": "You will answer with just the correct output label and nothing else.",

        "labels": [

            "activate_my_card",

            "age_limit",

            "apple_pay_or_google_pay",

            "atm_support",

            "automatic_top_up",

            "balance_not_updated_after_bank_transfer",

            "balance_not_updated_after_cheque_or_cash_deposit",

            "beneficiary_not_allowed",

            "cancel_transfer",

            "card_about_to_expire",

            "card_acceptance",

            "card_arrival",

            "card_delivery_estimate",

            "card_linking",

            "card_not_working",

            "card_payment_fee_charged",

            "card_payment_not_recognised",

            "card_payment_wrong_exchange_rate",

            "card_swallowed",

            "cash_withdrawal_charge",

            "cash_withdrawal_not_recognised",

            "change_pin",

            "compromised_card",

            "contactless_not_working",

            "country_support",

            "declined_card_payment",

            "declined_cash_withdrawal",

            "declined_transfer",

            "direct_debit_payment_not_recognised",

            "disposable_card_limits",

            "edit_personal_details",

            "exchange_charge",

            "exchange_rate",

            "exchange_via_app",

            "extra_charge_on_statement",

            "failed_transfer",

            "fiat_currency_support",

            "get_disposable_virtual_card",

            "get_physical_card",

            "getting_spare_card",

            "getting_virtual_card",

            "lost_or_stolen_card",

            "lost_or_stolen_phone",

            "order_physical_card",

            "passcode_forgotten",

            "pending_card_payment",

            "pending_cash_withdrawal",

            "pending_top_up",

            "pending_transfer",

            "pin_blocked",

            "receiving_money",

            "Refund_not_showing_up",

            "request_refund",

            "reverted_card_payment?",

            "supported_cards_and_currencies",

            "terminate_account",

            "top_up_by_bank_transfer_charge",

            "top_up_by_card_charge",

            "top_up_by_cash_or_cheque",

            "top_up_failed",

            "top_up_limits",

            "top_up_reverted",

            "topping_up_by_card",

            "transaction_charged_twice",

            "transfer_fee_charged",

            "transfer_into_account",

            "transfer_not_received_by_recipient",

            "transfer_timing",

            "unable_to_verify_identity",

            "verify_my_identity",

            "verify_source_of_funds",

            "verify_top_up",

            "virtual_card_not_working",

            "visa_or_mastercard",

            "why_verify_identity",

            "wrong_amount_of_cash_received",

            "wrong_exchange_rate_for_cash_withdrawal"

        ],

        "few_shot_examples": "seed.csv",

        "few_shot_selection": "semantic_similarity",

        "few_shot_num": 10,

        "example_template": "Input: {example}\nOutput: {label}"

    }

}
 
agent = LabelingAgent(config=config)

ds = AutolabelDataset("testv3.csv", config=config)

agent.plan(ds)

ds = agent.run(ds, max_items=100)

ds.df.to_csv("demoapp_resultv3.csv")

print(ds.df.sample(50))

运行
注意因为默认使用的是openai 的，我们需要通过环境变量，让openai sdk 使用我们服务的地址
对于api 地址我配置了两个，似乎OPENAI_BASE_URL 不生效，OPENAI_API_BASE 才可以，为了简单配置了两个

export OPENAI_API_KEY="sk-vphS5nLiuE0Htmy3wtrhsw" # change to your api key

export OPENAI_BASE_URL="http://localhost:4000" 

export OPENAI_API_BASE="http://localhost:4000"

效果

实际效果还算不错

对于删除标记的数据，目前也可以正确的生成

说明

因为默认autolabel 对于ollama 支持不是很好，所以基于litellm 进行了兼容适配（同时也可以解决一些嵌入模型的问题），基于autolabel 的自动数据标注还是很不错的，同时
refuel 团队提供的基于llama 微调版本的模型也发布了，我们可以基于gguf 制作一个ollama 模型，方式使用，目前使用的是一个老版本的

参考资料

https://github.com/refuel-ai/autolabel
https://docs.refuel.ai/
https://docs.litellm.ai/docs/
https://ollama.com/blog/embedding-models
https://ollama.com/library/nomic-embed-text

posted on 2024-08-23 08:00 荣锋亮阅读(115) 评论(0) 编辑收藏举报

刷新页面返回顶部

rongfengliang-荣锋亮