通过litellm + ollma 试用autolabel
ollama 当前版本对于openai api 的兼容实际上部分是有问题的(目前官方在进行修改,但是暂时还没发布),我们可以通过litelmm 的proxy 模式提供openaia 兼容的api,同时可以进行灵活的改写(比如openai 的gpt-3.5-turbo 实际使用的是michaelborck/refuled ),以下是一个简单试用
环境准备
- litellm
核心是配置,同时推荐配置一个db 支持
config.yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: ollama/michaelborck/refuled
api_base: http://172.16.1.205:11434
api_key: demo
rpm: 600
- model_name: text-embedding-ada-002
litellm_params:
model: ollama/michaelborck/refuled
api_base: http://localhost:11434
api_key: demo
- model_name: gpt-3.5-turbo
litellm_params:
model: ollama/michaelborck/refuled
api_base: http://localhost:11434
api_key: demo
rpm: 600
router_settings:
routing_strategy: usage-based-routing-v2
litellm_settings:
drop_params: true
general_settings:
store_model_in_db: true
master_key: sk-1234
database_url: "postgresql://postgres:postgres@localhost:5432/litellm"
启动服务
litellm --config ./config.yaml
备注: 启动之后需要创建api key,后边需要使用
autolabel 使用
基于了官方的bank 分类场景,里边使用了semantic_similarity 处理,上边对于text-embedding-ada-002 的配置就比较重要,否则会有提示找不到embedding 模型的问题
- banking 分类处理
示例数据下载,对于test.csv 为了测试我删除了部分数据(标签数据),同时修改名称为testv3.csv
from autolabel import get_data
get_data('banking')
app.py
from autolabel import LabelingAgent, AutolabelDataset
config = {
"task_name": "BankingComplaintsClassification",
"task_type": "classification",
"dataset": {
"label_column": "label",
"delimiter": ","
},
"model": {
"provider": "openai",
"name": "gpt-3.5-turbo"
},
"prompt": {
"task_guidelines": "You are an expert at understanding bank customers support complaints and queries.\nYour job is to correctly classify the provided input example into one of the following categories.\nCategories:\n{labels}",
"output_guidelines": "You will answer with just the correct output label and nothing else.",
"labels": [
"activate_my_card",
"age_limit",
"apple_pay_or_google_pay",
"atm_support",
"automatic_top_up",
"balance_not_updated_after_bank_transfer",
"balance_not_updated_after_cheque_or_cash_deposit",
"beneficiary_not_allowed",
"cancel_transfer",
"card_about_to_expire",
"card_acceptance",
"card_arrival",
"card_delivery_estimate",
"card_linking",
"card_not_working",
"card_payment_fee_charged",
"card_payment_not_recognised",
"card_payment_wrong_exchange_rate",
"card_swallowed",
"cash_withdrawal_charge",
"cash_withdrawal_not_recognised",
"change_pin",
"compromised_card",
"contactless_not_working",
"country_support",
"declined_card_payment",
"declined_cash_withdrawal",
"declined_transfer",
"direct_debit_payment_not_recognised",
"disposable_card_limits",
"edit_personal_details",
"exchange_charge",
"exchange_rate",
"exchange_via_app",
"extra_charge_on_statement",
"failed_transfer",
"fiat_currency_support",
"get_disposable_virtual_card",
"get_physical_card",
"getting_spare_card",
"getting_virtual_card",
"lost_or_stolen_card",
"lost_or_stolen_phone",
"order_physical_card",
"passcode_forgotten",
"pending_card_payment",
"pending_cash_withdrawal",
"pending_top_up",
"pending_transfer",
"pin_blocked",
"receiving_money",
"Refund_not_showing_up",
"request_refund",
"reverted_card_payment?",
"supported_cards_and_currencies",
"terminate_account",
"top_up_by_bank_transfer_charge",
"top_up_by_card_charge",
"top_up_by_cash_or_cheque",
"top_up_failed",
"top_up_limits",
"top_up_reverted",
"topping_up_by_card",
"transaction_charged_twice",
"transfer_fee_charged",
"transfer_into_account",
"transfer_not_received_by_recipient",
"transfer_timing",
"unable_to_verify_identity",
"verify_my_identity",
"verify_source_of_funds",
"verify_top_up",
"virtual_card_not_working",
"visa_or_mastercard",
"why_verify_identity",
"wrong_amount_of_cash_received",
"wrong_exchange_rate_for_cash_withdrawal"
],
"few_shot_examples": "seed.csv",
"few_shot_selection": "semantic_similarity",
"few_shot_num": 10,
"example_template": "Input: {example}\nOutput: {label}"
}
}
agent = LabelingAgent(config=config)
ds = AutolabelDataset("testv3.csv", config=config)
agent.plan(ds)
ds = agent.run(ds, max_items=100)
ds.df.to_csv("demoapp_resultv3.csv")
print(ds.df.sample(50))
- 运行
注意因为默认使用的是openai 的,我们需要通过环境变量,让openai sdk 使用我们服务的地址
对于api 地址我配置了两个,似乎OPENAI_BASE_URL 不生效,OPENAI_API_BASE 才可以,为了简单配置了两个
export OPENAI_API_KEY="sk-vphS5nLiuE0Htmy3wtrhsw" # change to your api key
export OPENAI_BASE_URL="http://localhost:4000"
export OPENAI_API_BASE="http://localhost:4000"
- 效果
实际效果还算不错
对于删除标记的数据,目前也可以正确的生成
说明
因为默认autolabel 对于ollama 支持不是很好,所以基于litellm 进行了兼容适配(同时也可以解决一些嵌入模型的问题),基于autolabel 的自动数据标注还是很不错的,同时
refuel 团队提供的基于llama 微调版本的模型也发布了,我们可以基于gguf 制作一个ollama 模型,方式使用,目前使用的是一个老版本的
参考资料
https://github.com/refuel-ai/autolabel
https://docs.refuel.ai/
https://docs.litellm.ai/docs/
https://ollama.com/blog/embedding-models
https://ollama.com/library/nomic-embed-text