An Introductory Guide to Fine-Tuning LLMs

https://www.datacamp.com/tutorial/fine-tuning-large-language-models

Fine-tuning Large Language Models (LLMs) has revolutionized Natural Language Processing (NLP), offering unprecedented capabilities in tasks like language translation, sentiment analysis, and text generation. This transformative approach leverages pre-trained models like GPT-2, enhancing their performance on specific domains through the fine-tuning process.

Over the last year and a half, the field of natural language processing (NLP) has undergone a significant transformation due to the popularization of Large Language Models (LLMs). The natural language skills that these models present have allowed applications that seemed impossible to achieve a few years ago.

LLMs are pushing the boundaries of what was previously considered achievable with capabilities ranging from language translation to sentiment analysis and text generation.

However, we all know training such models is time-consuming and expensive. This is why, fine-tuning large language models is important for tailoring these advanced algorithms to specific tasks or domains.

This process enhances the model's performance on specialized tasks and significantly broadens its applicability across various fields. This means we can take advantage of the Natural Language Processing capacity of pre-trained LLMs and further train them to perform our specific tasks.

Today, explore the essence of pre-trained language models and further delve into the fine-tuning process.

So, let’s navigate through practical steps for fine-tuning a model like GPT-2 using Hugging Face.

Fine-tuning vs. RAG

RAG combines the strengths of retrieval-based models and generative models. In RAG, a retriever component searches a large database or knowledge base to find relevant information based on the input query. This retrieved information is then used by a generative model to produce a more accurate and contextually relevant response. Key benefits of RAG include:

Dynamic knowledge integration: Incorporates real-time information from external sources, making it suitable for tasks requiring up-to-date or specific knowledge.

Contextual relevance: Enhances the generative model’s responses by providing additional context from the retrieved documents.

Versatility: Can handle a wider range of queries, including those requiring specific or rare information that the model may not have been trained on.

Choosing between fine-tuning and RAG

When deciding whether to use fine-tuning or RAG, consider the following factors:

Nature of the task: For tasks that benefit from highly specialized models (e.g., domain-specific applications), fine-tuning is often the preferred approach. RAG is ideal for tasks that require integration of external knowledge or real-time information retrieval.

Data availability: Fine-tuning requires a substantial amount of labeled data specific to the task. If such data is scarce, RAG’s retrieval component can compensate by providing relevant information from external sources.

Resource constraints: Fine-tuning can be computationally intensive, whereas RAG leverages existing databases to supplement the generative model, potentially reducing the need for extensive training.

Fine-Tuning LLaMA 2: A Step-by-Step Guide to Customizing the Large Language Model

https://www.datacamp.com/tutorial/fine-tuning-llama-2

微调框架

moreh

https://docs.moreh.io/tutorials/

Fine-tuning Tutorials

This tutorial is for anyone who wants to fine-tune powerful large language models such as Llama2, Mistral for their own projects. We will walk you through the steps to fine-tune these large language models (LLMs) with MoAI Platform.

Llama3 8B

Llama3 70B

Mistral

GPT

Qwen

Baichuan2

Llama2 13B

Fine-tuning in machine learning involves adjusting a pre-trained machine learning model's weight on new data to enhance task-specific performance. Essentially, when you want to apply an AI model to a new task, you take an existing model and optimize it with new datasets. This allows you to customize the model to meet your specific needs and domain requirements.

A pre-trained model has a large number of parameters designed for general-purpose use, and effectively fine-tuning such a large model requires a sufficient amount of training data.

With the MoAI Platform, you can easily apply optimized parallelization techniques that consider the GPU's memory size, significantly reducing the time and effort needed before starting training.

#What you will learn here:

Loading datasets, models, and tokenizers

Running training and checking results

Applying automatic parallelization

Choosing the right training environment and AI accelerators

LLaMA-Factory

https://github.com/hiyouga/LLaMA-Factory

Features

Various models: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, Yi, Gemma, Baichuan, ChatGLM, Phi, etc.

Integrated methods: (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO, etc.

Scalable resources: 16-bit full-tuning, freeze-tuning, LoRA and 2/3/4/5/6/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ.

Advanced algorithms: GaLore, BAdam, DoRA, LongLoRA, LLaMA Pro, Mixture-of-Depths, LoRA+, LoftQ, PiSSA and Agent tuning.

Practical tricks: FlashAttention-2, Unsloth, RoPE scaling, NEFTune and rsLoRA.

Experiment monitors: LlamaBoard, TensorBoard, Wandb, MLflow, etc.

Faster inference: OpenAI-style API, Gradio UI and CLI with vLLM worker.

swift

https://github.com/modelscope/swift

SWIFT supports training(PreTraining/Fine-tuning/RLHF), inference, evaluation and deployment of 300+ LLMs and 50+ MLLMs (multimodal large models). Developers can directly apply our framework to their own research and production environments to realize the complete workflow from model training and evaluation to application. In addition to supporting the lightweight training solutions provided by PEFT, we also provide a complete Adapters library to support the latest training techniques such as NEFTune, LoRA+, LLaMA-PRO, etc. This adapter library can be used directly in your own custom workflow without our training scripts.

To facilitate use by users unfamiliar with deep learning, we provide a Gradio web-ui for controlling training and inference, as well as accompanying deep learning courses and best practices for beginners. SWIFT web-ui is available both on Huggingface space and ModelScope studio, please feel free to try!

SWIFT has rich documentations for users, please feel free to check our documentation website:

xtuner

https://github.com/InternLM/xtuner

https://xtuner.readthedocs.io/zh-cn/latest/training/multi_modal_dataset.html

XTuner is an efficient, flexible and full-featured toolkit for fine-tuning large models.

Efficient

Support LLM, VLM pre-training / fine-tuning on almost all GPUs. XTuner is capable of fine-tuning 7B LLM on a single 8GB GPU, as well as multi-node fine-tuning of models exceeding 70B.

Automatically dispatch high-performance operators such as FlashAttention and Triton kernels to increase training throughput.

Compatible with DeepSpeed 🚀, easily utilizing a variety of ZeRO optimization techniques.

Flexible

Support various LLMs (InternLM, Mixtral-8x7B, Llama 2, ChatGLM, Qwen, Baichuan, ...).

Support VLM (LLaVA). The performance of LLaVA-InternLM2-20B is outstanding.

Well-designed data pipeline, accommodating datasets in any format, including but not limited to open-source and custom formats.

Support various training algorithms (QLoRA, LoRA, full-parameter fune-tune), allowing users to choose the most suitable solution for their requirements.

Full-featured

Support continuous pre-training, instruction fine-tuning, and agent fine-tuning.

Support chatting with large models with pre-defined templates.

The output models can seamlessly integrate with deployment and server toolkit (LMDeploy), and large-scale evaluation toolkit (OpenCompass, VLMEvalKit).

mindformers

https://mindformers.readthedocs.io/zh-cn/latest/Introduction.html

MindSpore Transformers套件的目标是构建一个大模型训练、微调、评估、推理、部署的全流程开发套件，提供业内主流的Transformer类预训练模型和SOTA下游任务应用，涵盖丰富的并行特性。期望帮助用户轻松的实现大模型训练和创新研发。

MindSpore Transformers套件基于MindSpore内置的并行技术和组件化设计，具备如下特点：

一行代码实现从单卡到大规模集群训练的无缝切换；

提供灵活易用的个性化并行配置；

能够自动进行拓扑感知，高效地融合数据并行和模型并行策略；

一键启动任意任务的单卡/多卡训练、微调、评估、推理流程；

支持用户进行组件化配置任意模块，如优化器、学习策略、网络组装等；

提供Trainer、pipeline、AutoClass等高阶易用性接口；

提供预置SOTA权重自动下载及加载功能；

支持人工智能计算中心无缝迁移部署；

Qwen (Alibaba Cloud) Tutorial: Introduction and Fine-Tuning

https://www.datacamp.com/tutorial/qwen-alibaba-cloud

Qwen is a family of large language and multimodal models developed by Alibaba Cloud, designed for various tasks like text generation, image understanding, and conversation.

Fine-tuning Qwen Models

Fine-tuning Qwen models allows you to adapt them to specific tasks, potentially improving their performance for your particular use case. This process involves training the pre-trained model on a custom dataset, allowing it to learn task-specific knowledge while retaining its general language understanding capabilities.

In this section, we'll walk through the process of fine-tuning the Qwen-7B model on a custom dataset. We'll use efficient fine-tuning techniques to make this process manageable, even for large models. In our example, we're fine-tuning the model to improve its performance on translation tasks and answering factual questions. This process allows the model to learn from a custom dataset while retaining its general language understanding capabilities.

COCO to Qwen-VL开源使用数据集格式

https://mindformers.readthedocs.io/zh-cn/latest/research/qwenvl/qwenvl.html#id3

数据集制作

目前本仓库中对Qwen-VL使用微调数据集格式同Qwen-VL开源使用数据集格式一致，如下示例：
[
  {
    "id": "identity_0",
    "conversations": [
      {
        "from": "user",
        "value": "Picture 1: <img>assets/demo.jpeg</img>\n图中的狗是什么品种？"
      },
      {
        "from": "assistant",
        "value": "图中是一只拉布拉多犬。"
      },
      {
        "from": "user",
        "value": "框出图中的格子衬衫"
      },
      {
        "from": "assistant",
        "value": "<ref>格子衬衫</ref><box>(588,499),(725,789)</box>"
      }
    ]
  }
]
Qwen-VL开源模型中未开源相关数据集，以下提供使用公开数据集转换为上述数据格式的样例，并用于模型微调

数据集名称

适用模型

适用阶段

下载链接

LlaVA-Instruct-150K detail_23k.json（对话数据）

Qwen-VL-9.6B

finetune

Link

COCO2014 Train（图片数据）

Qwen-VL-9.6B

finetune

Link

下载数据集后，需要执行data_convert.py脚本进行数据预处理，将原始数据转换为上述对话格式数据。
cd research/qwenvl
python data_convert.py --data_path /path/to/detail_23k.json --image_location /location/of/coco/train2014 --output_path /path/to/converted/json --user_role_name user --assistant_role_name assistant
其中--data_path表示原始对话数据路径，--image_location表示COCO train2014文件夹所在路径，路径不包含train2014，--output_path表示转换后对话数据保存路径, --user_role_name 表示转换后对话中用户名称，--assistant_role_name表示转换后对话中助手名称。

数据集名称	适用模型	适用阶段	下载链接
LlaVA-Instruct-150K detail_23k.json（对话数据）	Qwen-VL-9.6B	finetune	Link
COCO2014 Train（图片数据）	Qwen-VL-9.6B	finetune	Link

coco数据集

https://cocodataset.org/#download

QWEN VL 自带脚本

https://github.com/QwenLM/Qwen-VL/blob/master/finetune.py

https://www.eula.club/blogs/Qwen-VL%E5%A4%9A%E6%A8%A1%E6%80%81%E5%A4%A7%E6%A8%A1%E5%9E%8B%E7%9A%84%E9%83%A8%E7%BD%B2%E4%B8%8E%E5%BE%AE%E8%B0%83.html#_5-2-%E5%AF%B9%E6%A8%A1%E5%9E%8B%E8%BF%9B%E8%A1%8Clora%E5%BE%AE%E8%B0%83

准备微调数据集

需要将所有样本数据放到一个列表中并存入JSON文件中。每个样本对应一个字典，包含id和conversation，其中后者为一个列表。

data.json

[
  {
    "id": "identity_0",
    "conversations": [
      {
        "from": "user",
        "value": "你好"
      },
      {
        "from": "assistant",
        "value": "我是Qwen-VL,一个支持视觉输入的大模型。"
      }
    ]
  },
  {
    "id": "identity_1",
    "conversations": [
      {
        "from": "user",
        "value": "Picture 1: <img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg</img>\n图中的狗是什么品种？"
      },
      {
        "from": "assistant",
        "value": "图中是一只拉布拉多犬。"
      },
      {
        "from": "user",
        "value": "框出图中的格子衬衫"
      },
      {
        "from": "assistant",
        "value": "<ref>格子衬衫</ref><box>(588,499),(725,789)</box>"
      }
    ]
  },
  { 
    "id": "identity_2",
    "conversations": [
      {
        "from": "user",
        "value": "Picture 1: <img>assets/mm_tutorial/Chongqing.jpeg</img>\nPicture 2: <img>assets/mm_tutorial/Beijing.jpeg</img>\n图中都是哪"
      },
      {
        "from": "assistant",
        "value": "第一张图片是重庆的城市天际线，第二张图片是北京的天际线。"
      }
    ]
  }
]

对数据格式的解释：

为针对多样的VL任务，增加了一下的特殊tokens： <img> </img> <ref> </ref> <box> </box>
对于带图像输入的内容可表示为 Picture id: <img>img_path</img>\n{your prompt}，其中id表示对话中的第几张图片。"img_path"可以是本地的图片或网络地址。
对话中的检测框可以表示为<box>(x1,y1),(x2,y2)</box>，其中 (x1, y1) 和(x2, y2)分别对应左上角和右下角的坐标，并且被归一化到[0, 1000)的范围内. 检测框对应的文本描述也可以通过<ref>text_caption</ref>表示。

# 5.2 对模型进行LoRA微调

这里使用官方项目里提供的微调脚本进行LoRA微调测试，模型采用HuggingFace下载的那个全精度模型，数据采用上面的示例数据。

finetune_lora_single_gpu.sh

#!/bin/bash

export CUDA_DEVICE_MAX_CONNECTIONS=1
DIR=`pwd`

MODEL="/root/autodl-tmp/Qwen-VL-Chat"
DATA="/root/autodl-tmp/data.json"

export CUDA_VISIBLE_DEVICES=0

python3 finetune.py \
    --model_name_or_path $MODEL \
    --data_path $DATA \
    --bf16 True \
    --fix_vit True \
    --output_dir output_qwen \
    --num_train_epochs 5 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1000 \
    --save_total_limit 10 \
    --learning_rate 1e-5 \
    --weight_decay 0.1 \
    --adam_beta2 0.95 \
    --warmup_ratio 0.01 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --report_to "none" \
    --model_max_length 600 \
    --lazy_preprocess True \
    --gradient_checkpointing \
    --use_lora

注意事项：

需要修改脚本中的MODEL、DATA参数，将其换成实际的模型和数据地址。
需要修改脚本里的model_max_length参数，默认是2048，这需要27.3GB的显存，租用的服务器显存不够，这里将其设置为600，是可以微调成功的。

对Qwen-VL-Chat模型进行LoRA微调

#

通义千问（Qwen-VL）本地微调

https://blog.csdn.net/Guet142021/article/details/136623750?utm_medium=distribute.pc_relevant.none-task-blog-2~default~baidujs_baidulandingword~default-0-136623750-blog-136251662.235^v43^pc_blog_bottom_relevance_base7&spm=1001.2101.3001.4242.1&utm_relevant_index=1

https://blog.csdn.net/python1234_/article/details/139773552?spm=1001.2101.3001.6650.3&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EYuanLiJiHua%7EPosition-3-139773552-blog-136251662.235%5Ev43%5Epc_blog_bottom_relevance_base7&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EYuanLiJiHua%7EPosition-3-139773552-blog-136251662.235%5Ev43%5Epc_blog_bottom_relevance_base7&utm_relevant_index=4

https://blog.csdn.net/python1234_/article/details/139773552

https://www.modelscope.cn/models/qwen/Qwen-VL-Chat/

https://github.com/QwenLM/Qwen-VL

https://xtuner.readthedocs.io/zh-cn/latest/training/multi_modal_dataset.html

QWEN VL info

https://www.infoq.cn/article/94cdVSu567CBjOEaI9mV

CogVLM zhipu

https://www.jiqizhixin.com/articles/2023-10-12

posted @ 2024-08-03 10:55 lightsong 阅读(169) 评论(0) 收藏举报

刷新页面返回顶部

Stay Hungry,Stay Foolish!

lightsong

{Web: [React, Vue, NodeJS, HTTP]，DevOps:[Jenkins,Docker,K8S], Languages:[Python, JS, C, Lua, Shell, Groovy]}, AI:[LLM, langchain，langraph]

An Introductory Guide to Fine-Tuning LLMs

An Introductory Guide to Fine-Tuning LLMs

Fine-tuning vs. RAG

Choosing between fine-tuning and RAG

Fine-Tuning LLaMA 2: A Step-by-Step Guide to Customizing the Large Language Model

微调框架

moreh

Fine-tuning Tutorials

#What you will learn here:

LLaMA-Factory

Features

swift

xtuner

mindformers

Qwen (Alibaba Cloud) Tutorial: Introduction and Fine-Tuning

Fine-tuning Qwen Models

COCO to Qwen-VL开源使用数据集格式

数据集制作

coco数据集

准备微调数据集

# 5.2 对模型进行LoRA微调

#

通义千问（Qwen-VL）本地微调

QWEN VL info

CogVLM zhipu

公告