【Coursera GenAI with LLM】 Week 2 Fine-tuning LLMs with instruction Class Notes

GenAI Project Lifecycle: After picking pre-trained models, we can fine-tune!

In-context learning (ICL): zero / one / few shot inference. Including a few models in the prompt for model to learn and generate a better complement (aka output). Its drawbacks are:

  • for smaller models, it doesn't work even when a lot of examples are included
  • take up context window

Pre-training: you train the LLM using vast amounts of unstructured textual data via self-supervised learning

Fine-tuning: supervised learning process where you use a data set of labeled examples to update the weights of the LLM.

Two types of fine-tuning

  1. Instruction fine-tuning (full fine-tuning: very costly!)
    It trains the model using examples that demonstrate how it should respond to a specific instruction.
    Prepare instruction dataset --> split the dataset into training, validation, and test --> calculate the loss between training completion and the provided label --> use the loss to calculate the model weights in standard backpropagation
  2. PEFT (Parameter Efficient Fine-tuning: cheaper!)
    PEFT is a set of techniques that preserves the weights of the original LLM and trains only a small number of task-specific adapter layers and parameters.
    ex. LoRA

Catastrophic forgetting: full fine-tuning process modifies the weights of the original LLM, which can degrade performance on other tasks
--> To solve catastrophic forgetting, we can use PEFT!

Multi-task instruction: it can instruct the fine tuning on many tasks, but it requires a lot of data and examples

FLAN: fine-tuned language net, is a specific set of instructions used to fine-tune different models. Like the yummy dessert

Terms

  1. Unigram: a single word
  2. Bigram: two words
  3. n-gram: n words

Model Evaluation Metrics

  1. **Accuracy **= Correct Predictions / Total Predictions
  2. ROUGE (recall oriented under study for jesting evaluation): assess the quality of automatically generated **summaries **by comparing them to human-generated reference summaries.
  3. BLEU (bilingual evaluation understudy): an algorithm designed to evaluate the quality of machine-**translated **text by comparing it to human-generated translations.



Benchmarks:
tests that evaluate the capabilities of models. ex. GLUE, SuperGLUE, MMLU (Massive Multitask Language Understanding), Big-bench Hard, HELM (Holistic Evaluation of Language Models)

posted @   MiraMira  阅读(12)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· AI 智能体引爆开源社区「GitHub 热点速览」
· 三行代码完成国际化适配,妙~啊~
· .NET Core 中如何实现缓存的预热?
点击右上角即可分享
微信分享提示