【Coursera GenAI with LLM】 Week 3 LLM-powered applications Class Notes

Model optimizations to improve application performance

  1. Distillation: uses a larger model, the teacher model, to train a smaller model, the student model, we freeze teacher's weights and generate completions, also generate student model's completion, the difference between those 2 completions is Distillation Loss . Student model will adjust its final prediction layer or hidden layer. You then use the smaller model for inference to lower your storage and compute budget.

  2. Quantization: post training quantization transforms a model's weights to a lower precision representation, such as a 16-bit floating point or eight-bit integer. This reduces the memory footprint of your model.

  3. Pruning: removes redundant model parameters that contribute little to the model's performance.

Cheat Sheet

RAG (Retrieval Augmented Generation)

Chain of thought prompting

Program-Aided Language Model (PAL)

  • LLM + Code interpreter --> to solve the problem that LLM can't do math

Orchestrator: can manage the information between LLM, external app and external databases. ex. Langchain

ReAct: it's a format for prompting (?), synergizing reasoning and action in LLMs

  • Thought: reason about the current situation
  • Action: an external task model can carry out from an allowed set of actions--search, lookup, finish
  • Observation: a few example

posted @   MiraMira  阅读(13)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· AI 智能体引爆开源社区「GitHub 热点速览」
· 三行代码完成国际化适配,妙~啊~
· .NET Core 中如何实现缓存的预热?
点击右上角即可分享
微信分享提示