CS Course Learning

【李宏毅】2024大语言模型课程

课程学习

课程链接：https://speech.ee.ntu.edu.tw/~hylee/genai/2024-spring.php

Bilibili相关视频链接：https://www.bilibili.com/video/BV1XS411w7qr

GPT: Autoregressive model

In-context Learning

Chain of Thoughts (CoT)
Tree of Thoughts (ToT)
Algorithm of Thoughts (AoT)
....

使用工具：

搜寻引擎 Retrieval Augmented Generation (RAG)
写程序 Program of Thought (PoT)
文字生图 DALL-E

Explainable ML:

Local Explanation
- Saliency Map
- SmoothGrad (improved Saliency Map)
- Integrated Gradient(IG)
Global Explanation

Three steps of LLM training:

Pre-train -> Foundation model
Instruction Fine-tuning (Supervised Learning)
Reinforcement Learning from Human Feedback (RLHF)

Seq2seq:

Syntactic Parsing (文法分析)
Multi-label Classification (区别于 Multi-class Classification)

An object can belong to multiple classes
Object Detection

Tranformer:

Self-attention
Cross-attention

Copy Mechanism => Summarization

Pointer Network

Attention Decoder

Greedy Decoding (每次都选择输出概率最大的token)
Bean Search
Sampling (more creative, randomness is needed for decoder when generating)

Prompt Hacking

Jailbeaking
Prompt Injection

Generative model:

Autoregressive (AR)
按部就班，逐个token生成、生成速度较慢
Non-autoregressive (NAR)
一次性生成、生成速度较快

Sepeculative Decoding

作业总结

seed

 import random
import numpy as np
 
def set_random_seed(seed):
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.manual_seed(seed)
    if torch.cuda.is_available():s
        torch.cuda.manual_seed_all(seed)
    random.seed(seed)
    np.random.seed(seed)

transformers

pipeline

 from transformers import pipeline
 
# 1. task
pipe = pipeline(task="automatic-speech-recognition")  # ASR
output = pipe("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
print(output)
 
# 2. model
pipe = pipeline(model="FacebookAI/roberta-large-mnli")
pipe("This restaurant is awesome")
print(output)
 
# 3. multi-input
pipe = pipeline(model="FacebookAI/roberta-large-mnli")
output = pipe(["This restaurant is awesome", "It is ugly"])
print(output)
 
# 4. with gradio
import gradio as gr
pipe = pipeline(task="sentiment-analysis", model="FacebookAI/roberta-large-mnli")
gr.Interface.from_pipeline(pipe).launch()
 
"""
task: str = None
    `image-classification`
    `image-segmentation`
    `object-detection`
    `text-generation`
    ...
"""

AutoClass

 # load model and tokenizer
from transformers import AutoTokenizer, AutoModelForCausalLM
 
tokenizer = AutoTokenizer.from_pretrained(<model_path>)
model = AutoModelForCausalLM.from_pretrained(<model_path>)

openai

 from openai import OpenAI
 
# use deepseek API as an example
client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")
 
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Hello"},
    ],
    stream=False
)
 
print(response.choices[0].message.content)

【李宏毅】2023机器学习系列课程

课程链接：https://speech.ee.ntu.edu.tw/~hylee/ml/2023-spring.php

课程学习

能够使用工具的AI：

WebGPT
Toolformer

作业总结

Pytorch

trainer

 # trainer
n_epochs = config['n_epochs']
criterion = nn.MSELoss(reduction='mean')  # define loss function
optimizer = torch.optim.SGD(model.parameters(), lr=config['learning_rate'], momentum=0.7)  # define optimizer
 
for epoch in range(n_epochs):
    # train
    model.train()
    loss_record = []
    for X, y in train_loader:
        optimizer.zero_grad()
        X, y = X.to(device), y.to(device)
        pred = model(X)
        loss = criterion(pred, y)
        loss.backward()
        optimizer.step()
        loss_record.append(loss.detach().item())  # loss value of a batch : loss.detach().item()
    mean_train_loss = sum(loss_record) / len(loss_record)
  
    # evaluate
    model.eval()
    loss_record = []
  
    with torch.no_grad():
        for X, y in valid_loader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            loss = criterion(pred, y)
            loss_record.append(loss.detach().item())  # loss value of a batch : loss.detach().item()
    mean_eval_loss = sum(loss_record) / len(loss_record)

tensorboard

 .
 
from torch.utils.tensorboard import SummaryWriter
 
writer = SummaryWriter() # Writer of tensoboard.
writer.add_scalar('Loss/train', mean_train_loss, step)
"""
def add_scalar(
    tag: Any,  # 图表的名称
    scalar_value: Any,  # 纵坐标取值
    global_step: Any | None = None,  # 横坐标取值
    walltime: Any | None = None,
    new_style: bool = False,
    double_precision: bool = False
)
"""

BLUE

Bilingual Evaluation Understudy，是一种用于评估机器翻译质量的自动指标，核心思想是计算翻译与参考译文之间的 n-gram重叠程度 ，并结合一些调整因子（如长度惩罚）得出一个综合得分。

B L E U = B P * e^{\sum_{n = 1}^{N} w_{n} * l o g (p_{n})}

$BLEU = BP * e^{\sum_{n=1}^N w_n * log(p_n)}$

$p_n$ ：第n-gram的精度。
$w_n$ ：每个n-gram的权重，通常均匀分布（如1/4）。
$BP$ ：长度惩罚因子。

【ETH】2020 Digital Design and Computer Architecture

课程链接：https://safari.ethz.ch/digitaltechnik/spring2020/doku.php?id=start

课程视频链接：https://www.youtube.com/playlist?list=PL5Q2soXY2Zi_FRrloMa2fUYWPGiZUBQo2

课程学习

DRAM：动态存储器（需要每隔一段时间刷新一次数据才能保存数据），断电数据丢失

SRAM：静态存储器（不需要刷新电路），断电数据丢失

作业总结

【UCB】2020 Structure and Interpretation of Computer Programs

课程链接：https://web.archive.org/web/20210104105406/https://cs61a.org/

课程视频链接：https://www.bilibili.com/video/BV1s3411G7yM/

课程学习

作业总结

【陈天奇】Machine Learning Compilation

课程学习

作业总结

posted @ 2025-02-24 14:24 MaximeSHE 阅读(6) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· LLM学习笔记

· Machine Learning

· 全球名校AI课程库（43）| 李宏毅 · 机器学习(&深度学习)课程『Machine Learning』

· NTU ML2023Spring Part1（合集）

· 斯坦福NLP课程 | 第9讲 - cs224n课程大项目实用技巧与经验

公告

昵称： MaximeSHE
园龄： 3年7个月
粉丝： 2
关注： 0

+加关注

2025年3月

日

一

二

三

四

五

六

CS Course Learning

【李宏毅】2024大语言模型课程

课程学习

作业总结

seed

transformers

pipeline

AutoClass

openai

【李宏毅】2023机器学习系列课程

课程学习

作业总结

Pytorch

trainer

tensorboard

BLUE

【ETH】2020 Digital Design and Computer Architecture

课程学习

作业总结

【UCB】2020 Structure and Interpretation of Computer Programs

课程学习

作业总结

【陈天奇】Machine Learning Compilation

课程学习

作业总结

公告

搜索

常用链接

随笔分类

随笔档案

阅读排行榜

	import random
	import numpy as np

	def set_random_seed(seed):
	torch.backends.cudnn.deterministic = True
	torch.backends.cudnn.benchmark = False
	torch.manual_seed(seed)
	if torch.cuda.is_available():s
	torch.cuda.manual_seed_all(seed)
	random.seed(seed)
	np.random.seed(seed)

	from transformers import pipeline

	# 1. task
	pipe = pipeline(task="automatic-speech-recognition") # ASR
	output = pipe("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
	print(output)

	# 2. model
	pipe = pipeline(model="FacebookAI/roberta-large-mnli")
	pipe("This restaurant is awesome")
	print(output)

	# 3. multi-input
	pipe = pipeline(model="FacebookAI/roberta-large-mnli")
	output = pipe(["This restaurant is awesome", "It is ugly"])
	print(output)

	# 4. with gradio
	import gradio as gr
	pipe = pipeline(task="sentiment-analysis", model="FacebookAI/roberta-large-mnli")
	gr.Interface.from_pipeline(pipe).launch()

	"""
	task: str = None
	`image-classification`
	`image-segmentation`
	`object-detection`
	`text-generation`
	...
	"""

	# load model and tokenizer
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained(<model_path>)
	model = AutoModelForCausalLM.from_pretrained(<model_path>)

	from openai import OpenAI

	# use deepseek API as an example
	client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")

	response = client.chat.completions.create(
	model="deepseek-chat",
	messages=[
	{"role": "system", "content": "You are a helpful assistant"},
	{"role": "user", "content": "Hello"},
	],
	stream=False
	)

	print(response.choices[0].message.content)

	# trainer
	n_epochs = config['n_epochs']
	criterion = nn.MSELoss(reduction='mean') # define loss function
	optimizer = torch.optim.SGD(model.parameters(), lr=config['learning_rate'], momentum=0.7) # define optimizer

	for epoch in range(n_epochs):
	# train
	model.train()
	loss_record = []
	for X, y in train_loader:
	optimizer.zero_grad()
	X, y = X.to(device), y.to(device)
	pred = model(X)
	loss = criterion(pred, y)
	loss.backward()
	optimizer.step()
	loss_record.append(loss.detach().item()) # loss value of a batch : loss.detach().item()
	mean_train_loss = sum(loss_record) / len(loss_record)

	# evaluate
	model.eval()
	loss_record = []

	with torch.no_grad():
	for X, y in valid_loader:
	X, y = X.to(device), y.to(device)
	pred = model(X)
	loss = criterion(pred, y)
	loss_record.append(loss.detach().item()) # loss value of a batch : loss.detach().item()
	mean_eval_loss = sum(loss_record) / len(loss_record)

	.

	from torch.utils.tensorboard import SummaryWriter

	writer = SummaryWriter() # Writer of tensoboard.
	writer.add_scalar('Loss/train', mean_train_loss, step)
	"""
	def add_scalar(
	tag: Any, # 图表的名称
	scalar_value: Any, # 纵坐标取值
	global_step: Any \| None = None, # 横坐标取值
	walltime: Any \| None = None,
	new_style: bool = False,
	double_precision: bool = False
	)
	"""