大模型 - 随笔分类 - bregman

测评数据集和大模型报告

摘要：参考 https://www.minimax.io/news/minimax-m25 以下是四个 AI 评测基准的对比总结：基准领域数据来源样本量任务形式核心指标防数据污染策略代表意义 SWE-Bench Verified 软件工程真实 GitHub Issue + PR（Djan 阅读全文

posted @ 2026-02-13 14:38 bregman 阅读(8) 评论(0) 推荐(0)

litellm 使用介绍

摘要：作用：为claude code 做代理，支持其他模型 litellm_config.yaml model_list: - model_name: kimik2.5 litellm_params: model: anthropic/aisearch_s3_dsv3_1basexxx api_key: 阅读全文

posted @ 2026-01-28 20:43 bregman 阅读(19) 评论(0) 推荐(0)

安装 flash-attention

摘要：fa2 # flash-attention export CMAKE_CXX_STANDARD=17 export CMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=1" export TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 8. 阅读全文

posted @ 2025-08-20 05:48 bregman 阅读(60) 评论(0) 推荐(0)

大模型课件 CSE 234: Data Systems for Machine Learning

摘要：https://hao-ai-lab.github.io/cse234-w25/ 阅读全文

posted @ 2025-06-30 14:28 bregman 阅读(51) 评论(0) 推荐(0)

大模型训练ultrascale-playbook

摘要：大模型训练 https://huggingface.co/spaces/nanotron/ultrascale-playbook?section=high_level_overview 大模型推理加速 https://www.53ai.com/news/finetuning/202407110928 阅读全文

posted @ 2025-04-27 16:54 bregman 阅读(63) 评论(0) 推荐(0)

利用 proxychains 代理下载huggingface数据

摘要：socket 代理使用安装 brew install proxychains-ng 配置 $ tail -n 3 /opt/homebrew/etc/proxychains.conf #socks4 127.0.0.1 9050 socks5 127.0.0.1 <端口> 使用 # pkill p 阅读全文

posted @ 2025-02-14 15:46 bregman 阅读(363) 评论(1) 推荐(0)

使用大模型

摘要：https://github.com/deepseek-ai/awesome-deepseek-integration/blob/main/docs/zotero/README_cn.md 一开始用curl调用一直没反应，加了 -k 后可以。然后去掉-k 也可以 curl https://api 阅读全文

posted @ 2025-02-06 11:03 bregman 阅读(147) 评论(0) 推荐(0)

调用本地大模型

摘要：https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF/summary 下载llama-cli https://github.com/ggerganov/llama.cpp/releases 利用model 阅读全文

posted @ 2025-02-05 21:21 bregman 阅读(647) 评论(0) 推荐(1)

她说，她是仙，她不是神

随笔分类 - 大模型

公告

她说， 她是仙， 她不是神

随笔分类 - 大模型

公告

她说，她是仙，她不是神