ollama 初体验

参考

https://github.com/ollama/ollama
https://zhuanlan.zhihu.com/p/689555159
https://zhuanlan.zhihu.com/p/687099148
https://zhuanlan.zhihu.com/p/685166253
https://babyno.top/posts/2024/03/run-a-large-language-model-locally-2/ 提供RAG示例
https://sspai.com/post/85193#!
https://github.com/sugarforever/chat-ollama 一个支持RAG的webUI

Windows 设置环境变量

OLLAMA_HOST, 取值为 0.0.0.0
OLLAMA_MODELS, 取值为 D:\my_workspace\OLLAMA_MODELS\

下载并安装 ollama windows版本

ollama 安装后会提供一个命令行工具 ollama.exe, 可以用来下载模型, 也可以启动本地rest服务.
ollama 会自动检测GPU, 优先检查CUDA, 然后是AMD ROMc, 如果没有合适的GPU则使用CPU进行推理.

ollama pull qwen:0.5b #文件: 395MB, 千问小模型
ollama serve      # 启动 ollama 本地访问, 端口为 11434

下载模型

即使是同一个family的模型, 也有不同参数量和不同的量化策略, 需要按照电脑的硬件来选择合适的. 16 位浮点数精度(FP16)的模型, 推理所需显存约为模型参数量的2倍, 4比特量化模型: 显存(单位GB)为模型参数量的1/2(单位B).

ollama pull phi3 #微软的phi3模型, 模型很小超过了很多规模更大的模型, 训练的数据集是教科书级别的语料. 
ollama pull qwen:0.5b #文件: 395MB, 千问小模型
ollama pull tinyllama  #文件: 637MB, 一个比较有名的mini版llama模型
ollama pull qwen:1.8b   #文件: 637MB
ollama pull nomic-embed-text  #文件: 275MB
ollama pull qwen:7b   #文件: 1.1GB
ollama pull mistral  # mistral 模型
ollama pull llama2  #llama2模型
ollama pull llama2-chinese  #中文微调的llama2模型 
ollama pull unichat-llama3-chinese-8b # 中文llama3, https://ollama.com/ollam/unichat-llama3-chinese-8b

ollama API 示例

下面是 VS code 的 RestClient写法, 不知为何 RestClient 无法使用 localhost 和 127.0.0.1 访问, 如果是postman可能需要通过localhost来访问.

可以访问:
GET http://0.0.0.0:11434/ HTTP/1.1

不可以访问:
GET http://localhost:11434/ HTTP/1.1
GET http://127.0.0.1:11434/ HTTP/1.1


POST http://0.0.0.0:11434/api/embeddings HTTP/1.1
content-type: application/json

{
 "model": "qwen:0.5b",
 "prompt": "Here is an article about llamas..."
}

POST http://0.0.0.0:11434/api/embeddings HTTP/1.1
content-type: application/json

{
 "model": "nomic-embed-text",
 "prompt": "Here is an article about llamas..."
}

POST http://0.0.0.0:11434/api/show HTTP/1.1
content-type: application/json

{
 "name": "qwen:0.5b"
}


POST http://0.0.0.0:11434/api/generate HTTP/1.1
content-type: application/json

{
  "model": "qwen:0.5b",
  "prompt": "Here is an article about llamas...",
  "context": [
  ],
  "stream": false,
  "format":"json",
  "options": {
    "seed": 123,
    "temperature": 0
  }  
}

POST http://0.0.0.0:11434/api/chat HTTP/1.1
content-type: application/json

{
  "model": "qwen:0.5b",  
  "stream": false,
  "format":"json",
  "messages": [
    {
      "role": "user",
      "content": "why is the sky blue?"
    }
  ]  
}


## Send a chat message with a conversation history, 同时增加system role设定系统提示词. 
POST http://0.0.0.0:11434/api/chat HTTP/1.1
content-type: application/json

{
  "model": "qwen:1.8b",  
  "stream": false,
  "format":"json",
  "messages": [
    {
      "role": "system",
      "content": "以海盗的口吻简单作答, 以中文回复"
    },    
    {
      "role": "user",
      "content": "why is the sky blue?"
    },
    {
      "role": "assistant",
      "content": "due to rayleigh scattering."
    },
    {
      "role": "user",
      "content": "请解释一下光的折射?"
    }
  ]  
}

posted @ 2024-04-04 20:31 harrychinese 阅读(1297) 评论(0) 编辑收藏举报

刷新页面返回顶部

harrychinese

ollama 初体验

参考

Windows 设置环境变量

下载并安装 ollama windows版本

下载模型

ollama API 示例

公告