ollama 的一些参数简单说明
ollama 提供了不少配置可以方便我们进行ollama 服务的调整,比如访问端口(默认127.0.0.1),模型内存管理。。。
以下简单说明下访问以及模型内存管理的
完整配置信息
可以通过golang 代码查看,主要定义在envconfig/config.go 中
- 默认配置
func AsMap() map[string]EnvVar {
return map[string]EnvVar{
"OLLAMA_DEBUG": {"OLLAMA_DEBUG", Debug, "Show additional debug information (e.g. OLLAMA_DEBUG=1)"},
"OLLAMA_FLASH_ATTENTION": {"OLLAMA_FLASH_ATTENTION", FlashAttention, "Enabled flash attention"},
"OLLAMA_HOST": {"OLLAMA_HOST", "", "IP Address for the ollama server (default 127.0.0.1:11434)"},
"OLLAMA_KEEP_ALIVE": {"OLLAMA_KEEP_ALIVE", KeepAlive, "The duration that models stay loaded in memory (default \"5m\")"},
"OLLAMA_LLM_LIBRARY": {"OLLAMA_LLM_LIBRARY", LLMLibrary, "Set LLM library to bypass autodetection"},
"OLLAMA_MAX_LOADED_MODELS": {"OLLAMA_MAX_LOADED_MODELS", MaxRunners, "Maximum number of loaded models (default 1)"},
"OLLAMA_MAX_QUEUE": {"OLLAMA_MAX_QUEUE", MaxQueuedRequests, "Maximum number of queued requests"},
"OLLAMA_MAX_VRAM": {"OLLAMA_MAX_VRAM", MaxVRAM, "Maximum VRAM"},
"OLLAMA_MODELS": {"OLLAMA_MODELS", "", "The path to the models directory"},
"OLLAMA_NOHISTORY": {"OLLAMA_NOHISTORY", NoHistory, "Do not preserve readline history"},
"OLLAMA_NOPRUNE": {"OLLAMA_NOPRUNE", NoPrune, "Do not prune model blobs on startup"},
"OLLAMA_NUM_PARALLEL": {"OLLAMA_NUM_PARALLEL", NumParallel, "Maximum number of parallel requests (default 1)"},
"OLLAMA_ORIGINS": {"OLLAMA_ORIGINS", AllowOrigins, "A comma separated list of allowed origins"},
"OLLAMA_RUNNERS_DIR": {"OLLAMA_RUNNERS_DIR", RunnersDir, "Location for runners"},
"OLLAMA_TMPDIR": {"OLLAMA_TMPDIR", TmpDir, "Location for temporary files"},
}
}
一些配置调整
默认ollama 提供的api 服务是本地的,其他访问不方便,解决方法很有,包含了直接通过配置修改以及基于nginx proxy 的
- 配置默认的
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
- 模型内存
加载模型到内存中,模型到内存中有利于快速推理,api 配置模式
curl http://localhost:11434/api/generate -d '{"model": "llama3", "keep_alive": -1}'
OLLAMA_KEEP_ALIVE 也是一个参数
[Service]
Environment="OLLAMA_KEEP_ALIVE=-1"
- 队列配置
OLLAMA_MAX_QUEUE 环境变量
[Service]
Environment="OLLAMA_MAX_QUEUE=1000"
说明
了解一些配置还是比较有用的,可以更好的进行资源使用以及调优处理
参考资料
https://github.com/ollama/ollama/blob/main/docs/api.md
https://github.com/ollama/ollama/blob/main/docs/faq.md
https://github.com/ollama/ollama/blob/main/envconfig/config.go