chroma使用指南官方文档&翻译
2024-03-24 19:41 l_v_y_forever 阅读(1545) 评论(0) 编辑 收藏 举报🧪 Usage Guide 🧪 使用指南
- Python 蟒
- JavaScript JavaScript的
Initiating a persistent Chroma client
启动持久色度客户端
import chromadb
You can configure Chroma to save and load from your local machine. Data will be persisted automatically and loaded on start (if it exists).
您可以将 Chroma 配置为从本地计算机保存和加载。数据将自动保留并在启动时加载(如果存在)。
client = chromadb.PersistentClient(path="/path/to/save/to")
The path
is where Chroma will store its database files on disk, and load them on start.
path
这是 Chroma 将其数据库文件存储在磁盘上并在启动时加载它们的地方。
// CJS
const{ChromaClient}=require("chromadb");
// ESM
import{ChromaClient}from'chromadb'
To connect with the JS client, you must connect to a backend running Chroma. See Running Chroma in client/server mode
for how to do this.
const client =newChromaClient();
The client object has a few useful convenience methods.
客户端对象具有一些有用的便捷方法。
client.heartbeat()# returns a nanosecond heartbeat. Useful for making sure the client remains connected.
client.reset()# Empties and completely resets the database. ⚠️ This is destructive and not reversible.
The client object has a few useful convenience methods.
await client.reset() # Empties and completely resets the database.⚠️This is destructive and not reversible.
Running Chroma in client/server mode
在客户端/服务器模式下运行 Chroma
Chroma can also be configured to run in client/server mode. In this mode, the Chroma client connects to a Chroma server running in a separate process.
Chroma 也可以配置为在客户端/服务器模式下运行。在此模式下,Chroma 客户端连接到在单独进程中运行的 Chroma 服务器。
To start the Chroma server, run the following command:
要启动 Chroma 服务器,请运行以下命令:
chroma run --path /db_path
Then use the Chroma HTTP client to connect to the server:
然后使用Chroma HTTP客户端连接到服务器:
import chromadb
chroma_client = chromadb.HttpClient(host='localhost', port=8000)
That's it! Chroma's API will run in client-server
mode with just this change.
就是这样!Chroma 的 API 将在仅此更改的模式下 client-server
运行。
Using the python http-only client
使用 python http-only 客户端
If you are running chroma in client-server mode, you may not need the full Chroma library. Instead, you can use the lightweight client-only library.
In this case, you can install the chromadb-client
package. This package is a lightweight HTTP client for the server with a minimal dependency footprint.
如果您在客户端-服务器模式下运行色度,则可能不需要完整的色度库。相反,您可以使用轻量级的仅限客户端的库。在这种情况下,您可以安装该 chromadb-client
软件包。此包是服务器的轻量级 HTTP 客户端,具有最小的依赖项占用空间。
pip install chromadb-client
import chromadb
# Example setup of the client to connect to your chroma server
client = chromadb.HttpClient(host='localhost', port=8000)
Note that the chromadb-client
package is a subset of the full Chroma library and does not include all the dependencies. If you want to use the full Chroma library, you can install the chromadb
package instead. Most importantly, there is no default embedding function. If you add() documents without embeddings, you must have manually specified an embedding function and installed the dependencies for it.
请注意,该 chromadb-client
软件包是完整 Chroma 库的子集,不包括所有依赖项。如果要使用完整的 Chroma 库,则可以改为安装该 chromadb
软件包。最重要的是,没有默认的嵌入函数。如果添加()文档而没有嵌入,则必须手动指定嵌入函数并为其安装依赖项。
To run Chroma in client server mode, first install the chroma library and CLI via pypi:
pip install chromadb
Then start the Chroma server:
chroma run --path /db_path
The JS client then talks to the chroma server backend.
// CJS
const{ChromaClient}=require("chromadb");
// ESM
import{ChromaClient}from'chromadb'
const client =newChromaClient();
You can also run the Chroma server in a docker container, or deployed to a cloud provider. See the deployment docs for more information.
Using collections 使用集合
Chroma lets you manage collections of embeddings, using the collection
primitive.
Chroma 允许您使用 collection
基元管理嵌入集合。
Creating, inspecting, and deleting Collections
创建、检查和删除集合
Chroma uses collection names in the url, so there are a few restrictions on naming them:
Chroma 在 url 中使用集合名称,因此命名它们有一些限制:
- The length of the name must be between 3 and 63 characters.
名称的长度必须介于 3 到 63 个字符之间。 - The name must start and end with a lowercase letter or a digit, and it can contain dots, dashes, and underscores in between.
名称必须以小写字母或数字开头和结尾,并且中间可以包含点、破折号和下划线。 - The name must not contain two consecutive dots.
名称不得包含两个连续的点。 - The name must not be a valid IP address.
该名称不得是有效的 IP 地址。
Chroma collections are created with a name and an optional embedding function. If you supply an embedding function, you must supply it every time you get the collection.
色度集合是使用名称和可选嵌入功能创建的。如果提供嵌入函数,则每次获取集合时都必须提供该函数。
collection = client.create_collection(name="my_collection", embedding_function=emb_fn)
collection = client.get_collection(name="my_collection", embedding_function=emb_fn)
If you later wish to get_collection
, you MUST do so with the embedding function you supplied while creating the collection
如果您以后希望 get_collection
这样做,则必须使用您在创建集合时提供的嵌入函数来执行此操作
The embedding function takes text as input, and performs tokenization and embedding. If no embedding function is supplied, Chroma will use sentence transformer as a default.
嵌入函数将文本作为输入,并执行标记化和嵌入。如果未提供嵌入函数,则 Chroma 将默认使用句子转换器。
// CJS
const{ChromaClient}=require("chromadb");
// ESM
import{ChromaClient}from'chromadb'
The JS client talks to a chroma server backend. This can run on your local computer or be easily deployed to AWS.
let collection =await client.createCollection({
name:"my_collection",
embeddingFunction: emb_fn,
});
let collection2 =await client.getCollection({
name:"my_collection",
embeddingFunction: emb_fn,
});
If you later wish to getCollection
, you MUST do so with the embedding function you supplied while creating the collection
The embedding function takes text as input, and performs tokenization and embedding.
You can learn more about 🧬 embedding functions, and how to create your own.
您可以了解有关嵌入函数以及如何创建自己的函数的详细信息🧬。
Existing collections can be retrieved by name with .get_collection
, and deleted with .delete_collection
. You can also use .get_or_create_collection
to get a collection if it exists, or create it if it doesn't.
可以使用 按名称检索 .get_collection
现有集合,并使用 .delete_collection
删除现有集合。您还可以使用 .get_or_create_collection
来获取集合(如果存在),或者创建集合(如果不存在)。
collection = client.get_collection(name="test")# Get a collection object from an existing collection, by name. Will raise an exception if it's not found.
collection = client.get_or_create_collection(name="test")# Get a collection object from an existing collection, by name. If it doesn't exist, create it.
client.delete_collection(name="my_collection")# Delete a collection and all associated embeddings, documents, and metadata. ⚠️ This is destructive and not reversible
Existing collections can be retrieved by name with .getCollection
, and deleted with .deleteCollection
.
const collection =await client.getCollection({name:"test"}) # Get a collection object from an existing collection, by name.Will raise an exception of it's not found.
await client.deleteCollection({name:"my_collection"}) # Delete a collection and all associated embeddings, documents, and metadata.⚠️This is destructive and not reversible
Collections have a few useful convenience methods.
集合有一些有用的便利方法。
collection.peek()# returns a list of the first 10 items in the collection
collection.count()# returns the number of items in the collection
collection.modify(name="new_name")# Rename the collection
await collection.peek();// returns a list of the first 10 items in the collection
await collection.count();// returns the number of items in the collection
Changing the distance function
更改距离函数
create_collection
also takes an optional metadata
argument which can be used to customize the distance method of the embedding space by setting the value of hnsw:space
.
create_collection
还采用一个可选 metadata
参数,该参数可用于通过设置 的 hnsw:space
值来自定义嵌入空间的 Distance 方法。
collection = client.create_collection(
name="collection_name",
metadata={"hnsw:space":"cosine"}# l2 is the default
)
createCollection
also takes an optional metadata
argument which can be used to customize the distance method of the embedding space by setting the value of hnsw:space
let collection = client.createCollection({
name:"collection_name",
metadata:{"hnsw:space":"cosine"},
});
Valid options for hnsw:space
are "l2", "ip, "or "cosine". The default is "l2" which is the squared L2 norm.
的 hnsw:space
有效选项为“l2”、“ip”或“余弦”。默认值为“l2”,即 L2 范数的平方。
Distance 距离 | parameter 参数 | Equation 方程 |
---|---|---|
Squared L2 平方 L2 | 'l2' “L2” | $d = \sum\left(A_i-B_i\right)^2$ $d = \sum\left(A_i-B_i\right)^2$ |
Inner product 内积 | 'ip' “ip” | $d = 1.0 - \sum\left(A_i \times B_i\right) $ $d = 1.0 - \sum\left(A_i \times B_i\right) $ |
Cosine similarity 余弦相似度 | 'cosine' “余弦” | $d = 1.0 - \frac{\sum\left(A_i \times B_i\right)}{\sqrt{\sum\left(A_i^2\right)} \cdot \sqrt{\sum\left(B_i^2\right)}}$ $d = 1.0 - \frac{\sum\left(A_i \times B_i\right)}{\sqrt{\sum\left(A_i^2\right)} \cdot \sqrt{\sum\left(B_i^2\right)}}$ |
Adding data to a Collection
将数据添加到集合
Add data to Chroma with .add
.
使用 .add
.
Raw documents: 原始文档:
collection.add(
documents=["lorem ipsum...","doc2","doc3",...],
metadatas=[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
ids=["id1","id2","id3",...]
)
await collection.add({
ids:["id1","id2","id3",...],
metadatas:[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
documents:["lorem ipsum...","doc2","doc3",...],
})
// input order
// ids - required
// embeddings - optional
// metadata - optional
// documents - optional
If Chroma is passed a list of documents
, it will automatically tokenize and embed them with the collection's embedding function (the default will be used if none was supplied at collection creation). Chroma will also store the documents
themselves. If the documents are too large to embed using the chosen embedding function, an exception will be raised.
如果向 Chroma 传递了 documents
的列表,它将自动标记并使用集合的嵌入函数嵌入它们(如果在创建集合时未提供任何内容,则将使用默认值)。Chroma 也会 documents
存储自己。如果文档太大而无法使用所选的嵌入函数进行嵌入,则会引发异常。
Each document must have a unique associated id
. Trying to .add
the same ID twice will result in only the initial value being stored. An optional list of metadata
dictionaries can be supplied for each document, to store additional information and enable filtering.
每个文档必须具有唯一的关联 id
。尝试 .add
使用同一 ID 两次将导致仅存储初始值。可以为每个文档提供可选的 metadata
词典列表,以存储其他信息并启用筛选。
Alternatively, you can supply a list of document-associated embeddings
directly, and Chroma will store the associated documents without embedding them itself.
或者,您可以直接提供与文档关联 embeddings
的列表,Chroma 将存储关联的文档,而无需嵌入它们本身。
collection.add(
documents=["doc1","doc2","doc3",...],
embeddings=[[1.1,2.3,3.2],[4.5,6.9,4.4],[1.1,2.3,3.2],...],
metadatas=[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
ids=["id1","id2","id3",...]
)
await collection.add({
ids:["id1","id2","id3",...],
embeddings:[[1.1,2.3,3.2],[4.5,6.9,4.4],[1.1,2.3,3.2],...],
metadatas:[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
documents:["lorem ipsum...","doc2","doc3",...],
})
If the supplied embeddings
are not the same dimension as the collection, an exception will be raised.
如果提供的 embeddings
维度与集合的维度不同,则会引发异常。
You can also store documents elsewhere, and just supply a list of embeddings
and metadata
to Chroma. You can use the ids
to associate the embeddings with your documents stored elsewhere.
您还可以将文档存储在其他位置,只需提供 和 metadata
的列表 embeddings
。您可以使用 将 ids
嵌入内容与存储在别处的文档相关联。
collection.add(
embeddings=[[1.1,2.3,3.2],[4.5,6.9,4.4],[1.1,2.3,3.2],...],
metadatas=[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
ids=["id1","id2","id3",...]
)
await collection.add({
ids:["id1","id2","id3",...],
embeddings:[[1.1,2.3,3.2],[4.5,6.9,4.4],[1.1,2.3,3.2],...],
metadatas:[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
})
Querying a Collection 查询集合
Chroma collections can be queried in a variety of ways, using the .query
method.
使用该 .query
方法可以通过多种方式查询色度集合。
You can query by a set of query_embeddings
.
您可以按一组 query_embeddings
.
collection.query(
query_embeddings=[[11.1,12.1,13.1],[1.1,2.3,3.2],...],
n_results=10,
where={"metadata_field":"is_equal_to_this"},
where_document={"$contains":"search_string"}
)
const result =await collection.query({
queryEmbeddings:[[11.1,12.1,13.1],[1.1,2.3,3.2],...],
nResults:10,
where:{"metadata_field":"is_equal_to_this"},
})
// input order
// query_embeddings - optional
// n_results - required
// where - optional
// query_texts - optional
The query will return the n_results
closest matches to each query_embedding
, in order. An optional where
filter dictionary can be supplied to filter by the metadata
associated with each document. Additionally, an optional where_document
filter dictionary can be supplied to filter by contents of the document.
查询将按顺序返回每个 query_embedding
的 n_results
最接近的匹配项。可以提供可选的 where
筛选器字典,以按与每个文档关联的筛选 metadata
器进行筛选。此外,还可以提供可选的 where_document
筛选器词典,以按文档内容进行筛选。
If the supplied query_embeddings
are not the same dimension as the collection, an exception will be raised.
如果提供的 query_embeddings
维度与集合的维度不同,则会引发异常。
You can also query by a set of query_texts
. Chroma will first embed each query_text
with the collection's embedding function, and then perform the query with the generated embedding.
您还可以按一组 query_texts
.Chroma 将首先使用集合的嵌入函数嵌入每个 query_text
集合,然后使用生成的嵌入执行查询。
collection.query(
query_texts=["doc10","thus spake zarathustra",...],
n_results=10,
where={"metadata_field":"is_equal_to_this"},
where_document={"$contains":"search_string"}
)
You can also retrieve items from a collection by id
using .get
.
还可以使用 id
.get
从集合中检索项目。
collection.get(
ids=["id1","id2","id3",...],
where={"style":"style1"}
)
await collection.query({
nResults:10,// n_results
where:{"metadata_field":"is_equal_to_this"},// where
queryTexts:["doc10","thus spake zarathustra",...],// query_text
})
You can also retrieve items from a collection by id
using .get
.
await collection.get({
ids:["id1","id2","id3",...],//ids
where:{"style":"style1"}// where
})
.get
also supports the where
and where_document
filters. If no ids
are supplied, it will return all items in the collection that match the where
and where_document
filters.
.get
还支持 where
AND where_document
筛选器。如果未提供任何 ids
项,它将返回集合中与 where
和 where_document
筛选器匹配的所有项。
Choosing which data is returned
选择返回的数据
When using get or query you can use the include parameter to specify which data you want returned - any of embeddings
, documents
, metadatas
, and for query, distances
. By default, Chroma will return the documents
, metadatas
and in the case of query, the distances
of the results. embeddings
are excluded by default for performance and the ids
are always returned. You can specify which of these you want returned by passing an array of included field names to the includes parameter of the query or get method.
使用 get 或 query 时,可以使用 include 参数指定要返回的数据 - 任意 embeddings
、 documents
、 和 metadatas
for query, distances
。默认情况下,Chroma 将返回 documents
, metadatas
在查询的情况下,返回结果的 distances
。 embeddings
默认情况下,出于性能原因,将排除 并始终返回。 ids
您可以通过将包含的字段名称数组传递给查询或 get 方法的 includes 参数来指定要返回的字段名称。
# Only get documents and ids
collection.get({
include:["documents"]
})
collection.query({
queryEmbeddings:[[11.1,12.1,13.1],[1.1,2.3,3.2],...],
include:["documents"]
})
Using Where filters 使用 Where 过滤器
Chroma supports filtering queries by metadata
and document
contents. The where
filter is used to filter by metadata
, and the where_document
filter is used to filter by document
contents.
Chroma 支持按 metadata
内容 document
过滤查询。 where
过滤器用于按 metadata
进行过滤, where_document
过滤器用于按 document
内容进行过滤。
Filtering by metadata 按元数据筛选
In order to filter on metadata, you must supply a where
filter dictionary to the query. The dictionary must have the following structure:
为了筛选元数据,必须为查询提供 where
筛选器字典。字典必须具有以下结构:
{
"metadata_field":{
<Operator>:<Value>
}
}
Filtering metadata supports the following operators:
筛选元数据支持以下运算符:
$eq
- equal to (string, int, float)
$eq
- 等于 (字符串、整数、浮点数)$ne
- not equal to (string, int, float)
$ne
- 不等于 (string, int, float)$gt
- greater than (int, float)
$gt
- 大于 (int, float)$gte
- greater than or equal to (int, float)
$gte
- 大于或等于 (int, float)$lt
- less than (int, float)
$lt
- 小于 (int, float)$lte
- less than or equal to (int, float)
$lte
- 小于或等于 (int, float)
Using the $eq operator is equivalent to using the where
filter.
使用 $eq 运算符等同于使用 where
筛选器。
{
"metadata_field":"search_string"
}
# is equivalent to
{
"metadata_field":{
"$eq":"search_string"
}
}
Where filters only search embeddings where the key exists. If you search collection.get(where={"version": {"$ne": 1}})
. Metadata that does not have the key version
will not be returned.
其中筛选器仅搜索键存在的嵌入。如果您搜索 collection.get(where={"version": {"$ne": 1}})
.没有密钥 version
的元数据将不会返回。
Filtering by document contents
按文档内容筛选
In order to filter on document contents, you must supply a where_document
filter dictionary to the query. We support two filtering keys: $contains
and $not_contains
. The dictionary must have the following structure:
为了筛选文档内容,必须向查询提供 where_document
筛选器字典。我们支持两个筛选键: $contains
和 $not_contains
。字典必须具有以下结构:
# Filtering for a search_string
{
"$contains":"search_string"
}
# Filtering for not contains
{
"$not_contains":"search_string"
}
Using logical operators
使用逻辑运算符
You can also use the logical operators $and
and $or
to combine multiple filters.
您还可以使用逻辑运算符 $and
并 $or
组合多个筛选器。
An $and
operator will return results that match all of the filters in the list.
$and
运算符将返回与列表中所有筛选器匹配的结果。
{
"$and":[
{
"metadata_field":{
<Operator>:<Value>
}
},
{
"metadata_field":{
<Operator>:<Value>
}
}
]
}
An $or
operator will return results that match any of the filters in the list.
$or
运算符将返回与列表中任何筛选器匹配的结果。
{
"$or":[
{
"metadata_field":{
<Operator>:<Value>
}
},
{
"metadata_field":{
<Operator>:<Value>
}
}
]
}
Using inclusion operators ($in
and $nin
)
使用包含运算符 ( $in
和 $nin
)
The following inclusion operators are supported:
支持以下包含运算符:
$in
- a value is in predefined list (string, int, float, bool)
$in
- 值位于预定义列表中(字符串、整数、浮点数、布尔值)$nin
- a value is not in predefined list (string, int, float, bool)
$nin
- 值不在预定义列表中(字符串、整数、浮点数、布尔值)
An $in
operator will return results where the metadata attribute is part of a provided list:
$in
运算符将返回结果,其中元数据属性是提供的列表的一部分:
{
"metadata_field":{
"$in":["value1","value2","value3"]
}
}
An $nin
operator will return results where the metadata attribute is not part of a provided list:
$nin
运算符将返回元数据属性不属于所提供列表的结果:
{
"metadata_field":{
"$nin":["value1","value2","value3"]
}
}
For additional examples and a demo how to use the inclusion operators, please see provided notebook here
有关其他示例和如何使用包含运算符的演示,请参阅此处提供的笔记本
Updating data in a collection
更新集合中的数据
Any property of items in a collection can be updated using .update
.
集合中项的任何属性都可以使用 .update
进行更新。
collection.update(
ids=["id1","id2","id3",...],
embeddings=[[1.1,2.3,3.2],[4.5,6.9,4.4],[1.1,2.3,3.2],...],
metadatas=[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
documents=["doc1","doc2","doc3",...],
)
If an id
is not found in the collection, an error will be logged and the update will be ignored. If documents
are supplied without corresponding embeddings
, the embeddings will be recomputed with the collection's embedding function.
如果在集合中找不到 an id
,则将记录错误并忽略更新。如果 documents
提供时没有对应 embeddings
的 ,则将使用集合的嵌入函数重新计算嵌入。
If the supplied embeddings
are not the same dimension as the collection, an exception will be raised.
如果提供的 embeddings
维度与集合的维度不同,则会引发异常。
Chroma also supports an upsert
operation, which updates existing items, or adds them if they don't yet exist.
Chroma 还支持更新 upsert
现有项目或添加现有项目(如果尚不存在)的操作。
collection.upsert(
ids=["id1","id2","id3",...],
embeddings=[[1.1,2.3,3.2],[4.5,6.9,4.4],[1.1,2.3,3.2],...],
metadatas=[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
documents=["doc1","doc2","doc3",...],
)
await collection.upsert({
ids:["id1","id2","id3"],
embeddings:[
[1.1,2.3,3.2],
[4.5,6.9,4.4],
[1.1,2.3,3.2],
],
metadatas:[
{chapter:"3",verse:"16"},
{chapter:"3",verse:"5"},
{chapter:"29",verse:"11"},
],
documents:["doc1","doc2","doc3"],
});
If an id
is not present in the collection, the corresponding items will be created as per add
. Items with existing id
s will be updated as per update
.
如果集合中不存在, id
则将根据 add
创建相应的项。具有现有 id
s 的项目将根据 update
.
Deleting data from a collection
从集合中删除数据
Chroma supports deleting items from a collection by id
using .delete
. The embeddings, documents, and metadata associated with each item will be deleted.
⚠️ Naturally, this is a destructive operation, and cannot be undone.
Chroma 支持使用 id
.delete
从集合中删除项目。与每个项目关联的嵌入、文档和元数据将被删除。⚠️ 当然,这是一个破坏性的行动,无法撤消。
collection.delete(
ids=["id1","id2","id3",...],
where={"chapter":"20"}
)
await collection.delete({
ids:["id1","id2","id3",...],//ids
where:{"chapter":"20"}//where
})
.delete
also supports the where
filter. If no ids
are supplied, it will delete all items in the collection that match the where
filter.
.delete
还支持 where
过滤器。如果未提供, ids
它将删除集合中与 where
筛选器匹配的所有项。
Authentication 认证
You can configure Chroma to use authentication when in server/client mode only.
您可以将 Chroma 配置为仅在服务器/客户端模式下使用身份验证。
Supported authentication methods:
支持的身份验证方法:
Authentication Method 身份验证方法 | Basic Auth (Pre-emptive) 基本身份验证(抢占式) | Static API Token 静态 API 令牌 |
---|---|---|
Description 描述 | RFC 7617 Basic Auth with user:password base64-encoded Authorization header.RFC 7617 基本身份验证,带有 user:password base64 编码 Authorization 的标头。 |
Static auth token in Authorization: Bearer <token> or in X-Chroma-Token: <token> headers.标头中 Authorization: Bearer <token> 或标头中的 X-Chroma-Token: <token> 静态身份验证令牌。 |
Status 地位 | Alpha |
Alpha |
Server-Side Support 服务器端支持 | ✅ Alpha |
✅ Alpha |
Client/Python 客户端/Python | ✅ Alpha |
✅ Alpha |
Client/JS 客户端/JS | ✅ Alpha |
✅ Alpha |
Basic Authentication 基本身份验证
Server Setup 服务器设置
Generate Server-Side Credentials
生成服务器端凭据
A good security practice is to store the password securely. In the example below we use bcrypt (currently the only supported hash in Chroma server side auth) to hash the plaintext password.
一个好的安全做法是安全地存储密码。在下面的例子中,我们使用 bcrypt(目前 Chroma 服务器端身份验证中唯一支持的哈希值)来哈希明文密码。
To generate the password hash, run the following command. Note that you will need to have htpasswd
installed on your system.
若要生成密码哈希,请运行以下命令。请注意,您需要在系统上 htpasswd
安装。
htpasswd -Bbn admin admin > server.htpasswd
Running the Server 运行服务器
Set the following environment variables:
设置以下环境变量:
exportCHROMA_SERVER_AUTH_CREDENTIALS_FILE="server.htpasswd"
exportCHROMA_SERVER_AUTH_CREDENTIALS_PROVIDER="chromadb.auth.providers.HtpasswdFileServerAuthCredentialsProvider"
exportCHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.basic.BasicAuthServerProvider"
And run the server as normal:
并照常运行服务器:
chroma run --path /db_path
Client Setup 客户端设置
import chromadb
from chromadb.config import Settings
client = chromadb.HttpClient(
settings=Settings(chroma_client_auth_provider="chromadb.auth.basic.BasicAuthClientProvider",chroma_client_auth_credentials="admin:admin"))
client.heartbeat()# this should work with or without authentication - it is a public endpoint
client.get_version()# this should work with or without authentication - it is a public endpoint
client.list_collections()# this is a protected endpoint and requires authentication
Client Setup
import{ChromaClient}from"chromadb";
const client =newChromaClient({
auth:{provider:"basic",credentials:"admin:admin"},
});
Static API Token Authentication
静态 API 令牌身份验证
Tokens must be alphanumeric ASCII strings. Tokens are case-sensitive.
令牌必须是字母数字 ASCII 字符串。令牌区分大小写。
Server Setup 服务器设置
Current implementation of static API token auth supports only ENV based tokens.
静态 API 令牌身份验证的当前实现仅支持基于 ENV 的令牌。
Running the Server 运行服务器
Set the following environment variables to use Authorization: Bearer test-token
to be your authentication header.
设置以下环境变量以用作 Authorization: Bearer test-token
身份验证标头。
exportCHROMA_SERVER_AUTH_CREDENTIALS="test-token"
exportCHROMA_SERVER_AUTH_CREDENTIALS_PROVIDER="chromadb.auth.token.TokenConfigServerAuthCredentialsProvider"
exportCHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.token.TokenAuthServerProvider"
to use X-Chroma-Token: test-token
type of authentication header you can set an additional environment variable.
若要使用 X-Chroma-Token: test-token
身份验证标头的类型,可以设置其他环境变量。
exportCHROMA_SERVER_AUTH_CREDENTIALS="test-token"
exportCHROMA_SERVER_AUTH_CREDENTIALS_PROVIDER="chromadb.auth.token.TokenConfigServerAuthCredentialsProvider"
exportCHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.token.TokenAuthServerProvider"
exportCHROMA_SERVER_AUTH_TOKEN_TRANSPORT_HEADER="X_CHROMA_TOKEN"
Client Setup 客户端设置
import chromadb
from chromadb.config import Settings
client = chromadb.HttpClient(
settings=Settings(chroma_client_auth_provider="chromadb.auth.token.TokenAuthClientProvider",
chroma_client_auth_credentials="test-token"))
client.heartbeat()# this should work with or without authentication - it is a public endpoint
client.get_version()# this should work with or without authentication - it is a public endpoint
client.list_collections()# this is a protected endpoint and requires authentication
Client Setup
Using the default Authorization: Bearer <token>
header:
import{ChromaClient}from"chromadb";
const client =newChromaClient({
auth:{provider:"token",credentials:"test-token"},
});
//or explicitly specifying the auth header type
const client =newChromaClient({
auth:{
provider:"token",
credentials:"test-token",
providerOptions:{headerType:"AUTHORIZATION"},
},
});
Using custom Chroma auth token X-Chroma-Token: <token>
header:
import{ChromaClient}from"chromadb";
const client =newChromaClient({
auth:{
provider:"token",
credentials:"test-token",
providerOptions:{headerType:"X_CHROMA_TOKEN"},
},
});