chroma使用指南官方文档&翻译

2024-03-24 19:41 l_v_y_forever 阅读(1850) 评论(0) 编辑收藏举报

🧪 Usage Guide

🧪 Usage Guide 🧪 使用指南

Select a language 选择语言

Python 蟒
JavaScript JavaScript的

Initiating a persistent Chroma client
启动持久色度客户端

Python
JavaScript

import chromadb

You can configure Chroma to save and load from your local machine. Data will be persisted automatically and loaded on start (if it exists).
您可以将 Chroma 配置为从本地计算机保存和加载。数据将自动保留并在启动时加载（如果存在）。

client = chromadb.PersistentClient(path="/path/to/save/to")
 

The path is where Chroma will store its database files on disk, and load them on start.
path 这是 Chroma 将其数据库文件存储在磁盘上并在启动时加载它们的地方。

// CJS
const{ChromaClient}=require("chromadb");

// ESM
import{ChromaClient}from'chromadb'
 

Connecting to the backend

To connect with the JS client, you must connect to a backend running Chroma. See Running Chroma in client/server mode for how to do this.

const client =newChromaClient();
 

Python
JavaScript

The client object has a few useful convenience methods.
客户端对象具有一些有用的便捷方法。

client.heartbeat()# returns a nanosecond heartbeat. Useful for making sure the client remains connected.
client.reset()# Empties and completely resets the database. ⚠️ This is destructive and not reversible.
 

The client object has a few useful convenience methods.

await client.reset() # Empties and completely resets the database.⚠️This is destructive and not reversible.
 

Running Chroma in client/server mode
在客户端/服务器模式下运行 Chroma

Python
JavaScript

Chroma can also be configured to run in client/server mode. In this mode, the Chroma client connects to a Chroma server running in a separate process.
Chroma 也可以配置为在客户端/服务器模式下运行。在此模式下，Chroma 客户端连接到在单独进程中运行的 Chroma 服务器。

To start the Chroma server, run the following command:
要启动 Chroma 服务器，请运行以下命令：

chroma run --path /db_path

Then use the Chroma HTTP client to connect to the server:
然后使用Chroma HTTP客户端连接到服务器：

import chromadb
chroma_client = chromadb.HttpClient(host='localhost', port=8000)
 

That's it! Chroma's API will run in client-server mode with just this change.
就是这样！Chroma 的 API 将在仅此更改的模式下 client-server 运行。

Using the python http-only client
使用 python http-only 客户端

If you are running chroma in client-server mode, you may not need the full Chroma library. Instead, you can use the lightweight client-only library. In this case, you can install the chromadb-client package. This package is a lightweight HTTP client for the server with a minimal dependency footprint.
如果您在客户端-服务器模式下运行色度，则可能不需要完整的色度库。相反，您可以使用轻量级的仅限客户端的库。在这种情况下，您可以安装该 chromadb-client 软件包。此包是服务器的轻量级 HTTP 客户端，具有最小的依赖项占用空间。

pip install chromadb-client
 

import chromadb
# Example setup of the client to connect to your chroma server
client = chromadb.HttpClient(host='localhost', port=8000)
 

Note that the chromadb-client package is a subset of the full Chroma library and does not include all the dependencies. If you want to use the full Chroma library, you can install the chromadb package instead. Most importantly, there is no default embedding function. If you add() documents without embeddings, you must have manually specified an embedding function and installed the dependencies for it.
请注意，该 chromadb-client 软件包是完整 Chroma 库的子集，不包括所有依赖项。如果要使用完整的 Chroma 库，则可以改为安装该 chromadb 软件包。最重要的是，没有默认的嵌入函数。如果添加（）文档而没有嵌入，则必须手动指定嵌入函数并为其安装依赖项。

To run Chroma in client server mode, first install the chroma library and CLI via pypi:

pip install chromadb
 

Then start the Chroma server:

chroma run --path /db_path

The JS client then talks to the chroma server backend.

// CJS
const{ChromaClient}=require("chromadb");

// ESM
import{ChromaClient}from'chromadb'

const client =newChromaClient();
 

You can also run the Chroma server in a docker container, or deployed to a cloud provider. See the deployment docs for more information.

Using collections 使用集合

Chroma lets you manage collections of embeddings, using the collection primitive.
Chroma 允许您使用 collection 基元管理嵌入集合。

Creating, inspecting, and deleting Collections
创建、检查和删除集合

Chroma uses collection names in the url, so there are a few restrictions on naming them:
Chroma 在 url 中使用集合名称，因此命名它们有一些限制：

The length of the name must be between 3 and 63 characters.
名称的长度必须介于 3 到 63 个字符之间。
The name must start and end with a lowercase letter or a digit, and it can contain dots, dashes, and underscores in between.
名称必须以小写字母或数字开头和结尾，并且中间可以包含点、破折号和下划线。
The name must not contain two consecutive dots.
名称不得包含两个连续的点。
The name must not be a valid IP address.
该名称不得是有效的 IP 地址。

Chroma collections are created with a name and an optional embedding function. If you supply an embedding function, you must supply it every time you get the collection.
色度集合是使用名称和可选嵌入功能创建的。如果提供嵌入函数，则每次获取集合时都必须提供该函数。

Python
JavaScript

collection = client.create_collection(name="my_collection", embedding_function=emb_fn)
collection = client.get_collection(name="my_collection", embedding_function=emb_fn)
 

caution 谨慎

If you later wish to get_collection, you MUST do so with the embedding function you supplied while creating the collection
如果您以后希望 get_collection 这样做，则必须使用您在创建集合时提供的嵌入函数来执行此操作

The embedding function takes text as input, and performs tokenization and embedding. If no embedding function is supplied, Chroma will use sentence transformer as a default.
嵌入函数将文本作为输入，并执行标记化和嵌入。如果未提供嵌入函数，则 Chroma 将默认使用句子转换器。

// CJS
const{ChromaClient}=require("chromadb");

// ESM
import{ChromaClient}from'chromadb'
 

The JS client talks to a chroma server backend. This can run on your local computer or be easily deployed to AWS.

let collection =await client.createCollection({
name:"my_collection",
embeddingFunction: emb_fn,
});
let collection2 =await client.getCollection({
name:"my_collection",
embeddingFunction: emb_fn,
});
 

caution

If you later wish to getCollection, you MUST do so with the embedding function you supplied while creating the collection

The embedding function takes text as input, and performs tokenization and embedding.

You can learn more about 🧬 embedding functions, and how to create your own.
您可以了解有关嵌入函数以及如何创建自己的函数的详细信息🧬。

Python
JavaScript

Existing collections can be retrieved by name with .get_collection, and deleted with .delete_collection. You can also use .get_or_create_collection to get a collection if it exists, or create it if it doesn't.
可以使用按名称检索 .get_collection 现有集合，并使用 .delete_collection 删除现有集合。您还可以使用 .get_or_create_collection 来获取集合（如果存在），或者创建集合（如果不存在）。

collection = client.get_collection(name="test")# Get a collection object from an existing collection, by name. Will raise an exception if it's not found.
collection = client.get_or_create_collection(name="test")# Get a collection object from an existing collection, by name. If it doesn't exist, create it.
client.delete_collection(name="my_collection")# Delete a collection and all associated embeddings, documents, and metadata. ⚠️ This is destructive and not reversible
 

Existing collections can be retrieved by name with .getCollection, and deleted with .deleteCollection.

const collection =await client.getCollection({name:"test"}) # Get a collection object from an existing collection, by name.Will raise an exception of it's not found.
await client.deleteCollection({name:"my_collection"}) # Delete a collection and all associated embeddings, documents, and metadata.⚠️This is destructive and not reversible
 

Collections have a few useful convenience methods.
集合有一些有用的便利方法。

Python
JavaScript

collection.peek()# returns a list of the first 10 items in the collection
collection.count()# returns the number of items in the collection
collection.modify(name="new_name")# Rename the collection
 

await collection.peek();// returns a list of the first 10 items in the collection
await collection.count();// returns the number of items in the collection
 

Changing the distance function
更改距离函数

Python
Javascript

create_collection also takes an optional metadata argument which can be used to customize the distance method of the embedding space by setting the value of hnsw:space.
create_collection 还采用一个可选 metadata 参数，该参数可用于通过设置的 hnsw:space 值来自定义嵌入空间的 Distance 方法。

 collection = client.create_collection(
        name="collection_name",
        metadata={"hnsw:space":"cosine"}# l2 is the default
)
 

createCollection also takes an optional metadata argument which can be used to customize the distance method of the embedding space by setting the value of hnsw:space

let collection = client.createCollection({
name:"collection_name",
metadata:{"hnsw:space":"cosine"},
});
 

Valid options for hnsw:space are "l2", "ip, "or "cosine". The default is "l2" which is the squared L2 norm.
的 hnsw:space 有效选项为“l2”、“ip”或“余弦”。默认值为“l2”，即 L2 范数的平方。

Distance 距离	parameter 参数	Equation 方程
Squared L2 平方 L2	'l2' “L2”	$d = \sum\left(A_i-B_i\right)^2$ $d = \sum\left（A_i-B_i\right）^2$
Inner product 内积	'ip' “ip”	$d = 1.0 - \sum\left(A_i \times B_i\right) $ $d = 1.0 - \sum\left（A_i \times B_i\right） $
Cosine similarity 余弦相似度	'cosine' “余弦”	$d = 1.0 - \frac{\sum\left(A_i \times B_i\right)}{\sqrt{\sum\left(A_i^2\right)} \cdot \sqrt{\sum\left(B_i^2\right)}}$ $d = 1.0 - \frac{\sum\left（A_i \times B_i\right）}{\sqrt{\sum\left（A_i^2\right）} \cdot \sqrt{\sum\left（B_i^2\right）}}$

Adding data to a Collection
将数据添加到集合

Add data to Chroma with .add.
使用 .add .

Raw documents: 原始文档：

Python
JavaScript

collection.add(
    documents=["lorem ipsum...","doc2","doc3",...],
    metadatas=[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
    ids=["id1","id2","id3",...]
)
 

await collection.add({
ids:["id1","id2","id3",...],
metadatas:[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
documents:["lorem ipsum...","doc2","doc3",...],
})
// input order
// ids - required
// embeddings - optional
// metadata - optional
// documents - optional
 

If Chroma is passed a list of documents, it will automatically tokenize and embed them with the collection's embedding function (the default will be used if none was supplied at collection creation). Chroma will also store the documents themselves. If the documents are too large to embed using the chosen embedding function, an exception will be raised.
如果向 Chroma 传递了 documents 的列表，它将自动标记并使用集合的嵌入函数嵌入它们（如果在创建集合时未提供任何内容，则将使用默认值）。Chroma 也会 documents 存储自己。如果文档太大而无法使用所选的嵌入函数进行嵌入，则会引发异常。

Each document must have a unique associated id. Trying to .add the same ID twice will result in only the initial value being stored. An optional list of metadata dictionaries can be supplied for each document, to store additional information and enable filtering.
每个文档必须具有唯一的关联 id 。尝试 .add 使用同一 ID 两次将导致仅存储初始值。可以为每个文档提供可选的 metadata 词典列表，以存储其他信息并启用筛选。

Alternatively, you can supply a list of document-associated embeddings directly, and Chroma will store the associated documents without embedding them itself.
或者，您可以直接提供与文档关联 embeddings 的列表，Chroma 将存储关联的文档，而无需嵌入它们本身。

Python
JavaScript

collection.add(
    documents=["doc1","doc2","doc3",...],
    embeddings=[[1.1,2.3,3.2],[4.5,6.9,4.4],[1.1,2.3,3.2],...],
    metadatas=[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
    ids=["id1","id2","id3",...]
)
 

await collection.add({
ids:["id1","id2","id3",...],
embeddings:[[1.1,2.3,3.2],[4.5,6.9,4.4],[1.1,2.3,3.2],...],
metadatas:[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
documents:["lorem ipsum...","doc2","doc3",...],
})

 

If the supplied embeddings are not the same dimension as the collection, an exception will be raised.
如果提供的 embeddings 维度与集合的维度不同，则会引发异常。

You can also store documents elsewhere, and just supply a list of embeddings and metadata to Chroma. You can use the ids to associate the embeddings with your documents stored elsewhere.
您还可以将文档存储在其他位置，只需提供和 metadata 的列表 embeddings 。您可以使用将 ids 嵌入内容与存储在别处的文档相关联。

Python
JavaScript

collection.add(
    embeddings=[[1.1,2.3,3.2],[4.5,6.9,4.4],[1.1,2.3,3.2],...],
    metadatas=[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
    ids=["id1","id2","id3",...]
)
 

await collection.add({
ids:["id1","id2","id3",...],
embeddings:[[1.1,2.3,3.2],[4.5,6.9,4.4],[1.1,2.3,3.2],...],
metadatas:[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
})
 

Querying a Collection 查询集合

Chroma collections can be queried in a variety of ways, using the .query method.
使用该 .query 方法可以通过多种方式查询色度集合。

You can query by a set of query_embeddings.
您可以按一组 query_embeddings .

Python
JavaScript

collection.query(
    query_embeddings=[[11.1,12.1,13.1],[1.1,2.3,3.2],...],
    n_results=10,
    where={"metadata_field":"is_equal_to_this"},
    where_document={"$contains":"search_string"}
)
 

const result =await collection.query({
queryEmbeddings:[[11.1,12.1,13.1],[1.1,2.3,3.2],...],
nResults:10,
where:{"metadata_field":"is_equal_to_this"},
})
// input order
// query_embeddings - optional
// n_results - required
// where - optional
// query_texts - optional
 

The query will return the n_results closest matches to each query_embedding, in order. An optional where filter dictionary can be supplied to filter by the metadata associated with each document. Additionally, an optional where_document filter dictionary can be supplied to filter by contents of the document.
查询将按顺序返回每个 query_embedding 的 n_results 最接近的匹配项。可以提供可选的 where 筛选器字典，以按与每个文档关联的筛选 metadata 器进行筛选。此外，还可以提供可选的 where_document 筛选器词典，以按文档内容进行筛选。

If the supplied query_embeddings are not the same dimension as the collection, an exception will be raised.
如果提供的 query_embeddings 维度与集合的维度不同，则会引发异常。

You can also query by a set of query_texts. Chroma will first embed each query_text with the collection's embedding function, and then perform the query with the generated embedding.
您还可以按一组 query_texts .Chroma 将首先使用集合的嵌入函数嵌入每个 query_text 集合，然后使用生成的嵌入执行查询。

Python
JavaScript

collection.query(
    query_texts=["doc10","thus spake zarathustra",...],
    n_results=10,
    where={"metadata_field":"is_equal_to_this"},
    where_document={"$contains":"search_string"}
)
 

You can also retrieve items from a collection by id using .get.
还可以使用 id .get 从集合中检索项目。

collection.get(
    ids=["id1","id2","id3",...],
    where={"style":"style1"}
)
 

await collection.query({
nResults:10,// n_results
where:{"metadata_field":"is_equal_to_this"},// where
queryTexts:["doc10","thus spake zarathustra",...],// query_text
})
 

You can also retrieve items from a collection by id using .get.

await collection.get({
ids:["id1","id2","id3",...],//ids
where:{"style":"style1"}// where
})
 

.get also supports the where and where_document filters. If no ids are supplied, it will return all items in the collection that match the where and where_document filters.
.get 还支持 where AND where_document 筛选器。如果未提供任何 ids 项，它将返回集合中与 where 和 where_document 筛选器匹配的所有项。

Choosing which data is returned
选择返回的数据

When using get or query you can use the include parameter to specify which data you want returned - any of embeddings, documents, metadatas, and for query, distances. By default, Chroma will return the documents, metadatas and in the case of query, the distances of the results. embeddings are excluded by default for performance and the ids are always returned. You can specify which of these you want returned by passing an array of included field names to the includes parameter of the query or get method.
使用 get 或 query 时，可以使用 include 参数指定要返回的数据 - 任意 embeddings 、 documents 、和 metadatas for query， distances 。默认情况下，Chroma 将返回 documents ， metadatas 在查询的情况下，返回结果的 distances 。 embeddings 默认情况下，出于性能原因，将排除并始终返回。 ids 您可以通过将包含的字段名称数组传递给查询或 get 方法的 includes 参数来指定要返回的字段名称。

# Only get documents and ids
collection.get({
    include:["documents"]
})

collection.query({
    queryEmbeddings:[[11.1,12.1,13.1],[1.1,2.3,3.2],...],
    include:["documents"]
})
 

Using Where filters 使用 Where 过滤器

Chroma supports filtering queries by metadata and document contents. The where filter is used to filter by metadata, and the where_document filter is used to filter by document contents.
Chroma 支持按 metadata 内容 document 过滤查询。 where 过滤器用于按 metadata 进行过滤， where_document 过滤器用于按 document 内容进行过滤。

Filtering by metadata 按元数据筛选

In order to filter on metadata, you must supply a where filter dictionary to the query. The dictionary must have the following structure:
为了筛选元数据，必须为查询提供 where 筛选器字典。字典必须具有以下结构：

{
"metadata_field":{
<Operator>:<Value>
}
}
 

Filtering metadata supports the following operators:
筛选元数据支持以下运算符：

$eq - equal to (string, int, float)
$eq - 等于（字符串、整数、浮点数）
$ne - not equal to (string, int, float)
$ne - 不等于（string， int， float）
$gt - greater than (int, float)
$gt - 大于（int， float）
$gte - greater than or equal to (int, float)
$gte - 大于或等于（int， float）
$lt - less than (int, float)
$lt - 小于（int， float）
$lte - less than or equal to (int, float)
$lte - 小于或等于（int， float）

Using the $eq operator is equivalent to using the where filter.
使用 $eq 运算符等同于使用 where 筛选器。

{
"metadata_field":"search_string"
}

# is equivalent to

{
"metadata_field":{
"$eq":"search_string"
}
}

 

note 注意

Where filters only search embeddings where the key exists. If you search collection.get(where={"version": {"$ne": 1}}). Metadata that does not have the key version will not be returned.
其中筛选器仅搜索键存在的嵌入。如果您搜索 collection.get(where={"version": {"$ne": 1}}) .没有密钥 version 的元数据将不会返回。

Filtering by document contents
按文档内容筛选

In order to filter on document contents, you must supply a where_document filter dictionary to the query. We support two filtering keys: $contains and $not_contains. The dictionary must have the following structure:
为了筛选文档内容，必须向查询提供 where_document 筛选器字典。我们支持两个筛选键： $contains 和 $not_contains 。字典必须具有以下结构：

# Filtering for a search_string
{
"$contains":"search_string"
}
 

# Filtering for not contains
{
"$not_contains":"search_string"
}
 

Using logical operators
使用逻辑运算符

You can also use the logical operators $and and $or to combine multiple filters.
您还可以使用逻辑运算符 $and 并 $or 组合多个筛选器。

An $and operator will return results that match all of the filters in the list.
$and 运算符将返回与列表中所有筛选器匹配的结果。

{
"$and":[
{
"metadata_field":{
<Operator>:<Value>
}
},
{
"metadata_field":{
<Operator>:<Value>
}
}
]
}
 

An $or operator will return results that match any of the filters in the list.
$or 运算符将返回与列表中任何筛选器匹配的结果。

{
"$or":[
{
"metadata_field":{
<Operator>:<Value>
}
},
{
"metadata_field":{
<Operator>:<Value>
}
}
]
}
 

Using inclusion operators (`$in` and `$nin`)
使用包含运算符（ `$in` 和 `$nin` ）

The following inclusion operators are supported:
支持以下包含运算符：

$in - a value is in predefined list (string, int, float, bool)
$in - 值位于预定义列表中（字符串、整数、浮点数、布尔值）
$nin - a value is not in predefined list (string, int, float, bool)
$nin - 值不在预定义列表中（字符串、整数、浮点数、布尔值）

An $in operator will return results where the metadata attribute is part of a provided list:
$in 运算符将返回结果，其中元数据属性是提供的列表的一部分：

{
"metadata_field":{
"$in":["value1","value2","value3"]
}
}
 

An $nin operator will return results where the metadata attribute is not part of a provided list:
$nin 运算符将返回元数据属性不属于所提供列表的结果：

{
"metadata_field":{
"$nin":["value1","value2","value3"]
}
}
 

Practical examples 实例

For additional examples and a demo how to use the inclusion operators, please see provided notebook here
有关其他示例和如何使用包含运算符的演示，请参阅此处提供的笔记本

Python
JavaScript

Updating data in a collection
更新集合中的数据

Any property of items in a collection can be updated using .update.
集合中项的任何属性都可以使用 .update 进行更新。

collection.update(
    ids=["id1","id2","id3",...],
    embeddings=[[1.1,2.3,3.2],[4.5,6.9,4.4],[1.1,2.3,3.2],...],
    metadatas=[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
    documents=["doc1","doc2","doc3",...],
)
 

If an id is not found in the collection, an error will be logged and the update will be ignored. If documents are supplied without corresponding embeddings, the embeddings will be recomputed with the collection's embedding function.
如果在集合中找不到 an id ，则将记录错误并忽略更新。如果 documents 提供时没有对应 embeddings 的，则将使用集合的嵌入函数重新计算嵌入。

If the supplied embeddings are not the same dimension as the collection, an exception will be raised.
如果提供的 embeddings 维度与集合的维度不同，则会引发异常。

Chroma also supports an upsert operation, which updates existing items, or adds them if they don't yet exist.
Chroma 还支持更新 upsert 现有项目或添加现有项目（如果尚不存在）的操作。

Python
JavaScript

collection.upsert(
    ids=["id1","id2","id3",...],
    embeddings=[[1.1,2.3,3.2],[4.5,6.9,4.4],[1.1,2.3,3.2],...],
    metadatas=[{"chapter":"3","verse":"16"},{"chapter":"3","verse":"5"},{"chapter":"29","verse":"11"},...],
    documents=["doc1","doc2","doc3",...],
)
 

await collection.upsert({
ids:["id1","id2","id3"],
embeddings:[
[1.1,2.3,3.2],
[4.5,6.9,4.4],
[1.1,2.3,3.2],
],
metadatas:[
{chapter:"3",verse:"16"},
{chapter:"3",verse:"5"},
{chapter:"29",verse:"11"},
],
documents:["doc1","doc2","doc3"],
});
 

If an id is not present in the collection, the corresponding items will be created as per add. Items with existing ids will be updated as per update.
如果集合中不存在， id 则将根据 add 创建相应的项。具有现有 id s 的项目将根据 update .

Deleting data from a collection
从集合中删除数据

Chroma supports deleting items from a collection by id using .delete. The embeddings, documents, and metadata associated with each item will be deleted. ⚠️ Naturally, this is a destructive operation, and cannot be undone.
Chroma 支持使用 id .delete 从集合中删除项目。与每个项目关联的嵌入、文档和元数据将被删除。⚠️ 当然，这是一个破坏性的行动，无法撤消。

Python
JavaScript

collection.delete(
    ids=["id1","id2","id3",...],
    where={"chapter":"20"}
)
 

await collection.delete({
ids:["id1","id2","id3",...],//ids
where:{"chapter":"20"}//where
})
 

.delete also supports the where filter. If no ids are supplied, it will delete all items in the collection that match the where filter.
.delete 还支持 where 过滤器。如果未提供， ids 它将删除集合中与 where 筛选器匹配的所有项。

Authentication 认证

You can configure Chroma to use authentication when in server/client mode only.
您可以将 Chroma 配置为仅在服务器/客户端模式下使用身份验证。

Supported authentication methods:
支持的身份验证方法：

Authentication Method 身份验证方法	Basic Auth (Pre-emptive) 基本身份验证（抢占式）	Static API Token 静态 API 令牌
Description 描述	RFC 7617 Basic Auth with `user:password` base64-encoded `Authorization` header. RFC 7617 基本身份验证，带有 `user:password` base64 编码 `Authorization` 的标头。	Static auth token in `Authorization: Bearer <token>` or in `X-Chroma-Token: <token>` headers. 标头中 `Authorization: Bearer <token>` 或标头中的 `X-Chroma-Token: <token>` 静态身份验证令牌。
Status 地位	`Alpha`	`Alpha`
Server-Side Support 服务器端支持	✅ `Alpha`	✅ `Alpha`
Client/Python 客户端/Python	✅ `Alpha`	✅ `Alpha`
Client/JS 客户端/JS	✅ `Alpha`	✅ `Alpha`

Basic Authentication 基本身份验证

Server Setup 服务器设置

Generate Server-Side Credentials
生成服务器端凭据

Security Practices 安全实践

A good security practice is to store the password securely. In the example below we use bcrypt (currently the only supported hash in Chroma server side auth) to hash the plaintext password.
一个好的安全做法是安全地存储密码。在下面的例子中，我们使用 bcrypt（目前 Chroma 服务器端身份验证中唯一支持的哈希值）来哈希明文密码。

To generate the password hash, run the following command. Note that you will need to have htpasswd installed on your system.
若要生成密码哈希，请运行以下命令。请注意，您需要在系统上 htpasswd 安装。

htpasswd -Bbn admin admin > server.htpasswd
 

Running the Server 运行服务器

Set the following environment variables:
设置以下环境变量：

exportCHROMA_SERVER_AUTH_CREDENTIALS_FILE="server.htpasswd"
exportCHROMA_SERVER_AUTH_CREDENTIALS_PROVIDER="chromadb.auth.providers.HtpasswdFileServerAuthCredentialsProvider"
exportCHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.basic.BasicAuthServerProvider"
 

And run the server as normal:
并照常运行服务器：

chroma run --path /db_path

Python
JavaScript

Client Setup 客户端设置

import chromadb
from chromadb.config import Settings

client = chromadb.HttpClient(
  settings=Settings(chroma_client_auth_provider="chromadb.auth.basic.BasicAuthClientProvider",chroma_client_auth_credentials="admin:admin"))
client.heartbeat()# this should work with or without authentication - it is a public endpoint

client.get_version()# this should work with or without authentication - it is a public endpoint

client.list_collections()# this is a protected endpoint and requires authentication
 

Client Setup

import{ChromaClient}from"chromadb";

const client =newChromaClient({
auth:{provider:"basic",credentials:"admin:admin"},
});
 

Static API Token Authentication
静态 API 令牌身份验证

Tokens 令牌

Tokens must be alphanumeric ASCII strings. Tokens are case-sensitive.
令牌必须是字母数字 ASCII 字符串。令牌区分大小写。

Server Setup 服务器设置

Security Note 安全说明

Current implementation of static API token auth supports only ENV based tokens.
静态 API 令牌身份验证的当前实现仅支持基于 ENV 的令牌。

Running the Server 运行服务器

Set the following environment variables to use Authorization: Bearer test-token to be your authentication header.
设置以下环境变量以用作 Authorization: Bearer test-token 身份验证标头。

exportCHROMA_SERVER_AUTH_CREDENTIALS="test-token"
exportCHROMA_SERVER_AUTH_CREDENTIALS_PROVIDER="chromadb.auth.token.TokenConfigServerAuthCredentialsProvider"
exportCHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.token.TokenAuthServerProvider"
 

to use X-Chroma-Token: test-token type of authentication header you can set an additional environment variable.
若要使用 X-Chroma-Token: test-token 身份验证标头的类型，可以设置其他环境变量。

exportCHROMA_SERVER_AUTH_CREDENTIALS="test-token"
exportCHROMA_SERVER_AUTH_CREDENTIALS_PROVIDER="chromadb.auth.token.TokenConfigServerAuthCredentialsProvider"
exportCHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.token.TokenAuthServerProvider"
exportCHROMA_SERVER_AUTH_TOKEN_TRANSPORT_HEADER="X_CHROMA_TOKEN"
 

Python
JavaScript

Client Setup 客户端设置

import chromadb
from chromadb.config import Settings

client = chromadb.HttpClient(
    settings=Settings(chroma_client_auth_provider="chromadb.auth.token.TokenAuthClientProvider",
                      chroma_client_auth_credentials="test-token"))
client.heartbeat()# this should work with or without authentication - it is a public endpoint

client.get_version()# this should work with or without authentication - it is a public endpoint

client.list_collections()# this is a protected endpoint and requires authentication
 

Client Setup

Using the default Authorization: Bearer <token> header:

import{ChromaClient}from"chromadb";

const client =newChromaClient({
auth:{provider:"token",credentials:"test-token"},
});
//or explicitly specifying the auth header type
const client =newChromaClient({
auth:{
provider:"token",
credentials:"test-token",
providerOptions:{headerType:"AUTHORIZATION"},
},
});
 

Using custom Chroma auth token X-Chroma-Token: <token> header:

import{ChromaClient}from"chromadb";

const client =newChromaClient({
auth:{
provider:"token",
credentials:"test-token",
providerOptions:{headerType:"X_CHROMA_TOKEN"},
},
});
 

Previous 以前

🔑 Getting Started 🔑 开始

Next 下一个

🖼️ Multi-modal 🖼️ 多式联运

Initiating a persistent Chroma client
启动持久色度客户端
Running Chroma in client/server mode
在客户端/服务器模式下运行 Chroma
Using collections 使用集合
Authentication 认证

Basic Authentication 基本身份验证
Static API Token Authentication
静态 API 令牌身份验证

参考文档：

1、https://docs.trychroma.com/usage-guide

2、https://www.cnblogs.com/wanghengbin/p/18092869

3、https://blog.csdn.net/xzq_qzx_/article/details/136535125

刷新页面返回顶部

l_v_y_forever

chroma使用指南官方文档&翻译

🧪 Usage Guide 🧪 使用指南

Initiating a persistent Chroma client​启动持久色度客户端

Running Chroma in client/server mode​在客户端/服务器模式下运行 Chroma

Using the python http-only client​使用 python http-only 客户端

Using collections​ 使用集合

Creating, inspecting, and deleting Collections​创建、检查和删除集合

Changing the distance function​更改距离函数

Adding data to a Collection​将数据添加到集合

Querying a Collection​ 查询集合

Choosing which data is returned​选择返回的数据

Using Where filters​ 使用 Where 过滤器

Filtering by metadata​ 按元数据筛选

Filtering by document contents​按文档内容筛选

Using logical operators​使用逻辑运算符

Using inclusion operators ($in and $nin)​使用包含运算符 （ $in 和 $nin ）

Updating data in a collection​更新集合中的数据

Deleting data from a collection​从集合中删除数据

Authentication​ 认证

Basic Authentication​ 基本身份验证

Server Setup​ 服务器设置

Generate Server-Side Credentials​生成服务器端凭据

Running the Server​ 运行服务器

Client Setup​ 客户端设置

Client Setup​

Static API Token Authentication​静态 API 令牌身份验证

Server Setup​ 服务器设置

Running the Server​ 运行服务器

Client Setup​ 客户端设置

Client Setup​

About

Initiating a persistent Chroma client
启动持久色度客户端

Running Chroma in client/server mode
在客户端/服务器模式下运行 Chroma

Using the python http-only client
使用 python http-only 客户端

Using collections 使用集合

Creating, inspecting, and deleting Collections
创建、检查和删除集合

Changing the distance function
更改距离函数

Adding data to a Collection
将数据添加到集合

Querying a Collection 查询集合

Choosing which data is returned
选择返回的数据

Using Where filters 使用 Where 过滤器

Filtering by metadata 按元数据筛选

Filtering by document contents
按文档内容筛选

Using logical operators
使用逻辑运算符

Using inclusion operators (`$in` and `$nin`)
使用包含运算符（ `$in` 和 `$nin` ）

Updating data in a collection
更新集合中的数据

Deleting data from a collection
从集合中删除数据

Authentication 认证

Basic Authentication 基本身份验证

Server Setup 服务器设置

Generate Server-Side Credentials
生成服务器端凭据

Running the Server 运行服务器

Client Setup 客户端设置

Client Setup

Static API Token Authentication
静态 API 令牌身份验证

Server Setup 服务器设置

Running the Server 运行服务器

Client Setup 客户端设置

Client Setup