在 Vector Search 2.0 中,集合将数据存储为称为数据对象的单个 JSON 对象。本页介绍了如何创建数据对象或从 Cloud Storage 存储桶导入数据对象,以及如何更新和删除数据对象。
创建数据对象
以下示例演示了如何向名为 movies 的集合添加数据对象。
curl -X POST \
'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies/dataObjects?dataObjectId=the-shawshank-redemption' \
-H 'Bearer $(gcloud auth print-access-token)' \
-H 'Content-Type: application/json' \
-d '{ \
"data": { \
"title": "The Shawshank Redemption", \
"genre": "Drama", \
"year": 1994, \
"director": "Frank Darabont" \
}, \
"vectors": { \
"plot_embedding": { \
"dense": { \
"values": [ \
0.4752082440607731, \
0.09026746166854707, \
0.8752307753619009 \
] \
} \
}, \
"genre_embedding": { \
"dense": { \
"values": [ \
0.38638010860523064, \
0.739343471733759, \
0.16189056837017107, \
0.5271366865924485 \
] \
} \
}, \
"soundtrack_embedding": { \
"dense": { \
"values": [ \
0.5920451749052875, \
0.08301644173787519, \
0.1264733498775969, \
0.6196429624200321, \
0.4925828581737443 \
] \
} \
}, \
"sparse_embedding": { \
"sparse": { \
"values": [ \
1, \
6, \
3, \
2, \
8, \
5, \
2 \
], \
"indices": [ \
4065, \
13326, \
17377, \
25918, \
28105, \
32683, \
42998 \
] \
} \
} \
} \
}'
系统会自动填充在集合架构中指定了自动嵌入的嵌入字段。您还可以自带嵌入内容 (BYOE),以设置不会自动填充的向量字段值。
导入数据对象
以下示例演示了如何将数据对象从 Cloud Storage 导入到名为 movies 的集合中。
curl -X POST \
"https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies:importDataObjects" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{ \
"gcs_import": { \
"contents_uri": "gs://your-bucket/path/to/your-data.jsonl", \
"error_uri": "gs://your-bucket/path/to/import-errors/" \
} \
}'
对于非常大的数据集,您可以从 Cloud Storage 存储桶批量导入数据。Vector Search 2.0 的文件格式为 JSONL,其中每一行都是一个 JSON 对象,包含三个顶级属性:data_object_id、data 和 vectors。
以下示例展示了包含所需属性的 JSONL。
{
"data_object_id": "movie-789",
"data": {
"title":"The Shawshank Redemption",
"plot": "...",
"year":1994,
"avg_rating": 8.5,
"movie_runtime_info": {
"hours": 2,
"minutes": 5
},
},
"vectors": {
"title_embedding": [-0.23, 0.88, 0.11, ...],
"sparse_embedding": {
"values": [0.01, -0.93, 0.27, ...],
"indices": [23, 83, 131, ...]
}
}
}
获取数据对象
以下示例演示了如何从 movies 集合中获取名为 the-shawshank-redemption 的数据对象。
curl -X GET \
'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies/dataObjects/the-shawshank-redemption' \
-H 'Bearer $(gcloud auth print-access-token)' \
-H 'Content-Type: application/json'
更新数据对象
以下示例演示了如何更新 movies 集合中名为 the-shawshank-redemption 的数据对象中的 title 字段。
curl -X PATCH \
'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies/dataObjects/the-shawshank-redemption' \
-H 'Bearer $(gcloud auth print-access-token)' \
-H 'Content-Type: application/json' \
-d '{ \
"data": { \
"title": "The Shawshank Redemption (updated)" \
}, \
"vectors": { \
"plot_embedding": { \
"dense": { \
"values": [ \
1.0, \
1.0, \
1.0 \
] \
} \
} \
} \
}'
删除数据对象
您可以按名称删除单个数据对象,也可以批量删除符合过滤条件的数据对象。
以下示例展示了如何从 movies 集合中删除数据对象 the-shawshank-redemption。
curl -X DELETE \
'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies/dataObjects/the-shawshank-redemption' \
-H 'Bearer $(gcloud auth print-access-token)' \
-H 'Content-Type: application/json'