数据对象

在 Vector Search 2.0 中,集合将数据存储为称为数据对象的单个 JSON 对象。本页介绍了如何创建数据对象或从 Cloud Storage 存储桶导入数据对象,以及如何更新和删除数据对象。

创建数据对象

以下示例演示了如何向名为 movies 的集合添加数据对象。

curl -X POST \
'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies/dataObjects?dataObjectId=the-shawshank-redemption' \
  -H 'Bearer $(gcloud auth print-access-token)' \
  -H 'Content-Type: application/json' \
  -d '{ \
    "data": { \
      "title": "The Shawshank Redemption", \
      "genre": "Drama", \
      "year": 1994, \
      "director": "Frank Darabont" \
    }, \
    "vectors": { \
      "plot_embedding": { \
        "dense": { \
          "values": [ \
            0.4752082440607731, \
            0.09026746166854707, \
            0.8752307753619009 \
          ] \
        } \
      }, \
      "genre_embedding": { \
        "dense": { \
          "values": [ \
            0.38638010860523064, \
            0.739343471733759, \
            0.16189056837017107, \
            0.5271366865924485 \
          ] \
        } \
      }, \
      "soundtrack_embedding": { \
        "dense": { \
          "values": [ \
            0.5920451749052875, \
            0.08301644173787519, \
            0.1264733498775969, \
            0.6196429624200321, \
            0.4925828581737443 \
          ] \
        } \
      }, \
      "sparse_embedding": { \
        "sparse": { \
          "values": [ \
            1, \
            6, \
            3, \
            2, \
            8, \
            5, \
            2 \
          ], \
          "indices": [ \
            4065, \
            13326, \
            17377, \
            25918, \
            28105, \
            32683, \
            42998 \
          ] \
        } \
      } \
    } \
  }'

系统会自动填充在集合架构中指定了自动嵌入的嵌入字段。您还可以自带嵌入内容 (BYOE),以设置不会自动填充的向量字段值。

导入数据对象

以下示例演示了如何将数据对象从 Cloud Storage 导入到名为 movies 的集合中。

curl -X POST \
"https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies:importDataObjects" \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  -d '{ \
    "gcs_import": { \
      "contents_uri": "gs://your-bucket/path/to/your-data.jsonl", \
      "error_uri": "gs://your-bucket/path/to/import-errors/" \
    } \
  }'

对于非常大的数据集,您可以从 Cloud Storage 存储桶批量导入数据。Vector Search 2.0 的文件格式为 JSONL,其中每一行都是一个 JSON 对象,包含三个顶级属性:data_object_iddatavectors

以下示例展示了包含所需属性的 JSONL。

{
  "data_object_id": "movie-789",
  "data": {
    "title":"The Shawshank Redemption",
    "plot": "...",
    "year":1994,
    "avg_rating": 8.5,
    "movie_runtime_info": {
        "hours": 2,
        "minutes": 5
    },
  },
  "vectors": {
    "title_embedding": [-0.23, 0.88, 0.11, ...],
    "sparse_embedding": {
      "values": [0.01, -0.93, 0.27, ...],
      "indices": [23, 83, 131, ...]
    }
  }
}

获取数据对象

以下示例演示了如何从 movies 集合中获取名为 the-shawshank-redemption 的数据对象。

curl -X GET \
'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies/dataObjects/the-shawshank-redemption'  \
    -H 'Bearer $(gcloud auth print-access-token)' \
    -H 'Content-Type: application/json'

更新数据对象

以下示例演示了如何更新 movies 集合中名为 the-shawshank-redemption 的数据对象中的 title 字段。

curl -X PATCH \
'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies/dataObjects/the-shawshank-redemption' \
  -H 'Bearer $(gcloud auth print-access-token)' \
  -H 'Content-Type: application/json' \
  -d '{ \
    "data": { \
      "title": "The Shawshank Redemption (updated)" \
    }, \
    "vectors": { \
      "plot_embedding": { \
        "dense": { \
          "values": [ \
            1.0, \
            1.0, \
            1.0 \
          ] \
        } \
      } \
    } \
  }'

删除数据对象

您可以按名称删除单个数据对象,也可以批量删除符合过滤条件的数据对象。

以下示例展示了如何从 movies 集合中删除数据对象 the-shawshank-redemption

curl -X DELETE \
'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies/dataObjects/the-shawshank-redemption' \
  -H 'Bearer $(gcloud auth print-access-token)' \
  -H 'Content-Type: application/json'

后续步骤