🌐 AI搜索 & 代理 主页
Skip to content

Conversation

@SilasMarvin
Copy link
Contributor

@SilasMarvin SilasMarvin commented Jun 3, 2024

This integration brings reranking into postgresml.

pgml=# SELECT pgml.rank('mixedbread-ai/mxbai-rerank-large-v1', 'test', array_agg(md5(random()::text)), '{"return_documents": false, "top_k": 10}') FROM generate_series(1, 100);
           rank            
---------------------------
 (58,0.20096051692962646,)
 (91,0.2007983922958374,)
 (84,0.1950932741165161,)
 (83,0.1925133764743805,)
 (7,0.1918289214372635,)
 (15,0.1851881593465805,)
 (67,0.18225009739398956,)
 (94,0.1795625537633896,)
 (40,0.17863182723522186,)
 (97,0.177006796002388,)
(10 rows)

@SilasMarvin SilasMarvin requested review from kczimm and montanalow June 3, 2024 18:07

if transformer not in __cache_sentence_transformer_by_name:
__cache_sentence_transformer_by_name[transformer] = create_cross_encoder(
transformer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass kwargs through to create_cross_encoder we can specify the device, https://www.sbert.net/docs/package_reference/cross_encoder/cross_encoder.html?highlight=crossencoder#sentence_transformers.cross_encoder.CrossEncoder

We should do this for the SentenceTransformer constructor, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have to create a separate argument for this, or pop specific arguments from kwargs. If just pass kwargs straight through we will get an unexpected keyword argument error.

Copy link
Contributor

@kczimm kczimm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should bump the version (probably to 2.9.0 since it's an API addition) and add the following migration file sql/pgml--2.8.5--2.9.0.sql:

-- src/api.rs:613
-- pgml::api::rank
CREATE  FUNCTION pgml."rank"(
	"transformer" TEXT, /* &str */
	"query" TEXT, /* &str */
	"documents" TEXT[], /* alloc::vec::Vec<&str> */
	"kwargs" jsonb DEFAULT '{}' /* pgrx::datum::json::JsonB */
) RETURNS TABLE (
	"corpus_id" bigint,  /* i64 */
	"score" double precision,  /* f64 */
	"text" TEXT  /* core::option::Option<alloc::string::String> */
)
IMMUTABLE STRICT PARALLEL SAFE 
LANGUAGE c /* Rust */
AS 'MODULE_PATHNAME', 'rank_wrapper';

@SilasMarvin
Copy link
Contributor Author

We should bump the version (probably to 2.9.0 since it's an API addition) and add the following migration file sql/pgml--2.8.5--2.9.0.sql:

-- src/api.rs:613
-- pgml::api::rank
CREATE  FUNCTION pgml."rank"(
	"transformer" TEXT, /* &str */
	"query" TEXT, /* &str */
	"documents" TEXT[], /* alloc::vec::Vec<&str> */
	"kwargs" jsonb DEFAULT '{}' /* pgrx::datum::json::JsonB */
) RETURNS TABLE (
	"corpus_id" bigint,  /* i64 */
	"score" double precision,  /* f64 */
	"text" TEXT  /* core::option::Option<alloc::string::String> */
)
IMMUTABLE STRICT PARALLEL SAFE 
LANGUAGE c /* Rust */
AS 'MODULE_PATHNAME', 'rank_wrapper';

Done! 745b190

@SilasMarvin SilasMarvin merged commit fb2426f into master Jun 5, 2024
@SilasMarvin SilasMarvin deleted the silas-add-rank branch June 5, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants