🌐 AI搜索 & 代理 主页
Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
18f8f44
Preliminary draft of semantic search in postgres in 15 minutes
SilasMarvin Jun 11, 2024
00bd75d
Cleanups
SilasMarvin Jun 12, 2024
068af92
Ready for review
SilasMarvin Jun 14, 2024
a9148da
Cleanup first paragraph
SilasMarvin Jun 17, 2024
3e0fa33
A few suggestions (#1536)
levkk Jun 17, 2024
c71fcd2
Add reason on why to use semantic search
SilasMarvin Jun 17, 2024
9b6e75f
Clean up spelling errors
SilasMarvin Jun 17, 2024
b451c9b
Fix more small spelling errors
SilasMarvin Jun 17, 2024
d418deb
Finish timings
SilasMarvin Jun 18, 2024
84872ac
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvin Jun 18, 2024
1686f93
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvin Jun 18, 2024
b2b9d88
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvin Jun 18, 2024
b8766bd
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvin Jun 18, 2024
4574183
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvin Jun 18, 2024
4db2149
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvin Jun 18, 2024
68368e2
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvin Jun 18, 2024
af8dd3e
Convert italics back to backticks
SilasMarvin Jun 18, 2024
2c156ae
Remove hnsw link out
SilasMarvin Jun 18, 2024
faf0be1
Alude to arrays
SilasMarvin Jun 18, 2024
27445f5
Finalize post
SilasMarvin Jun 18, 2024
427f77f
Merge branch 'master' into silas-semantic-search-in-postgres-in-15-mi…
SilasMarvin Jun 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Finish timings
  • Loading branch information
SilasMarvin committed Jun 18, 2024
commit d418debfdb243b1059130a709b0801c76d6331b6
33 changes: 23 additions & 10 deletions pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Embeddings are vectors. Given some text and some embedding model, we can convert

!!! generic

!!! code_block time="14.125 ms"
!!! code_block

```postgresql
SELECT pgml.embed('mixedbread-ai/mxbai-embed-large-v1', 'Generating embeddings in Postgres is fun!');
Expand Down Expand Up @@ -130,7 +130,7 @@ This is a somewhat confusing formula but luckily _pgvector_ provides an operato

!!! generic

!!! code_block time="64.643 ms"
!!! code_block

```postgresql
SELECT '[1,2,3]'::vector <=> '[2,3,4]'::vector;
Expand Down Expand Up @@ -206,17 +206,30 @@ It is inefficient to compute embeddings for all the documents every time we sear

_pgvector_ provides us with the `vector` data type for storing embeddings in regular PostgreSQL tables:


!!! generic

!!! code_block
!!! code_block time="12.547 ms"

```postgresql
CREATE TABLE text_and_embeddings (
id SERIAL PRIMARY KEY,
text text,
embedding vector (1024)
);
```

!!!

!!!

Let's add some data to our table:

!!! generic

!!! code_block time="72.156 ms"

```postgresql
INSERT INTO text_and_embeddings (text, embedding)
VALUES
(
Expand All @@ -240,11 +253,11 @@ VALUES

!!!

Once our table has some data, we can search it using the following query:
Now that our table has some data, we can search over it using the following query:

!!! generic

!!! code_block time="19.864 ms"
!!! code_block time="35.016 ms"

```postgresql
WITH query_embedding AS (
Expand Down Expand Up @@ -288,7 +301,7 @@ Let's demonstrate this by inserting 100,000 additional embeddings:

!!! generic

!!! code_block
!!! code_block time="3114242.499 ms"

```postgresql
INSERT INTO text_and_embeddings (text, embedding)
Expand All @@ -309,7 +322,7 @@ Now trying our search engine again:

!!! generic

!!! code_block time="105.917 ms"
!!! code_block time="138.252 ms"

```postgresql
WITH embedded_query AS (
Expand Down Expand Up @@ -364,7 +377,7 @@ and search again, we would get much better performance:

!!! generic

!!! code_block time="29.191 ms"
!!! code_block time="44.508 ms"

```postgresql
WITH embedded_query AS (
Expand Down Expand Up @@ -405,7 +418,7 @@ HNSW indexes typically have better and faster recall but require more compute wh

!!! generic

!!! code_block
!!! code_block time="115564.303"

```postgresql
DROP index text_and_embeddings_embedding_idx;
Expand All @@ -422,7 +435,7 @@ Now let's try searching again:

!!! generic

!!! code_block time="20.270 ms"
!!! code_block time="35.716 ms"

```postgresql
WITH embedded_query AS (
Expand Down