🌐 AI搜索 & 代理 主页
Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
be03897
New site search
SilasMarvin Jan 10, 2024
9df3528
Working fast site search and vector search
SilasMarvin Jan 13, 2024
f9cb8a1
Cleaned tests and remote fallback working for search and vector_search
SilasMarvin Jan 17, 2024
b04ead6
Clean up vector search
SilasMarvin Jan 17, 2024
44ab0ed
Switched to a transactional version of upsert documents and syncing p…
SilasMarvin Jan 17, 2024
9aaa31b
Working conditional pipeline running on document upsert
SilasMarvin Jan 18, 2024
6979f69
Really good upsert documents
SilasMarvin Jan 18, 2024
c8e1af8
Cleaned up some tests
SilasMarvin Jan 18, 2024
9df12b5
Switching old pipeline to be a pass through for the new multi field p…
SilasMarvin Jan 19, 2024
f75a2ec
Finished pipeline as a pass through and more tests
SilasMarvin Jan 22, 2024
59f4419
Working site search with doc type filtering
SilasMarvin Jan 22, 2024
ec351ff
Working site search with doc type filtering
SilasMarvin Jan 23, 2024
027080f
collection query_builder now a wrapper around collection.vector_search
SilasMarvin Jan 23, 2024
44cc8a0
Verifying on Python and JavaScript
SilasMarvin Jan 24, 2024
6a9fd14
Working with JavaScript and Python
SilasMarvin Jan 25, 2024
099ea60
Cleaned up
SilasMarvin Jan 25, 2024
412fb57
Move MultiFieldPipeline to Pipeline and added batch uploads for docum…
SilasMarvin Jan 25, 2024
9781766
Added SingleFieldPipeline function shoutout to Lev
SilasMarvin Jan 25, 2024
b87a654
Working on fixing query
SilasMarvin Jan 27, 2024
17b81e7
Working recursive query
SilasMarvin Feb 5, 2024
7339cd5
Added smarter chunking and search results table
SilasMarvin Feb 5, 2024
84e621a
Updated deps, added debugger for queries
SilasMarvin Feb 9, 2024
d745fc6
Logging search results done
SilasMarvin Feb 9, 2024
2d75d98
Correct return type with search inserts
SilasMarvin Feb 9, 2024
bed7144
Updated tests to pass with new sqlx version
SilasMarvin Feb 9, 2024
0e06ce1
Added a way for users to provide search_events
SilasMarvin Feb 12, 2024
1677a51
Quick fix on remote embeddings search
SilasMarvin Feb 12, 2024
a5599e5
Quick fix and change the upsert query to be more efficient
SilasMarvin Feb 13, 2024
f47002e
Fix for JS after updating tokio
SilasMarvin Feb 13, 2024
f39b94c
Updated extractive_question_answering example for Python
SilasMarvin Feb 13, 2024
f2c5f61
Updated question_answering for Python
SilasMarvin Feb 13, 2024
6ec6df5
Updated question_answering_instructor for Python
SilasMarvin Feb 13, 2024
c9a24e6
Updated semantic_search for Python
SilasMarvin Feb 14, 2024
6c7f05a
Updated summarizing_question_answering for Python
SilasMarvin Feb 14, 2024
119807f
Updated table question answering for Python
SilasMarvin Feb 14, 2024
71d4915
Updated table question answering for Python
SilasMarvin Feb 14, 2024
6dfd0d7
Updated rag question answering for Python
SilasMarvin Feb 14, 2024
70f1ac0
Updated question_answering for JavaScript
SilasMarvin Feb 14, 2024
67fae04
Updated question_answering_instructor for JavaScript
SilasMarvin Feb 14, 2024
0dd0027
Updated question_answering_instructor for JavaScript
SilasMarvin Feb 14, 2024
7afea01
Updated extractive_question_answering example for JavaScript
SilasMarvin Feb 14, 2024
95188a4
Updated summarizing_question_answering for JavaScript
SilasMarvin Feb 14, 2024
8807489
Updated semantic_search for JavaScript
SilasMarvin Feb 14, 2024
c9e5d04
Updated versions and removed unused clone
SilasMarvin Feb 14, 2024
c71143f
Cleaned up search query
SilasMarvin Feb 14, 2024
f4d261e
Edit test
SilasMarvin Feb 14, 2024
3d1a6ce
Added the stress test
SilasMarvin Feb 14, 2024
692c252
Updated to use new sdk
SilasMarvin Feb 14, 2024
fc5658f
Updated test
SilasMarvin Feb 15, 2024
4c38aca
Removed document_id
SilasMarvin Feb 16, 2024
4167e32
Removed document_id and updated all searches to work without it
SilasMarvin Feb 16, 2024
0cadd8c
Fixed python test
SilasMarvin Feb 16, 2024
077ce1b
Updated stress test
SilasMarvin Feb 16, 2024
7f53b93
Updated to clean up pool access
SilasMarvin Feb 16, 2024
144da42
Added test for bad collection names
SilasMarvin Feb 16, 2024
039c9cc
Cleaned up tests
SilasMarvin Feb 16, 2024
bd983cf
Add migration error
SilasMarvin Feb 26, 2024
4fb0149
Updated text
SilasMarvin Feb 26, 2024
b4f1edd
Add dockerfile to build javascript
SilasMarvin Feb 26, 2024
c41597a
Working dockerfile for build
SilasMarvin Feb 26, 2024
3f53e9c
Test github docker build
SilasMarvin Feb 26, 2024
679b995
Iterating on gh action
SilasMarvin Feb 26, 2024
c614e4e
Iterating on gh action
SilasMarvin Feb 26, 2024
7169596
Iterating on gh action
SilasMarvin Feb 26, 2024
8de7727
Iterating on gh action
SilasMarvin Feb 26, 2024
25fe41c
Iterating on gh action
SilasMarvin Feb 26, 2024
271e1e4
Updated collection test
SilasMarvin Feb 26, 2024
9e4c2a1
Finished boosting and working with the new sdk
SilasMarvin Feb 27, 2024
c46957c
Made document search just use semantic search and boosted title
SilasMarvin Feb 27, 2024
0d963a8
Updated the chatbot to use the new chat history
SilasMarvin Feb 27, 2024
d9b241d
Small cleanups
SilasMarvin Feb 27, 2024
a34619b
Adjust boosting
SilasMarvin Feb 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Small cleanups
  • Loading branch information
SilasMarvin committed Feb 28, 2024
commit d9b241d6715126922506d9e0fafdb226294fc4db
6 changes: 2 additions & 4 deletions pgml-sdks/pgml/build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use std::io::Write;

const ADDITIONAL_DEFAULTS_FOR_PYTHON: &[u8] = br#"
def init_logger(level: Optional[str] = "", format: Optional[str] = "") -> None
def SingleFieldPipeline(name: str, model: Optional[Model] = None, splitter: Optional[Splitter] = None, parameters: Optional[Json] = Any) -> MultiFieldPipeline
def SingleFieldPipeline(name: str, model: Optional[Model] = None, splitter: Optional[Splitter] = None, parameters: Optional[Json] = Any) -> Pipeline
async def migrate() -> None

Json = Any
Expand All @@ -15,7 +15,7 @@ GeneralJsonAsyncIterator = Any

const ADDITIONAL_DEFAULTS_FOR_JAVASCRIPT: &[u8] = br#"
export function init_logger(level?: string, format?: string): void;
export function newSingleFieldPipeline(name: string, model?: Model, splitter?: Splitter, parameters?: Json): MultiFieldPipeline;
export function newSingleFieldPipeline(name: string, model?: Model, splitter?: Splitter, parameters?: Json): Pipeline;
export function migrate(): Promise<void>;

export type Json = any;
Expand All @@ -39,7 +39,6 @@ fn main() {
remove_file(&path).ok();
let mut file = OpenOptions::new()
.create(true)
.write(true)
.append(true)
.open(path)
.unwrap();
Expand All @@ -53,7 +52,6 @@ fn main() {
remove_file(&path).ok();
let mut file = OpenOptions::new()
.create(true)
.write(true)
.append(true)
.open(path)
.unwrap();
Expand Down
18 changes: 16 additions & 2 deletions pgml-sdks/pgml/javascript/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

188 changes: 6 additions & 182 deletions pgml-sdks/pgml/src/collection.rs
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,9 @@ pub struct Collection {
exists,
archive,
upsert_directory,
upsert_file
upsert_file,
generate_er_diagram,
get_pipeline_status
)]
impl Collection {
/// Creates a new [Collection]
Expand Down Expand Up @@ -259,25 +261,6 @@ impl Collection {
}

/// Adds a new [Pipeline] to the [Collection]
///
/// # Arguments
///
/// * `pipeline` - The [Pipeline] to add.
///
/// # Example
///
/// ```
/// use pgml::{Collection, Pipeline, Model, Splitter};
///
/// async fn example() -> anyhow::Result<()> {
/// let model = Model::new(None, None, None);
/// let splitter = Splitter::new(None, None);
/// let mut pipeline = Pipeline::new("my_pipeline", None, None, None);
/// let mut collection = Collection::new("my_collection", None);
/// collection.add_pipeline(&mut pipeline).await?;
/// Ok(())
/// }
/// ```
#[instrument(skip(self))]
pub async fn add_pipeline(&mut self, pipeline: &mut Pipeline) -> anyhow::Result<()> {
// The flow for this function:
Expand Down Expand Up @@ -322,23 +305,6 @@ impl Collection {
}

/// Removes a [Pipeline] from the [Collection]
///
/// # Arguments
///
/// * `pipeline` - The [Pipeline] to remove.
///
/// # Example
///
/// ```
/// use pgml::{Collection, Pipeline};
///
/// async fn example() -> anyhow::Result<()> {
/// let mut pipeline = Pipeline::new("my_pipeline", None, None, None);
/// let mut collection = Collection::new("my_collection", None);
/// collection.remove_pipeline(&mut pipeline).await?;
/// Ok(())
/// }
/// ```
#[instrument(skip(self))]
pub async fn remove_pipeline(&mut self, pipeline: &Pipeline) -> anyhow::Result<()> {
// The flow for this function:
Expand Down Expand Up @@ -368,29 +334,12 @@ impl Collection {
}

/// Enables a [Pipeline] on the [Collection]
///
/// # Arguments
///
/// * `pipeline` - The [Pipeline] to enable
///
/// # Example
///
/// ```
/// use pgml::{Collection, Pipeline};
///
/// async fn example() -> anyhow::Result<()> {
/// let pipeline = Pipeline::new("my_pipeline", None, None, None);
/// let collection = Collection::new("my_collection", None);
/// collection.enable_pipeline(&pipeline).await?;
/// Ok(())
/// }
/// ```
#[instrument(skip(self))]
pub async fn enable_pipeline(&mut self, pipeline: &mut Pipeline) -> anyhow::Result<()> {
// The flow for this function:
// 1. Set ACTIVE = TRUE for the pipeline in collection.pipelines
// 2. Resync the pipeline
// TOOD: Review this pattern
// TODO: Review this pattern
self.verify_in_database(false).await?;
let project_info = &self.database_data.as_ref().unwrap().project_info;
let pool = get_or_initialize_pool(&self.database_url).await?;
Expand All @@ -407,23 +356,6 @@ impl Collection {
}

/// Disables a [Pipeline] on the [Collection]
///
/// # Arguments
///
/// * `pipeline` - The [Pipeline] to disable
///
/// # Example
///
/// ```
/// use pgml::{Collection, Pipeline};
///
/// async fn example() -> anyhow::Result<()> {
/// let pipeline = Pipeline::new("my_pipeline", None, None, None);
/// let collection = Collection::new("my_collection", None);
/// collection.disable_pipeline(&pipeline).await?;
/// Ok(())
/// }
/// ```
#[instrument(skip(self))]
pub async fn disable_pipeline(&self, pipeline: &Pipeline) -> anyhow::Result<()> {
// The flow for this function:
Expand Down Expand Up @@ -459,27 +391,6 @@ impl Collection {
}

/// Upserts documents into the database
///
/// # Arguments
///
/// * `documents` - A vector of documents to upsert
/// * `strict` - Whether to throw an error if keys: `id` or `text` are missing from any documents
///
/// # Example
///
/// ```
/// use pgml::Collection;
///
/// async fn example() -> anyhow::Result<()> {
/// let mut collection = Collection::new("my_collection", None);
/// let documents = vec![
/// serde_json::json!({"id": 1, "text": "hello world"}).into(),
/// serde_json::json!({"id": 2, "text": "hello world"}).into(),
/// ];
/// collection.upsert_documents(documents, None).await?;
/// Ok(())
/// }
/// ```
#[instrument(skip(self, documents))]
pub async fn upsert_documents(
&mut self,
Expand Down Expand Up @@ -647,21 +558,6 @@ impl Collection {
}

/// Gets the documents on a [Collection]
///
/// # Arguments
///
/// * `args` - The filters and options to apply to the query
///
/// # Example
///
/// ```
/// use pgml::Collection;
///
/// async fn example() -> anyhow::Result<()> {
/// let mut collection = Collection::new("my_collection", None);
/// let documents = collection.get_documents(None).await?;
/// Ok(())
/// }
#[instrument(skip(self))]
pub async fn get_documents(&self, args: Option<Json>) -> anyhow::Result<Vec<Json>> {
let pool = get_or_initialize_pool(&self.database_url).await?;
Expand Down Expand Up @@ -721,25 +617,6 @@ impl Collection {
}

/// Deletes documents in a [Collection]
///
/// # Arguments
///
/// * `filter` - The filters to apply
///
/// # Example
///
/// ```
/// use pgml::Collection;
///
/// async fn example() -> anyhow::Result<()> {
/// let mut collection = Collection::new("my_collection", None);
/// let documents = collection.delete_documents(serde_json::json!({
/// "id": {
/// "eq": 1
/// }
/// }).into()).await?;
/// Ok(())
/// }
#[instrument(skip(self))]
pub async fn delete_documents(&self, filter: Json) -> anyhow::Result<()> {
let pool = get_or_initialize_pool(&self.database_url).await?;
Expand Down Expand Up @@ -832,25 +709,6 @@ impl Collection {
}

/// Performs vector search on the [Collection]
///
/// # Arguments
///
/// * `query` - The query to search for
/// * `pipeline` - The [Pipeline] used for the search
/// * `query_paramaters` - The query parameters passed to the model for search
///
/// # Example
///
/// ```
/// use pgml::{Collection, Pipeline};
///
/// async fn example() -> anyhow::Result<()> {
/// let mut collection = Collection::new("my_collection", None);
/// let mut pipeline = Pipeline::new("my_pipeline", None, None, None);
/// let results = collection.vector_search("Query", &mut pipeline, None, None).await?;
/// Ok(())
/// }
/// ```
#[instrument(skip(self))]
#[allow(clippy::type_complexity)]
pub async fn vector_search(
Expand Down Expand Up @@ -956,18 +814,6 @@ impl Collection {
}

/// Gets all pipelines for the [Collection]
///
/// # Example
///
/// ```
/// use pgml::Collection;
///
/// async fn example() -> anyhow::Result<()> {
/// let mut collection = Collection::new("my_collection", None);
/// let pipelines = collection.get_pipelines().await?;
/// Ok(())
/// }
/// ```
#[instrument(skip(self))]
pub async fn get_pipelines(&mut self) -> anyhow::Result<Vec<Pipeline>> {
self.verify_in_database(false).await?;
Expand All @@ -982,18 +828,6 @@ impl Collection {
}

/// Gets a [Pipeline] by name
///
/// # Example
///
/// ```
/// use pgml::Collection;
///
/// async fn example() -> anyhow::Result<()> {
/// let mut collection = Collection::new("my_collection", None);
/// let pipeline = collection.get_pipeline("my_pipeline").await?;
/// Ok(())
/// }
/// ```
#[instrument(skip(self))]
pub async fn get_pipeline(&mut self, name: &str) -> anyhow::Result<Pipeline> {
self.verify_in_database(false).await?;
Expand All @@ -1009,18 +843,6 @@ impl Collection {
}

/// Check if the [Collection] exists in the database
///
/// # Example
///
/// ```
/// use pgml::Collection;
///
/// async fn example() -> anyhow::Result<()> {
/// let collection = Collection::new("my_collection", None);
/// let exists = collection.exists().await?;
/// Ok(())
/// }
/// ```
#[instrument(skip(self))]
pub async fn exists(&self) -> anyhow::Result<bool> {
let pool = get_or_initialize_pool(&self.database_url).await?;
Expand Down Expand Up @@ -1108,13 +930,15 @@ impl Collection {
Ok(())
}

#[instrument(skip(self))]
pub async fn get_pipeline_status(&mut self, pipeline: &mut Pipeline) -> anyhow::Result<Json> {
self.verify_in_database(false).await?;
let project_info = &self.database_data.as_ref().unwrap().project_info;
let pool = get_or_initialize_pool(&self.database_url).await?;
pipeline.get_status(project_info, &pool).await
}

#[instrument(skip(self))]
pub async fn generate_er_diagram(&mut self, pipeline: &mut Pipeline) -> anyhow::Result<String> {
self.verify_in_database(false).await?;
let project_info = &self.database_data.as_ref().unwrap().project_info;
Expand Down
15 changes: 0 additions & 15 deletions pgml-sdks/pgml/src/pipeline.rs
Original file line number Diff line number Diff line change
Expand Up @@ -214,20 +214,6 @@ impl Pipeline {
}

/// Gets the status of the [Pipeline]
/// This includes the status of the chunks, embeddings, and tsvectors
///
/// # Example
///
/// ```
/// use pgml::Collection;
///
/// async fn example() -> anyhow::Result<()> {
/// let mut collection = Collection::new("my_collection", None);
/// let mut pipeline = collection.get_pipeline("my_pipeline").await?;
/// let status = pipeline.get_status().await?;
/// Ok(())
/// }
/// ```
#[instrument(skip(self))]
pub async fn get_status(
&mut self,
Expand Down Expand Up @@ -778,7 +764,6 @@ impl Pipeline {
pub(crate) async fn resync(
&mut self,
project_info: &ProjectInfo,
// pool: &Pool<Postgres>,
connection: &mut PgConnection,
) -> anyhow::Result<()> {
// We are assuming we have manually verified the pipeline before doing this
Expand Down
Loading