🌐 AI搜索 & 代理 主页
Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
save
  • Loading branch information
levkk committed Apr 28, 2024
commit 75792235e6d216e97514d6eeedda547c048b4a69
Binary file added pgml-cms/docs/.gitbook/assets/architecture_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pgml-cms/docs/.gitbook/assets/architecture_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pgml-cms/docs/.gitbook/assets/performance_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions pgml-cms/docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
* [Foreign Data Wrappers](introduction/getting-started/import-your-data/foreign-data-wrappers.md)
* [Move data with COPY](introduction/getting-started/import-your-data/copy.md)
* [Migrate with pg_dump](introduction/getting-started/import-your-data/pg-dump.md)
* [Architecture](introduction/architecture/README.md)
* [Why PostgresML?](introduction/architecture/why-postgresml.md)

## API

Expand Down
31 changes: 31 additions & 0 deletions pgml-cms/docs/introduction/architecture/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# PostgresML architecture

PostgresML is an extension for the PostgreSQL database server. It operates inside the database, using the same hardware to perform machine learning tasks.

## PostgreSQL

PostgreSQL is a process-based database server. It handles multiple connections by forking the main process, which creates OS-level isolation between clients.

<figure>
<img src="/docs/.gitbook/assets/architecture_1.png" alt="Architecture" width="100%">
<figcaption class="mt-4"><i>PostgreSQL architecture</i></figcaption>
</figure>

The main process allocates a block of shared memory, and grants all client processes direct access to it. The shared memory is used to store data retrieved from disk, so clients can re-use the same data for different queries.

This architecture is perfect for machine learning.

## PostgresML extension

A process-based architecture is perfect for multi-tenant machine learning applications. Each client connection loads its own libraries and models, serves them to the client, and removes all traces of them when the connection is closed.

<figure>
<img src="/docs/.gitbook/assets/architecture_2.png" alt="Architecture" width="60%">
<figcaption class="mt-4"><i>Per-connection models</i></figcaption>
</figure>

Since PostgreSQL shares data between clients, the expensive part of retrieving data is optimized, while the relatively inexpensive part of loading models into memory is automated and isolated.

## Conclusion

By running on the same machine and relying on PostgreSQL for scaling, stability, and performance, PostgresML eliminates network latency and brittleness of a service-based architecture.
32 changes: 32 additions & 0 deletions pgml-cms/docs/introduction/architecture/why-postgresml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Why PostgresML?

PostgresML offers a unique and modern architecture which replaces service-based machine learning applications with a single database. The benefits of this approach are measurable in performance, ease of use, and data integrity.

## Service-based architecture

Most applications today are built using services. In the extreme case, microservices with singular purpose are employed to achieve additional separation of concerns.

For an application to use a machine learning model, it is typical to build and maintain separate services and data synchronization pipelines, to allow machine learning engineers that work in Python to build and deploy their models separately and independently from application engineering.

<figure>
<img src="/docs/.gitbook/assets/performance_1.png" alt="Before PostgresML" width="80%">
<figcaption class="mt-4"><i>Service-based machine learning architecture</i></figcaption>
</figure>

### Impact

Building on top of service-based architecture has major performance disadvantages. Any task that may fall outside the domain of the engineering team that built the service, like machine learning, will require additional communication between teams, and additional services to be built and maintained.

Communication between services is done with protocols like gRPC or HTTP, which being stateless, require additional context fetched from a database or a cache. Since communication happens over the network, serialization and deserialization of data is required, costing additional time and resources.

The diagram above illustrates the work required to service a single user request. With below-linear scaling characteristics and increasing brittleness, this architecture eventually breaks down and costs teams time, and the organization resources.


## PostgresML architecture

PostgresML simplifies things. By moving machine learning models to the database, we eliminate the need for an additional feature store, data synchronization and inference services, and the need for RPC calls requiring (de)serialization and network latency & reliability costs.

<figure>
<img src="/docs/.gitbook/assets/performance_2.png" alt="After PostgresML" width="80%">
<figcaption class="mt-4"><i>PostgresML architecture</i></figcaption>
</figure>