🌐 AI搜索 & 代理 主页
Skip to content

Commit fab7311

Browse files
authored
Merge pull request #2 from postgresml/levkk-mvp
some notes
2 parents 016975f + cf4fc96 commit fab7311

File tree

1 file changed

+78
-0
lines changed

1 file changed

+78
-0
lines changed

PRODUCT.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Product description
2+
3+
This document describes the value proposition of this product.
4+
5+
## The problem
6+
7+
Machine learning is a hard to take advantage of for most startups. They either don't have the time or the knowhow
8+
to deploy ML models into production. This problem exists for multi-billion dollar enteprises, it's 10x true
9+
for small startups.
10+
11+
Python ecosystem is also hard to manage. Common problems are dependency hell and Python version conflicts.
12+
Most of the time, engineers just want to train and deploy an algorithm; everything else is distraction.
13+
14+
Data is kept in databases that are hard for ML algorithms to access: MySQL, Postgres, Dynamo, etc.
15+
The typical ML workflow is:
16+
17+
1. export data to a warehouse (e.g. Snowflake) or S3 (CSVs),
18+
2. run a Python script that will train the model (while fighting through dependency hell),
19+
3. pickle the model and upload it to object storage,
20+
4. download and unpickle the model in production, behind an HTTP API,
21+
5. serve predictions in a microservice.
22+
23+
By the time this workflow completes, the data is obsolete, the algorithm is wrong and the ML engineer
24+
is polishing their CV or considering farming as an alternative career path.
25+
26+
## The solution
27+
28+
Colocate data and machine learning together in one system, train the models online, and run predictions
29+
from the same system with a simple command. That system in our case is Postgres, because that's where most
30+
startups keep their data. Postgres happens to be highly extendable as well, which makes our job easier.
31+
32+
The new workflow is now:
33+
34+
1. define the data with a SQL query (i.e. a view),
35+
2. train an algorithm with a single command,
36+
3. serve predictions with a SQL query.
37+
38+
No Python, no code of any kind really, no dependencies, no exports, imports, transforms,
39+
S3 permission issues, deploys or JSON/GraphQL; from prototype to production in about 5 minutes.
40+
41+
Here is an example:
42+
43+
#### Define the data with a SQL query
44+
45+
```sql
46+
CREATE VIEW my_data AS
47+
SELECT NOW() - created_at AS user_tenure,
48+
age,
49+
location,
50+
total_purchases,
51+
FROM users
52+
CROSS JOIN LATERAL (
53+
SELECT SUM(purchase_price) AS total_purchases FROM orders
54+
WHERE user_id = users.id
55+
);
56+
```
57+
58+
#### Train the model
59+
60+
The function `pgml.train` accepts three arguments:
61+
62+
- the model name
63+
- the `y` column for the algorithm,
64+
- the algorithm to use, defaults to Linear Regression.
65+
66+
```sql
67+
SELECT pgml.train('my_data', 'total_purchases');
68+
```
69+
70+
#### Serve the model
71+
72+
The model is ready for serving! Let's serve this via SQL again:
73+
74+
```sql
75+
SELECT pgml.score('my_model_1', '2 years'::interval) AS likely_purchase_amount_based_on_tenure;
76+
```
77+
78+
You can call this directly from your app, no special infrastructure required.

0 commit comments

Comments
 (0)