🌐 AI搜索 & 代理 主页
Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
0a33546
Readme update in progress
santiatpml Apr 5, 2023
c006523
Updated hugs emoji
santiatpml Apr 5, 2023
ba8d050
Readme added dashboard image
santiatpml Apr 5, 2023
ae520e6
Getting started in progress
santiatpml Apr 5, 2023
d822e43
Getting started in progress
santiatpml Apr 5, 2023
a7e9ce4
Added notebooks image
santiatpml Apr 5, 2023
7c8b982
Updated dashboard image and some edits
santiatpml Apr 5, 2023
3ae4024
Added protobuf for finbert support and text-classification readme in …
santiatpml Apr 5, 2023
0daba37
Using sql instead of json for highlighting
santiatpml Apr 5, 2023
0e51c29
update dependencies (#588)
montanalow Apr 5, 2023
3e06339
Updates to text-classification
santiatpml Apr 6, 2023
755580a
First version of text classification
santiatpml Apr 6, 2023
345eb79
Added grammatical correctness
santiatpml Apr 6, 2023
b6cfcdd
Added zero-shot classification
santiatpml Apr 6, 2023
d025f12
readme for token classification
santiatpml Apr 7, 2023
91557e3
Moved results from sql to json
santiatpml Apr 7, 2023
4ffae4e
Images for different tasks
santiatpml Apr 7, 2023
4f21192
Updated table of contents
santiatpml Apr 7, 2023
db9523c
Update to 0.7.4 (#591)
Apr 7, 2023
e02eaff
fix for np.float32 serialization (#589)
santiatpml Apr 7, 2023
8c3ee5e
Readme update in progress
santiatpml Apr 5, 2023
b6476eb
Updated hugs emoji
santiatpml Apr 5, 2023
5a03402
Readme added dashboard image
santiatpml Apr 5, 2023
970b7be
Getting started in progress
santiatpml Apr 5, 2023
3938ba5
Getting started in progress
santiatpml Apr 5, 2023
7edfbf4
Added notebooks image
santiatpml Apr 5, 2023
cb9b2d4
Updated dashboard image and some edits
santiatpml Apr 5, 2023
2f33c43
Added protobuf for finbert support and text-classification readme in …
santiatpml Apr 5, 2023
47e0cea
Using sql instead of json for highlighting
santiatpml Apr 5, 2023
ad16887
Updates to text-classification
santiatpml Apr 6, 2023
8721ce8
First version of text classification
santiatpml Apr 6, 2023
daf045c
Added grammatical correctness
santiatpml Apr 6, 2023
5749330
Added zero-shot classification
santiatpml Apr 6, 2023
a2bcd1d
readme for token classification
santiatpml Apr 7, 2023
6c3a98c
Moved results from sql to json
santiatpml Apr 7, 2023
760b520
Images for different tasks
santiatpml Apr 7, 2023
fca5ef2
Updated table of contents
santiatpml Apr 7, 2023
c347f9b
Documentation for more tasks
santiatpml Apr 7, 2023
a1ef779
Updated with more tasks
santiatpml Apr 7, 2023
f94cc3c
Expanded text generation section
santiatpml Apr 7, 2023
f8891c2
Removed Table QA from toc
santiatpml Apr 7, 2023
8381fe8
Text2text generation
santiatpml Apr 10, 2023
592fc59
Added fill mask section
santiatpml Apr 10, 2023
42a6541
Started Vector DB section
santiatpml Apr 11, 2023
c728d7e
First version of vector databases
santiatpml Apr 11, 2023
3ee5b8c
Reset docker compose and docker local to original
santiatpml Apr 11, 2023
c9596a7
Update README.md
santiatpml Apr 12, 2023
bd197a6
Update README.md
santiatpml Apr 12, 2023
629ffe0
Update README.md
santiatpml Apr 12, 2023
a3f45c9
Update README.md
santiatpml Apr 12, 2023
d2bd901
Update README.md
santiatpml Apr 12, 2023
0016d07
Update README.md
santiatpml Apr 12, 2023
27e1029
Updated tagline
santiatpml Apr 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Added protobuf for finbert support and text-classification readme in …
…progress
  • Loading branch information
santiatpml committed Apr 7, 2023
commit 2f33c4394a1d16828a4646ff2d5dfb5715e3fd07
129 changes: 107 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ PostgresML is a PostgreSQL extension that enables you to perform ML training and

**Translation**

*SQL Query*
*SQL query*

```sql
SELECT pgml.transform(
Expand All @@ -62,7 +62,7 @@ SELECT pgml.transform(
```
*Result*

```bash
```json
french
------------------------------------------------------------

Expand All @@ -75,27 +75,24 @@ SELECT pgml.transform(


**Sentiment Analysis**
*SQL Query*
*SQL query*

```sql
SELECT pgml.transform(

'{"model": "roberta-large-mnli"}'::JSONB,
inputs => ARRAY
[
task => 'text-classification',
inputs => ARRAY[
'I love how amazingly simple ML has become!',
'I hate doing mundane and thankless tasks. ☹️'
]

) AS positivity;
```
*Result*
```bash
```json
positivity
------------------------------------------------------
[
{"label": "NEUTRAL", "score": 0.8143417835235596},
{"label": "NEUTRAL", "score": 0.7637073993682861}
{"label": "POSITIVE", "score": 0.9995759129524232},
{"label": "NEGATIVE", "score": 0.9903519749641418}
]
```

Expand Down Expand Up @@ -144,7 +141,7 @@ cd postgresml
docker-compose up
```

Step 3: Connect to PostgresDB with PostgresML enabled using a SQL IDE or [`psql`](https://www.postgresql.org/docs/current/app-psql.html)
Step 3: Connect to PostgresDB with PostgresML enabled using a SQL IDE or <a href="https://www.postgresql.org/docs/current/app-psql.html" target="_blank">psql</a>
```bash
postgres://postgres@localhost:5433/pgml_development
```
Expand All @@ -165,18 +162,106 @@ If you want to check out the functionality without the hassle of Docker please g

### Option 2
- Use any of these popular tools to connect to PostgresML and write SQL queries
- [Apache Superset](https://superset.apache.org/)
- [DBeaver](https://dbeaver.io/)
- [Data Grip](https://www.jetbrains.com/datagrip/)
- [Postico 2](https://eggerapps.at/postico2/)
- [Popsql](https://popsql.com/)
- [Tableau](https://www.tableau.com/)
- [Power BI](https://powerbi.microsoft.com/en-us/)
- [Jupyter](https://jupyter.org/)
- [VSCode](https://code.visualstudio.com/)
- <a href="https://superset.apache.org/" target="_blank">Apache Superset</a>
- <a href="https://dbeaver.io/" target="_blank">DBeaver</a>
- <a href="https://www.jetbrains.com/datagrip/" target="_blank">Data Grip</a>
- <a href="https://eggerapps.at/postico2/" target="_blank">Postico 2</a>
- <a href="https://popsql.com/" target="_blank">Popsql</a>
- <a href="https://www.tableau.com/" target="_blank">Tableau</a>
- <a href="https://powerbi.microsoft.com/en-us/" target="_blank">PowerBI</a>
- <a href="https://jupyter.org/" target="_blank">Jupyter</a>
- <a href="https://code.visualstudio.com/" target="_blank">VSCode</a>

## NLP Tasks
- Text Classification
PostgresML integrates 🤗 Hugging Face Transformers to bring state-of-the-art NLP models into the data layer. There are tens of thousands of pre-trained models with pipelines to turn raw text in your database into useful results. Many state of the art deep learning architectures have been published and made available from Hugging Face <a href= "https://huggingface.co/models" target="_blank">model hub</a>.

You can call different NLP tasks and customize using them using the following SQL query.

```sql
SELECT pgml.transform(
task => TEXT OR JSONB, -- Pipeline initializer arguments
inputs => TEXT[] OR BYTEA[], -- inputs for inference
args => JSONB -- (optional) arguments to the pipeline.
)
```
### Text Classification

Text classification involves assigning a label or category to a given text. Common use cases include sentiment analysis, natural language inference, and the assessment of grammatical correctness.
![text classification](pgml-docs/docs/images/text-classification.png)

*Basic SQL query*
```sql
SELECT pgml.transform(
task => 'text-classification',
inputs => ARRAY[
'I love how amazingly simple ML has become!',
'I hate doing mundane and thankless tasks. ☹️'
]
) AS positivity;
```
*Result*
```json
positivity
------------------------------------------------------
[
{"label": "POSITIVE", "score": 0.9995759129524232},
{"label": "NEGATIVE", "score": 0.9903519749641418}
]
```

A fine-tune checkpoint of DistilBERT-base-uncased that is tuned on Stanford Sentiment Treebank(sst2) is used as a default <a href="https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english" target="_blank">model</a> for text classification.

*SQL query using specific model*

To use one of the over 19,000 models available on Hugging Face, include the name of the desired model and its associated task as a JSONB object in the SQL query. For example, if you want to use a RoBERTa <a href="https://huggingface.co/models?pipeline_tag=text-classification" target="_blank">model</a> trained on around 40,000 English tweets and that has POS (positive), NEG (negative), and NEU (neutral) labels for its classes, include this information in the JSONB object when making your query.

```sql
SELECT pgml.transform(
inputs => ARRAY[
'I love how amazingly simple ML has become!',
'I hate doing mundane and thankless tasks. ☹️'
],
task => '{"task": "text-classification",
"model": "finiteautomata/bertweet-base-sentiment-analysis"
}'::JSONB
) AS positivity;
```
*Result*
```json
positivity
-----------------------------------------------
[
{"label": "POS", "score": 0.992932200431826},
{"label": "NEG", "score": 0.975599765777588}
]
```

*SQL query using models from specific industry*

By selecting a model that has been specifically designed for a particular industry, you can achieve more accurate and relevant text classification. An example of such a model is <a href="https://huggingface.co/ProsusAI/finbert" target="_blank">FinBERT</a>, a pre-trained NLP model that has been optimized for analyzing sentiment in financial text. FinBERT was created by training the BERT language model on a large financial corpus, and fine-tuning it to specifically classify financial sentiment. When using FinBERT, the model will provide softmax outputs for three different labels: positive, negative, or neutral.

```sql
SELECT pgml.transform(
inputs => ARRAY[
'Stocks rallied and the British pound gained.',
'Stocks making the biggest moves midday: Nvidia, Palantir and more'
],
task => '{"task": "text-classification",
"model": "ProsusAI/finbert"
}'::JSONB
) AS market_sentiment;
```

*Result*
```json

market_sentiment
------------------------------------------------------
[
{"label": "positive", "score": 0.8983612656593323},
{"label": "neutral", "score": 0.8062630891799927}
]
```
- Token Classification
- Table Question Answering
- Question Answering
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ services:
context: ./pgml-extension/
dockerfile: Dockerfile.local
ports:
- "5433:5432"
- "6453:5432"
command:
- sleep
- infinity
Expand Down
Binary file added pgml-docs/docs/images/text-classification.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion pgml-extension/Dockerfile.local
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ RUN cat /etc/apt/sources.list
RUN apt-get update && apt-get install -y postgresql-pgml-14

# Cache this, quicker
RUN pip3 install xgboost scikit-learn diptest torch lightgbm transformers datasets sentencepiece sentence_transformers sacremoses sacrebleu rouge
RUN pip3 install xgboost scikit-learn diptest torch lightgbm transformers datasets sentencepiece sentence_transformers sacremoses sacrebleu rouge protobuf

COPY --chown=postgres:postgres . /app
WORKDIR /app
Expand Down