A modular FastAPI backend that extracts text, tables, and images from uploaded PDF files.
The extracted content is returned as a ZIP file containing:
- โ
text.txtโ extracted text - โ
table_X.csvโ tables saved as CSV (without using pandas, lightweight CSV writer) - โ
extracted image files (
image_1.png,image_2.png, โฆ)
This project is designed with clean modular structure, Dockerized deployment, and can be consumed easily by any frontend (e.g., Streamlit).
- Upload a PDF via REST API
- Extract:
- Text (saved in
.txt) - Tables (saved in
.csvwithout pandas, using Pythonโs built-incsv) - Images (saved as
.png)
- Text (saved in
- Get everything in a single downloadable ZIP file
- Modular project structure (services, utils, routes)
- Dockerized for easy deployment
pdf-extractor-backend/
โโโ app/
โ โโโ main.py # FastAPI entrypoint
โ โโโ routes/
โ โ โโโ extract.py # API endpoint
โ โโโ services/
โ โ โโโ extractor.py # PDF extraction logic
โ โโโ utils/
โ โ โโโ file_ops.py # File saving helpers
โโโ requirements.txt # Python dependencies
โโโ Dockerfile # Container build file
โโโ README.md # Documentation
git clone https://github.com/Dipesh-Ydv/pdf-extractor-backend-api.git
cd pdf-extractor-backendpip install -r requirements.txtuvicorn app.main:app --reloadGo to: http://127.0.0.1:8000/docs
POST /extract/pdf
Upload a PDF file with the key file.
Example using curl:
curl -X POST "http://127.0.0.1:8000/extract/pdf" \
-F "file=@sample.pdf" \
-o output.zip- Returns a ZIP file containing:
text.txttable_1.csv,table_2.csv, โฆimage_1.png,image_2.png, โฆ
docker build -t pdf-extractor-backend .docker run -d -p 8000:8000 pdf-extractor-backendNow API is available at: ๐ http://localhost:8000/docs
docker tag pdf-extractor-backend:latest dipeshydv/pdf-extractor-backend:latestdocker push dipeshydv/pdf-extractor-backend:latestdocker pull dipeshydv/pdf-extractor-backend:latest
docker run -d -p 8000:8000 dipeshydv/pdf-extractor-backend:latestSee requirements.txt:
fastapi
uvicorn[standard]
python-multipart
pdfplumber
pillow
pandas
zipfile36
pyMuPdf
- Fork the project
- Create a feature branch (
git checkout -b feature/xyz) - Commit changes (
git commit -m 'Add xyz') - Push to branch (
git push origin feature/xyz) - Create a Pull Request
MIT License โ free to use & modify.