This repository provides an example of a modular, containerized system for performing summarization tasks on text, audio, and video content using LangChain. The application is designed to process documents through a microservice architecture managed with Podman Compose, leveraging GPU-accelerated language models and supporting caching and storage.
- Multi-format support: Processes text, audio, and video files with dynamic Loader selection based on file type.
- Streaming & Batching: Supports both streaming summaries and batch processing based on user preferences.
- GPU-accelerated services: Separates heavy GPU dependencies into their own containers for efficient resource utilization.
- Modular architecture: Enables easy swapping or scaling of services, such as replacing the LLM or transcription backends.
For more detailed information read this article.
- Podman: Install Podman and Podman Compose.
- NVIDIA Container Toolkit: Required for GPU-enabled containers. Refer to the NVIDIA documentation.
-
Clone the repository
git clone https://github.com/lfenzo/langchain-summarization.git cd langchain-summarization -
Start the containers with Podman Compose
podman compose up
Ensure that you system supports GPU passthough for the LLM and Transcription services.
-
Make sure that the LLM of your choice is pulled in the Ollama Server:
podman exec langchain-summarization_ollama-server_1 ollama pull <your_llm_here>
-
Test the summarization endpoint in the FastAPI service available at
http://0.0.0.0:8000:curl -X POST "http://0.0.0.0:8000/summarize" \ -F "file=@path/to/your/file.pdf;type=application/octet-stream" \ --no-buffer