๐ŸŒ AIๆœ็ดข & ไปฃ็† ไธป้กต
Skip to content

Tencent/WeKnora

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

WeKnora Logo

ๅฎ˜ๆ–น็ฝ‘็ซ™ ๅพฎไฟกๅฏน่ฏๅผ€ๆ”พๅนณๅฐ License Version

| English | ็ฎ€ไฝ“ไธญๆ–‡ | ๆ—ฅๆœฌ่ชž |

Overview โ€ข Architecture โ€ข Key Features โ€ข Getting Started โ€ข API Reference โ€ข Developer Guide

๐Ÿ’ก WeKnora - LLM-Powered Document Understanding & Retrieval Framework

๐Ÿ“Œ Overview

WeKnora is an LLM-powered framework designed for deep document understanding and semantic retrieval, especially for handling complex, heterogeneous documents.

It adopts a modular architecture that combines multimodal preprocessing, semantic vector indexing, intelligent retrieval, and large language model inference. At its core, WeKnora follows the RAG (Retrieval-Augmented Generation) paradigm, enabling high-quality, context-aware answers by combining relevant document chunks with model reasoning.

Website: https://weknora.weixin.qq.com

โœจ Latest Updates

v0.2.0 Highlights:

  • ๐Ÿค– Agent Mode: New ReACT Agent mode that can call built-in tools, MCP tools, and web search, providing comprehensive summary reports through multiple iterations and reflection
  • ๐Ÿ“š Multi-Type Knowledge Bases: Support for FAQ and document knowledge base types, with new features including folder import, URL import, tag management, and online entry
  • โš™๏ธ Conversation Strategy: Support for configuring Agent models, normal mode models, retrieval thresholds, and Prompts, with precise control over multi-turn conversation behavior
  • ๐ŸŒ Web Search: Support for extensible web search engines with built-in DuckDuckGo search engine
  • ๐Ÿ”Œ MCP Tool Integration: Support for extending Agent capabilities through MCP, with built-in uvx and npx launchers, supporting multiple transport methods
  • ๐ŸŽจ New UI: Optimized conversation interface with Agent mode/normal mode switching, tool call process display, and comprehensive knowledge base management interface upgrade
  • โšก Infrastructure Upgrade: Introduced MQ async task management, support for automatic database migration, and fast development mode

๐Ÿ”’ Security Notice

Important: Starting from v0.1.3, WeKnora includes login authentication functionality to enhance system security. For production deployments, we strongly recommend:

  • Deploy WeKnora services in internal/private network environments rather than public internet
  • Avoid exposing the service directly to public networks to prevent potential information leakage
  • Configure proper firewall rules and access controls for your deployment environment
  • Regularly update to the latest version for security patches and improvements

๐Ÿ—๏ธ Architecture

weknora-architecture.png

WeKnora employs a modern modular design to build a complete document understanding and retrieval pipeline. The system primarily includes document parsing, vector processing, retrieval engine, and large model inference as core modules, with each component being flexibly configurable and extendable.

๐ŸŽฏ Key Features

  • ๐Ÿค– Agent Mode: Support for ReACT Agent mode that can use built-in tools to retrieve knowledge bases, MCP tools, and web search tools to access external services, providing comprehensive summary reports through multiple iterations and reflection
  • ๐Ÿ” Precise Understanding: Structured content extraction from PDFs, Word documents, images and more into unified semantic views
  • ๐Ÿง  Intelligent Reasoning: Leverages LLMs to understand document context and user intent for accurate Q&A and multi-turn conversations
  • ๐Ÿ“š Multi-Type Knowledge Bases: Support for FAQ and document knowledge base types, with folder import, URL import, tag management, and online entry capabilities
  • ๐Ÿ”ง Flexible Extension: All components from parsing and embedding to retrieval and generation are decoupled for easy customization
  • โšก Efficient Retrieval: Hybrid retrieval strategies combining keywords, vectors, and knowledge graphs, with cross-knowledge base retrieval support
  • ๐ŸŒ Web Search: Support for extensible web search engines with built-in DuckDuckGo search engine
  • ๐Ÿ”Œ MCP Tool Integration: Support for extending Agent capabilities through MCP, with built-in uvx and npx launchers, supporting multiple transport methods
  • โš™๏ธ Conversation Strategy: Support for configuring Agent models, normal mode models, retrieval thresholds, and Prompts, with precise control over multi-turn conversation behavior
  • ๐ŸŽฏ User-Friendly: Intuitive web interface and standardized APIs for zero technical barriers
  • ๐Ÿ”’ Secure & Controlled: Support for local deployment and private cloud, ensuring complete data sovereignty

๐Ÿ“Š Application Scenarios

Scenario Applications Core Value
Enterprise Knowledge Management Internal document retrieval, policy Q&A, operation manual search Improve knowledge discovery efficiency, reduce training costs
Academic Research Analysis Paper retrieval, research report analysis, scholarly material organization Accelerate literature review, assist research decisions
Product Technical Support Product manual Q&A, technical documentation search, troubleshooting Enhance customer service quality, reduce support burden
Legal & Compliance Review Contract clause retrieval, regulatory policy search, case analysis Improve compliance efficiency, reduce legal risks
Medical Knowledge Assistance Medical literature retrieval, treatment guideline search, case analysis Support clinical decisions, improve diagnosis quality

๐Ÿงฉ Feature Matrix

Module Support Description
Agent Mode โœ… ReACT Agent Mode Support for using built-in tools to retrieve knowledge bases, MCP tools, and web search, with cross-knowledge base retrieval and multiple iterations
Knowledge Base Types โœ… FAQ / Document Support for creating FAQ and document knowledge base types, with folder import, URL import, tag management, and online entry
Document Formats โœ… PDF / Word / Txt / Markdown / Images (with OCR / Caption) Support for structured and unstructured documents with text extraction from images
Model Management โœ… Centralized configuration, built-in model sharing Centralized model configuration with model selection in knowledge base settings, support for multi-tenant shared built-in models
Embedding Models โœ… Local models, BGE / GTE APIs, etc. Customizable embedding models, compatible with local deployment and cloud vector generation APIs
Vector DB Integration โœ… PostgreSQL (pgvector), Elasticsearch Support for mainstream vector index backends, flexible switching for different retrieval scenarios
Retrieval Strategies โœ… BM25 / Dense Retrieval / GraphRAG Support for sparse/dense recall and knowledge graph-enhanced retrieval with customizable retrieve-rerank-generate pipelines
LLM Integration โœ… Support for Qwen, DeepSeek, etc., with thinking/non-thinking mode switching Compatible with local models (e.g., via Ollama) or external API services with flexible inference configuration
Conversation Strategy โœ… Agent models, normal mode models, retrieval thresholds, Prompt configuration Support for configuring Agent models, normal mode models, retrieval thresholds, online Prompt configuration, precise control over multi-turn conversation behavior
Web Search โœ… Extensible search engines, DuckDuckGo Support for extensible web search engines with built-in DuckDuckGo search engine
MCP Tools โœ… uvx, npx launchers, Stdio/HTTP Streamable/SSE Support for extending Agent capabilities through MCP, with built-in uvx and npx launchers, supporting three transport methods
QA Capabilities โœ… Context-aware, multi-turn dialogue, prompt templates Support for complex semantic modeling, instruction control and chain-of-thought Q&A with configurable prompts and context windows
E2E Testing โœ… Retrieval+generation process visualization and metric evaluation End-to-end testing tools for evaluating recall hit rates, answer coverage, BLEU/ROUGE and other metrics
Deployment Modes โœ… Support for local deployment / Docker images Meets private, offline deployment and flexible operation requirements, with fast development mode support
User Interfaces โœ… Web UI + RESTful API Interactive interface and standard API endpoints, with Agent mode/normal mode switching and tool call process display
Task Management โœ… MQ async tasks, automatic database migration MQ-based async task state maintenance, support for automatic database schema and data migration during version upgrades

๐Ÿš€ Getting Started

๐Ÿ›  Prerequisites

Make sure the following tools are installed on your system:

๐Ÿ“ฆ Installation

โ‘  Clone the repository

# Clone the main repository
git clone https://github.com/Tencent/WeKnora.git
cd WeKnora

โ‘ก Configure environment variables

# Copy example env file
cp .env.example .env

# Edit .env and set required values
# All variables are documented in the .env.example comments

โ‘ข Start the services (include Ollama)

Check the images that need to be started in the .env file.

./scripts/start_all.sh

or

make start-all

โ‘ข.0 Start ollama services (Optional)

ollama serve > /dev/null 2>&1 &

โ‘ข.1 Activate different combinations of features

  • Minimum core services
docker compose up -d
  • All features enabled
docker-compose --profile full up -d
  • Tracing logs required
docker-compose --profile jaeger up -d
  • Neo4j knowledge graph required
docker-compose --profile neo4j up -d
  • Minio file storage service required
docker-compose --profile minio up -d
  • Multiple options combination
docker-compose --profile neo4j --profile minio up -d

โ‘ฃ Stop the services

./scripts/start_all.sh --stop
# Or
make stop-all

๐ŸŒ Access Services

Once started, services will be available at:

  • Web UI: http://localhost
  • Backend API: http://localhost:8080
  • Jaeger Tracing: http://localhost:16686

๐Ÿ”Œ Using WeChat Dialog Open Platform

WeKnora serves as the core technology framework for the WeChat Dialog Open Platform, providing a more convenient usage approach:

  • Zero-code Deployment: Simply upload knowledge to quickly deploy intelligent Q&A services within the WeChat ecosystem, achieving an "ask and answer" experience
  • Efficient Question Management: Support for categorized management of high-frequency questions, with rich data tools to ensure accurate, reliable, and easily maintainable answers
  • WeChat Ecosystem Integration: Through the WeChat Dialog Open Platform, WeKnora's intelligent Q&A capabilities can be seamlessly integrated into WeChat Official Accounts, Mini Programs, and other WeChat scenarios, enhancing user interaction experiences

๐Ÿ”— Access WeKnora via MCP Server

1๏ธโƒฃ Clone the repository

git clone https://github.com/Tencent/WeKnora

2๏ธโƒฃ Configure MCP Server

It is recommended to directly refer to the MCP Configuration Guide for configuration.

Configure the MCP client to connect to the server:

{
  "mcpServers": {
    "weknora": {
      "args": [
        "path/to/WeKnora/mcp-server/run_server.py"
      ],
      "command": "python",
      "env":{
        "WEKNORA_API_KEY":"Enter your WeKnora instance, open developer tools, check the request header x-api-key starting with sk",
        "WEKNORA_BASE_URL":"http(s)://your-weknora-address/api/v1"
      }
    }
  }
}

Run directly using stdio command:

pip install weknora-mcp-server
python -m weknora-mcp-server

๐Ÿ”ง Initialization Configuration Guide

To help users quickly configure various models and reduce trial-and-error costs, we've improved the original configuration file initialization method by adding a Web UI interface for model configuration. Before using, please ensure the code is updated to the latest version. The specific steps are as follows: If this is your first time using this project, you can skip steps โ‘ โ‘ก and go directly to steps โ‘ขโ‘ฃ.

โ‘  Stop the services

./scripts/start_all.sh --stop

โ‘ก Clear existing data tables (recommended when no important data exists)

make clean-db

โ‘ข Compile and start services

./scripts/start_all.sh

โ‘ฃ Access Web UI

http://localhost

On your first visit, you will be automatically redirected to the registration/login page. After completing registration, please create a new knowledge base and finish the relevant settings on its configuration page.

๐Ÿ“ฑ Interface Showcase

Web UI Interface

Knowledge Base Management
Knowledge Base Management
Conversation Settings
Conversation Settings
Agent Mode Tool Call Process
Agent Mode Tool Call Process

Knowledge Base Management: Support for creating FAQ and document knowledge base types, with multiple import methods including drag-and-drop, folder import, and URL import. Automatically identifies document structures and extracts core knowledge to establish indexes. Supports tag management and online entry. The system clearly displays processing progress and document status, achieving efficient knowledge base management.

Agent Mode: Support for ReACT Agent mode that can use built-in tools to retrieve knowledge bases, call user-configured MCP tools and web search tools to access external services, providing comprehensive summary reports through multiple iterations and reflection. Supports cross-knowledge base retrieval, allowing selection of multiple knowledge bases for simultaneous retrieval.

Conversation Strategy: Support for configuring Agent models, normal mode models, retrieval thresholds, and online Prompt configuration, with precise control over multi-turn conversation behavior and retrieval execution methods. The conversation input box supports Agent mode/normal mode switching, enabling/disabling web search, and selecting conversation models.

Document Knowledge Graph

WeKnora supports transforming documents into knowledge graphs, displaying the relationships between different sections of the documents. Once the knowledge graph feature is enabled, the system analyzes and constructs an internal semantic association network that not only helps users understand document content but also provides structured support for indexing and retrieval, enhancing the relevance and breadth of search results.

For detailed configuration, please refer to the Knowledge Graph Configuration Guide.

MCP Server

Please refer to the MCP Configuration Guide for the necessary setup.

๐Ÿ“˜ API Reference

Troubleshooting FAQ: Troubleshooting FAQ

Detailed API documentation is available at: API Docs

๏ฟฝ๏ฟฝ Developer Guide

โšก Fast Development Mode (Recommended)

If you need to frequently modify code, you don't need to rebuild Docker images every time! Use fast development mode:

# Method 1: Using Make commands (Recommended)
make dev-start      # Start infrastructure
make dev-app        # Start backend (new terminal)
make dev-frontend   # Start frontend (new terminal)

# Method 2: One-click start
./scripts/quick-dev.sh

# Method 3: Using scripts
./scripts/dev.sh start     # Start infrastructure
./scripts/dev.sh app       # Start backend (new terminal)
./scripts/dev.sh frontend  # Start frontend (new terminal)

Development Advantages:

  • โœ… Frontend modifications auto hot-reload (no restart needed)
  • โœ… Backend modifications quick restart (5-10 seconds, supports Air hot-reload)
  • โœ… No need to rebuild Docker images
  • โœ… Support IDE breakpoint debugging

Detailed Documentation: Development Environment Quick Start

๐Ÿ“ Directory Structure

WeKnora/
โ”œโ”€โ”€ client/      # go client
โ”œโ”€โ”€ cmd/         # Main entry point
โ”œโ”€โ”€ config/      # Configuration files
โ”œโ”€โ”€ docker/      # docker images files
โ”œโ”€โ”€ docreader/   # Document parsing app
โ”œโ”€โ”€ docs/        # Project documentation
โ”œโ”€โ”€ frontend/    # Frontend app
โ”œโ”€โ”€ internal/    # Core business logic
โ”œโ”€โ”€ mcp-server/  # MCP server
โ”œโ”€โ”€ migrations/  # DB migration scripts
โ””โ”€โ”€ scripts/     # Shell scripts

๐Ÿค Contributing

We welcome community contributions! For suggestions, bugs, or feature requests, please submit an Issue or directly create a Pull Request.

๐ŸŽฏ How to Contribute

  • ๐Ÿ› Bug Fixes: Discover and fix system defects
  • โœจ New Features: Propose and implement new capabilities
  • ๐Ÿ“š Documentation: Improve project documentation
  • ๐Ÿงช Test Cases: Write unit and integration tests
  • ๐ŸŽจ UI/UX Enhancements: Improve user interface and experience

๐Ÿ“‹ Contribution Process

  1. Fork the project to your GitHub account
  2. Create a feature branch git checkout -b feature/amazing-feature
  3. Commit changes git commit -m 'Add amazing feature'
  4. Push branch git push origin feature/amazing-feature
  5. Create a Pull Request with detailed description of changes

๐ŸŽจ Code Standards

๐Ÿ“ Commit Guidelines

Use Conventional Commits standard:

feat: Add document batch upload functionality
fix: Resolve vector retrieval precision issue
docs: Update API documentation
test: Add retrieval engine test cases
refactor: Restructure document parsing module

๐Ÿ‘ฅ Contributors

Thanks to these excellent contributors:

Contributors

๐Ÿ“„ License

This project is licensed under the MIT License. You are free to use, modify, and distribute the code with proper attribution.

๐Ÿ“ˆ Project Statistics

Star History Chart