ClickGraph - A high-performance, stateless, read-only graph query translator for ClickHouseยฎ, written in Rust, with Neo4j ecosystem compatibility - Cypher and Bolt Protocol 5.8 support.
Note: ClickGraph is under active development and is development-ready for view-based graph analytics applications.
- There are huge volumes of data in ClickHouse databases, viewing them as graph data with graph analytics capability brings another level of abstraction and boosts productivity with graph tools, in ways beyond relational analytics alone.
- Research shows relational analytics with columnar stores and vectorized execution engines like ClickHouse provide superior analytical performance and scalability to graph-native technologies, which usually leverage explicit adjacency representations and are more suitable for local-area graph traversals.
- View-based graph analytics offer the benefits of zero-ETL without the hassle of data migration and duplicate cost, yet better performance and scalability than most of the native graph analytics options.
- Neo4j Bolt protocol support gives instant access to the tools available, including graph visualization and the MCP server.
v0.6.0 brings elegant semantic validation for variable-length paths and robust multi-table label support.
- VLP Transitivity Check - Semantic validation prevents invalid recursive patterns
- Multi-table label fixes - Denormalization metadata, type inference improvements
- Performance optimization - Skip unnecessary CTE generation for non-transitive relationships
- Test status - 2446/3359 passing (72.8%), 98.6% unit test coverage
- Semantic VLP Validation - Automatically detects non-transitive relationships (e.g., IPโDomain) and converts to single-hop patterns
- Architecture - Analyzer-level validation instead of tactical SQL fixes
- Example:
(IP)-[DNS_REQUESTED*]->(Domain)โ Simple single-hop query (Domain nodes can't start DNS_REQUESTED edges) - ClickHouse function passthrough - Make all ClickHouse functions available in Cypher through the pass-thru feature
- Type inference - Bottom-up processing for multi-hop pattern label resolution
- Denormalization metadata - Copy
is_denormalized,from_node_properties,to_node_propertiesfrom schema - VLP ID columns - Use relationship schema columns (
from_id/to_id) - Cycle prevention - Skip for single-hop patterns (can't have cycles)
-
Composite Node IDs - Use property combinations instead of single columns for node identity
- Example:
node_id: [tenant_id, user_id]for multi-tenant users,[country, city]for locations - Use case: Multi-tenant systems, hierarchical data, time-series events, complex entity relationships
- Benefit: Model real-world data without artificial ID columns
- Example:
-
ClickHouse Function Pass-through - Use ClickHouse functions directly in Cypher expressions
- Syntax:
RETURN ch.function_name(args)orRETURN chagg.aggregate_function(args) - Use cases:
ch.cityHash64(),ch.murmurHash3_64(),chagg.uniq(), specialized aggregations - Example:
RETURN ch.cityHash64(u.email) AS hash
- Syntax:
-
LDBC SNB (Work In Progress) - Social Network Benchmark implementation
- Status: Schema mapping complete, query adaptation in progress
- Scale: Designed for scale factors 1-100 (1K-100M edges)
- Purpose: Industry-standard graph database benchmarking
-
MCP Server Validation - Model Context Protocol integration testing
- Status: Active development and validation
- Purpose: LLM tool integration for graph queries
- Error handling - Removed
.unwrap()landmines, proper Result/Option propagation - Schema validation - Comprehensive checks for empty ID columns, missing properties
- Test infrastructure - Unified test data setup scripts, reproducible fixtures
- Code quality - Modular architecture, reduced duplication, cleaner abstractions
- Cross-table queries - Zeek log correlation, multi-table JOINs (v0.5.4)
- Smart type inference - Automatic node/relationship type inference (v0.5.4)
- FK-Edge patterns - File systems, org charts with VLP (v0.5.4)
- Polymorphic edges - Single table with multiple edge types (v0.5.2)
- Multi-tenancy - Parameterized views, 99% cache reduction (v0.5.0)
- Query cache - LRU caching, 10-100x speedup (v0.5.0)
- Bolt Protocol 5.8 - Full Neo4j compatibility (v0.5.0)
v0.5.4 (December 7, 2025) - Cross-Table Queries & Smart Inference
- Cross-table query support - Zeek log correlation and multi-table JOINs
- Smart type inference - Automatic node and relationship type inference
- FK-Edge patterns - File systems, org charts with variable-length paths
- OnTime Flights benchmark - 20M row real-world dataset validation
- See CHANGELOG.md for details
v0.5.3 (December 2, 2025) - Cypher Functions Release
label()function: Get scalar label for nodes- EXISTS subqueries: Filter based on pattern existence
- WITH + MATCH chaining: Multi-stage query pipelines
- Regex matching (
=~): Pattern matching via ClickHousematch() collect()function: Aggregate to arrays viagroupArray()
v0.5.2 (November 30, 2025) - Schema Variations Release
Comprehensive support for advanced schema patterns including polymorphic edges, coupled edges, and denormalized tables.
- Polymorphic Edge Tables: Single table with multiple edge types
- Coupled Edge Optimization: Automatic JOIN elimination for same-table edges
- VLP + UNWIND Support: Path decomposition with ARRAY JOIN
- OPTIONAL MATCH + VLP Fix: Anchor nodes preserved when no path exists
v0.5.1 (November 21, 2025) - Docker Hub Release ๐ณ
- Official Docker Hub availability:
docker pull genezhang/clickgraph:latest - Multi-platform support (linux/amd64, linux/arm64)
- RETURN DISTINCT support
- Docker image validation suite
v0.5.0 (November 2025) - Phase 2 Complete
- Multi-tenancy with parameterized views (99% cache memory reduction)
- SET ROLE RBAC support for column-level security
- Query cache with LRU eviction (10-100x speedup)
- Parameterized queries with Neo4j-compatible $param syntax
- Auto-schema discovery from ClickHouse metadata
- Complete Bolt Protocol 5.8 implementation
- 22% code reduction in core modules
See CHANGELOG.md for complete release history.
- Read-Only Graph Analytics: Translates Cypher graph queries into optimized ClickHouse SQL for analytical workloads
- ClickHouse-native: Leverages ClickHouse for graph queries on shared data with SQL, merging OLAP speed with graph-analysis power
- Stateless Architecture: Offloads all query execution to ClickHouseโno extra datastore required
- Cypher Query Language: Industry-standard Cypher read syntax for intuitive, expressive property-graph querying
- Parameterized Queries: Neo4j-compatible parameter support (
$paramsyntax) for SQL injection prevention and query plan caching - Query Cache: Development-ready LRU caching with 10-100x speedup for repeated query translations, SQL template reuse with parameter substitution, and Neo4j-compatible CYPHER replan options
- Variable-Length Paths: Recursive traversals with
*1..3syntax using ClickHouse WITH RECURSIVE CTEs - Path Variables & Functions: Capture and analyze path data with
length(p),nodes(p),relationships(p)functions - Analytical-scale Performance: Optimized for very large datasets and complex multi-hop traversals
- Query Performance Metrics: Phase-by-phase timing with HTTP headers and structured logging for monitoring and optimization
- Bolt Protocol v5.8: โ Fully functional - Complete query execution, authentication, and multi-database support. Compatible with Neo4j drivers, cypher-shell, and Neo4j Browser.
- HTTP REST API: โ Fully functional - Complete query execution with parameters, aggregations, and all Cypher features
- Multi-Schema Support: โ
Fully working - Complete schema isolation with per-request schema selection:
- USE clause: Cypher
USE database_namesyntax (highest priority) - Session/request parameter: Bolt session database or HTTP
schema_nameparameter - Default schema: Fallback to "default" schema
- Schema isolation: Different schemas map same labels to different ClickHouse tables
- USE clause: Cypher
- Dual Server Architecture: HTTP and Bolt servers running simultaneously (both production-ready)
- Authentication Support: Multiple authentication schemes including basic auth
- Zero Migration: Transform existing relational data into graph format through YAML configuration
- Auto-Discovery: Automatically query ClickHouse
system.columnsfor property mappings withauto_discover_columns: true- no manual mapping needed! - Dynamic Schema Loading: Runtime schema registration via HTTP API (
POST /schemas/load) with full YAML content support - Native Performance: Leverages ClickHouse's columnar storage and query optimization
- Robust Implementation: Comprehensive validation, error handling, and optimization passes
ClickGraph runs as a lightweight stateless query translator alongside ClickHouse:
flowchart LR
Clients["๐ฑ Graph Clients<br/><br/>HTTP/REST<br/>Bolt Protocol<br/>(Neo4j tools)"]
ClickGraph["โก ClickGraph<br/><br/>Cypher โ SQL<br/>Translator<br/><br/>:8080 (HTTP)<br/>:7687 (Bolt)"]
ClickHouse["๐พ ClickHouseยฎ<br/><br/>Columnar Storage<br/>Query Engine"]
Clients -->|Cypher| ClickGraph
ClickGraph -->|SQL| ClickHouse
ClickHouse -->|Results| ClickGraph
ClickGraph -->|Results| Clients
style ClickGraph fill:#e1f5ff,stroke:#0288d1,stroke-width:3px
style ClickHouse fill:#fff3e0,stroke:#f57c00,stroke-width:3px
style Clients fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
Three-tier architecture: Graph clients โ ClickGraph translator โ ClickHouse database
Both protocols share the same underlying query engine and ClickHouse backend. Both are production-ready.
New to ClickGraph? See the Getting Started Guide for a complete walkthrough.
Pull and run the pre-built image:
# Pull the latest image
docker pull genezhang/clickgraph:latest
# Start ClickHouse only
docker-compose up -d clickhouse-service
# Run ClickGraph from Docker Hub image
docker run -d \
--name clickgraph \
--network clickgraph_default \
-p 8080:8080 \
-p 7687:7687 \
-e CLICKHOUSE_URL="http://clickhouse-service:8123" \
-e CLICKHOUSE_USER="test_user" \
-e CLICKHOUSE_PASSWORD="test_pass" \
-e GRAPH_CONFIG_PATH="/app/schemas/social_benchmark.yaml" \
-v $(pwd)/benchmarks/social_network/schemas:/app/schemas:ro \
genezhang/clickgraph:latestNote: CLICKHOUSE_DATABASE is optional (defaults to "default"). All queries use fully-qualified table names from your schema config.
Or use docker-compose (uses published image by default):
```bash
docker-compose up -d
Build and run locally with Rust:
# Prerequisites: Rust toolchain (1.85+) and Docker for ClickHouse
# 1. Clone and start ClickHouse
git clone https://github.com/genezhang/clickgraph
cd clickgraph
docker-compose up -d clickhouse-service
# 2. Build ClickGraph
cargo build --release
# 3. Set required environment variables and run
export CLICKHOUSE_URL="http://localhost:8123"
export CLICKHOUSE_USER="test_user"
export CLICKHOUSE_PASSWORD="test_pass"
export GRAPH_CONFIG_PATH="./benchmarks/social_network/schemas/social_benchmark.yaml"
cargo run --bin clickgraph
# Or inline (all on one command):
CLICKHOUSE_URL="http://localhost:8123" \
CLICKHOUSE_USER="test_user" \
CLICKHOUSE_PASSWORD="test_pass" \
GRAPH_CONFIG_PATH="./benchmarks/social_network/schemas/social_benchmark.yaml" \
cargo run --bin clickgraph -- --http-port 8080 --bolt-port 7687
โ ๏ธ Required Environment Variables:GRAPH_CONFIG_PATHis required to start the server. Without it, ClickGraph won't know how to map your ClickHouse tables to graph nodes and edges.
Query via HTTP API:
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{"query": "MATCH (u:User) RETURN u.full_name LIMIT 5"}'Or connect with Neo4j tools (cypher-shell, Neo4j Browser):
cypher-shell -a bolt://localhost:7687 -u neo4j -p password# Simple query
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{"query": "RETURN 1 as test"}'
# Query with parameters
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{"query": "MATCH (u:User) WHERE u.age >= $minAge RETURN u.full_name, u.age", "parameters": {"minAge": 25}}'-
Bolt Protocol (Neo4j driver compatibility):
from neo4j import GraphDatabase driver = GraphDatabase.driver("bolt://localhost:7687") with driver.session() as session: result = session.run("RETURN 1 as test") for record in result: print(record["test"]) # Outputs: 1 driver.close()
-
Use the USE clause for multi-database queries:
-- Query specific database using Neo4j-compatible USE clause USE social_network MATCH (u:User)-[:FOLLOWS]->(friend) RETURN u.name, collect(friend.name) AS friends -- USE overrides session/request parameters USE ecommerce MATCH (p:Product) WHERE p.price > 100 RETURN p.name
ClickGraph includes an interactive command-line client for easy querying (inherited from Brahmand):
# Build the client
cargo build --release -p clickgraph-client
# Run with default settings (connects to http://localhost:8080)
./target/release/clickgraph-client
# Or connect to a custom server
./target/release/clickgraph-client --url http://your-server:8080clickgraph-client :) MATCH (u:User) RETURN u.name LIMIT 5
Tim Duncan
Tony Parker
LaMarcus Aldridge
Manu Ginobili
Boris Diaw
clickgraph-client :) MATCH (u:User)-[:FOLLOWS]->(friend) WHERE u.user_id = 101 RETURN friend.name
Tim Duncan
LaMarcus Aldridge
clickgraph-client :) <Ctrl+C or Ctrl+D to exit>
I'll be back:)
Features:
- Interactive REPL with history support
- Automatic result formatting (JSON or text)
- Command history (up/down arrows)
- Simple connection to any ClickGraph server
Use ClickGraph with AI assistants like Claude through the Model Context Protocol (MCP).
Zero Configuration: ClickGraph's Bolt protocol is compatible with Neo4j's MCP server out-of-the-box!
-
Start ClickGraph with Bolt protocol enabled (default):
docker run -d -p 8080:8080 -p 7687:7687 genezhang/clickgraph:latest
-
Configure Claude Desktop to connect via Neo4j's MCP server:
Add to Claude Desktop's MCP configuration (
claude_desktop_config.json):{ "mcpServers": { "clickgraph": { "command": "npx", "args": [ "@modelcontextprotocol/server-neo4j", "bolt://localhost:7687" ], "env": { "NEO4J_USERNAME": "neo4j", "NEO4J_PASSWORD": "password" } } } } -
Query with natural language:
- "Show me all users in the graph"
- "Find users who follow each other"
- "What's the average age of users by country?"
Features Available:
- โ Natural language to Cypher query translation
- โ Schema introspection and discovery
- โ Complex graph pattern queries
- โ Aggregations and analytics
๐ Complete MCP Setup Guide โ
Transform existing relational data into graph format through YAML configuration:
Example: Map your users and user_follows tables to a social network graph:
views:
- name: social_network
nodes:
user: # Node label in Cypher queries
source_table: users
node_id: user_id
property_mappings:
name: full_name
relationships:
follows: # Relationship type in Cypher queries
source_table: user_follows
from_node: user # Source node label
to_node: user # Target node label
from_id: follower_id
to_id: followed_idThen query with standard Cypher:
MATCH (u:user)-[:follows]->(friend:user)
WHERE u.name = 'Alice'
RETURN friend.nameOPTIONAL MATCH for handling optional patterns:
-- Find all users and their friends (if any)
MATCH (u:user)
OPTIONAL MATCH (u)-[:follows]->(friend:user)
RETURN u.name, friend.name
-- Mixed required and optional patterns
MATCH (u:user)-[:authored]->(p:post)
OPTIONAL MATCH (p)-[:liked_by]->(liker:user)
RETURN u.name, p.title, COUNT(liker) as likesโ Generates efficient LEFT JOIN SQL with NULL handling for unmatched patterns
โก Quick Start - 5 Minutes to Graph Analytics
Perfect for first-time users! Simple social network demo with:
- 3 users, friendships - minimal setup with Memory tables
- Basic Cypher queries - find friends, mutual connections
- HTTP & Neo4j drivers - both integration methods
- 5-minute setup - zero to working graph analytics
๐ E-commerce Analytics - Comprehensive Demo
Complete end-to-end demonstration with:
- Complete data setup with realistic e-commerce schema (customers, products, orders, reviews)
- Advanced graph queries for customer segmentation, product recommendations, and market basket analysis
- Real-world workflows with both HTTP REST API and Neo4j driver examples
- Performance optimization techniques and expected benchmarks
- Business insights from customer journeys, seasonal patterns, and cross-selling opportunities
Start with Quick Start, then explore E-commerce Analytics for advanced usage! ๐ฏ
ClickGraph supports flexible configuration via command-line arguments and environment variables:
# View all options
cargo run --bin clickgraph -- --help
# Custom ports
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688
# Disable Bolt protocol (HTTP only)
cargo run --bin clickgraph -- --disable-bolt
# Custom host binding
cargo run --bin clickgraph -- --http-host 127.0.0.1 --bolt-host 127.0.0.1
# Configure CTE depth limit for variable-length paths (default: 100)
cargo run --bin clickgraph -- --max-cte-depth 150
export CLICKGRAPH_MAX_CTE_DEPTH=150 # Or via environment variableSee docs/configuration.md for complete configuration documentation.
- Getting Started - Setup walkthrough and first queries
- Features Overview - Comprehensive feature list
- API Documentation - HTTP REST API and Bolt protocol
- Configuration Guide - Server configuration and CLI options
- Development Process - โญ 5-phase feature workflow (START HERE for contributors!)
- Current Status - What works now, what's in progress
- Known Issues - Active bugs and limitations
Current Version: v0.5.3 (December 2, 2025)
- โ Rust Unit Tests: 534/534 passing (100%)
- โ Schema Variation Tests: 73 tests across 4 schema types
- โ Benchmarks: 14/14 passing (100%)
- โ E2E Tests: Bolt 4/4, Cache 5/5 (100%)
- โ Polymorphic & Coupled Edge Tables: Advanced schema patterns
- โ Multi-Tenancy: Parameterized views with row-level security
- โ Bolt Protocol 5.8: Full Neo4j driver compatibility
- โ Query Cache: 10-100x speedup with LRU eviction
- โ
Parameterized Queries: Neo4j-compatible
$paramsyntax - โ Variable-Length Paths: Recursive CTEs with configurable depth
โ ๏ธ Read-Only Engine: Write operations not supported by designโ ๏ธ Anonymous Nodes: Use named nodes for better SQL generation
See STATUS.md and KNOWN_ISSUES.md for details.
Phase 1 (v0.4.0) โ - Query cache, parameters, Bolt protocol, Neo4j functions Phase 2 (v0.5.0) โ - Multi-tenancy, RBAC, auto-schema discovery Phase 2.5 (v0.5.2) โ - Schema variations (polymorphic, coupled edges) Phase 2.6 (v0.5.3) โ - Cypher functions (label, EXISTS, regex, collect) Phase 3 (v0.6.0) ๐ - Additional graph algorithms, query optimization
See ROADMAP.md for detailed feature tracking.
ClickGraph welcomes contributions! Key areas for development:
- Additional Cypher language features (Phase 3)
- Query optimization improvements
- Neo4j compatibility enhancements
- Performance benchmarking
- Documentation improvements
Development Resources:
- DEVELOPMENT_PROCESS.md - โญ 5-phase feature development workflow (START HERE!)
- STATUS.md - Current project status and test results
- KNOWN_ISSUES.md - Active bugs and limitations
- .github/copilot-instructions.md - Architecture and conventions
ClickGraph is licensed under the Apache License, Version 2.0. See the LICENSE file for details.
This project is developed on a forked repo of Brahmand with zero-ETL view-based graph querying, Neo4j ecosystem compatibility and enterprise deployment capabilities.
