This project is a modified version of YCSB (Yahoo! Cloud Serving Benchmark) and is designed to benchmark ArangoDB, MongoDB, and Redis only.
- Getting Started
- Database Setup
- Running the Benchmarks
- Methodology and Experiment
- ArangoDB Query Translator
- Building from Source
- Running Multiple Instances and Latency Percentiles
- License and Notices
Clone this repository to your local machine:
git clone https://github.com/<your-username>/COMSE6156-midterm-experiment.git
cd COMSE6156-midterm-experimentEnsure that the following databases are installed or available via Docker:
- ArangoDB (version 3.10.5)
- MongoDB (version 6.0)
- Redis (version 7.0)
You can run them using Docker containers on a single VM for local testing.
YCSB divides the benchmarking process into two phases: load and run.
This phase inserts a fixed number of records into each database using a synthetic dataset generated by YCSB.
-
ArangoDB:
bin/ycsb.sh load arangodb -P workloads/workloada
-
MongoDB:
bin/ycsb.sh load mongodb -P workloads/workloada
-
Redis:
bin/ycsb.sh load redis -P workloads/workloada
After loading the data, run the workload tests:
-
ArangoDB:
bin/ycsb.sh run arangodb -P workloads/workloada
-
MongoDB:
bin/ycsb.sh run mongodb -P workloads/workloada
-
Redis:
bin/ycsb.sh run redis -P workloads/workloada
For additional parameters and workload customization, refer to the YCSB Wiki.
This section describes how YCSB is used to compare ArangoDB (a multi-model database) against polyglot persistence solutions—MongoDB and Redis—to evaluate performance under various workloads.
-
Objective:
- Compare the performance of a multi-model database (ArangoDB) with a polyglot approach (MongoDB + Redis).
- Focus on throughput (ops/sec), average and high-percentile latencies (e.g., P95, P99), and overall system resource utilization.
-
Approach:
- ArangoDB supports multiple data models (document, key-value, etc.) in a single system.
- MongoDB + Redis represents a polyglot persistence strategy, where each database is used for the data model it handles best.
- Data Source: A synthetic dataset generated by YCSB, consisting of key-value or document-style records with multiple fields of small to moderate size.
- Loading: YCSB’s “load” phase inserts a fixed number of records into each database.
- System Configuration:
- Hardware: A single VM running Docker containers for ArangoDB, MongoDB, and Redis.
- Software:
- ArangoDB 3.10.5
- MongoDB 6.0
- Redis 7.0
- YCSB 0.17.0
- Network: Localhost-based testing with default ports exposed, ensuring minimal network overhead.
In addition to the benchmarking functionality, this project includes a custom-built query translator developed in Python with auto testing for the average translating time. This translator converts ArangoDB AQL (ArangoDB Query Language) statements into equivalent commands for MongoDB and Redis. Key features include:
-
Basic CRUD Conversion:
- INSERT, UPDATE, and REMOVE operations are parsed and translated directly into MongoDB commands (using methods like
insertOne,updateOne, anddeleteOne) or mapped to Redis key–value operations (SETandDEL).
- INSERT, UPDATE, and REMOVE operations are parsed and translated directly into MongoDB commands (using methods like
-
Extended Query Support (for MongoDB):
- When detecting extended query clauses such as
LET,SORT,LIMIT, and simple conditional expressions withAND/OR, the translator attempts to convert the ArangoDB query into a MongoDB aggregation pipeline. - Note that the translator currently does not support the
COLLECTclause and returns an appropriate unsupported message if encountered.
- When detecting extended query clauses such as
-
Redis Compatibility:
- For Redis, only basic operations are supported since Redis is a key–value store and does not handle complex queries natively.
- If extended clauses appear in a query intended for Redis, the translator will respond by indicating that these extended features are not supported.
This translator is intended to serve as a prototype for simplifying the migration or testing of queries between different database systems. While it covers many common query patterns, advanced AQL constructs might require further enhancement or a more robust parsing approach.
This project uses Maven 3 (or later) for building.
-
To build the full distribution (only including the three bindings):
mvn clean package
-
To build a single binding (e.g., MongoDB):
mvn -pl site.ycsb:mongodb-binding -am clean package
For high-load tests and to analyze latency distribution (e.g., P95, P99, P99.9), you might run multiple instances. Note that:
-
Averaging latency percentiles directly is not valid.
-
High-percentile latency (e.g., P99) can reveal system bottlenecks.
-
HdrHistogram can serialize latency data. To output histograms, use:
-p hdrhistogram.fileoutput=true -p hdrhistogram.output.path=your_output_file.hdr
Then use tools such as HdrLogProcessing to merge and analyze the results.
This project is derived from YCSB and is released under the same open-source license. Please refer to the LICENSE.txt and NOTICE.txt files for further details.