🌐 AI搜索 & 代理 主页
Skip to content

Tongs2000/COMSE6156-midterm-experiment

Repository files navigation

COMSE6156-midterm-experiment

This project is a modified version of YCSB (Yahoo! Cloud Serving Benchmark) and is designed to benchmark ArangoDB, MongoDB, and Redis only.


Table of Contents


Getting Started

Clone this repository to your local machine:

git clone https://github.com/<your-username>/COMSE6156-midterm-experiment.git
cd COMSE6156-midterm-experiment

Database Setup

Ensure that the following databases are installed or available via Docker:

  • ArangoDB (version 3.10.5)
  • MongoDB (version 6.0)
  • Redis (version 7.0)

You can run them using Docker containers on a single VM for local testing.


Running the Benchmarks

YCSB divides the benchmarking process into two phases: load and run.

1. Loading Data

This phase inserts a fixed number of records into each database using a synthetic dataset generated by YCSB.

  • ArangoDB:

    bin/ycsb.sh load arangodb -P workloads/workloada
  • MongoDB:

    bin/ycsb.sh load mongodb -P workloads/workloada
  • Redis:

    bin/ycsb.sh load redis -P workloads/workloada

2. Running the Tests

After loading the data, run the workload tests:

  • ArangoDB:

    bin/ycsb.sh run arangodb -P workloads/workloada
  • MongoDB:

    bin/ycsb.sh run mongodb -P workloads/workloada
  • Redis:

    bin/ycsb.sh run redis -P workloads/workloada

For additional parameters and workload customization, refer to the YCSB Wiki.


Methodology and Experiment

This section describes how YCSB is used to compare ArangoDB (a multi-model database) against polyglot persistence solutions—MongoDB and Redis—to evaluate performance under various workloads.

Experiment Design

  • Objective:

    1. Compare the performance of a multi-model database (ArangoDB) with a polyglot approach (MongoDB + Redis).
    2. Focus on throughput (ops/sec), average and high-percentile latencies (e.g., P95, P99), and overall system resource utilization.
  • Approach:

    • ArangoDB supports multiple data models (document, key-value, etc.) in a single system.
    • MongoDB + Redis represents a polyglot persistence strategy, where each database is used for the data model it handles best.

Data and Setup

  • Data Source: A synthetic dataset generated by YCSB, consisting of key-value or document-style records with multiple fields of small to moderate size.
  • Loading: YCSB’s “load” phase inserts a fixed number of records into each database.
  • System Configuration:
    • Hardware: A single VM running Docker containers for ArangoDB, MongoDB, and Redis.
    • Software:
      • ArangoDB 3.10.5
      • MongoDB 6.0
      • Redis 7.0
      • YCSB 0.17.0
    • Network: Localhost-based testing with default ports exposed, ensuring minimal network overhead.

ArangoDB Query Translator

In addition to the benchmarking functionality, this project includes a custom-built query translator developed in Python with auto testing for the average translating time. This translator converts ArangoDB AQL (ArangoDB Query Language) statements into equivalent commands for MongoDB and Redis. Key features include:

  • Basic CRUD Conversion:

    • INSERT, UPDATE, and REMOVE operations are parsed and translated directly into MongoDB commands (using methods like insertOne, updateOne, and deleteOne) or mapped to Redis key–value operations (SET and DEL).
  • Extended Query Support (for MongoDB):

    • When detecting extended query clauses such as LET, SORT, LIMIT, and simple conditional expressions with AND/OR, the translator attempts to convert the ArangoDB query into a MongoDB aggregation pipeline.
    • Note that the translator currently does not support the COLLECT clause and returns an appropriate unsupported message if encountered.
  • Redis Compatibility:

    • For Redis, only basic operations are supported since Redis is a key–value store and does not handle complex queries natively.
    • If extended clauses appear in a query intended for Redis, the translator will respond by indicating that these extended features are not supported.

This translator is intended to serve as a prototype for simplifying the migration or testing of queries between different database systems. While it covers many common query patterns, advanced AQL constructs might require further enhancement or a more robust parsing approach.


Building from Source

This project uses Maven 3 (or later) for building.

  • To build the full distribution (only including the three bindings):

    mvn clean package
  • To build a single binding (e.g., MongoDB):

    mvn -pl site.ycsb:mongodb-binding -am clean package

Running Multiple Instances and Latency Percentiles

For high-load tests and to analyze latency distribution (e.g., P95, P99, P99.9), you might run multiple instances. Note that:

  1. Averaging latency percentiles directly is not valid.

  2. High-percentile latency (e.g., P99) can reveal system bottlenecks.

  3. HdrHistogram can serialize latency data. To output histograms, use:

    -p hdrhistogram.fileoutput=true
    -p hdrhistogram.output.path=your_output_file.hdr

    Then use tools such as HdrLogProcessing to merge and analyze the results.


License and Notices

This project is derived from YCSB and is released under the same open-source license. Please refer to the LICENSE.txt and NOTICE.txt files for further details.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published