AI prompts
base on A distributed thread-per-core document database <img src="./logo.svg">
## Introduction
`dbeel` is an attempt to learn modern database architecture.
The best one-liner to describe the db is: <em>A distributed thread-per-core document database written in rust.</em>
So basically it has a document API like in `MongoDB` with leaderless replication like in `Cassandra` and thread-per-core architecture like in `ScyllaDB`.
It's not production ready at all, but that doesn't mean there is no value in the project.
If you ever wanted to read database code without getting overwhelmed by massive amounts of code, dbeel is for you.
You can try it out by running `cargo install dbeel`.
I've also posted a <a href="https://tontinton.com/posts/database-fundementals/">blog post</a> as a summary of what I've learned working on this project.
## Traits
* Documents + API in [msgpack](https://msgpack.org) format
* [LSM Tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree)
* Memtable is a red black tree
* [Thread per core](https://seastar.io/shared-nothing) (thanks `glommio`)
* [io_uring](https://unixism.net/loti/what_is_io_uring.html) (thanks again `glommio`)
* Direct I/O
* Page cache implemented using [WTiny-LFU](https://arxiv.org/pdf/1512.00727.pdf) eviction algorithm
* Load balanced via [consistent hashing](https://en.wikipedia.org/wiki/Consistent_hashing)
* Each shard (core) is placed on the ring
* Metadata events sent using [gossip dissemination](https://en.wikipedia.org/wiki/Gossip_protocol)
* Leaderless replication with tunable consistency
* `replication_factor` (parameter in `create_collection` command) - Number of nodes that will store a copy of data
* Write `consistency` (parameter in `set` command) - Number of nodes that will acknowledge a write for it to succeed
* Read `consistency` (parameter in `get` command) - Number of nodes that have to respond to a read operation for it to succeed
* Max timestamp conflict resolution
## Performance
Running the benchmark on my machine ([System76 lemp11](https://tech-docs.system76.com/models/lemp11/README.html)) with no `fdatasync` results in the following output:
```
Set:
total: 54.424290449s, min: 80.219µs, p50: 446.851µs, p90: 905.422µs, p99: 1.806261ms, p999: 7.463916ms, max: 35.385961ms
Get:
total: 29.281556369s, min: 36.577µs, p50: 231.464µs, p90: 479.929µs, p99: 1.222589ms, p999: 3.269881ms, max: 6.242454ms
```
Running with `--wal-sync` (calls `fdatasync` after each write to the WAL file) results in the following output for Set (note that `fdatasync` on my machine takes 6-10ms):
```
Set:
total: 1253.611595658s, min: 6.625024ms, p50: 12.57609ms, p90: 12.858347ms, p99: 13.4931ms, p999: 19.062725ms, max: 31.880792ms
```
You can always configure `--wal-sync` to achieve better throughput, with worse tail latencies, by setting `--wal-sync-delay` (try setting half the time it takes to `fdatasync` a file on average in your setup).
## How to use
The only implemented client is in async rust, and can work on either `glommio` or `tokio` (select which using cargo features).
Documents are formatted in `msgpack` and the best crate I found for it is `rmpv`, so the client makes heavy use of it.
Example (mostly copied from `tokio_example/`):
```rust
// When connecting to a cluster, you provide nodes to request cluster metadata from.
let seed_nodes = [("127.0.0.1", 10000)];
let client = DbeelClient::from_seed_nodes(&seed_nodes).await?;
// Create a collection with replication of 3 (meaning 3 copies for each document).
let collection = client.create_collection_with_replication(COLLECTION_NAME, 3).await?;
// Create key and document using rmpv.
let key = Value::String("key".into());
let document = Value::Map(vec![
(Value::String("is_best_db".into()), Value::Boolean(true)),
(Value::String("owner".into()), Value::String("tontinton".into())),
]);
// Write document using quorum consistency.
collection.set_consistent(key.clone(), value.clone(), Consistency::Quorum).await?;
// Read document using quorum consistency.
let response = collection.get_consistent(key, Consistency::Quorum).await?;
assert_eq!(response, value);
// Drop collection.
collection.drop().await?;
```
## Try out the benchmarks yourself
To compile the DB (you can skip building the db by running `cargo install dbeel`):
``` sh
cargo build --release
./target/release/dbeel --help
```
To compile the blackbox benchmarks:
``` sh
cd blackbox_bench
cargo build --release
```
To run the benchmarks:
``` sh
# If you installed using cargo instead of building, dbeel should be in your PATH.
./target/release/dbeel # On first terminal
./target/release/blackbox-bench # On second terminal
```
", Assign "at most 3 tags" to the expected json: {"id":"6046","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"