construct

mirror of https://github.com/matrix-construct/construct synced 2024-11-16 23:10:54 +01:00

History

Jason Volk 605cce9ed1 ircd::db: Propagate the FlushOptions.allow_write_stall option; improve sort cmd.		2018-12-19 13:58:09 -08:00
..
database	ircd::db: Propagate the FlushOptions.allow_write_stall option; improve sort cmd.	2018-12-19 13:58:09 -08:00
cache.h	ircd::db: Add per-cache statistics.	2018-09-26 18:00:18 -07:00
cell.h	ircd::db: Removed unused cell features.	2018-05-19 18:49:06 -07:00
column.h	ircd::db: Propagate the FlushOptions.allow_write_stall option; improve sort cmd.	2018-12-19 13:58:09 -08:00
compactor.h	ircd::db: Minor cleanup; comments.	2018-12-13 13:44:37 -08:00
comparator.h	ircd::db: Support future CanKeysWithDifferentByteContentsBeEqual feature.	2018-10-31 11:25:07 -07:00
db.h	ircd::db: Add interface to get column and database options.	2018-12-12 10:17:47 -08:00
delta.h	ircd::db: minor cleanup: move this here.	2018-02-07 23:15:17 -08:00
descriptor.h	ircd::db: Promote LZ4 as default compression above Snappy.	2018-11-30 15:20:35 -08:00
index.h	ircd::db: Properly maintain db::gopts as iterator state.	2018-05-25 03:07:30 -07:00
json.h	ircd::json: Add specific extern undefined number.	2018-05-19 22:55:03 -07:00
merge.h	Update Copyrastafaris.	2018-02-05 21:24:34 -08:00
opts.h	ircd::db: Move database::options to db::options.	2018-12-12 10:17:47 -08:00
pos.h	ircd::db: Split out header for pos.h from main db.h	2018-12-03 12:20:55 -08:00
prefix.h	Update Copyrastafaris.	2018-02-05 21:24:34 -08:00
README.md	ircd::db: Update and add various README's.	2018-09-19 16:11:21 -07:00
row.h	ircd::db: Dressing for C99 array on stack here.	2018-12-01 17:07:15 -08:00
stats.h	ircd::db: Enable histogram interface; partial data tally.	2018-09-25 22:18:37 -07:00
txn.h	ircd::db: Comment to clarify txn iface.	2018-09-26 15:28:36 -07:00

README.md

IRCd Database

The database here is a strictly schematized key-value grid built from the primitives of column, cell and row. We use the database as both a flat-object store (matrix event db) using cells, columns and rows; as well as binary data block store (matrix media db) using just a single column.

Columns

A column is a key-value store. We specify columns in the database descriptors when opening the database and (though technically somewhat possible) don't change them during runtime. The database must be opened with the same descriptors in properly aligned positions every single time. All keys and values in a column can be iterated. Columns can also be split into "domains" of keys (see: db/index.h) based on the key's prefix, allowing a seek and iteration isolated to just one domain.

In practice we create a column to present a single property in a JSON object. There is no recursion and we also must know the name of the property in advance to specify a descriptor for it. In practice we create a column for each property in the Matrix event object and then optimize that column specifically for that property.

{
	"origin_server_ts": 15373977384823,
	"content": {"msgtype": "m.text", "body": "hello"}
}

For the above example consider two columns: first origin_server_ts is optimized for timestamps by having values which are fixed 8 byte signed integers; next the content object is stored in whole as JSON text in the content column. Recursion is not yet supported but theoretically we can create more columns to hold nested properties if we want further optimizations.

Rows

Since columns are technically independent key-value stores (they have their own index), when an index key is the same between columns we call this a row. In the Matrix event example, each property of the same event is sought together in a row. A row seek is optimized and the individual cells are queried concurrently and iterated in lock-step together.

Cells

A cell is a gratuitious interface representing of a single value in a column with a common key that should be able to form a row between columns. A row is comprised of cells.

Important notes

!!! The database system is plugged into the userspace context system to facilitate IO. This means that an expensive database call (mostly on the read side) that has to do disk IO will suspend your userspace context. Remember that when your userspace context resumes on the other side of the call, the state of IRCd and even the database itself may have changed. We have a suite of tools to mitigate this. !!!

While the database schema is modifiable at runtime (we can add and remove columns on the fly) the database is very picky about opening the exact same way it last closed. This means, for now, we have the full object schema explicitly specified when the DB is first opened. All columns exist for the lifetime of the DB, whether or not you have a handle to them.