Move contents of doc/ into this wiki submodule.

2024-11-21 17:31:21 +01:00 · 2020-05-22 19:38:17 -07:00 · 2020-05-22 19:38:17 -07:00 · dfc7008e07
commit dfc7008e07
parent b1b54c05fb
8 changed files with 987 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,5 @@
 doxygen
 html
 TAGS
 latex
 xml
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@ -0,0 +1,65 @@
 # Architectural Philosophy
 ### libircd
 ##### Single-threaded✝
 The design of `libircd` is fully-asynchronous, oriented around a single-thread
 event-loop. No code in the library _blocks_ the process. All operations are
 conducted on top of a single `boost::asio::io_service` which must be supplied
 by the executable linking to `libircd`. That `io_service` must be run by the
 executable at its discretion; typically the embedder's call to `ios.run()` is
 the only place the process will _block_.
 The single-threaded approach ensures there is an _uninterrupted_, _uncontended_,
 _predictable_ execution which is easy for developers to reason about intuitively
 with sequential-consistency. This is ideal for the I/O-bound application being
 facilitated. If there are periods of execution which are computationally intense
 like parsing, hashing, cryptography, etc: this is absorbed in lieu of thread
 synchronization and bus contention.
 This system achieves scale through running multiple independent instances which
 synchronize at the application-logic level through passing the application's own
 messages.
 ✝ However, do not assume a truly threadless execution for the entire address
 space. If there is ever a long-running background computation or a call to a
 3rd party library which will block the event loop, we may use an additional
 `std::thread` to "offload" such an operation. Thus we do have a threading model,
 but it is heterogeneous.
 ##### Introduces userspace threading
 IRCd presents an interface introducing stackful coroutines, a.k.a. userspace
 context switching, a.k.a. green threads, a.k.a. fibers. The library avoids
 callbacks as the way to break up execution when waiting for events. Instead, we
 harken back to the simple old ways of synchronous programming where control
 flow and data are easy to follow. If there are certain cases where we don't
 want a stack to linger which may jeopardize the c10k'ness of the daemon the
 asynchronous pattern is still used (this is a hybrid system).
 Consider coroutines like "macro-ops" and asynchronous callbacks like
 "micro-ops." The pattern tends to use a coroutine to perform a large and
 complex operation which may involve many micro-ops behind the scenes. This
 approach relegates the asynchronous callback pattern to simple tasks contained
 within specific units which require scale, encapsulating the complexity away
 from the rest of the project.
 ##### Runs only one server at a time
 Keeping with the spirit of simplicity of the original architecture, `libircd`
 continues to be a "singleton" object which uses globals and keeps actual server
 state in the library itself. In other words, **only one IRC daemon can exist
 within a process's address space at a time.** Whether or not this was a pitfall
 of the original design, it has emerged over the decades as a very profitable
 decision for making IRCd an accessible open source internet project.
 ##### Formal grammars, RTTI, exceptions
 We utilize the `boost::spirit` system of parsing and printing through
 compile-time formal grammars, rather than writing our own parsers manually.
 In addition, we build several tools on top of such formal devices like a
 type-safe format string library acting as a drop-in for `::sprintf()`, but
 accepting objects like `std::string` without `.c_str()` and prevention of
 outputting unprintable/unwanted characters that may have been injected into
 the system somewhere prior.
--- a/BUILD.md
+++ b/BUILD.md
@ -0,0 +1,176 @@
 ## BUILD (standalone)
 ##### Compatibility Primer
 This section is intended to allow building with dependencies that have not
 made their way to mainstream systems. Important notes that may affect you:
 - Boost: The required version is available through `apt` as `libboost-all-dev` on
 Ubuntu Cosmic (18.10). All earlier releases (including 18.04 LTS) can configure
 with `--with-included-boost` as instructed below.
 - RocksDB: THE COMPLETE SOURCE-CODE OF ROCKSDB MUST BE AVAILABLE TO BUILD CONSTRUCT.
 This is different from the `include/` and `lib/` files installed by your
 distribution's package system. You do not have to build the source, but it must
 be available. ALL UBUNTU USERS MUST BUILD THE SOURCE AS WELL (SKIP TO NEXT BULLET).
 ```
 git submodule update --init deps/rocksdb
 cd deps/rocksdb
 git fetch --tags --force
 git checkout v5.17.2
 ```
 > For best performance and stability, please check for the version available on
 your system for the above `git checkout`.
 - RocksDB: All Ubuntu users on all releases must configure Construct with the
 option `--with-included-rocksdb`. This will fetch and properly build rocksdb.
 > Ubuntu builds their library with `-Bsymbolic-functions`. This conflicts with
 the requirements of Construct's embedding.
 ##### Installation Primer
 A general overview of what construct will build and install is given here. At
 this time it is suggested to supply `./configure` with a `--prefix` path,
 especially for development. Example `--prefix=~/.local/`.
 - Binary executable `$prefix/bin/construct`
 - Shared library `$prefix/lib/libircd.so`
 - Shared library modules `$prefix/lib/modules/construct/*.so`
 - Header files `$prefix/include/ircd/*`
 - Read-only shared assets `$prefix/share/construct/*`
 - Database directory may be established at `$prefix/var/db/construct/`
 ```
 Do not set your `--prefix` path to a directory inside your git repository or
 an invocation of `git clean` will erase your database in $prefix/var/db/.
 ```
 #### STANDALONE BUILD PROCEDURE
 ```
 ./autogen.sh
 ./configure --prefix=$PWD/build
 make install
 ```
 > The `--with-included-*` will fetch, configure **and build** the dependencies included
 as submodules. The result will not be installable on the system without this repository
 remaining intact. Please read the compatibility primer first to understand which options
 you need or don't need on your system.
 ### Additional build options
 #### Debug mode
 ```
 --enable-debug
 ```
 Full debug mode. Includes additional code within `#ifdef RB_DEBUG` sections.
 Optimization level is `-Og`, which is still valgrind-worthy. Debugger support
 is `-ggdb`. Log level is `DEBUG` (maximum). Assertions are enabled. No
 sanitizer instrumentation is generated by default in this mode.
 #### Generic mode binary (for distribution packages)
 Construct developers have set the default compilation to generate native
 hardware operations which may only be supported on very specific targets. For
 a generic mode binary, package maintainers may require this option.
 ```
 --enable-generic
 ```
 Sets `-mtune=generic` as `native` is otherwise the default.
 #### Compact mode (experimental)
 ```
 --enable-compact
 ```
 Create the smallest possible resulting output. This will optimize for size
 (if optimization is enabled), remove all debugging, strip symbols, and apply
 any toolchain-feature or #ifdef in code that optimizes the output size.
 _This feature is experimental. It may not build or execute on all platforms
 reliably. Please report bugs._
 #### Manually enable assertions
 ```
 --enable-assert
 ```
 Implied by `--enable-debug`. This is useful to specifically enable `assert()`
 statements when `--enable-debug` is not used.
 ```
 --with-assert=trap
 ```
 Recommended when using `--enable-assert` for debugging. This replaces the
 default mechanism of assertion with traps rather than aborts; allowing
 developers to explore an unterminated program.
 #### Manually enable optimization
 ```
 --enable-optimize
 ```
 This manually applies full release-mode optimizations even when using
 `--enable-debug`. Implied when not in debug mode.
 #### Disable third-party dynamic allocator libraries
 ```
 --disable-malloc-libs
 ```
 `./configure` will detect alternative `malloc()` implementations found in
 libraries installed on the system (jemalloc/tcmalloc/etc). Construct developers
 may enable these to be configured by default, if detected. To always prevent
 any alternative to the default standard library allocator specify this option.
 #### Enable third-party dynamic allocator libraries
 Currently:
 ```
 --enable-jemalloc
 ```
 `./configure` will detect alternative `malloc()` implementations found in
 libraries installed on the system (jemalloc/tcmalloc/etc). Construct developers
 may not enable these to be configured by default, falling back on the default
 allocator. To always use one of the alternative allocators use one option here.
 #### Logging level
 ```
 --with-log-level=
 ```
 This manually sets the level of logging. All log levels at or below this level
 will be available. When a log level is not available, all code used to generate
 its messages will be entirely eliminated via *dead-code-elimination* at compile
 time.
 The log levels are (from logger.h):
 ```
 7  DEBUG      Maximum verbosity for developers.
 6  DWARNING   A warning but only for developers (more frequent than WARNING).
 5  DERROR     An error but only worthy of developers (more frequent than ERROR).
 4  INFO       A more frequent message with good news.
 3  NOTICE     An infrequent important message with neutral or positive news.
 2  WARNING    Non-impacting undesirable behavior user should know about.
 1  ERROR      Things that shouldn't happen; user impacted and should know.
 0  CRITICAL   Catastrophic/unrecoverable; program is in a compromised state.
 ```
 When `--enable-debug` is used `--with-log-level=DEBUG` is implied. Otherwise
 for release mode `--with-log-level=INFO` is implied. Large deployments with
 many users may consider lower than `INFO` to maximize optimization and reduce
 noise.
--- a/FAQ.md
+++ b/FAQ.md
@ -0,0 +1,46 @@
 # FREQUENTLY ASKED QUESTIONS
 ##### Why does it say IRCd everywhere?
 This is a long story which is not covered in full here. The short version
 is that this project was originally intended to implement an IRC federation
 using an extended superset of the rfc1459/rfc2812 protocol. This concept went
 through several iterations. The Atheme Services codebase was first considered
 for development into a "gateway" for IRC networks to connect to each other.
 That was succeeded by the notion of eliminating separate services-daemons in
 favor of IRCd-meshing for redundancy and scale. At that point Charybdis/4 was
 chosen as a basis for the project.
 Around this time, the Matrix protocol was emerging as a potential candidate
 for federating synchronous-messaging. Though far from perfect, it had enough
 potential to outweigh the troubles of inventing and promoting yet another
 messaging protocol in a wildly diverse and already saturated space.
 Somewhile after, the original collaborators of this endeavor became
 disillusioned by many of the finer details of Matrix. Many red-flags observed
 about its stewards, community, and the overall engineering requirements placed
 on implementations made it clear this project's goals would never be reached in
 a timely or cost-effective way. Coupled with the political situation and
 death-spiral of IRC itself, the original collaborators disbanded.
 One developer decided to continue by simplifying the mission down to just
 creating a Matrix server first, and worrying about IRC later, maybe through
 TS6, or maybe never. This reasoning was bolstered by the ongoing poor
 performance of Matrix's principal reference implementation in python+pgsql.
 Today there is virtually nothing left of any original IRCd. The project
 namespaces like "ircd::" and IRCD_ remain but they too might be replaced by
 "ctor" etc at some time in the future.
 ##### Why is there a SpiderMonkey JavaScript embedding?
 One of the goals of this project is realtime team collaboration and
 development inside chat rooms. The embedding is intended to replace the
 old notion of running a "bot" which is just a single instance of a program
 that some user connects. The embedding facilitates a cloud-esque or so-called
 "lambda" ecosystem of many untrusted user-written modules that are stored
 and managed by the server.
 *The SpiderMonkey embedding is defunct and no longer developed. It is planned
 to be succeeded by WASM.*
--- a/SETUP.md
+++ b/SETUP.md
@ -0,0 +1,73 @@
 ## SETUP
 This guide will help you execute Construct for the first time. If you are
 building from source code and have not already done so please follow the
 instructions in [BUILD](BUILD.md) before continuing here.
 #### NOTES
 - We will refer to your server as `host.tld`. For those familiar with matrix:
 this is your _origin_ and mxid `@user:host.tld` hostpart. If you delegate
 your server's location to something like `matrix.host.tld:1234` we refer to
 this as your _servername_.
 > Construct clusters all share the same _origin_ but each individual instance
 of the daemon has a unique _servername_.
 - If you built construct yourself as a standalone build you will need to add
 the included library directories before executing:
 `export LD_LIBRARY_PATH=/path/to/src/deps/boost/lib:$LD_LIBRARY_PATH`
 `export LD_LIBRARY_PATH=/path/to/src/deps/rocksdb:$LD_LIBRARY_PATH`
 ### PROCEDURE
 1. Execute
 	There are two arguments: `<origin> [servername]`. If the _servername_
 	argument is missing, the _origin_ will be used for it instead.
 	```
 	bin/construct host.tld
 	````
 	> There is no configuration file.
 	> Log messages will appear in terminal concluding with notice `IRCd RUN`.
 2. Strike ctrl-c on keyboard
 	> The command-line console will appear.
 3. Create a general listener socket by entering the following command:
 	```
 	net listen matrix * 8448 privkey.pem cert.pem chain.pem
 	```
 	- `matrix` is your name for this listener; you can use any name.
 	- `*` and `8448` is the local address and port to bind.
 	- `privkey.pem` and `cert.pem` and `chain.pem` are paths (ideally
 	absolute paths) to PEM-format files for the listener's TLS.
 	> The Matrix Federation Tester should now pass. Browse to
 	https://matrix.org/federationtester/api/report?server_name=host.tld and
 	verify `"AllChecksOK": true`
 4. To use a web-based client like Riot, configure the "web root" directory
 to point at Riot's `webapp/` directory by entering the following:
 	```
 	conf set ircd.web.root.path /path/to/riot-web/webapp/
 	mod reload web_root
 	```
 6. Browse to `https://host.tld:8448/` and register a user.
 ### ADDENDUM
 * If you are employing a reverse-proxy you must review the apropos section in
 the [TROUBLESHOOTING](TROUBLESHOOTING.md#trouble-with-reverse-proxies-and-middlewares)
 guide or the server may not operate correctly.
 * Logging to files is only enabled by default for CRITICAL, ERROR, and WARNING.
 It is not enabled by default for the INFO level. To enable, use `conf set
 ircd.log.info.file.enable true`.
--- a/STYLE.md
+++ b/STYLE.md
@ -0,0 +1,455 @@
 # How to CPP for IRCd
 In the post-C++11 world it is time to leave C99+ behind and seriously consider
 C++ as C proper. It has been a hard 30 year journey to finally earn that, but
 now it is time. This document is the effective style guide for how Charybdis
 will integrate -std=gnu++17 and how developers should approach it.
 ### C++ With Respect For C People
 Remember your C heritage. There is nothing wrong with C, it is just incomplete.
 There is also no overhead with C++, that is a myth. If you write C code in C++
 it will be the same C code. Think about it like this: if C is like a bunch of
 macros on assembly, C++ is a bunch of macros on C. This guide will not address
 any more myths and for that we refer you [here](https://isocpp.org/blog/2014/12/myths-3).
 #### Direct initialization
 Use `=` only for assignment to an existing object. *Break your C habit right now.*
 Use bracket initialization `{}` of all variables and objects. Fall back to parens `()`
 if brackets conflict with an initializer_list constructor (such as with STL containers)
 or if absolutely necessary to quash warnings about conversions.
 > Quick note to preempt a confusion for C people:
 > Initialization in C++ is like C but you don't have to use the `=`.
 >
 > ```C++
 > struct user { const char *nick; };
 > struct user you = {"you"};
 > user me {"me"};
 > ```
 >
 * Use Allman style for complex/long initialization statements. It's like a function
  returning the value to your new object; it is easier to read than one giant line.
 > ```C++
 > const auto sum
 > {
 >     1 + (2 + (3 * 4) + 5) + 6
 > };
 > ```
 * Do not put uninitialized variables at the top of a function and assign them
 later.
 * Even though C++17 mandates [copy elision](https://en.cppreference.com/w/cpp/language/copy_elision)
 this project does not relax its comprehensive use of direct initialization.
 #### Use full const correctness
 `const` correctness should extend to all variables, pointers, arguments, and
 functions- not just "pointed-to" data. If it *can* be `const` then make it
 `const` and relax it later if necessary.
 #### Use auto
 Use `auto` whenever it is possible to use it; specify a type when you must.
 If the compiler can't figure out the auto, that's when you indicate the type.
 #### RAII will be in full force
 All variables, whether they're function-local, class-members, even globals,
 must always be under some protection at all times. There must be the
 expectation at *absolutely any point* including *between those points*
 everything will blow up randomly and the protection will be invoked to back-out
 the way you came. That is, essentially, **the juice of why we are here.**
 **This is really serious business.** You have to do one thing at a time. When you
 move on to the next thing the last thing has to have already fully succeeded
 or fully failed. Everything is a **transaction**. Nothing in the future exists.
 There is nothing you need from the future to give things a consistent state.
 * The program should be effectively reversible -- should be able to "go backwards"
 or "unwind" from any point. Think in terms of stacks, not linear procedures.
 This means when a variable, or member (a **resource**) first comes into scope,
 i.e. it is declared or accessible (**acquired**), it must be **initialized**
 to a completely consistent state at that point.
 >
 > Imagine pulling down a window shade to hide the sun. As you pull down, the canvas
 > unrolls from its spool at the top. Your goal is to hook the shade on to the nail
 > at the bottom of the window: that is reaching the return statement. If you slip
 > and let go, the shade will roll back up into the spool at the top: that is an
 > exception.
 >
 > What you can't do is prepare work on the way down which needs _any_ further pulling
 > to be in a consistent state and not leak. You might slip and let go at any time for
 > any reason. A `malloc()` on one line and a `free()` following it is an example of
 > requiring more pulling.
 >
 > Indeed slipping and letting go is an accident -- but the point is that *accidents
 > happen*. They're not always your fault, and many times are in other parts of the
 > code which are outside of your control. This is a good approach for robust and
 > durable code over long-lived large-scale projects.
 #### Exceptions will be used
 Wait, you were trolling "respect for C people" right? **No.** If you viewed
 the above section merely through the prism avoiding classic memory leaks, and
 can foresee how to now write stackful, reversible, protected programs without
 even calling free() or delete: you not only have earned the right, but you
 **have** to use exceptions. This is no longer a matter of arguing for or
 against `if()` statement clutter and checking return types and passing errors
 down the stack.
 * Object construction (logic in the initialization list, constructor body, etc)
 is actual real program logic. Object construction is not something to just
 prepare some memory, like initializing it to zero, leaving an instance
 somewhere for further functions to conduct operations on. Your whole program
 could be running - the entire universe could be running - in some member
 initializer somewhere. The only way to error out of this is to throw, and it
 is perfectly legitimate to do so.
 * Function bodies and return types should not be concerned with error
 handling and passing of such. They only cause and generate the errors.
 * Try/catch style note: We specifically discourage naked try/catch blocks.
 In other words, **most try-catch blocks are of the
 [function-try-catch](http://en.cppreference.com/w/cpp/language/function-try-block)
 variety.** The style is simply to piggyback the try/catch where another block
 would have been.
 > ```C++
 > while(foo) try
 > {
 >     ...
 > }
 > catch(exception)
 > {
 > }
 > ```
 * We extend this demotion style of keywords to `do` as well, which should
  avoid having its own line if possible.
 > ```C++
 > int x; do
 > {
 >     ...
 > }
 > while((x = foo());
 > ```
 #### Encapsulation will be relaxed
 To summarize, most structures will default to being fully public unless there
 is a very pressing reason to create a private section. Such a reason is not
 "the user *could* break something by touching this," instead it is "the user
 *will only ever* break something by touching this."
 * Do not use the keyword `class` unless your sole intent is to have the members
 immediately following it be private. Using `class` followed by a `public:`
 label is nubile.
 Note that public interfaces and private implementation patterns are still
 widely used and encouraged, even expected, but not purely using the C++
 language features. The intent here is to allow hacking on the project to be
 easy. We don't want to stifle creativity by getting in the way of developers
 implementing new ideas which do things that weren't originally intended.
 In practice, interfaces try to expose as much as possible, but require only
 a tiny surface by default for actual intended use.
 #### Pointers and References
 * The `&` or `*` prefixes the variable name; it does not postfix the type.
 This is evidenced by comma-delimited declarations. There is only one exception
 to this for universal references which is described later.
 > ```C++
 > int a, &b{a}, *c{&b}, *const d{&b}, *const *const e{&c};
 > ```
 * Biblical maxim: Use references when you can, pointers when you must.
 * Pass arguments by const reference `const foo &bar` preferably, non-const
 reference `foo &bar` if you must.
 * Use const references even if you're not referring to anything created yet.
 const references can construct, contain, and refer to an instance of the type
 with all in one magic. This style has no sympathy for erroneously expecting
 that a const reference is not a local construction; expert C++ developers
 do not make this error. See reasons for using a pointer below.
 * Passing by value indicates some kind of need for object construction in
 the argument, or that something may be std::move()'ed to and from it. Except
 for some common patterns, this is generally suspect.
 * Passing to a function with an rvalue reference argument `foo &&bar` indicates
 something will be std::move()'ed to it, and ownership is now acquired by that
 function.
 * In a function with a template `template<class foo>`, an rvalue reference in
 the prototype for something in the template `void func(foo &&bar)` is actually
 a [universal reference](https://isocpp.org/blog/2012/11/universal-references-in-c11-scott-meyers)
 which has some differences from a normal rvalue reference. To make this clear
 our style is to move the `&&` like so `void func(foo&& bar)`. This is actually
 useful because a variadic template foo `template<class... foo>` will require
 the prototype `void func(foo&&... bar)`.
 * Passing a pointer, or pointer arguments in general, indicates something may
 be null (optional), or to explicitly prevent local const construction which is
 a rare reason. Otherwise suspect.
 * Avoid using references as object members, you're most likely just limiting
 the ability to assign, move, and reuse the object because references cannot be
 reseated; then the "~~big three~~" "big five" custom constructors have to be
 created and maintained, and it becomes an unnecessary mess.
 #### Miscellaneous
 * Prefer "locality" rather than "centrality." In other words, we keep things
 in as local of a scope or file as possible to where it is used.
 * new and delete should rarely if ever be seen. This is more true than ever with
 C++14 std::make_unique() and std::make_shared().
 * We allow some C-style arrays, especially on the stack, even C99 dynamic sized ones;
 there's no problem here, just be responsible.
 * `alloca()` will not be used.
 * C format strings are still acceptable. This is an IRC project, with heavy
 use of strings and complex formats and all the stringencies. We even have
 our own custom *protocol safe* format string library, and that should be used
 where possible.
 * streams and standard streams are generally avoided in this project. We could have
 have taken the direction to customize C++'s stream interface to make it
 performant, but otherwise the streams are generally slow and heavy. Instead we
 chose a more classical approach with format strings and buffers -- but without
 sacrificing type safety with our RTTI-based fmt library.
 * ~~varargs are still legitimate.~~ There are just many cases when template
 varargs, now being available, are a better choice; they can also be inlined.
 	* Our template va_rtti is starting to emerge as a suitable replacement
 	for any use of varags.
 * When using a `switch` over an `enum` type, put what would be the `default` case after/outside
 of the `switch` unless the situation specifically calls for one. We use -Wswitch so changes to
 the enum will provide a good warning to update any `switch`.
 * Prototypes should name their argument variables to make them easier to understand, except if
 such a name is redundant because the type carries enough information to make it obvious. In
 other words, if you have a prototype like `foo(const std::string &message)` you should name
 `message` because std::string is common and *what* the string is for is otherwise opaque.
 OTOH, if you have `foo(const options &options, const std::string &message)` one should skip
 the name for `options &` as it just adds redundant text to the prototype.
 * Consider any code inside a runtime `assert()` statement to **entirely**
 disappear in optimized builds. If some implementations of `assert()` may only
 elide the boolean check and thus preserve the inner statement and the effects
 of its execution: this is not standard; we do not rely on this. Do not use
 `assert()` to check return values of statements that need to be executed in
 optimized builds.
 #### Comments
 * `/* */` Multi-line comments are not normally used. We reserve this for
 debugging and temporary multi-line grey-outs. The goal for rarely using this
 is to not impede anybody attempting to refactor or grey-out a large swath of
 code.
 * `//` Primary developer comment; used even on multiple lines.
 * `///` Documentation comment; the same style as the single line comment; the
 documentation is applied to code that follows the comment block.
 * `///<` Documentation comment; this documents code preceding the comment.
 ##### Documentation will be pedantic, windy and even patronizing
 This is considered a huge anti-pattern in most other contexts where comments
 and documentation are minimal, read by experts, end up being misleading, tend
 to diverge from their associated code after maintenance, etc. This project is
 an exception. Consider two things:
 1. This is a free and open source public internet project. The goal here
 is to make it easy for many-eyeballs to understand everything. Then,
 many-eyeballs can help fix comments which become misleading.
 2. Most free and open source public internet projects are written in C
 because C++ is complicated with a steep learning curve. It is believed
 C++ reduces the amount of many-eyeballs. A huge number of contributions
 to these projects come from people with limited experience working on
 their "first project."
 Therefor, writers of documentation will consider a reader which has
 encountered IRCd as their first project, specifically in C++. Patronizing
 explanations of common/standard C++ patterns and intricacies can be made.
 ### Art & Tableaux
 * Tab style is **tabs before spaces**. Tabs set an indentation level and
 then spaces format things *at that level*. This is one of the hardest styles
 to get right and then enforce, but it looks the best for everyone. The point
 here is that the tab-width becomes a personal setting -- nobody has to argue
 whether it's worth 2 or 4 or 8 spaces... Remember, tabs are never used to
 align things that would fall out of alignment if the tab-width changed.
 * Only one blank line at a time. While an entire section could be devoted to
 *where* to create whitespace, for now, just know to only use a single blank
 line to do so. There are ways to cheat. I am a huge fan of whitespace and I
 will share some of these ways. For example, a comment block may end in a
 line starting with `//` with no text after it. Combined with the allowed
 completely blank line after that you now have more whitespace.
 ### Conventions
 These are things you should know when mulling over the code as a whole. Knowing
 these things will help you avoid various gotchas and not waste your tim
 debugging little surprises. You may or may not agree with some of these
 choices (specifically the lack of choices in many cases) but that's why they're
 explicitly discussed here. Conventions are not laws: they can be ignored or
 overruled on a case basis. One should follow them by default.
 #### Null termination
 - We don't rely on null terminated strings. We always carry around two points
 of data to indicate such vectoring. Ideally this is a pair of pointers
 indicating the `begin`/`end` like an STL iterator range. `string_view` et al
 and the `buffer::` suite work this way.
 - Null terminated strings can still be used and we even still create them in
 many places on purpose just because we can.
 - Null terminated creations use the BSD `strl*` style and *not* the `strn*`
 style. Take note of this. When out of buffer space, such an `strl*` style
 will *always* add a null to the end of the buffer. Since we almost always
 have vectoring data and don't really need this null, a character of the string
 may be lost. This can happen when creating a buffer tight to the length of an
 expected string without a `+ 1`. This is actually the foundation of a case
 to move *back* to `strn*` style but it's not prudent at this time.
 - Anything named `print*` like `print(mutable_buffer, T)` always composes null
 terminated output into the buffer. These functions usually return a size_t
 which count characters printed *not including null*. They may return a
 `string_view`/`const_buffer` of that size (never viewing the null).
 #### Iteration protocols
 When not using STL-iterators, you may encounter some closure/callback-based
 iterator functions. Usually that's a `for_each()`. If we want to break out
 of the loop, our conventions are as follows:
 - *find protocol* for `find()` functions. The closure returns true to break
 the loop at that element, false to continue. The `find()` function itself
 then returns a pointer or reference to that element. If the end of the
 iteration is reached then a `find()` usually returns `nullptr` or throws an
 exception, etc.
 - *test protocol* for `test()` functions (this has nothing to do with unit-
 tests or development testing). This is the same logic as the find protocol
 except the `test()` function itself returns true if the closure broke the
 loop by returning true, or false if the end of the iteration was reached.
 - *until protocol* for `until()` functions. The closure "remains true 'till
 the end." When the end is reached, true is returned. The closure returns false
 to break the loop, and then false is returned from until() as well.
 Overloads of `for_each()` may be encountered accepting closures that return
 `void` and others that return `bool`. The `bool` overloads use the
 *until protocol* as that matches the same logic in a `for(; bool;)` loop.
 #### nothrow is not noexcept
 Often a function is overloaded with an std::nothrow_t argument or our
 util::nothrow overload template. This means the function **will not throw
 a specific exception expected from the overload alternative** (or set of
 exceptions, etc). Any exception may still come out of that nothrow overload;
 technically including the specific exception if it came from somewhere else!
 Use the noexcept keyword with tact, not by default. Most of the project
 propagates exceptions. Functions that handle their errors and are expected to
 return (i.e since they catch `std::exception`), still throw special exceptions
 like `ircd::ctx::terminated`. If the `catch(...)` and `noexcept` features are
 used: developers must cooperate by handling ctx interruptions and propagating
 terminations. This is not an issue on leaf and simple functions where we tend
 to make use of `noexcept`, especially for non-inlines allowing for better
 compiler optimizations to occur.
 #### Indications of yielding and IO's
 There is a section on how yielding and IO can occur far up the stack from a
 benign-looking callsite in ctx/README. We try to make comments to indicate
 these things directly in the definitions and certainly in documentation.
 Some of those indications may say nothing more than `[GET]` and `[SET]` without
 any other comment. That is the minimum acceptable marking for something which
 will likely do read or write IO respectively to disk or even the network. In
 any such case the ircd::ctx will definitely yield if that happens.
 #### Nothing ticks
 The project makes considerable use of userspace threads which may be spawned by
 various subsystems to perform tasks: some of those tasks tend to be performed at
 intervals or in some cases may require scanning data at an interval (i.e timeout
 check). Our style is to not wakeup a context (or similarly queue a callback in
 the plain event loop) for an empty dataset. In other words, when there is no
 work, the program should be entirely comatose and not woken up by the OS.
 For example: if you were to `strace(1)` construct and then pull the network
 cable: eventually there would be complete silence.
 ### Git / Development related
 Commits in this project tend to have a `prefix:` like `ircd::m:`. This is
 simply an indicator of where the change occurred. If multiple areas of the
 project are changed: first determine if the change in each area can stand on
 its own and break what you're doing into multiple commits; this is generally
 the case when adding a low-level feature to support something built at a higher
 level. Otherwise, prefix the commit with the largest/most-fundamental area
 being changed.
 - Prefixes tend to just be the namespace where the change is occurring.
 - Prefixes can be an actual class name if that class has a lot of nested
 assets and pretty much acts as a namespace.
 - Prefixes for changes in `modules/` where code is not in any namespace tend
 to be the path to the module i.e `modules/s_conf:` or `modules/client/sync:`
 - Prefixes for other areas of the project can just be the directory like `doc:`
 or `tools:` or `README:`
 Existing conventions for commit wording are documented here as follows:
 Generally after the prefix, the most frequent words a commit start with
 are "Add" "Fix" "Move" "Remove" and "Improve" and though it is not
 required, if you can classify what you're doing with one of those that
 is ideal.
 - The use of the word "minor" indicates that no application logic was
 affected by a commit: i.e code formatting changes and "minor cleanup" etc.
 - The use of the word "various" indicates many not-very-related changes
 or very spread-out changes: i.e "various fixes" etc; this tends not to be
 something one is proud of using.
 - The use of the word "checkpoint" indicates something sloppy and
 incomplete is being committed; it compiles and runs; there is a pressing
 need to get it out of the dirty head for the time being.
--- a/TUNING.md
+++ b/TUNING.md
@ -0,0 +1,97 @@
 ## TUNING
 This guide is intended for system administrators to optimize Construct and
 maximize its performance for their environment. This does not cover [BUILD](BUILD.md)
 tuning, and it is expected that Construct is already installed and the [SETUP](SETUP.md)
 has been completed.
 - Some instructions may reference Construct's configuration system. This is
 accessed via the administrator's console which can be reached by striking
 `ctrl-c (SIGINT)` and then using the `conf` command (see: `help conf`). The
 console can also be reached interactively through your preferred client in
 the `!control` room. Alternatively, configuration state can be manipulated
 directly through the `!conf` room. Configuration changes take effect as a
 result of state events sent to the `!conf` room, thus all aforementioned
 methods to change configuration are the same.
 - CHANGES TO CONFIGURATION ARE EFFECTIVE IMMEDIATELY. ERRONEOUS VALUES MAY
 CAUSE UNEXPECTED BEHAVIOR AND RESULT IN PROGRAM TERMINATION. CONFIGURATION
 ERRORS MAY ALSO PREVENT STARTUP. Please see the
 [TROUBLESHOOTING](TROUBLESHOOTING.md#recovering-from-broken-configurations)
 guide for how to recover from configuration errors.
 ### Event Cache Tuning
 Most of Construct's runtime footprint in RAM consists of a cache of Matrix
 events read from the database. The data in many of these events may be
 directly accessed for fundamental server operations; for example, a client's
 access-token and user information is stored with events in special server
 rooms. The event cache is a set of LRU (Least Recently Used) caches. The size
 of these caches should be tuned to at least the "working-set size" expected
 by the server. If these caches are too small, load will be placed
 on the next storage tier. For storage devices with poor random access
 characteristics it is important these caches cover the server's working-set
 size.
 To list the event cache information, try the following commands (example output
 shown):
 ```
 > db cache events *
 COLUMN                               PCT       HITS    MISSES    INSERT                     CACHED                   CAPACITY               INSERT TOTAL               LOCKED
 *                                 61.94%   18742243   3818637   3814446      1.41 GiB (1517280856)      2.28 GiB (2449473536)      4.46 GiB (4787594200)   4.41 MiB (4628512)
 ```
 ```
 > db cache events **
 COLUMN                               PCT       HITS    MISSES    INSERT                     CACHED                   CAPACITY               INSERT TOTAL               LOCKED
 content                           17.85%    2113271     85256     83255       22.85 MiB (23962992)     128.00 MiB (134217728)     569.37 MiB (597026848)           0.00 B (0)
 depth                             90.71%      11292     96431     96431       58.06 MiB (60876968)       64.00 MiB (67108864)       59.68 MiB (62575248)           0.00 B (0)
 event_id                           9.24%     191518    153523    153523         5.92 MiB (6202768)       64.00 MiB (67108864)     865.07 MiB (907093240)           0.00 B (0)
 origin_server_ts                  99.99%       9852    566483    566258       64.00 MiB (67103832)       64.00 MiB (67108864)     353.29 MiB (370455584)           0.00 B (0)
 room_id                           99.99%    1015939    216695    216694       63.99 MiB (67102496)       64.00 MiB (67108864)     132.05 MiB (138467768)   1.93 MiB (2019088)
 sender                            39.18%      56357     80879     80879       50.16 MiB (52592768)     128.00 MiB (134217728)       50.36 MiB (52809616)           0.00 B (0)
 state_key                         40.49%       7336     89035     87181       25.91 MiB (27171856)       64.00 MiB (67108864)     383.42 MiB (402049648)           0.00 B (0)
 type                              99.92%    1716885     66485     66485       31.97 MiB (33527264)       32.00 MiB (33554432)       40.69 MiB (42667312)           0.00 B (0)
 _event_idx                        99.99%     652575    505956    505955     255.98 MiB (268418416)     256.00 MiB (268435456)     635.40 MiB (666268064)    23.45 KiB (24016)
 _room_events                      62.14%     308312     13144     13144       79.54 MiB (83405864)     128.00 MiB (134217728)       79.73 MiB (83608112)  284.73 KiB (291560)
 _room_joined                      52.73%    2087968      6789      6789         4.22 MiB (4422936)         8.00 MiB (8388608)         4.23 MiB (4431280)           0.00 B (0)
 _room_state                       25.40%    2038549     21590     21590       16.25 MiB (17044504)       64.00 MiB (67108864)       52.26 MiB (54793600)           0.00 B (0)
 _room_head                        26.41%       7986      9435      9435         2.11 MiB (2215192)         8.00 MiB (8388608)       37.56 MiB (39389688)           0.00 B (0)
 _event_json                       62.79%      82254   1166164   1166153     642.96 MiB (674189112)   1024.00 MiB (1073741824)     736.76 MiB (772552224)   3.52 MiB (3690824)
 _event_refs                       79.17%      54501    112508    112505       50.67 MiB (53127080)       64.00 MiB (67108864)       68.76 MiB (72098088)           0.00 B (0)
 _event_type                       99.77%         22      8215      8215       15.96 MiB (16738848)       16.00 MiB (16777216)       17.27 MiB (18109240)    73.93 KiB (75704)
 _event_sender                      0.00%          0     23453     23453                 0.00 B (0)       16.00 MiB (16777216)       15.01 MiB (15739768)           0.00 B (0)
 _event_horizon                    99.96%      15722     18296     18296       15.99 MiB (16769768)       16.00 MiB (16777216)       18.91 MiB (19833200)           0.00 B (0)
 _room_state_space                 67.24%       3997     24712     24712       86.06 MiB (90241400)     128.00 MiB (134217728)       92.28 MiB (96762256)           0.00 B (0)
 ```
 To view the configuration item for the size of a cache, which should match your
 output from the above command, use the following command where `<COLUMN>` is
 replaced by one of the names under `COLUMN` in the above output:
 ```
 conf ircd.m.dbs.<COLUMN>.cache.size
 ```
 To alter a cache size, set the configuration item with a byte value. In the
 example below we will set the `_event_json` cache size to 256 MiB. This change
 will take effect immediately and the cache will grow or shrink to that size.
 ```
 conf set ircd.m.dbs._event_json.cache.size 268435456
 ```
 > Tip: The best metric to figure out which caches are inadequate is not
 necessarily the utilization percentage. Caches that are too small generally
 exhibit high values under `INSERT TOTAL` as well as full utilization. If this
 value is several times higher than the cache size and growing, consider
 increasing that cache's size.
 ### Client Pool Tuning
 (TODO)
--- a/Troubleshooting-problems.md
+++ b/Troubleshooting-problems.md
@ -0,0 +1,70 @@
 # TROUBLESHOOTING
 ##### Useful program options
 Start the daemon with one or more of the following program options to make it
 easier to troubleshoot and perform maintenance:
 - *-single* will start in "single user mode" which is a convenience combination
 of *-nolisten -wa -console* options described below.
 - *-nolisten* will disable the loading of any listener sockets during startup.
 - *-wa* write-avoid will discourage (but not deny) writes to the database. This
 prevents a lot of background tasks and other noise for any maintenance.
 - *-console* convenience to immediately drop to the adminstrator console
 after startup.
 - *-debug* enables full debug log output.
 ##### Recovering from broken configurations
 If your server ever fails to start from an errant conf item: you can override
 any item using an environmental variable before starting the program. To do
 this simply replace the '.' characters with '_' in the name of the item when
 setting it in the environment. The name is otherwise the same, including its
 lower case.
 Otherwise, the program can be run with the option `-defaults`. This will
 prevent initial loading of the configuration from the database. It will
 not prevent environmental variable overrides (as mentioned above). Values
 will not be written back to the database unless they are explicitly set by
 the user in the console.
 ##### Recovering from database corruption
 In very rare cases after a hard crash the journal cannot completely restore
 data before the crash. Due to the design of rocksdb and the way we apply it
 for Matrix, data is lost in chronological order starting from the most recent
 transaction (matrix event). The database is consistent for all events up until
 the first corrupt event, called the point-in-time.
 When any loss has occurred the daemon will fail to start normally. To enable
 point-in-time recovery use the command-line option `-pitrecdb` at the next
 invocation.
 ##### Trouble with reverse proxies and middlewares
 Construct is designed to be capable internet service software and should
 perform best when directly interfacing with remote parties. Nevertheless,
 some users wish to employ middlewares known as "reverse-proxies" through
 which all communication is forwarded. This gives the appearance, from the
 server's perspective, that all clients are connecting from the same IP
 address on different ports.
 At this time there are some known issues with reverse proxies which may be
 mitigated by administrators having reviewed the following:
 1. The connection limit from a single remote IP address must be raised from
 its default, for example by entering the following in !control or console:
 ```
 conf set ircd.client.max_client_per_peer 65535
 ```
 2. The server does not yet support non-SSL listening sockets. Administrators
 may have to generate locally signed certificates for communication from the
 reverse-proxy.