From 78c4c2fb374c88b83f60648a2f71bc5ffcca8b11 Mon Sep 17 00:00:00 2001 From: Jason Volk Date: Thu, 20 Dec 2018 11:03:01 -0800 Subject: [PATCH] doc: Reorg some documentation. --- doc/ARCHITECTURE.md | 106 ++++++++++++++++++++++++++++++++++++++++ ircd/README.md | 114 +++++++------------------------------------- 2 files changed, 124 insertions(+), 96 deletions(-) create mode 100644 doc/ARCHITECTURE.md diff --git a/doc/ARCHITECTURE.md b/doc/ARCHITECTURE.md new file mode 100644 index 000000000..d99078b2d --- /dev/null +++ b/doc/ARCHITECTURE.md @@ -0,0 +1,106 @@ +# Architectural Philosophy + +### libircd + +##### Single-threaded✝ + +The design of `libircd` is fully-asynchronous, single-thread-oriented. No code +in the library _blocks_ the process. All operations are conducted on top of +a single `boost::asio::io_service` which must be supplied by the executable +linking to `libircd`. That `io_service` must be orchestrated by the executable +at its discretion; typically the embedder's call to `ios.run()` is the only +place the process will _block_. + +> Applications are limited by one or more of the following bounds: +> - Computing: program is limited by the efficiency of the CPU over time. +> - Space: program is limited by the space available for its dataset. +> - I/O: program is limited by external events, disks, and networks. +> +> `libircd` is dominated by the **I/O bound**. + +Its design is heavily optimized for this assumption with its single-thread +orientation. This methodology ensures there is an _uninterrupted_, +_uncontended_, _predictable_ execution which is easy for developers to +reason about intuitively with sequential-consistency in a cooperative +coroutine model. If there are periods of execution which are computationally +intense like parsing, hashing, cryptography, etc: this is absorbed in lieu of +thread synchronization and bus contention. + +This system achieves scale through running multiple independent instances which +synchronize at the application-logic level with message passing. + +✝ However, do not assume a truly threadless execution for the entire address +space. If there is ever a long-running background computation or a call to a +3rd party library which will do IO and block the event loop, we may use an +additional `std::thread` to "offload" such an operation. Thus we do have a +threading model, but it is heterogeneous. + +##### Introduces userspace threading✝ + +IRCd presents an interface introducing stackful coroutines, a.k.a. userspace +context switching, a.k.a. green threads, a.k.a. fibers. The library avoids callbacks +as the way to break up execution when waiting for events. Instead, we harken back +to the simple old ways of synchronous programming where control flow and data are +easy to follow. + +✝ If there are certain cases where we don't want a stack to linger which may +jeopardize the c10k'ness of the daemon the asynchronous pattern is still used. + +##### Can be embedded in your application with very minimal overhead. + +Linking to libircd from your executable allows you to customize and extend the +functionality of the server and have control over its execution, or, simply use +library routines provided by the library without any daemonization. + +##### Runs only one server at a time. + +Keeping with the spirit of simplicity of the original architecture, `libircd` +continues to be a "singleton" object which uses globals and keeps actual server +state in the library itself. In other words, **only one IRC daemon can exist +within a process's address space at a time.** Whether or not this was a pitfall +of the original design, it has emerged over the decades as a very profitable +decision for making IRCd an accessible open source internet project. + +##### Leverages formal grammars + +We utilize the `boost::spirit` system of parsing and printing through formal grammars, +rather than writing our own parsers manually. In addition, we build several tools +on top of such formal devices like a type-safe format string library acting as a +drop-in for `::sprintf()`, but accepting objects like `std::string` without `.c_str()` +and prevention of outputting unprintable/unwanted characters that may have been +injected into the system somewhere prior. + +##### Modular design + +`libircd` is designed specifically as a shared object library. The purpose of its +shared'ness is to facilitate IRCd's modular design: IRCd ships with many other +shared objects which introduce the "business logic" and features of the daemon. If +`libircd` was not a shared object, every single module would have to include large +amounts of duplicate code drawn from the static library. This would be a huge drag +on both compilation and the runtime performance. + +``` + (module) (module) + | | + | | + V V + |-------------| +---------------------- | | < ---- (module) +| | | | +| User's executable | <---- | libircd | +| | | | +---------------------- | | < ---- (module) + |-------------| + ^ ^ + | | + | | + (module) (module) + +``` + +The user (which we may also refer to as the "embedder" elsewhere in +documentation) only deals directly with `libircd` and not the modules. +`libircd` is generally loaded with its symbols bound globally in the executable +and on most platforms cannot be unloaded (or even loaded) manually and has not +been tested to do so. As an aside, we do not summarily dismiss the idea of +reload capability and would like to see it made possible. diff --git a/ircd/README.md b/ircd/README.md index 5d86172a9..68980ee78 100644 --- a/ircd/README.md +++ b/ircd/README.md @@ -8,106 +8,28 @@ which may introduce the actual application features (or the "business logic") of the server. > The executable linking and invoking `libircd` may be referred to as the -"embedding" or just the "executable" interchangably in this documentation. +"embedding" or "user" or "executable" interchangably in this documentation. -##### libircd is single-threaded✝ +## Organization -The design of `libircd` is fully-asynchronous, single-thread-oriented. No code -in the library _blocks_ the process. All operations are conducted on top of -a single `boost::asio::io_service` which must be supplied by the executable -linking to `libircd`. That `io_service` must be orchestrated by the executable -at its discretion; typically the embedder's call to `ios.run()` is the only -place the process will _block_. +##### Implied #include -> Applications are limited by one or more of the following bounds: -- Computing: program is limited by the efficiency of the CPU over time. -- Space: program is limited by the space available for its dataset. -- I/O: program is limited by external events, disks, and networks. +The `ircd.h` [standard include group](../include/ircd#what-to-include) +is pre-compiled and included *first* by default for every compilation unit in +this directory. Developers do not have to worry about including project headers +in a compilation unit, especially when creating and reorganizing either of them. -`libircd` is dominated by the **I/O bound**. Its design is heavily optimized -for this assumption with its single-thread orientation. This methodology -ensures there is an _uninterrupted_, _uncontended_, _predictable_ execution -which is easy for developers to reason about intuitively with -sequential-consistency in a cooperative coroutine model. +Note that because `ircd.h` is include _above_ any manually included header, +there is a theoretical possibility for a conflict. We make a serious effort +to prevent `ircd.h` from introducing pollution outside of our very specific +namespaces (see: [Project Namespaces](../include/ircd#project-namespaces)). -If there are periods of execution which are computationally intense like -parsing, hashing, cryptography, etc: this is absorbed in lieu of thread -synchronization and bus contention. This system achieves scale through running -multiple independent instances which synchronize at the application-logic -level. +##### Dependency Isolation -✝ However, don't start assuming a truly threadless execution for the entire -address space. If there is ever a long-running background computation or a call -to a 3rd party library which will do IO and block the event loop, we may use an -additional `std::thread` to "offload" such an operation. Thus we do have -a threading model, but it is heterogeneous. +Compilation units are primarily oriented around the inclusion of a specific +dependency which is not involved in the [ircd.h include group](../include/ircd#what-to-include). -##### libircd introduces userspace threading✝ - -IRCd presents an interface introducing stackful coroutines, a.k.a. userspace -context switching, a.k.a. green threads, a.k.a. fibers. The library avoids callbacks -as the way to break up execution when waiting for events. Instead, we harken back -to the simple old ways of synchronous programming where control flow and data are -easy to follow. - -✝ If there are certain cases where we don't want a stack to linger which may -jeopardize the c10k'ness of the daemon the asynchronous pattern is still used. - -##### libircd can be embedded in your application with very minimal overhead. - -Linking to libircd from your executable allows you to customize and extend the -functionality of the server and have control over its execution, or, simply use -library routines provided by the library without any daemonization. - -##### libircd runs only one server at a time. - -Keeping with the spirit of simplicity of the original architecture, `libircd` -continues to be a "singleton" object which uses globals and keeps actual server -state in the library itself. In other words, **only one IRC daemon can exist -within a process's address space at a time.** Whether or not this was a pitfall -of the original design, it has emerged over the decades as a very profitable -decision for making IRCd an accessible open source internet project. - -##### libircd leverages formal grammars - -We utilize the `boost::spirit` system of parsing and printing through formal grammars, -rather than writing our own parsers manually. In addition, we build several tools -on top of such formal devices like a type-safe format string library acting as a -drop-in for `::sprintf()`, but accepting objects like `std::string` without `.c_str()` -and prevention of outputting unprintable/unwanted characters that may have been -injected into the system somewhere prior. - -### Overview - -`libircd` is designed specifically as a shared object library. The purpose of its -shared'ness is to facilitate IRCd's modular design: IRCd ships with many other -shared objects which introduce the "business logic" and features of the daemon. If -`libircd` was not a shared object, every single module would have to include large -amounts of duplicate code drawn from the static library. This would be a huge drag -on both compilation and the runtime performance. - -``` - (module) (module) - | | - | | - V V - |-------------| ----------------------- | | < ---- (module) -| | | | -| User's executable | <---- | libircd | -| | | | ----------------------- | | < ---- (module) - |-------------| - ^ ^ - | | - | | - (module) (module) - -``` - -The user (which we may also refer to as the "embedder" elsewhere in -documentation) only deals directly with `libircd` and not the modules. -`libircd` is generally loaded with its symbols bound globally in the executable -and on most platforms cannot be unloaded (or even loaded) manually and has not -been tested to do so. As an aside, we do not summarily dismiss the idea of -reload capability and would like to see it made possible. +For example, the `magic.cc` unit was created to include `` +internally and then provide definitions to our own interfaces in `ircd.h`. We +don't include `` from `ircd.h` nor do we include it from +any other compilation unit. This simply isolates `libmagic` as a dependency.