0
0
Fork 0
mirror of https://github.com/matrix-construct/construct synced 2024-09-23 08:58:53 +02:00

doc: Reorg some documentation.

This commit is contained in:
Jason Volk 2018-12-20 11:03:01 -08:00
parent c1779fbf0d
commit 78c4c2fb37
2 changed files with 124 additions and 96 deletions

106
doc/ARCHITECTURE.md Normal file
View file

@ -0,0 +1,106 @@
# Architectural Philosophy
### libircd
##### Single-threaded✝
The design of `libircd` is fully-asynchronous, single-thread-oriented. No code
in the library _blocks_ the process. All operations are conducted on top of
a single `boost::asio::io_service` which must be supplied by the executable
linking to `libircd`. That `io_service` must be orchestrated by the executable
at its discretion; typically the embedder's call to `ios.run()` is the only
place the process will _block_.
> Applications are limited by one or more of the following bounds:
> - Computing: program is limited by the efficiency of the CPU over time.
> - Space: program is limited by the space available for its dataset.
> - I/O: program is limited by external events, disks, and networks.
>
> `libircd` is dominated by the **I/O bound**.
Its design is heavily optimized for this assumption with its single-thread
orientation. This methodology ensures there is an _uninterrupted_,
_uncontended_, _predictable_ execution which is easy for developers to
reason about intuitively with sequential-consistency in a cooperative
coroutine model. If there are periods of execution which are computationally
intense like parsing, hashing, cryptography, etc: this is absorbed in lieu of
thread synchronization and bus contention.
This system achieves scale through running multiple independent instances which
synchronize at the application-logic level with message passing.
✝ However, do not assume a truly threadless execution for the entire address
space. If there is ever a long-running background computation or a call to a
3rd party library which will do IO and block the event loop, we may use an
additional `std::thread` to "offload" such an operation. Thus we do have a
threading model, but it is heterogeneous.
##### Introduces userspace threading✝
IRCd presents an interface introducing stackful coroutines, a.k.a. userspace
context switching, a.k.a. green threads, a.k.a. fibers. The library avoids callbacks
as the way to break up execution when waiting for events. Instead, we harken back
to the simple old ways of synchronous programming where control flow and data are
easy to follow.
✝ If there are certain cases where we don't want a stack to linger which may
jeopardize the c10k'ness of the daemon the asynchronous pattern is still used.
##### Can be embedded in your application with very minimal overhead.
Linking to libircd from your executable allows you to customize and extend the
functionality of the server and have control over its execution, or, simply use
library routines provided by the library without any daemonization.
##### Runs only one server at a time.
Keeping with the spirit of simplicity of the original architecture, `libircd`
continues to be a "singleton" object which uses globals and keeps actual server
state in the library itself. In other words, **only one IRC daemon can exist
within a process's address space at a time.** Whether or not this was a pitfall
of the original design, it has emerged over the decades as a very profitable
decision for making IRCd an accessible open source internet project.
##### Leverages formal grammars
We utilize the `boost::spirit` system of parsing and printing through formal grammars,
rather than writing our own parsers manually. In addition, we build several tools
on top of such formal devices like a type-safe format string library acting as a
drop-in for `::sprintf()`, but accepting objects like `std::string` without `.c_str()`
and prevention of outputting unprintable/unwanted characters that may have been
injected into the system somewhere prior.
##### Modular design
`libircd` is designed specifically as a shared object library. The purpose of its
shared'ness is to facilitate IRCd's modular design: IRCd ships with many other
shared objects which introduce the "business logic" and features of the daemon. If
`libircd` was not a shared object, every single module would have to include large
amounts of duplicate code drawn from the static library. This would be a huge drag
on both compilation and the runtime performance.
```
(module) (module)
| |
| |
V V
|-------------|
---------------------- | | < ---- (module)
| | | |
| User's executable | <---- | libircd |
| | | |
---------------------- | | < ---- (module)
|-------------|
^ ^
| |
| |
(module) (module)
```
The user (which we may also refer to as the "embedder" elsewhere in
documentation) only deals directly with `libircd` and not the modules.
`libircd` is generally loaded with its symbols bound globally in the executable
and on most platforms cannot be unloaded (or even loaded) manually and has not
been tested to do so. As an aside, we do not summarily dismiss the idea of
reload capability and would like to see it made possible.

View file

@ -8,106 +8,28 @@ which may introduce the actual application features (or the "business logic")
of the server.
> The executable linking and invoking `libircd` may be referred to as the
"embedding" or just the "executable" interchangably in this documentation.
"embedding" or "user" or "executable" interchangably in this documentation.
##### libircd is single-threaded✝
## Organization
The design of `libircd` is fully-asynchronous, single-thread-oriented. No code
in the library _blocks_ the process. All operations are conducted on top of
a single `boost::asio::io_service` which must be supplied by the executable
linking to `libircd`. That `io_service` must be orchestrated by the executable
at its discretion; typically the embedder's call to `ios.run()` is the only
place the process will _block_.
##### Implied #include <ircd.h>
> Applications are limited by one or more of the following bounds:
- Computing: program is limited by the efficiency of the CPU over time.
- Space: program is limited by the space available for its dataset.
- I/O: program is limited by external events, disks, and networks.
The `ircd.h` [standard include group](../include/ircd#what-to-include)
is pre-compiled and included *first* by default for every compilation unit in
this directory. Developers do not have to worry about including project headers
in a compilation unit, especially when creating and reorganizing either of them.
`libircd` is dominated by the **I/O bound**. Its design is heavily optimized
for this assumption with its single-thread orientation. This methodology
ensures there is an _uninterrupted_, _uncontended_, _predictable_ execution
which is easy for developers to reason about intuitively with
sequential-consistency in a cooperative coroutine model.
Note that because `ircd.h` is include _above_ any manually included header,
there is a theoretical possibility for a conflict. We make a serious effort
to prevent `ircd.h` from introducing pollution outside of our very specific
namespaces (see: [Project Namespaces](../include/ircd#project-namespaces)).
If there are periods of execution which are computationally intense like
parsing, hashing, cryptography, etc: this is absorbed in lieu of thread
synchronization and bus contention. This system achieves scale through running
multiple independent instances which synchronize at the application-logic
level.
##### Dependency Isolation
✝ However, don't start assuming a truly threadless execution for the entire
address space. If there is ever a long-running background computation or a call
to a 3rd party library which will do IO and block the event loop, we may use an
additional `std::thread` to "offload" such an operation. Thus we do have
a threading model, but it is heterogeneous.
Compilation units are primarily oriented around the inclusion of a specific
dependency which is not involved in the [ircd.h include group](../include/ircd#what-to-include).
##### libircd introduces userspace threading✝
IRCd presents an interface introducing stackful coroutines, a.k.a. userspace
context switching, a.k.a. green threads, a.k.a. fibers. The library avoids callbacks
as the way to break up execution when waiting for events. Instead, we harken back
to the simple old ways of synchronous programming where control flow and data are
easy to follow.
✝ If there are certain cases where we don't want a stack to linger which may
jeopardize the c10k'ness of the daemon the asynchronous pattern is still used.
##### libircd can be embedded in your application with very minimal overhead.
Linking to libircd from your executable allows you to customize and extend the
functionality of the server and have control over its execution, or, simply use
library routines provided by the library without any daemonization.
##### libircd runs only one server at a time.
Keeping with the spirit of simplicity of the original architecture, `libircd`
continues to be a "singleton" object which uses globals and keeps actual server
state in the library itself. In other words, **only one IRC daemon can exist
within a process's address space at a time.** Whether or not this was a pitfall
of the original design, it has emerged over the decades as a very profitable
decision for making IRCd an accessible open source internet project.
##### libircd leverages formal grammars
We utilize the `boost::spirit` system of parsing and printing through formal grammars,
rather than writing our own parsers manually. In addition, we build several tools
on top of such formal devices like a type-safe format string library acting as a
drop-in for `::sprintf()`, but accepting objects like `std::string` without `.c_str()`
and prevention of outputting unprintable/unwanted characters that may have been
injected into the system somewhere prior.
### Overview
`libircd` is designed specifically as a shared object library. The purpose of its
shared'ness is to facilitate IRCd's modular design: IRCd ships with many other
shared objects which introduce the "business logic" and features of the daemon. If
`libircd` was not a shared object, every single module would have to include large
amounts of duplicate code drawn from the static library. This would be a huge drag
on both compilation and the runtime performance.
```
(module) (module)
| |
| |
V V
|-------------|
---------------------- | | < ---- (module)
| | | |
| User's executable | <---- | libircd |
| | | |
---------------------- | | < ---- (module)
|-------------|
^ ^
| |
| |
(module) (module)
```
The user (which we may also refer to as the "embedder" elsewhere in
documentation) only deals directly with `libircd` and not the modules.
`libircd` is generally loaded with its symbols bound globally in the executable
and on most platforms cannot be unloaded (or even loaded) manually and has not
been tested to do so. As an aside, we do not summarily dismiss the idea of
reload capability and would like to see it made possible.
For example, the `magic.cc` unit was created to include `<magic.h>`
internally and then provide definitions to our own interfaces in `ircd.h`. We
don't include `<magic.h>` from `ircd.h` nor do we include it from
any other compilation unit. This simply isolates `libmagic` as a dependency.