From deaea72f9abe756b5dca232f319be167de3ae4ea Mon Sep 17 00:00:00 2001
From: Jason Volk <jason@zemos.net>
Date: Wed, 25 Oct 2017 09:26:25 -0700
Subject: [PATCH] ircd::m: Update README.

---
 include/ircd/m/README.md | 224 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 224 insertions(+)

diff --git a/include/ircd/m/README.md b/include/ircd/m/README.md
index 0b331a516..86069e76a 100644
--- a/include/ircd/m/README.md
+++ b/include/ircd/m/README.md
@@ -98,3 +98,227 @@ room. Rooms may contain events of any type, but we don't invent new `m.room.*`
 type events ourselves. This project tends to create events in the namespace
 `ircd.*` These events should not alter the room's functionality for a client
 with knowledge of only the published `m.room.*` events wouldn't understand.
+
+
+#### Coherence
+
+Matrix is specified as a directed acyclic graph of messages. The conversation of
+messages moves in one direction: past to future. Messages only reference other
+messages which have a lower degree of separation (depth) from the first message
+in the graph (m.room.create). Specifically, each message makes a reference to all
+known messages at the last depth.
+
+* The strong ordering of this system contributes to an intuitive "light cone"
+read coherence. Knowledge of any piece of information (like an event) offers
+coherent knowledge of all known information which preceded it at that point.
+
+* Write consistency is relaxed. Multiple messages may be issued at the same
+depth from independent actors and multiple reference chains may form
+independent of others. This provides the scalar for performance in a large
+distributed internet system.
+
+* Write incoherence must then be resolved with entry consistency because of the
+relaxed release sequence. While parties broadcast all of their new messages,
+they make no guarantees for their arrival integration with destinations at
+the point of release. This wouldn't be as practical. This means a write which
+wishes to be coherent can only use the best available state they have been
+made aware of and commit a new message to it.
+
+The system has no other method of resolving incoherence. As a future thought,
+some form of release commitment will have to be integrated among at least a
+subset of actors for a few important updates to the graph. For example, a
+two-phase commit of an important state event *or the re-introduction of
+the classic IRC mode change indicating a commitment to change state.*
+
+
+References to previous events:
+
+ [A0] <-- [A1] <-- [A2]       | A has seen B1 and includes a reference in A2
+  ^                 |
+  |        <---<----<
+  |        |
+  ^------ [B1] <-- [B2]       | B hasn't yet seen A1 or A2
+
+[T0]  A release A0  :
+[T1]  A release A1  :  B acquire A0
+[T2]                :  B release B1
+[T3]  A acquire B1  :  B release B2
+[T4]  A release A2  :
+
+Both actors will have their clock (depth) now set to 2 and will issue the
+next new message at clock cycle 3 referencing all messages from cycle 2 to
+merge the split in the illustration above which is happening.
+
+ [A0] <-- [A1] <-- [A2]            [A4]      | A now sees B3, B2, and B1
+ ^                 |  |            |
+ |        <---<----<  ^--<--<   <--<
+ |        |                 |   |
+ ^------- [B1] <-- [B2] <-- [B3]             | B now sees A2, A1, and A0
+
+### Implementation
+
+This is a single-writer/multiple-reader approach. The "core" is the only writer.
+The write itself is just the saving of an event. This serves as a transaction
+advancing the state of the machine with effects visible to all future
+transactions and external actors.
+
+The core takes the pattern of
+`evaluate + exclude -> write commitment -> release sequence`. The single
+writer approach means that we resolve all incoherence using exclusion or
+reordering or rejection on entry and before any writing and release of the
+event. Many ircd::ctx's can orbit the inner core resolving their evaluation
+with the tightest exclusion occurring around the write at the inner core.
+This also gives us the benefit of a total serialization at this point.
+
+       :::::::
+       |||||||      <-- evaluation + rejection
+         \|/        <-- evaluation + exclusion / reordering
+          !
+          *         <-- actor serialized core write commitment
+       //|||\\
+     //|// \\|\\
+    :::::::::::::   <-- release sequence propagation cone
+
+The evaluation phase ensures the event commitment will work: that the event
+is valid, and that the event is a valid transition of the machine according
+to the rules. This process may take some time and many yields and IO, even
+network IO -- if the server lacks a warm cache. During the evaluation phase
+locks and exclusions may be acquired to maintain the validity of the
+evaluation state through writing at the expense of other contexts contending
+for that resource.
+
+> Many ircd::ctx are concurrently working their way through the core. The
+> "velocity" is low when an ircd::ctx on this path may yield a lot for various
+> IO and allow other events to be processed. The velocity increases when
+> concurrent evaluation and reordering is no longer viable to maintain
+> coherence. Any yielding of an ircd::ctx at a higher velocity risks stalling
+> the whole core.
+
+       :::::::       <-- event input              (low velocity)
+       |||||||       <-- evaluation process       (low velocity)
+         \|/         <-- serialization process    (higher velocity)
+
+The write commitment saves the event to the database. This is a relatively
+fast operation which probably won't even yield the ircd::ctx, and all
+future reads to the database will see this write.
+
+          !          <-- serial write commitment  (highest velocity)
+
+The release sequence broadcasts the event so its effects can be consumed.
+This works by yielding the ircd::ctx so all consumers can view the event
+and apply its effects for their feature module or send the event out to
+clients. This is usually faster than it sounds, as the consumers try not to
+hold up the release sequence for more than their first execution-slice,
+and copy the event if their output rate is slower.
+
+          *         <-- event revelation (higher velocity)
+       //|||\\
+     //|// \\|\\
+    :::::::::::::   <-- release sequence propagation cone (low velocity)
+
+The entire core commitment process relative to an event riding through it
+on an ircd::ctx has a duration tolerable for something like a REST interface,
+so the response to the user can wait for the commitment to succeed or fail
+and properly inform them after.
+
+The core process is then optimized by the following facts:
+
+	* The resource exclusion zone around most matrix events is either
+	  small or non-existent because of its relaxed write consistency.
+
+	* Writes in this implementation will not delay.
+
+"Core dilation" is a phenomenon which occurs when large numbers of events
+which have relaxed dependence are processed concurrently because none of
+them acquire any exclusivity which impede the others.
+
+       :::::::
+       |||||||
+       |||||||   <-- Core dilation; flow shape optimized for volume.
+       |||||||
+       /|||||\
+      ///|||\\\
+     //|/|||\|\\
+    :::::::::::::
+
+Close up of the charybdis's write head when tight to one schwarzschild-radius of
+matrix room surface which propagates only one event through at a time.
+Vertical tracks are contexts on their journey through each evaluation and exclusion
+step to the core.
+
+    Input Events                                            Phase
+    ::::::::::::::::::::::::::::::::::::::::::::::::::::::  validation / dupcheck
+    ||||||||||||||||||||||||||||||||||||||||||||||||||||||  identity/key resolution
+    ||||||||||||||||||||||||||||||||||||||||||||||||||||||  verification
+    |||| ||||||||||||||| ||||||||||||||| |||||||||||||||||  head resolution
+    --|--|----|-|---|--|--|---|---|---|---------|---|---|-  graph resolutions
+    ----------|-|---|---------|-------|-----------------|-  module evaluations
+     \          |   |         |       |                  /
+       ==       ==============|       |               ==    Lowest velocity locks
+          \                   |       |             /
+            ==                |       |          ==         Mid velocity locks
+               \              |       |        /
+                 ==           |      /      ==              High velocity locks
+                    \         |     /     /
+                      ==      =====/=  ==                   Highest velocity lock
+                         \       /   /
+                          \__   / __/
+                             _ | _
+                               !                            Write commitment
+
+
+Above, two contexts are illustrated as contending for the highest velocity
+lock. The highest velocity lock is not held for significant time, as the
+holder has very little work left to be done within the core, and will
+release the lock to the other context quickly. The lower velocity locks
+may have to be held longer, but are also less exclusive to all contexts.
+
+                               *                            Singularity
+                             [   ]
+               /-------------[---]-------------\
+            /                :   :                \         Federation send
+         /         /---------[---]---------\         \
+                 /           :   :           \              Client sync
+        out    /      /------[---]------\      \   out
+             /       /       :   :       \       \
+           /   out  /        |   |        \  out   \
+                   /   out   /   \   out   \
+                            /     \
+                            return
+                         | result to |
+                         | evaluator |
+                         -------------
+
+Above, a close-up of the release sequence. The new event is being "viewed" by
+each consumer context separated by the horizontal lines representing a context
+switch from the perspective of the event travelling down. Each consumer
+performs its task for how to propagate the commissioned event.
+
+Each consumer has a shared-lock of the event which will hold up the completion
+of the commitment until all consumers release that. The ideal consumer will only
+hold their lock for a single context-slice while they play their part in applying
+the event, like non-blocking copies to sockets etc. These consumers then go on
+to do the rest of their output without the original event data which was memory
+supplied by the evaluator (like an HTTP client). Then all locks acquired on
+the entry side of the core can be released. The evaluator then gets the result
+of the successful commitment.
+
+#### Scaling
+
+Scaling beyond the limit of a single CPU core can be done with multiple instances
+of IRCd which form a cluster of independent actors. This cluster can extend
+to other machines on the network too. The independent actors leverage the weak
+write consistency and strong ordering of the matrix protocol to scale the same
+way the federation scales.
+
+Interference pattern of two IRCd'en:
+
+  ::::::::::::::::::::::::::::::::::::
+  --------\:::::::/--\:::::::/--------
+           |||||||    |||||||
+             \|/        \|/
+              !          !
+              *          *
+           //|||\\    //|||\\
+         //|// \\|\\//|// \\|\\
+        /|/|/|\|\|\/|/|/|\|\|\|\