From deaea72f9abe756b5dca232f319be167de3ae4ea Mon Sep 17 00:00:00 2001 From: Jason Volk Date: Wed, 25 Oct 2017 09:26:25 -0700 Subject: [PATCH] ircd::m: Update README. --- include/ircd/m/README.md | 224 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 224 insertions(+) diff --git a/include/ircd/m/README.md b/include/ircd/m/README.md index 0b331a516..86069e76a 100644 --- a/include/ircd/m/README.md +++ b/include/ircd/m/README.md @@ -98,3 +98,227 @@ room. Rooms may contain events of any type, but we don't invent new `m.room.*` type events ourselves. This project tends to create events in the namespace `ircd.*` These events should not alter the room's functionality for a client with knowledge of only the published `m.room.*` events wouldn't understand. + + +#### Coherence + +Matrix is specified as a directed acyclic graph of messages. The conversation of +messages moves in one direction: past to future. Messages only reference other +messages which have a lower degree of separation (depth) from the first message +in the graph (m.room.create). Specifically, each message makes a reference to all +known messages at the last depth. + +* The strong ordering of this system contributes to an intuitive "light cone" +read coherence. Knowledge of any piece of information (like an event) offers +coherent knowledge of all known information which preceded it at that point. + +* Write consistency is relaxed. Multiple messages may be issued at the same +depth from independent actors and multiple reference chains may form +independent of others. This provides the scalar for performance in a large +distributed internet system. + +* Write incoherence must then be resolved with entry consistency because of the +relaxed release sequence. While parties broadcast all of their new messages, +they make no guarantees for their arrival integration with destinations at +the point of release. This wouldn't be as practical. This means a write which +wishes to be coherent can only use the best available state they have been +made aware of and commit a new message to it. + +The system has no other method of resolving incoherence. As a future thought, +some form of release commitment will have to be integrated among at least a +subset of actors for a few important updates to the graph. For example, a +two-phase commit of an important state event *or the re-introduction of +the classic IRC mode change indicating a commitment to change state.* + + +References to previous events: + + [A0] <-- [A1] <-- [A2] | A has seen B1 and includes a reference in A2 + ^ | + | <---<----< + | | + ^------ [B1] <-- [B2] | B hasn't yet seen A1 or A2 + +[T0] A release A0 : +[T1] A release A1 : B acquire A0 +[T2] : B release B1 +[T3] A acquire B1 : B release B2 +[T4] A release A2 : + +Both actors will have their clock (depth) now set to 2 and will issue the +next new message at clock cycle 3 referencing all messages from cycle 2 to +merge the split in the illustration above which is happening. + + [A0] <-- [A1] <-- [A2] [A4] | A now sees B3, B2, and B1 + ^ | | | + | <---<----< ^--<--< <--< + | | | | + ^------- [B1] <-- [B2] <-- [B3] | B now sees A2, A1, and A0 + +### Implementation + +This is a single-writer/multiple-reader approach. The "core" is the only writer. +The write itself is just the saving of an event. This serves as a transaction +advancing the state of the machine with effects visible to all future +transactions and external actors. + +The core takes the pattern of +`evaluate + exclude -> write commitment -> release sequence`. The single +writer approach means that we resolve all incoherence using exclusion or +reordering or rejection on entry and before any writing and release of the +event. Many ircd::ctx's can orbit the inner core resolving their evaluation +with the tightest exclusion occurring around the write at the inner core. +This also gives us the benefit of a total serialization at this point. + + ::::::: + ||||||| <-- evaluation + rejection + \|/ <-- evaluation + exclusion / reordering + ! + * <-- actor serialized core write commitment + //|||\\ + //|// \\|\\ + ::::::::::::: <-- release sequence propagation cone + +The evaluation phase ensures the event commitment will work: that the event +is valid, and that the event is a valid transition of the machine according +to the rules. This process may take some time and many yields and IO, even +network IO -- if the server lacks a warm cache. During the evaluation phase +locks and exclusions may be acquired to maintain the validity of the +evaluation state through writing at the expense of other contexts contending +for that resource. + +> Many ircd::ctx are concurrently working their way through the core. The +> "velocity" is low when an ircd::ctx on this path may yield a lot for various +> IO and allow other events to be processed. The velocity increases when +> concurrent evaluation and reordering is no longer viable to maintain +> coherence. Any yielding of an ircd::ctx at a higher velocity risks stalling +> the whole core. + + ::::::: <-- event input (low velocity) + ||||||| <-- evaluation process (low velocity) + \|/ <-- serialization process (higher velocity) + +The write commitment saves the event to the database. This is a relatively +fast operation which probably won't even yield the ircd::ctx, and all +future reads to the database will see this write. + + ! <-- serial write commitment (highest velocity) + +The release sequence broadcasts the event so its effects can be consumed. +This works by yielding the ircd::ctx so all consumers can view the event +and apply its effects for their feature module or send the event out to +clients. This is usually faster than it sounds, as the consumers try not to +hold up the release sequence for more than their first execution-slice, +and copy the event if their output rate is slower. + + * <-- event revelation (higher velocity) + //|||\\ + //|// \\|\\ + ::::::::::::: <-- release sequence propagation cone (low velocity) + +The entire core commitment process relative to an event riding through it +on an ircd::ctx has a duration tolerable for something like a REST interface, +so the response to the user can wait for the commitment to succeed or fail +and properly inform them after. + +The core process is then optimized by the following facts: + + * The resource exclusion zone around most matrix events is either + small or non-existent because of its relaxed write consistency. + + * Writes in this implementation will not delay. + +"Core dilation" is a phenomenon which occurs when large numbers of events +which have relaxed dependence are processed concurrently because none of +them acquire any exclusivity which impede the others. + + ::::::: + ||||||| + ||||||| <-- Core dilation; flow shape optimized for volume. + ||||||| + /|||||\ + ///|||\\\ + //|/|||\|\\ + ::::::::::::: + +Close up of the charybdis's write head when tight to one schwarzschild-radius of +matrix room surface which propagates only one event through at a time. +Vertical tracks are contexts on their journey through each evaluation and exclusion +step to the core. + + Input Events Phase + :::::::::::::::::::::::::::::::::::::::::::::::::::::: validation / dupcheck + |||||||||||||||||||||||||||||||||||||||||||||||||||||| identity/key resolution + |||||||||||||||||||||||||||||||||||||||||||||||||||||| verification + |||| ||||||||||||||| ||||||||||||||| ||||||||||||||||| head resolution + --|--|----|-|---|--|--|---|---|---|---------|---|---|- graph resolutions + ----------|-|---|---------|-------|-----------------|- module evaluations + \ | | | | / + == ==============| | == Lowest velocity locks + \ | | / + == | | == Mid velocity locks + \ | | / + == | / == High velocity locks + \ | / / + == =====/= == Highest velocity lock + \ / / + \__ / __/ + _ | _ + ! Write commitment + + +Above, two contexts are illustrated as contending for the highest velocity +lock. The highest velocity lock is not held for significant time, as the +holder has very little work left to be done within the core, and will +release the lock to the other context quickly. The lower velocity locks +may have to be held longer, but are also less exclusive to all contexts. + + * Singularity + [ ] + /-------------[---]-------------\ + / : : \ Federation send + / /---------[---]---------\ \ + / : : \ Client sync + out / /------[---]------\ \ out + / / : : \ \ + / out / | | \ out \ + / out / \ out \ + / \ + return + | result to | + | evaluator | + ------------- + +Above, a close-up of the release sequence. The new event is being "viewed" by +each consumer context separated by the horizontal lines representing a context +switch from the perspective of the event travelling down. Each consumer +performs its task for how to propagate the commissioned event. + +Each consumer has a shared-lock of the event which will hold up the completion +of the commitment until all consumers release that. The ideal consumer will only +hold their lock for a single context-slice while they play their part in applying +the event, like non-blocking copies to sockets etc. These consumers then go on +to do the rest of their output without the original event data which was memory +supplied by the evaluator (like an HTTP client). Then all locks acquired on +the entry side of the core can be released. The evaluator then gets the result +of the successful commitment. + +#### Scaling + +Scaling beyond the limit of a single CPU core can be done with multiple instances +of IRCd which form a cluster of independent actors. This cluster can extend +to other machines on the network too. The independent actors leverage the weak +write consistency and strong ordering of the matrix protocol to scale the same +way the federation scales. + +Interference pattern of two IRCd'en: + + :::::::::::::::::::::::::::::::::::: + --------\:::::::/--\:::::::/-------- + ||||||| ||||||| + \|/ \|/ + ! ! + * * + //|||\\ //|||\\ + //|// \\|\\//|// \\|\\ + /|/|/|\|\|\/|/|/|\|\|\|\