terminal/doc/specs/#5000 - Process Model 2.0/#5000 - Process Model 2.0.md
Mike Griese 207f15498f
Spec for Windows Terminal Window Management (#8135)
### ⇒ [doc link](https://github.com/microsoft/terminal/blob/dev/migrie/s/4472-window-management/doc/specs/%235000%20-%20Process%20Model%202.0/%234472%20-%20Windows%20Terminal%20Session%20Management.md) ⇐

## Summary of the Pull Request

This is a more detailed spec for two parts of the "Process Model 2.0" work that's being tracked in #5000. In particular, this spec focuses on the management of Windows Terminal windows, including opening new tabs in existing windows. 

Largely, the reader is expected to have already read the spec in progress in #7240, and already be familiar with the concept of "Monarch" and "Peasant" windows as introduced by that spec. For that reason, ⚠ **THIS PR IS TARGETING THE BRANCH FOR #7240** ⚠. 

### Abstract

> This document is intended to serve as an addition to the [Process Model 2.0
> Spec]. That document provides a big-picture overview of changes to the entirety
> of the Windows Terminal process architecture, including both the split of
> window/content processes, as well as the introduction of monarch/peasant
> processes. The focus of that document was to identify solutions to a set of
> scenarios that were closely intertwined, and establish these solutions would
> work together, without preventing any one scenario from working. What that
> document did not do was prescribe specific solutions to the given scenarios.
>
> This document offers a deeper dive on a subset of the issues in [#5000], to
> describe specifics for managing multiple windows with the Windows Terminal. This
> includes features such as:
>
> * Run `wt` in the current window ([#4472])
> * Single Instance Mode ([#2227])


## PR Checklist
* [x] Specs: #4472, Specs #2227
* [x] References: #5000, #4472, #2227, #7240
* [x] I work here

## Detailed Description of the Pull Request / Additional comments
_\*<sup>\*</sup><sub>\*</sub> read the spec  <sub>\*</sub><sup>\*</sup>\*_

### Why are these two separate documents?

I felt that the spec that is currently in review in #7240 and this doc should remain separate, yet closely related documents. #7240 is more about showing how this large set of problems discussed in #5000 can all be solved technically, and how those solutions can be used together. It establishes that none of the proposed solutions for components of #5000 will preclude the possibility of other components being solved. What it does _not_ do however is drill too deeply on the user experience that will be built on top of those architectural changes. 

This doc on the other hand focuses more closely on a pair of scenarios, and establishes how those scenarios will work technically, and how they'll be exposed to the user. 

### TODO:

* [x] A thought - How will we handle arguments like `--fullscreen`, `--initialSize r,c`? They only apply when creating a new window, right?
* [x] When a `wt -s 1 split-pane` command is executed, we'll need to make sure to not _also_ create a new tab
2021-02-05 06:30:02 -06:00

59 KiB

author created on last updated issue id
Mike Griese @zadjii-msft 2020-07-31 2021-02-05

Windows Terminal Process Model 2.0

Abstract

The Windows Terminal currently exists as a single process per window. It has one connection per terminal pane, which could be an additional conpty process and associated client processes. This model has proven effective for the simple windowing we've done so far. However, this single process model will not be enough for more complex scenarios, like dragging tabs into other windows.

This spec outlines changes to the Terminal process model in order to enable the following scenarios:

  • Tab Tearoff/ Reattach (#1256)
  • Run wt in the current window (#4472)
  • Single Instance Mode (#2227)
  • Quake Mode (#653)

Also discussed is "mixed elevation" (#1032 & #632), which is closely related to the above scenarios, and was investigated as a part of this re-architecture.

Inspiration

Much of the design for this feature was inspired by an older web browser process model. For a time, web browsers, would use a separate process for each of the tabs within the browser, to help isolate the actual tabs from one another. (Nowadays, browsers will have multiple tabs all hosted in a single process, to attempt to reuse some resources between tabs. They do still break out the tab content from the actual window's process).

The rxvt-unicode terminal emulator uses a similar client/server architecture. In this model, the terminal "content" is hosted by a server process, and windows act as a "client" for that content.

Background

In order to better understand some of the following technical solutions, it's important to understand some of the technical hurdles that need to be overcome.

Drag and drop tabs to create new windows

An important scenario we're targeting is the ability to drag a tab out of the window, and create a new Terminal window. When we do this, we want the new window to be able to persist all the same state that the original window had for that tab. For us, that primarily means the buffer contents and connection state.

However, how do we move the terminal state to another window? The terminal state is all located in-memory of the thread that created the TermControl hosting the terminal buffer. It's fairly challenging to re-create this state in another process.

Take a look at this document for more background on how the scenario might work.

There's really only a limited selection of things that a process could transfer to another with a drag and drop operation. If we wanted to use a string to transfer the data, we'd somehow need to serialize then entire state of the tab. That would mean turning its tree of panes, and the state of each of the buffers in the tab, into some sort of string. Then, we would have the new window deserialize that string to re-create the tab state. Consider that each buffer might include 32000 rows of 80+ characters each, each with RGB attributes. You are looking at 30MB+ of raw data to serialize and de-serialize per buffer, minimum. This sounds extremely fragile, if not impossible to do robustly.

What we need is a more effective way for separate Terminal windows to to be able to connect to and display content that's being hosted in another process.

Mixed admin and unelevated clients in a single window

Let's presume that you're a user who wants to be able to open an elevated tab within an otherwise unelevated Terminal window. We call this scenario "mixed elevation" - the tabs within the Terminal can be running either unelevated or elevated client applications.

It wouldn't be terribly difficult for the unelevated Terminal to request the permission of the user to spawn an elevated client application. The user would see a UAC prompt, they'd accept, and then they'd be able to have an elevated shell alongside their unelevated tabs.

However, this creates an escalation of privilege vector. Now, there's an unelevated window which is connected directly to an elevated process. At this point, any other unelevated application could send input to the Terminal's HWND, making it possible for another unelevated process to "drive" the Terminal window and send commands to the elevated client application.

Solution Design

Window and Content Processes

To begin, let's first take a look at a rough diagram of how the Windows Terminal process looks today:

figure-001

Currently, the entire Windows Terminal exists as a single process composed of many parts. It has a top Win32 layer responsible for the window. This window includes a UWP XAML-like App layer, which embeds many TermControls. Each of these contains the buffer and renderer, and communicates with a connection to another process.

The primary concept introduced by this spec is the idea of two types of process, which will work together to create a single Terminal window. These processes will be referred to as the "Window Process" and the "Content Process".

  • A Window Process is a process which is responsible for drawing a window to the desktop, and accepting input from the user. This is a window which hosts our XAML content, and the window which the user interacts with.
  • A Content Process is a process which hosts a single terminal instance. This is the process that hosts the terminal buffer, state machine, and connection. It is also responsible for the Renderer and DxEngine.

These two types of processes will work together to present both the UI of the Windows Terminal app, as well as the contents of the terminal buffers. A single window process may be in communication with many content processes - one per terminal instance. That means that each TermControl in a window process will be hosted in a separate process.

The window process will be full of "thin" TermControls - controls which are only the XAML layer and a WinRT object which is hosted by the content process. These thin TermControls will receive input, and have all the UI elements (including the SwapChainPanel) of the TermControl today, but the actual rendering to the swap chain, and the handling of those inputs will be done by the content process.

figure-002

As a broad outline, whenever the window wants to create a terminal, the flow will be something like the following:

  1. A window process will spawn a new content process, with a unique ID.
  2. The window process will attach itself to the content process. It will indicate that it (the window process) is the content process's hosting window.
  3. When the content process creates its swap chain, it will raise an event. The window process will use to connect that swap chain to the window process's SwapChainPanel.
  4. The content process will read output from the connection and draw to the swap chain. The contents that are rendered to the swap chain will be visible in the window process's SwapChainPanel. This is because they share the same underlying kernel object.

figure-003

The content process will be responsible for the terminal buffer and other terminal state (the core Terminal object), as well as the Renderer, DxEngine, and TerminalConnection. These are all being combined in the content process, as to maximize performance. We don't want to have to hop across the process boundary multiple times per frame, so the renderer must be in the same process as the buffer. Similarly, we want to be able to read data off of the connection as quickly as possible, and the best way to do this will be to have the connection in the same process as the Terminal core.

Technical Details

Much of the above is powered by the magic of WinRT (which is powered by the magic of COM). Whenever we create WinRT types, WinRT provides metadata about these types that not only enables us to use them in-proc (via the implementation), but also enables using these types across process boundaries.

Typically, these classes are given a unique GUID for the class. A process that wants to implement one of these out-of-proc WinRT objects can register with the system to say "I make MyClass's, and their GUID is {foo}". These are called WinRT servers. Then, consumers (clients) of that class can ask the system "I'd like to instantiate a {foo} please", which will cause the server to create a new instance of that object (in the server process), and the client will be given a WinRT object which can interact with the classes WinRT projection.

A server can register the types it produces with CoRegisterClassObject, like so:

CoRegisterClassObject(MyClassGUID,
                      winrt::make<MyClassFactory>().get(),
                      CLSCTX_LOCAL_SERVER,
                      REGCLS_MULTIPLEUSE,
                      &dwRegistration);

And a client can attempt to get an instance of MyClass with

auto myClass = create_instance<winrt::MyClass>(MyClassGUID, CLSCTX_LOCAL_SERVER);

We're going to be using that system a little differently here. Instead of using a GUID to represent a single Class, we're going to use the GUID to uniquely identify content processes. Each content process will receive a unique GUID on creation. It will register as the server for that GUID. Any window process will be able to connect to that specific content process strictly by GUID. Because each GUID is unique to each content process, any time any client calls create_instance<>(theGuid, ...), it will uniquely attempt to connect to the content process hosting theGuid.

If we wanted to have a second window process connect to the same content process as another window, all it needs is the content process's GUID.

Now that we have a WinRT object that the content process hosts, we can have the window and content process interact using that interface. The next important thing to communicate between these processes is the swapchain. The swapchain will need to be created by the content process. We also need to be able to communicate the HANDLE to that swapchain out to the window process. The window process needs to be able to attach that swapchain to the SwapChainPanel in the window.

This is tricky, because WinRT does not natively expose a HANDLE type. That would allow us to easily pass the HANDLE between these processes. Instead we'll need to manually cast the HANDLE to uint64_t at the WinRT boundary, and cast it back to a HANDLE when we want to use it.

Instead, the window will always need to be the one responsible for calling DuplicateHandle. Fortunately, the DuplicateHandle function does allow a caller to duplicate from another process into your own process.

When the content process needs to create a new swapchain, it'll raise an event. The window process can listen for that event to indicate that the swapchain changed. The window will then query for the content process's PID and create a handle to the content process. The window will query the current value of the content process's HANDLE to the swapchain. The window will then duplicate that HANDLE into it's own process space. Now that the window has a handle to the swapchain, it can use ISwapChainPanelNative2::SetSwapChainHandle to set the SwapChainPanel to use the same swapchain.

This will allow the content process to draw to the swapchain, and the window process to render that same swapchain.

author's note: There's some internal work that's going on in parallel that will make this DUplicateHandle business more structured. Unfortunately, that work isn't public quite yet. It should be noted that in the version of this we end up shipping, we'll probably need a split COM & WinRT interface between the window and content processes. The COM interface will handle the passing of the swapchain handle, and the WinRT interface will do everything else.

Scenario: Tab Tear-off and Reattach

Because all that's needed to uniquely identify an individual terminal instance is the GUID of the content process, it becomes fairly trivial to be able to "move" a terminal instance from one window to another.

drop-tab-on-existing-window

When a drag/drop operation happens, the payload of the event will contain the structure of the tree of panes. This tree will contain the GUIDs of the content processes that make up the leaf nodes of the tree. The receiving window process will then be able to use that tree to be able to re-create a similar pane structure in its window. It will use the GUIDs to be able to connect to the content processes for the terminals in that tab.

The terminal buffer never needs to move from one process to another. It always stays in a single content process. The only thing that changes is which window process is rendering the content process's swapchain.

When a new window is created from a torn-out tab in this manner, we will want to ensure that the created window maintains the same maximized state as the origin window. If the user tries to tear out a tab from a maximized window, the window we create should also be maximized.

Similar to dragging a tab from one window to another window, we can also detect when the tab was dropped somewhere outside the bounds of a tab strip. When that happens, we'll create a new WT window, and use the data package from the drag-drop to initialize the structure of the window.

tear-out-tab

For our own sanity, we'll want to ensure that tabs cannot be torn out from a "Preview" Windows Terminal window and dropped into a "Release" window. I'm not positive that the system will prevent that on our behalf, so I think it's important to call out that if this is possible, we should expressly disallow it.

Scenario: Mixed Elevation

It was initially theorized that this window/content model architecture would also help enable "mixed elevation", where there are tabs running at different integrity levels (read: elevated and unelevated) within the same terminal window. However, after investigation and research, it has become apparent that this scenario is not possible to do safely after all. There are numerous technical difficulties involved, and each with their own security risks. At the end of the day, the team wouldn't be comfortable shipping a mixed-elevation solution, because there's simply no way for us to be confident that we haven't introduced an escalation-of-privilege vector utilizing the Terminal. No matter how small the attack surface might be, we wouldn't be confident that there are no vectors for an attack.

For the time being we'll continue with the current model of having one window for unelevated processes, and another for elevated windows. We can revisit mixed elevation in the future if we can get the focus and attention from some security experts that could back up a technical solution.

Instead of supporting mixed elevation in a single window, we'll introduce a number of new properties to profiles and various actions, to improve the user experience of running elevated instances. These are detailed in the spec at Elevation Quality of Life Improvements.

Some things we considered during this investigation:

  • If a user requests a new elevated tab from an otherwise unelevated window, we could use UAC to create a new, elevated window process, and "move" all the current tabs to that window process, as well as the new elevated client. Now, the window process would be elevated, preventing it from input injection, and it would still contains all the previously existing tabs. The original window process could now be discarded, as the new elevated window process will pretend to be the original window.
    • However, it is unfortunately not possible with COM to have an elevated client attach to an unelevated server that's registered at runtime. Even in a packaged environment, the OS will reject the attempt to CoCreateInstance the content process object. This will prevent elevated windows from re-connecting to unelevated client processes.
    • We could theoretically build an RPC tunnel between content and window processes, and use the RPC connection to marshal the content process to the elevated window. However, then we would need to be responsible for securing access the the RPC endpoint, and we feel even less confident doing that.
    • Attempts were also made to use a window-broker-content architecture, with the broker process having a static CLSID in the registry, and having the window and content processes at mixed elevation levels CoCreateInstance that broker. This however also did not work across elevation levels. This may be due to a lack of Packaged COM support for mixed elevation levels. Even if this approach did end up working, we would still need to be responsible for securing the elevated windows so that an unelevated attacker couldn't hijack a content process and trigger unexpected code in the window process. We didn't feel confident that we could properly secure this channel either.

Monarch and Peasant Processes

With the current design, it's easy to connect many content processes to a window process, and move those content processes easily between windows. However, we want to make sure that it's also possible to communicate between these processes. What if we want to have only a single WT instance, so that whenever WT is run, it creates a tab in the existing window? What if you want to run a commandline in a given WT window?

In addition to the concept of Window and Content Processes, we'll also be introducing another type of categorization for window processes. These are "Monarch" and "Peasant" processes. This will allow for the coordination across the various windows by the Monarch.

There will only ever be one monarch process at a given time, and every other WT window process is a peasant process. However, we want this system to be redundant, so that if the monarch ever dies, one of the remaining peasant processes can take over for it. The new monarch will be chosen at random, so we'll call this a probabilistic elective monarchy.

Essentially, the probabilistic elective monarchy will work in the following way:

  1. We'll introduce a WinRT class (for the purpose of this doc we'll call it Monarch), with a unique GUID.
  2. When any window process starts up, it'll first try to register as the server for the Monarch WinRT class.
    • The OS will allow subsequent processes to successfully register as the server for Monarch objects, but when someone tries to create_instance a Monarch, the OS will always create the Monarch from the first process to register.
  3. After registering as a server for Monarchs, attempt to create a Monarch using winrt::create_instance.
  4. Using that Monarch, ask it for it's PID.
    • If that PID is the same as the PID of the current process, then the window process knows that it is the monarch.
    • If that PID is some other process, then we know that we're not currently the monarch.
      • If we don't currently have an ID assigned to us, then ask the Monarch to assign one to us.
      • If we do have an ID (from a previous monarch), then let the new monarch know that we exist.
  5. If we're a peasant process, then WaitForSingleObject on a handle to the monarch process. When the current monarch dies, go back to 3. At this point, we might be appointed the new monarch (by whatever process the OS uses to choose who the new server for Monarchs is.)
    • We're suggesting WaitForSingleObject here as opposed to having the monarch raise a WinRT event when it's closed, because it's possible that the monarch closes unexpectedly, before it has an opportunity to notify the peasants.

By this mechanism, the processes will be able to communicate with the Monarch, who'll be responsible for managing any inter-process communication between windows we might need. In addition, the monarch might need to track some global state that it can use to enable some of the following scenarios. An example of such global state might be "which window process was the most recently focused one?"

Author's note: The terms "monarch" and "peasant" were chosen to intentionally be a bit silly. Any relationship to an existing software model is purely coincidence. The metaphor is what is really important here, the naming just so happens to bring a bit of levity to what is otherwise a fairly dense spec. As this terminology is only intended to be used within the code itself, we're not terribly concerned about the actual choice of words here.

Scenario: Open new tabs in most recently used window

👉 Note: This scenario is covered in depth in the spec titled "Windows Terminal Session Management". This section is intended to be an overview of the scenario. It should help show how window management fits in the larger picture of the other process model changes.

A common feature of many browsers is that when a web URL is clicked somewhere, the web page is opened as a new tab in the most recently used window of the browser. This functionality is often referred to as "glomming", as the new tab "gloms" onto the existing window.

Currently, the terminal does not support such a feature - every wt invocation creates a new window. With the monarch/peasant architecture, it'll now be possible to enable such a scenario.

As each window is activated, it will call a method on the Monarch object (hosted by the monarch process) which will indicate that "I am peasant N, and I've been focused". The monarch will use those method calls to update its own internal stack of the most recently used windows.

When tab glomming is enabled, and a new wt.exe process is launched, that process will first ask the monarch if it should run the commandline in an existing window, or create its own window.

auto-glom-wt-exe

Depending on which window was the last focused, the monarch will dispatch the commandline to the appropriate window for them to handle instead. To the user, it'll seem as if the tab just opened in the most recent window.

Users should certainly be able to specify if they want new instances to glom onto the MRU window or not. You could imagine that currently, we default to the value "windowingBehavior": "useNew", meaning that each new wt gets its own new window.

Scenario: Run wt in the current window

👉 Note: This scenario is covered in depth in the spec titled "Windows Terminal Session Management". This section is intended to be an overview of the scenario. It should help show how window management fits in the larger picture of the other process model changes.

One often requested scenario is the ability to run a wt.exe commandline in the current window, as opposed to always creating a new window. With the ability to communicate between different window processes, one could imagine a logical extension of this scenario being "run a wt commandline in any given WT window".

This spec by no means attempts to fully document how this functionality should work. This scenario deserves its own spec to discuss various different naming schemes for the commandline parameters. The following is given as an example of how these arguments might be authored and implemented to satisfy some of these scenarios.

Since each window process will have it's own unique ID assigned to it by the monarch, then running a command in a given window with ID N should be as easy as something like:

wt.exe --window N new-tab ; split-pane

(or for shorthand, wt -w N new-tab ; split-pane).

This would create a new peasant, who could then ask the monarch if there is a peasant with ID N. If there is, then the peasant can connect to that other peasant, dispatch the current commandline to that other peasant, and exit, before ever creating a window. If this commandline instead creates a new monarch process, then there was no other monarch, and so there must logically not be any other existing WT windows, and the --window N argument could be safely ignored, and the command run in the current window.

We should reserve the session id 0 to always refer to "The current window", if there is one. So wt -w 0 new-tab will run new-tab in the current window (if we're being run from WT).

If wt -w 0 <commands> is run outside a WT instance, it could attempt to glom onto the most recent WT window instead. This seems more logical than something like wt --window last or some other special value indicating "run this in the MRU window".

That might be a simple, but wrong, implementation for "the current window". If the peasants always raise an event when their window is focused, and the monarch keeps track of the MRU order for peasants, then one could naively assume that the execution of wt -w 0 <commands> would always return the window the user was typing in, the current one. However, if someone were to do something like sleep 10 ; wt -w 0 <commands>, then the user could easily focus another WT window during the sleep, which would cause the MRU window to not be the same as the window executing the command.

I'm not sure that there is a better solution for the -w 0 scenario other than attempting to use the WT_SESSION environment variable. If a wt.exe process is spawned and that's in its environment variables, it could try and ask the monarch for the peasant who's hosting the session corresponding to that GUID. This is more of a theoretical solution than anything else.

Scenario: Quake Mode

"Quake mode" has a variety of different scenarios[1] that have all been requested, more than what fits cleanly into this spec. This section is not intended to be comprehensive. However, there's one key aspect of quake mode that is specifically relevant and worth mentioning here.

One of the biggest challenges with registering a global shortcut handler is what happens when multiple windows try to register for the same shortcut at the same time? From my initial research, it seems that only the first process to register for the shortcut will succeed. This makes it hard for multiple windows to handle the global shortcut key gracefully. If a second window is created, and it fails to register the global hotkey, then the first window is closed, there's no way for the second process to track that and re-register as the handler for that key.

With the addition of monarch/peasant processes, this problem becomes much easier to solve. Now, the monarch process will always be the process to register the shortcut key, whenever it's elected. If it dies and another peasant is elected monarch, then the new monarch will register as the global hotkey handler.

Then, the monarch can use it's pre-established channels of communication with the other window processes to actually drive the response we're looking for.

Alternatively, we could use an entirely other process to be in charge of the registration of the global keybinding. This process would be some sort of long-running service that's started on boot. When it detects the global hotkey, it could attempt to instantiate a Monarch object.

  • If it can't make one, then it can simply run a new instance of wt.exe, because there's not yet a running Terminal window.
  • Otherwise, it can communicate to the monarch that the global hotkey was pressed, and the monarch will take care of delegating the activation to the appropriate peasant window.

This would mitigate the need to have at least one copy of WT running already, and the user could press that keybinding at any time to start the terminal.

Rough interface design

This is by no means definitive, but one could imagine the Monarch and Peasant classes exposing the following WinRT projections:

class Peasant
{
    void AssignID(UInt64 id); // Should only be called by the monarch
    UInt64 GetID();
    UInt64 GetPID();
    Boolean ExecuteCommandline(String[] args, String currentDirectory);
    event TypedEventHandler<Object, Object> WindowActivated;
}

class Monarch : Peasant
{
    UInt64 AddPeasant(Peasant peasant);
    Boolean IsInSingleInstanceMode();
    Peasant GetPeasant(UInt64 peasantID);
    Peasant GetMostRecentPeasant();
}

The peasant process can instantiate the Peasant object itself, in its process space. Initially, the Peasant object won't have an ID assigned, and the call to AddPeasant on the Monarch will both cause the Monarch to assign the Peasant an ID, and add it to the Monarch process's list of processes. If the Peasant already had an ID assigned by a previous monarch, then the new monarch will simply re-use the value already existing in the Peasant. Now, the monarch can call methods on the Peasant object directly to trigger changes in the peasant process.

Note that the monarch also needs to have a peasant ID, because the monarch is really just a peasant with the added responsibility of keeping track of the other peasants as well.

It's important that ExecuteCommandline takes a currentDirectory parameter. Consider the scenario wt -w 0 -d . - in this scenario, the process that's executing the provided commandline should first change its working directory to the provided currentDirectory so that something like . actually refers to the directory where the command was executed.

Default Terminal

In parallel to the above scenarios, the team is also investigating ways of supporting a "default terminal application" on Windows (refer to #492). The vision for default terminals is that when a commandline application is started on Windows, instead of booting into conhost.exe as a default terminal, the system will instead boot up whatever application the user has set as their "default terminal application", whether that be the Windows Terminal or some other terminal app (ConEmu, MinTTY, Hyper, etc).

This is another scenario that deserves its own full spec, though it warrants a special callout in this document as well.

If we break down how the "defterm" scenario works, it's a bit of an inversion from the normal way that the Terminal is started. Instead of the Terminal creating a new connection, the connection already exists, it just needs to be attached to a window of some sort.

Considering all the changes proposed above, we can also support defterm with the following mechanism:

  1. When wt is started as the target of a "defterm" invocation, the system will somehow pass us a HANDLE (or pair of HANDLEs) to indicate that we're being started to be the terminal for that commandline client.
    • The details of the connection information aren't really important at this time.

Then we have two paths forward:

Option A

  1. The wt that's spawned in this way should become the content process for this connection, because it has the direct connection to the client that's attempting to start.

  2. This new content process will need a window. Depending on the configuration of the Terminal, the following could happen:

    • the content process could discover that there's no existing Monarch process, and that a new monarch should be created, with a reference to this content process (as a first tab)
    • The content process connects to the monarch, who then creates a new window process who attaches to the content process. (A new window is created.)
    • The content process connects to the monarch, who then tells an existing window process to attaches to the content process. (The client opens as a new tab either in the single instance or the most recent window, if glomming is enabled.)

OR

  1. The wt that's spawned by the defterm connection is a new window process. It should create a new content process to handle this connection.

    • the content process will need a way of being invoked by passing it handles to the new client. This way, the content process can dupe these handles into it's own process space, to be able to create the ITerminalConnection in its own process space.
  2. If this new window process is the monarch, then great! There are no other windows to glom onto, so create the tab in this window process.

  3. It'll ask the monarch if it's in single instance mode, or if the monarch is configured to glom tabs onto the most recent window, instead of spawning new ones. If it is configured in such a way, the new peasant window process will pass the content process's GUID to the Peasant* that should be the window for that new client

    • *: note that this peasant could just be the monarch (in single instance mode, for example).

👉 NOTE: This area is left intentionally vague, because we're not exactly sure what the API surface exposed for default terminal invocations will look like. Hopefully, it shows that however we do end up implementing support for default terminal apps, the proposed Window/Content/Monarch/Peasant architecture will still be a compatible solution.

UI/UX Design

There's not really all that much UI for this scenario. Ideally, all this will happen behind the scenes, without the user really being exposed to the inner machinations of the Terminal processes.

At the very least, resizes will occur off of the UI thread, which will help that scenario feel better. While resizes happen fairly responsively in Release builds, Debug builds used for development have fairly painful resizes occurring on the UI thread, blocking all input until they're finished.

Ideally, we'll want the tab that's being dragged to be able to show the contents of the tab as we're dragging it, similar to the way that Edgium currently does. However, this is expected to be challenging and something that won't be a part of the initial release of tab tear-out.

Capabilities

Accessibility

When working on the implementation of the window/content process split, we'll need to ensure that the contents of the buffer are still readable by accessibility tools like Narrator and NVDA. We'll be moving the actual buffer out of the window process into the content process, so we'll need to make sure that we can hook up the TermControlAutomationPeer across the process boundary. Presumably, this is possible trivially because it's already implemented as a WinRT type.

I'll be especially curious to make sure that TermControl::OnCreateAutomationPeer can occur off the window process's UI thread, because all cross-process calls will need to happen off the UI thread.

Security

As we'll be introducing a mechanism for crossing elevation boundaries, we'll want to be especially careful as we make these changes.

Our biggest concern regarding mixed elevation was regarding the ability for a non-elevated process to send input to another unelevated WT window, which was connected to an elevated client process. With these proposed changes, we won't have any unelevated windows connected to unelevated processes, so that concern should be mitigated.

Additionally, we'll probably want to make sure that the user is visually prompted when a connection to an elevated client is initiated. Typically, this is done with a UAC prompt, which seems reasonable in this scenario as well.

We'll likely want to default elevated windows to spawning unelevated connections, as to prevent accidentally running connections as an administrator. This is in contrast to existing behavior, where all connections in an elevated WT window are elevated.

Furthermore, we'll want to ensure that there's nothing that an unelevated client process could do to trigger any sort of input callback in the parent window. If the parent window is an elevated window, with other elevated connections, then we don't want the unelevated content process to act as a vector by which malicious software could hijack the elevated window.

Reliability

This is probably the biggest concern in this section. Because there will now be many, many more processes in the process tree, it is imperative that whenever we're doing cross-process operations, we do them safely.

Whenever we're working with an object that's hosted by another process, we'll need to make sure that we work with it in a try/catch, because at any time, the other process could be killed. At any point, a content process could be killed, without the window being closed. The window will need to be redundant to such a scenario, and if the content process is killed, display an appropriate error message, but continue running.

When the monarch process dies, we'll need to be able to reliably elect a new monarch. While we're electing a new monarch, and updating the new monarch, there's always the chance that the new monarch is also killed (This is the "Pope Stephen II" scenario). In this scenario, if at any point a peasant notices that the monarch has died, the peasant will need to begin checking who the new monarch is again.

There is certain to be a long tail of edge cases we'll discover as a product of this change. We'll need to be prepared to see an inevitable decrease in initial reliability numbers, and increased bug reports for a time. Additionally, these bugs will likely be fairly difficult to track down. Initially, it may help us to track these bugs down if we add additional tracing around each of the inter-process calls we might make. Some example situations might include:

  • When a process creates a new content process
  • When a content process is attached by a window
  • When a monarch is elected
  • when a peasant receives its new ID from a monarch
  • when either a window or content process safely exits

In any and all of these situations, we'll want to try and be as verbose as possible in the logging, to try and make tracking which process had the error occur easier.

In the long run, this may end up increasing reliability, as crashes in the buffer should no longer cause the entire terminal application to crash.

Compatibility

The biggest concern regarding compatibility will be ensuring that the Universal app version of the Windows Terminal will still work as before. Although that application isn't used by any customer-facing scenarios currently, it still represents some long term goals for the Terminal. We'll probably need to disable tear-out entirely on the universal app, at least to begin with. Additionally, we'll need to ensure that all the controls in the universal application are in-proc controls, not using the window/content split process model.

I'm also concerned that each individual atomic step along the path towards this goal still works. For more on this topic, see Implementation Plan.

Performance, Power, and Efficiency

There's no dramatic change expected here. There may be a minor delay in the spawning of new terminal instances, due to requiring cross-process hops for the instantiation of the content process.

Additionally, minor delays are expected to be introduced during startup for the initial setup of the monarch/peasant relationship.

Potential Issues

Extensions & non-terminal content

We've now created a very terminal-specific IPC mechanism for transferring terminal state from one thread to another. However, what happens when we start to have panes that contain non-terminal content in them? This is a scenario that's most often associated with the concept of extensions in the Terminal, but perhaps more relevantly could affect the Settings UI.

The Settings UI is something we intend on shipping in the Terminal as a part of 2.0, in the same time frame much of the above work is expected to be done. We also plan on hopefully making the Settings UI appear as its own tab within the Terminal. This would be the first example of having non-terminal content directly in the application. How would we support tearing out the Settings UI tab into it's own window?

Options available here include:

We could prevent tabs with non-terminal content open in them from being dragged out entirely. They're stuck in their window until the non-terminal content is removed. This seems like it would not be a good long-term solution. Users will want to be able to tear their non-terminal content out of the window.

Alternatively, we could pass some well-defined string / JSON blob from the source process to the target process, such that the extension could use that JSON to recreate whatever state the pane was last in. The extension would be able to control this content entirely.

  • Maybe they want to pass a GUID that the new process will be able to user to CreateInstance the singleton for their UI and then dupe their swap chain to the new thread...

We'll need to define a syntax for the tear-out and drag/drop scenarios anyways, so we should define that structure in a way that will allow future extensibility.

Lets examine some sample JSON[2][3]

{
  "tabs": [
    {
      "runtimeTitle": "foo",
      "runtimeColor": null,
      "panes": [
        {
          "split": "horizontal",
          "children": [
            {
              "size": 0.30,
              "contentType": "Microsoft.Terminal.TerminalControl",
              "payload": {
                "contentGuid": "{1-1-1-1}"
              }
            },
            {
              "size": 0.70,
              "contentType": "Microsoft.Terminal.TerminalControl",
              "payload": {
                "contentGuid": "{2-2-2-2}"
              }
            }
          ]
        }
      ]
    },
    {
      "runtimeTitle": null,
      "runtimeColor": null,
      "panes": [
        {
          "size": 1.0,
          "contentType": "Microsoft.Terminal.SettingsPage",
          "payload": {
            "currentPage": "globals"
          }
        }
      ]
    }
  ]
}

Here, we've got two tabs that have been serialized.

  • The first has two panes, separated with a horizontal split. It has also been renamed to "foo".
    • The first pane takes up 30% of the parent, and contains a TermControl. The content process for that control is identified by the content GUID {1-1-1-1}
    • The second pane is also a TermControl, takes up 70% of the parent, and is attached to the content process {2-2-2-2}
  • The second tab has a single pane, with a SettingsPage. The settings page has also specified in it's payload that its current page is the "globals" page.

When we send this serialized state to another window, it can use the content GUIDs to initialize new TermControls connected to the appropriate content processes. It can also pass the payload to a new SettingsPage, so the settings page can recreate whatever state it was in.

It would be at the discretion of each type of content to specify exactly what would go into the payload, and how to best deserialize it. In the future, we'd also need to consider how the content types should be identified and instantiated, but that's a questions that's deferred to a future "extensions" spec.

Mixed elevation & Monarch / Peasant issues

Because we won't be able to CoCreateInstance across different elevation levels, we won't be able to have elevated window processes communicate with unelevated ones. This means that we'll end up having one monarch per elevation level. This may end up a little confusing, as window IDs will be tracked per-elevation level. That means there could be two different windows that share the same ID - one elevated window, and one unelevated window. Only the windows running at the same integrity level will be addressable via the commandline.

This might be a little confusing to the end user, but is seen as an acceptable experience to the alternative of just completely disallowing running commands in other elevated windows. If a user is running in an elevated window, they probably will still want wt -w 0 new-tab to open in the current window.

What happens to the content if the monarch dies unexpectedly?

What happens if you only have one window process, the monarch, and it throws an exception for whatever reason? Will the content processes also attempt to be peasants here? Or perhaps is there another mechanism by which the content processes can realize all the windows are gone and start a new window process who becomes the instant-monarch for all the still-living and remaining content processes?

I suppose we could have a content process try to auto-create a new window for itself if it detects that it's window is gone, but it didn't receive a clean close. Like, the window process would need to call Content::WindowIsClosing() to indicate that it's a clean close manually. I'm not sure from and end user perspective if all the panes in a window exploding out into their own windows is totally a great idea, but it's worth noting as a possibility.

This is left unanswered for now - we can certainly experiment with the content processes auto-recreating a window for themselves in the future.

Implementation Plan

Obviously, everything that's discussed here represents an enormous amount of work. It's important that as we move towards this new model, that we do so in safe, incremental chunks, such that each step is still a viable Terminal application, but no single step is too large to review. As such, I'll attempt to break down the work required into atomic pieces, and provide a relative ordering for the work described above.

These tasks are broken into sections, because it seems that there's roughly three tracks of work to be done here, which can be done relatively independently of each other.


  1. Add Monarch/Peasant capabilities to wt window processes.
    • This does not need to involve any actual UX functionality, simply have the WT instances communicate with one another to see who is the Monarch.
    • The monarch will track and assign IDs to peasants.
  2. (Dependent on 1): Add support for running a wt commandline in an existing window
    • Monarch should assign simple IDs for use with the wt commandline
    • 0 should be reserved as an alias for "the current window"
  3. (Dependent on 1, maybe on 2): Add support for "single instance mode". New peasant windows will first ask the monarch if it's in single instance mode, and pass the peasant's commandline to the monarch if it is.
  4. (Dependent on 1): Monarch registers for the global quake hotkey, and uses that to activate the monarch.
    • Pressing the key when a window is currently focused should minimize? Do nothing?
  5. (Dependent on 4): any other quake mode improvements:
    • Summon the nearest window
    • make the window "drop down" from the top
    • Summon the MRU window
      • It would need to track a the MRU for windows, so pressing the shortcut when no window is active summons the MRU one.

  1. Change TermControl/DxEngine to create its own swap chain using DCompositionCreateSurfaceHandle and CreateSwapChainForCompositionSurfaceHandle. This is something that's already done in commit 30b8335.

  2. (Dependent on 6) Refactor TermControl to allow it to have a WinUI layer and a Core layer. The WinUI layer will be only responsible for interacting with XAML, processing input, etc, and sending it to the core layer, which handles all the logic concerning input. The core will expose these methods that the UI layer calls as projected methods

  3. (Dependent on 7): Expose the methods the UI layer calls on the core as projected methods. The control is still fundamentally in-proc, and the UI layer calls directly into the implementation of the control core, but the methods could be used x-proc.

  4. (Dependent on 8): Enable a TermControl to have an out-of proc core.

    • The core will need to be able to accept a PID using some method, and be able to DuplicateHandle the swapchain to that PID. It'll also need to raise the SwapChainHandleChanged event.
    • The Control UI layer would need to receive a GUID in the ctor, and use that connect to the out-of-proc core. It'll pass the UI's PID to the core, and attach to the core's swap chain handle.
    • The UI layer will be smart enough to call either the implementation method (if it's hosting in-proc) or the projected method (if it's out-of-proc).
  5. (Dependent on 9?) The core layer needs to be able to construct connections itself, rather than have one passed in to it.

  • In the future we'll probably want this to be more extensible, but for now we can probably just pass an enum for connection type, and an IConnectionSettings object to the core, and use the ConnectionType and setting to build the limited types of connection we currently have.
  1. (Dependent on 9, 10) wt.exe needs to be able to spawn as a content process, accepting a GUID for its ID, and spawning a single control core.
  • When the content process is first spawned, it won't create the core or connection, nor will it have any settings. The first client to connect to the content process should make sure to set up the settings before initializing the control.
  • A scratch XAML Island application might be a useful tool at this point, to test hosting the content in another process (that's not a full-blown terminal instance).
  1. (Dependent on 11) TerminalApp creates new content processes for each and every terminal instance it spawns. At this point, there's no tear-out, but the terminal instances are all out-of-proc from the window process.

  2. (Dependent on 12) Terminal can drop tabs onto another WT window process by communicating the structure of the tab's panes and their content GUIDs.

  • At this point, the tabs can't be torn out to create new windows, only move between existing windows
  • It might be hard to have the tab "attach" to the tab row of the other window.
  • It might be easiest to do this after 1, and communicate to the new window process the GUID of the old window process and the tab within the original window process, and just have the new window process ask the old window process what the structure of the tab is
    • Though, the tab won't be in the original process's list of tabs anymore, so that might not be helpful.
  1. wt accepts an initial position, size on the commandline.

  2. (Dependent on 13, 14) Tearing out a tab and dropping it not on another WT window creates a new window for the tab.

  • This will require and additional argument to wt for it to be able to inherit the structure of a given set of tabs. Either this will need to be passed on the cmdline, or the newly spawned process will need to be able to communicate with the original process to ask it what structure it should be building.
  • Doing this after 1 might be helpful.
  • Needs 14 to be able to specify the location of the new window.

  1. Add support for a NewWindow action
  2. wt accepts a commandline arg to force it to auto-elevate itself if it isn't already elevated.
  3. (Dependent on 16, 17) Users can mark a profile as "elevated": true, to always force a new elevated window to be created for elevated profiles.
  4. (Dependent on 18) Add an elevated parameter to NewTerminalArgs that will allow a newTab or splitPane action to use that value of elevated, rather than the value from the profile.
  5. Add a UAC shield to elevated terminal windows

Footnotes

[1]: See Quake mode scenarios for a longer enumeration of possible scenarios.

[2]: I'm not prescribing that this solution must involve JSON. In this scenario, I'm using JSON merely as an example serialization that we can quickly examine and discuss. If we did use JSON, we'd almost certainly want to use a string as the payload type, so we can just pass that string into some WinRT method projected by whatever type corresponds to the provided type.

[3]: The state that's serialized here for the contents of a window might also be convenient to re-use for the restoration of terminal window state across reboots. If we already know how to serialize the entire state of a Terminal window, then storing that somewhere for a future terminal launch to use seems like an obvious next step. See also #961.

Future considerations

  • A previous discussion with someone on the input team brought up the following idea: When a tab is being torn out, move all the other tabs to a new window, and continue dragging the current window. This should make the drag/drop experience feel more seamless, and might be able to allow us to render the tab content as we're drag/dropping.
    • If we pursue this, then we'll need to make sure that we re-assign the window ID's as appropriate. the new window (with the tabs that are being left behind) should still have the same peasant ID as the original window, which will now get a new ID (as to not conflict)
  • We could definitely do a NewWindowFromTab(index:int?) action that creates a new window from a current tab once this all lands.
    • Or it could be NewWindowFromTab(index:union(int,int[])?) to specify the current tab, a specific tab, or many tabs.
  • During review, it was mentioned that when links are opened in the new Edge browser, they will only glom onto an existing window if that window is open in the current virtual desktop. This seems like a good idea of a feature for the Terminal to follow as well. Since Edge is able to do it, there must be some way for an application to determine which virtual desktop it is open on. We could use that information to have the monarch track the last active window per-desktop, and only glom when there's one on the current desktop.
    • We could even imagine changing the glomToLastWindow property to accept a combined bool/enum value:
      • true or "always": always glom to the most recent window, regardless of desktop
      • "sameDesktop": Only glom if there's an existing window on this virtual desktop, otherwise create a new window
      • false or "never": Never glom, always create a new window.

Elevation and Extensions

In #4000, we're tracking adding support for 3rd party extensions to the Terminal. These extensions will be code that runs either in-proc or out-of-proc, and can interact with the Terminal in some way. As examples, they might provide settings (profiles), or add UI elements, or parse the contents of the buffer.

Because the Terminal will be executing 3rd party code, I believe extensions that run code should probably be disabled by default for elevated windows. This would help reduce the attack surface for and attacker to leverage the Terminal as an opportunity to have a user elevate the attacker's code.

Of course, the user might want to be able to use a well-behaved extension in elevated windows, when they trust the extension. We could have an additional set of settings the user could use to enable certain extensions in elevated windows. However, this setting cannot live in the normal settings.json or even state.json (see #7972, since those files are writable by any medium-IL process. Instead, this setting would ned to live in a separate file that's protected to only be writable by elevated processes. This would ensure that an attacker could not just add their extension to the list of white-listed extensions. When the settings UI wants to modify that setting, it'll need to prompt the user for permission, but that's an acceptable user experience.

TODOs

  • Experimentally prove that a elevated window can host an unelevated content
    • Research proved the opposite actually.
  • Experimentally prove that I can toss content process IDs from one window to another
    • I don't have any doubt that this will work, but I want to have a proof-of-concept just in case.
  • come up with a commandline mechanism for starting one wt.exe process either as a window process, or as a content process
    • wt --content {guid} seems reasonable to me. If there's a --content {guid} as the only two args, I think that's a distinct enough way to identify "you should be a content process".
  • What about handling XAML input events? The "thin" term control will need to do that in-proc, so it can reply on the UI thread if a particular keystroke was handled or not.
    • I'm almost certain that the Terminal::HandleKey() stuff is going to need to be handled in the thin control, and the core will need to raise events to the control? oof
  • Make sure I can pass the UiaProvider for the control core out to the Window Process, so the window process can use it to query buffer info.

Addenda

This spec also has a follow-up spec which introduces further changes upon this original draft. Please also refer to:

  • November 2020: Windows Terminal Window Management

Resources