terminal/src/terminal/parser/stateMachine.cpp

2038 lines
62 KiB
C++
Raw Normal View History

// Copyright (c) Microsoft Corporation.
// Licensed under the MIT license.
#include "precomp.h"
#include "stateMachine.hpp"
#include "ascii.hpp"
using namespace Microsoft::Console::VirtualTerminal;
//Takes ownership of the pEngine.
StateMachine::StateMachine(std::unique_ptr<IStateMachineEngine> engine) :
_engine(std::move(engine)),
_state(VTStates::Ground),
_trace(Microsoft::Console::VirtualTerminal::ParserTracing()),
_parameters{},
Refactor VT parameter handling (#7799) This PR introduces a pair of classes for managing VT parameters that automatically handle range checking and default fallback values, so the individual operations don't have to do that validation themselves. In addition to simplifying the code, this fixes a few cases where we were mishandling missing or extraneous parameters, and adds support for parameter sequences on commands that couldn't previously handle them. This PR also sets a limit on the number of parameters allowed, to help thwart DoS memory consumption attacks. ## References * The new parameter class also introduces the concept of an omitted/default parameter which is not necessarily zero, which is a prerequisite for addressing issue #4417. ## Detailed Description of the Pull Request / Additional comments There are two new classes provide by this PR: a `VTParameter` class, similar in function to a `std::optional<size_t>`, which holds an individual parameter (which may be an omitted/default value); and a `VTParameters` class, similar in function to `gsl:span<VTParameter>`, which holds a sequence of those parameters. Where `VTParameter` differs from `std::optional` is with the inclusion of two cast operators. There is a `size_t` cast that interprets omitted and zero values as 1 (the expected behaviour for most numeric parameters). And there is a generic cast, for use with the enum parameter types, which interprets omitted values as 0 (the expected behaviour for most selective parameters). The advantage of `VTParameters` class is that it has an `at` method that can never fail - out of range values simply return the a default `VTParameter` instance (this is standard behaviour in VT terminals). It also has a `size` method that will always return a minimum count of 1, since an empty parameter list is typically the equivalent of a single "default" parameter, so this guarantees you'll get at least one value when iterating over the list with `size()`. For cases where we just need to call the same dispatch method for every parameter, there is a helper `for_each` method, which repeatedly calls a given predicate function with each value in the sequence. It also collates the returned success values to determine the overall result of the sequence. As with the `size` method, this will always make at least one call, so it correctly handles empty sequences. With those two classes in place, we could get rid of all the parameter validation and default handling code in the `OutputStateMachineEngine`. We now just use the `VTParameters::at` method to grab a parameter and typically pass it straight to the appropriate dispatch method, letting the cast operators automatically handle the assignment of default values. Occasionally we might need a `value_or` call to specify a non-standard default value, but those cases are fairly rare. In some case the `OutputStateMachineEngine` was also checking whether parameters values were in range, but for the most part this shouldn't have been necessary, since that is something the dispatch classes would already have been doing themselves (in the few cases that they weren't, I've now updated them to do so). I've also updated the `InputStateMachineEngine` in a similar way to the `OutputStateMachineEngine`, getting rid of a few of the parameter extraction methods, and simplifying other parts of the implementation. It's not as clean a replacement as the output engine, but there are still benefits in using the new classes. ## Validation Steps Performed For the most part I haven't had to alter existing tests other than accounting for changes to the API. There were a couple of tests I needed to drop because they were checking for failure cases which shouldn't have been failing (unexpected parameters should never be an error), or testing output engine validation that is no longer handled at that level. I've added a few new tests to cover operations that take sequences of selective parameters (`ED`, `EL`, `TBC`, `SM`, and `RM`). And I've extended the cursor movement tests to make sure those operations can handle extraneous parameters that weren't expected. I've also added a test to verify that the state machine will correctly ignore parameters beyond the maximum 32 parameter count limit. I've also manual confirmed that the various test cases given in issues #2101 are now working as expected. Closes #2101
2020-10-15 18:12:52 +02:00
_parameterLimitReached(false),
_oscString{},
_cachedSequence{ std::nullopt },
_processingIndividually(false)
{
_ActionClear();
}
void StateMachine::SetParserMode(const Mode mode, const bool enabled) noexcept
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
{
Disable the acceptance of C1 control codes by default (#11690) There are some code pages with "unmapped" code points in the C1 range, which results in them being translated into Unicode C1 control codes, even though that is not their intended use. To avoid having these characters triggering unintentional escape sequences, this PR now disables C1 controls by default. Switching to ISO-2022 encoding will re-enable them, though, since that is the most likely scenario in which they would be required. They can also be explicitly enabled, even in UTF-8 mode, with the `DECAC1` escape sequence. What I've done is add a new mode to the `StateMachine` class that controls whether C1 code points are interpreted as control characters or not. When disabled, these code points are simply dropped from the output, similar to the way a `NUL` is interpreted. This isn't exactly the way they were handled in the v1 console (which I think replaces them with the font _notdef_ glyph), but it matches the XTerm behavior, which seems more appropriate considering this is in VT mode. And it's worth noting that Windows Explorer seems to work the same way. As mentioned above, the mode can be enabled by designating the ISO-2022 coding system with a `DOCS` sequence, and it will be disabled again when UTF-8 is designated. You can also enable it explicitly with a `DECAC1` sequence (originally this was actually a DEC printer sequence, but it doesn't seem unreasonable to use it in a terminal). I've also extended the operations that save and restore "cursor state" (e.g. `DECSC` and `DECRC`) to include the state of the C1 parser mode, since it's closely tied to the code page and character sets which are also saved there. Similarly, when a `DECSTR` sequence resets the code page and character sets, I've now made it reset the C1 mode as well. I should note that the new `StateMachine` mode is controlled via a generic `SetParserMode` method (with a matching API in the `ConGetSet` interface) to allow for easier addition of other modes in the future. And I've reimplemented the existing ANSI/VT52 mode in terms of these generic methods instead of it having to have its own separate APIs. ## Validation Steps Performed Some of the unit tests for OSC sequences were using a C1 `0x9C` for the string terminator, which doesn't work by default anymore. Since that's not a good practice anyway, I thought it best to change those to a standard 7-bit terminator. However, in tests that were explicitly validating the C1 controls, I've just enabled the C1 parser mode at the start of the tests in order to get them working again. There were also some ANSI mode adapter tests that had to be updated to account for the fact that it has now been reimplemented in terms of the `SetParserMode` API. I've added a new state machine test to validate the changes in behavior when the C1 parser mode is enabled or disabled. And I've added an adapter test to verify that the `DesignateCodingSystems` and `AcceptC1Controls` methods toggle the C1 parser mode as expected. I've manually verified the test cases in #10069 and #10310 to confirm that they're no longer triggering control sequences by default. Although, as I explained above, the C1 code points are completely dropped from the output rather than displayed as _notdef_ glyphs. I think this is a reasonable compromise though. Closes #10069 Closes #10310
2021-11-18 00:40:31 +01:00
_parserMode.set(mode, enabled);
}
bool StateMachine::GetParserMode(const Mode mode) const noexcept
Disable the acceptance of C1 control codes by default (#11690) There are some code pages with "unmapped" code points in the C1 range, which results in them being translated into Unicode C1 control codes, even though that is not their intended use. To avoid having these characters triggering unintentional escape sequences, this PR now disables C1 controls by default. Switching to ISO-2022 encoding will re-enable them, though, since that is the most likely scenario in which they would be required. They can also be explicitly enabled, even in UTF-8 mode, with the `DECAC1` escape sequence. What I've done is add a new mode to the `StateMachine` class that controls whether C1 code points are interpreted as control characters or not. When disabled, these code points are simply dropped from the output, similar to the way a `NUL` is interpreted. This isn't exactly the way they were handled in the v1 console (which I think replaces them with the font _notdef_ glyph), but it matches the XTerm behavior, which seems more appropriate considering this is in VT mode. And it's worth noting that Windows Explorer seems to work the same way. As mentioned above, the mode can be enabled by designating the ISO-2022 coding system with a `DOCS` sequence, and it will be disabled again when UTF-8 is designated. You can also enable it explicitly with a `DECAC1` sequence (originally this was actually a DEC printer sequence, but it doesn't seem unreasonable to use it in a terminal). I've also extended the operations that save and restore "cursor state" (e.g. `DECSC` and `DECRC`) to include the state of the C1 parser mode, since it's closely tied to the code page and character sets which are also saved there. Similarly, when a `DECSTR` sequence resets the code page and character sets, I've now made it reset the C1 mode as well. I should note that the new `StateMachine` mode is controlled via a generic `SetParserMode` method (with a matching API in the `ConGetSet` interface) to allow for easier addition of other modes in the future. And I've reimplemented the existing ANSI/VT52 mode in terms of these generic methods instead of it having to have its own separate APIs. ## Validation Steps Performed Some of the unit tests for OSC sequences were using a C1 `0x9C` for the string terminator, which doesn't work by default anymore. Since that's not a good practice anyway, I thought it best to change those to a standard 7-bit terminator. However, in tests that were explicitly validating the C1 controls, I've just enabled the C1 parser mode at the start of the tests in order to get them working again. There were also some ANSI mode adapter tests that had to be updated to account for the fact that it has now been reimplemented in terms of the `SetParserMode` API. I've added a new state machine test to validate the changes in behavior when the C1 parser mode is enabled or disabled. And I've added an adapter test to verify that the `DesignateCodingSystems` and `AcceptC1Controls` methods toggle the C1 parser mode as expected. I've manually verified the test cases in #10069 and #10310 to confirm that they're no longer triggering control sequences by default. Although, as I explained above, the C1 code points are completely dropped from the output rather than displayed as _notdef_ glyphs. I think this is a reasonable compromise though. Closes #10069 Closes #10310
2021-11-18 00:40:31 +01:00
{
return _parserMode.test(mode);
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
}
const IStateMachineEngine& StateMachine::Engine() const noexcept
{
return *_engine;
}
IStateMachineEngine& StateMachine::Engine() noexcept
{
return *_engine;
}
// Routine Description:
// - Determines if a character is a valid number character, 0-9.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isNumericParamValue(const wchar_t wch) noexcept
{
return wch >= L'0' && wch <= L'9'; // 0x30 - 0x39
}
#pragma warning(push)
#pragma warning(disable : 26497) // We don't use any of these "constexprable" functions in that fashion
// Routine Description:
// - Determines if a character belongs to the C0 escape range.
// This is character sequences less than a space character (null, backspace, new line, etc.)
// See also https://en.wikipedia.org/wiki/C0_and_C1_control_codes
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isC0Code(const wchar_t wch) noexcept
{
return (wch >= AsciiChars::NUL && wch <= AsciiChars::ETB) ||
wch == AsciiChars::EM ||
(wch >= AsciiChars::FS && wch <= AsciiChars::US);
}
// Routine Description:
// - Determines if a character is a C1 control characters.
// This is a single-character way to start a control sequence, as opposed to using ESC
// and their 7-bit equivalent.
//
// Not all single-byte codepages support C1 control codes--in some, the range that would
// be used for C1 codes are instead used for additional graphic characters.
//
// However, we do not need to worry about confusion whether a single byte, for example,
// \x9b in a single-byte stream represents a C1 CSI or some other glyph, because by the time we
// get here, everything is Unicode. Knowing whether a single-byte \x9b represents a
// single-character C1 CSI or some other glyph is handled by MultiByteToWideChar before
// we get here (if the stream was not already UTF-16). For instance, in CP_ACP, if a
// \x9b shows up, it will get converted to \x203a. So, if we get here, and have a
// \x009b, we know that it unambiguously represents a C1 CSI.
//
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isC1ControlCharacter(const wchar_t wch) noexcept
{
return (wch >= L'\x80' && wch <= L'\x9F');
}
// Routine Description:
// - Convert a C1 control characters to their 7-bit equivalent.
//
// Arguments:
// - wch - Character to convert.
// Return Value:
// - The 7-bit equivalent of the 8-bit control characters.
static constexpr wchar_t _c1To7Bit(const wchar_t wch) noexcept
{
return wch - L'\x40';
}
// Routine Description:
// - Determines if a character is a valid intermediate in an VT escape sequence.
// Intermediates are punctuation type characters that are generally vendor specific and
// modify the operational mode of a command.
// See also http://vt100.net/emu/dec_ansi_parser
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isIntermediate(const wchar_t wch) noexcept
{
return wch >= L' ' && wch <= L'/'; // 0x20 - 0x2F
}
// Routine Description:
// - Determines if a character is the delete character.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isDelete(const wchar_t wch) noexcept
{
return wch == AsciiChars::DEL;
}
// Routine Description:
// - Determines if a character is the escape character.
// Used to start escape sequences.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isEscape(const wchar_t wch) noexcept
{
return wch == AsciiChars::ESC;
}
// Routine Description:
// - Determines if a character is a delimiter between two parameters in an escape sequence.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isParameterDelimiter(const wchar_t wch) noexcept
{
return wch == L';'; // 0x3B
}
// Routine Description:
// - Determines if a character is "control sequence" beginning indicator.
// This immediately follows an escape and signifies a varying length control sequence.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isCsiIndicator(const wchar_t wch) noexcept
{
return wch == L'['; // 0x5B
}
// Routine Description:
// - Determines if a character is a private range marker for a control sequence.
// Private range markers indicate vendor-specific behavior.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isCsiPrivateMarker(const wchar_t wch) noexcept
{
return wch == L'<' || wch == L'=' || wch == L'>' || wch == L'?'; // 0x3C - 0x3F
}
// Routine Description:
// - Determines if a character is invalid in a control sequence
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isCsiInvalid(const wchar_t wch) noexcept
{
return wch == L':'; // 0x3A
}
// Routine Description:
// - Determines if a character is an invalid intermediate.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isIntermediateInvalid(const wchar_t wch) noexcept
{
// 0x30 - 0x3F
return _isNumericParamValue(wch) || _isCsiInvalid(wch) || _isParameterDelimiter(wch) || _isCsiPrivateMarker(wch);
}
// Routine Description:
// - Determines if a character is an invalid parameter.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isParameterInvalid(const wchar_t wch) noexcept
{
// 0x3A, 0x3C - 0x3F
return _isCsiInvalid(wch) || _isCsiPrivateMarker(wch);
}
// Routine Description:
// - Determines if a character is a string terminator indicator.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isStringTerminatorIndicator(const wchar_t wch) noexcept
{
return wch == L'\\'; // 0x5c
}
// Routine Description:
Improve support for VT character sets (#4496) This PR improves our VT character set support, enabling the [`SCS`] escape sequences to designate into all four G-sets with both 94- and 96-character sets, and supports invoking those G-sets into both the GL and GR areas of the code table, with [locking shifts] and [single shifts]. It also adds [`DOCS`] sequences to switch between UTF-8 and the ISO-2022 coding system (which is what the VT character sets require), and adds support for a lot more characters sets, up to around the level of a VT510. [`SCS`]: https://vt100.net/docs/vt510-rm/SCS.html [locking shifts]: https://vt100.net/docs/vt510-rm/LS.html [single shifts]: https://vt100.net/docs/vt510-rm/SS.html [`DOCS`]: https://en.wikipedia.org/wiki/ISO/IEC_2022#Interaction_with_other_coding_systems ## Detailed Description of the Pull Request / Additional comments To make it easier for us to declare a bunch of character sets, I've made a little `constexpr` class that can build up a mapping table from a base character set (ASCII or Latin1), along with a collection of mappings for the characters the deviate from the base set. Many of the character sets are simple variations of ASCII, so they're easy to define this way. This class then casts directly to a `wstring_view` which is how the translation tables are represented in most of the code. We have an array of four of these tables representing the four G-sets, two instances for the active left and right tables, and one instance for the single shift table. Initially we had just one `DesignateCharset` method, which could select the active character set. We now have two designate methods (for 94- and 96- character sets), and each takes a G-set number specifying the target of the designation, and a pair of characters identifying the character set that will be designated (at the higher VT levels, character sets are often identified by more than one character). There are then two new `LockingShift` methods to invoke these G-sets into either the GL or GR area of the code table, and a `SingleShift` method which invokes a G-set temporarily (for just the next character that is output). I should mention here that I had to make some changes to the state machine to make these single shift sequences work. The problem is that the input state machine treats `SS3` as the start of a control sequence, while the output state machine needs it to be dispatched immediately (it's literally the _Single Shift 3_ escape sequence). To make that work, I've added a `ParseControlSequenceAfterSs3` callback in the `IStateMachineEngine` interface to decide which behavior is appropriate. When it comes to mapping a character, it's simply an array reference into the appropriate `wstring_view` table. If the single shift table is set, that takes preference. Otherwise the GL table is used for characters in the range 0x20 to 0x7F, and the GR table for characters 0xA0 to 0xFF (technically some character sets will only map up to 0x7E and 0xFE, but that's easily controlled by the length of the `wstring_view`). The `DEL` character is a bit of a special case. By default it's meant to be ignored like the `NUL` character (it's essentially a time-fill character). However, it's possible that it could be remapped to a printable character in a 96-character set, so we need to check for that after the translation. This is handled in the `AdaptDispatch::Print` method, so it doesn't interfere with the primary `PrintString` code path. The biggest problem with this whole process, though, is that the GR mappings only really make sense if you have access to the raw output, but by the time the output gets to us, it would already have been translated to Unicode by the active code page. And in the case of UTF-8, the characters we eventually receive may originally have been composed from two or more code points. The way I've dealt with this was to disable the GR translations by default, and then added support for a pair of ISO-2022 `DOCS` sequences, which can switch the code page between UTF-8 and ISO-8859-1. When the code page is ISO-8859-1, we're essentially receiving the raw output bytes, so it's safe to enable the GR translations. This is not strictly correct ISO-2022 behavior, and there are edge cases where it's not going to work, but it's the best solution I could come up with. ## Validation Steps Performed As a result of the `SS3` changes in the state machine engine, I've had to move the existing `SS3` tests from the `OutputEngineTest` to the `InputEngineTest`, otherwise they would now fail (technically they should never have been output tests). I've added no additional unit tests, but I have done a lot of manual testing, and made sure we passed all the character set tests in Vttest (at least for the character sets we currently support). Note that this required a slightly hacked version of the app, since by default it doesn't expose a lot of the test to low-level terminals, and we currently identify as a VT100. Closes #3377 Closes #3487
2020-06-04 21:40:15 +02:00
// - Determines if a character is a "Single Shift Select" indicator.
// This immediately follows an escape and signifies a varying length control string.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isSs3Indicator(const wchar_t wch) noexcept
{
return wch == L'O'; // 0x4F
}
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
// Routine Description:
// - Determines if a character is the VT52 "Direct Cursor Address" command.
// This immediately follows an escape and signifies the start of a multiple
// character command sequence.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isVt52CursorAddress(const wchar_t wch) noexcept
{
return wch == L'Y'; // 0x59
}
// Routine Description:
Improve support for VT character sets (#4496) This PR improves our VT character set support, enabling the [`SCS`] escape sequences to designate into all four G-sets with both 94- and 96-character sets, and supports invoking those G-sets into both the GL and GR areas of the code table, with [locking shifts] and [single shifts]. It also adds [`DOCS`] sequences to switch between UTF-8 and the ISO-2022 coding system (which is what the VT character sets require), and adds support for a lot more characters sets, up to around the level of a VT510. [`SCS`]: https://vt100.net/docs/vt510-rm/SCS.html [locking shifts]: https://vt100.net/docs/vt510-rm/LS.html [single shifts]: https://vt100.net/docs/vt510-rm/SS.html [`DOCS`]: https://en.wikipedia.org/wiki/ISO/IEC_2022#Interaction_with_other_coding_systems ## Detailed Description of the Pull Request / Additional comments To make it easier for us to declare a bunch of character sets, I've made a little `constexpr` class that can build up a mapping table from a base character set (ASCII or Latin1), along with a collection of mappings for the characters the deviate from the base set. Many of the character sets are simple variations of ASCII, so they're easy to define this way. This class then casts directly to a `wstring_view` which is how the translation tables are represented in most of the code. We have an array of four of these tables representing the four G-sets, two instances for the active left and right tables, and one instance for the single shift table. Initially we had just one `DesignateCharset` method, which could select the active character set. We now have two designate methods (for 94- and 96- character sets), and each takes a G-set number specifying the target of the designation, and a pair of characters identifying the character set that will be designated (at the higher VT levels, character sets are often identified by more than one character). There are then two new `LockingShift` methods to invoke these G-sets into either the GL or GR area of the code table, and a `SingleShift` method which invokes a G-set temporarily (for just the next character that is output). I should mention here that I had to make some changes to the state machine to make these single shift sequences work. The problem is that the input state machine treats `SS3` as the start of a control sequence, while the output state machine needs it to be dispatched immediately (it's literally the _Single Shift 3_ escape sequence). To make that work, I've added a `ParseControlSequenceAfterSs3` callback in the `IStateMachineEngine` interface to decide which behavior is appropriate. When it comes to mapping a character, it's simply an array reference into the appropriate `wstring_view` table. If the single shift table is set, that takes preference. Otherwise the GL table is used for characters in the range 0x20 to 0x7F, and the GR table for characters 0xA0 to 0xFF (technically some character sets will only map up to 0x7E and 0xFE, but that's easily controlled by the length of the `wstring_view`). The `DEL` character is a bit of a special case. By default it's meant to be ignored like the `NUL` character (it's essentially a time-fill character). However, it's possible that it could be remapped to a printable character in a 96-character set, so we need to check for that after the translation. This is handled in the `AdaptDispatch::Print` method, so it doesn't interfere with the primary `PrintString` code path. The biggest problem with this whole process, though, is that the GR mappings only really make sense if you have access to the raw output, but by the time the output gets to us, it would already have been translated to Unicode by the active code page. And in the case of UTF-8, the characters we eventually receive may originally have been composed from two or more code points. The way I've dealt with this was to disable the GR translations by default, and then added support for a pair of ISO-2022 `DOCS` sequences, which can switch the code page between UTF-8 and ISO-8859-1. When the code page is ISO-8859-1, we're essentially receiving the raw output bytes, so it's safe to enable the GR translations. This is not strictly correct ISO-2022 behavior, and there are edge cases where it's not going to work, but it's the best solution I could come up with. ## Validation Steps Performed As a result of the `SS3` changes in the state machine engine, I've had to move the existing `SS3` tests from the `OutputEngineTest` to the `InputEngineTest`, otherwise they would now fail (technically they should never have been output tests). I've added no additional unit tests, but I have done a lot of manual testing, and made sure we passed all the character set tests in Vttest (at least for the character sets we currently support). Note that this required a slightly hacked version of the app, since by default it doesn't expose a lot of the test to low-level terminals, and we currently identify as a VT100. Closes #3377 Closes #3487
2020-06-04 21:40:15 +02:00
// - Determines if a character is "operating system control string" beginning
// indicator.
// This immediately follows an escape and signifies a signifies a varying
// length control sequence, quite similar to CSI.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isOscIndicator(const wchar_t wch) noexcept
{
return wch == L']'; // 0x5D
}
// Routine Description:
// - Determines if a character is a delimiter between two parameters in a "operating system control sequence"
// This occurs in the middle of a control sequence after escape and OscIndicator have been recognized,
// after the parameter indicating which OSC action to take.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isOscDelimiter(const wchar_t wch) noexcept
{
return wch == L';'; // 0x3B
}
// Routine Description:
// - Determines if a character should be ignored in a operating system control sequence
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isOscInvalid(const wchar_t wch) noexcept
{
return wch <= L'\x17' ||
wch == L'\x19' ||
(wch >= L'\x1c' && wch <= L'\x1f');
}
// Routine Description:
// - Determines if a character is "operating system control string" termination indicator.
// This signals the end of an OSC string collection.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isOscTerminator(const wchar_t wch) noexcept
{
return wch == AsciiChars::BEL; // Bell character
}
// Routine Description:
// - Determines if a character is "device control string" beginning
// indicator.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isDcsIndicator(const wchar_t wch) noexcept
{
return wch == L'P'; // 0x50
}
// Routine Description:
// - Determines if a character is valid for a DCS pass through sequence.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isDcsPassThroughValid(const wchar_t wch) noexcept
{
// 0x20 - 0x7E
return wch >= AsciiChars::SPC && wch < AsciiChars::DEL;
}
// Routine Description:
// - Determines if a character is "start of string" beginning
// indicator.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isSosIndicator(const wchar_t wch) noexcept
{
return wch == L'X'; // 0x58
}
// Routine Description:
// - Determines if a character is "private message" beginning
// indicator.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isPmIndicator(const wchar_t wch) noexcept
{
return wch == L'^'; // 0x5E
}
// Routine Description:
// - Determines if a character is "application program command" beginning
// indicator.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isApcIndicator(const wchar_t wch) noexcept
{
return wch == L'_'; // 0x5F
}
// Routine Description:
// - Determines if a character indicates an action that should be taken in the ground state -
// These are C0 characters and the C1 [single-character] CSI.
// Arguments:
// - wch - Character to check.
// Return Value:
// - True if it is. False if it isn't.
static constexpr bool _isActionableFromGround(const wchar_t wch) noexcept
{
return (wch <= AsciiChars::US) || _isC1ControlCharacter(wch) || _isDelete(wch);
}
#pragma warning(pop)
// Routine Description:
// - Triggers the Execute action to indicate that the listener should immediately respond to a C0 control character.
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
void StateMachine::_ActionExecute(const wchar_t wch)
{
_trace.TraceOnExecute(wch);
const bool success = _engine->ActionExecute(wch);
// Trace the result.
_trace.DispatchSequenceTrace(success);
}
// Routine Description:
// - Triggers the Execute action to indicate that the listener should
// immediately respond to a C0 control character, with the added
// information that we're executing it from the Escape state.
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
void StateMachine::_ActionExecuteFromEscape(const wchar_t wch)
{
_trace.TraceOnExecuteFromEscape(wch);
const bool success = _engine->ActionExecuteFromEscape(wch);
// Trace the result.
_trace.DispatchSequenceTrace(success);
}
// Routine Description:
// - Triggers the Print action to indicate that the listener should render the character given.
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
void StateMachine::_ActionPrint(const wchar_t wch)
{
_trace.TraceOnAction(L"Print");
const bool success = _engine->ActionPrint(wch);
// Trace the result.
_trace.DispatchSequenceTrace(success);
}
// Routine Description:
// - Triggers the EscDispatch action to indicate that the listener should handle a simple escape sequence.
// These sequences traditionally start with ESC and a simple letter. No complicated parameters.
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
void StateMachine::_ActionEscDispatch(const wchar_t wch)
{
_trace.TraceOnAction(L"EscDispatch");
Refactor VT control sequence identification (#7304) This PR changes the way VT control sequences are identified and dispatched, to be more efficient and easier to extend. Instead of parsing the intermediate characters into a vector, and then having to identify a sequence using both that vector and the final char, we now use just a single `uint64_t` value as the identifier. The way the identifier is constructed is by taking the private parameter prefix, each of the intermediate characters, and then the final character, and shifting them into a 64-bit integer one byte at a time, in reverse order. For example, the `DECTLTC` control has a private parameter prefix of `?`, one intermediate of `'`, and a final character of `s`. The ASCII values of those characters are `0x3F`, `0x27`, and `0x73` respectively, and reversing them gets you 0x73273F, so that would then be the identifier for the control. The reason for storing them in reverse order, is because sometimes we need to look at the first intermediate to determine the operation, and treat the rest of the sequence as a kind of sub-identifier (the character set designation sequences are one example of this). When in reverse order, this can easily be achieved by masking off the low byte to get the first intermediate, and then shifting the value right by 8 bits to get a new identifier with the rest of the sequence. With 64 bits we have enough space for a private prefix, six intermediates, and the final char, which is way more than we should ever need (the _DEC STD 070_ specification recommends supporting at least three intermediates, but in practice we're unlikely to see more than two). With this new way of identifying controls, it should now be possible for every action code to be unique (for the most part). So I've also used this PR to clean up the action codes a bit, splitting the codes for the escape sequences from the control sequences, and sorting them into alphabetical order (which also does a reasonable job of clustering associated controls). ## Validation Steps Performed I think the existing unit tests should be good enough to confirm that all sequences are still being dispatched correctly. However, I've also manually tested a number of sequences to make sure they were still working as expected, in particular those that used intermediates, since they were the most affected by the dispatch code refactoring. Since these changes also affected the input state machine, I've done some manual testing of the conpty keyboard handling (both with and without the new Win32 input mode enabled) to make sure the keyboard VT sequences were processed correctly. I've also manually tested the various VT mouse modes in Vttest to confirm that they were still working correctly too. Closes #7276
2020-08-18 20:57:52 +02:00
const bool success = _engine->ActionEscDispatch(_identifier.Finalize(wch));
// Trace the result.
_trace.DispatchSequenceTrace(success);
if (!success)
{
// Suppress it and log telemetry on failed cases
TermTelemetry::Instance().LogFailed(wch);
}
}
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
// Routine Description:
// - Triggers the Vt52EscDispatch action to indicate that the listener should handle a VT52 escape sequence.
// These sequences start with ESC and a single letter, sometimes followed by parameters.
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
void StateMachine::_ActionVt52EscDispatch(const wchar_t wch)
{
_trace.TraceOnAction(L"Vt52EscDispatch");
Refactor VT control sequence identification (#7304) This PR changes the way VT control sequences are identified and dispatched, to be more efficient and easier to extend. Instead of parsing the intermediate characters into a vector, and then having to identify a sequence using both that vector and the final char, we now use just a single `uint64_t` value as the identifier. The way the identifier is constructed is by taking the private parameter prefix, each of the intermediate characters, and then the final character, and shifting them into a 64-bit integer one byte at a time, in reverse order. For example, the `DECTLTC` control has a private parameter prefix of `?`, one intermediate of `'`, and a final character of `s`. The ASCII values of those characters are `0x3F`, `0x27`, and `0x73` respectively, and reversing them gets you 0x73273F, so that would then be the identifier for the control. The reason for storing them in reverse order, is because sometimes we need to look at the first intermediate to determine the operation, and treat the rest of the sequence as a kind of sub-identifier (the character set designation sequences are one example of this). When in reverse order, this can easily be achieved by masking off the low byte to get the first intermediate, and then shifting the value right by 8 bits to get a new identifier with the rest of the sequence. With 64 bits we have enough space for a private prefix, six intermediates, and the final char, which is way more than we should ever need (the _DEC STD 070_ specification recommends supporting at least three intermediates, but in practice we're unlikely to see more than two). With this new way of identifying controls, it should now be possible for every action code to be unique (for the most part). So I've also used this PR to clean up the action codes a bit, splitting the codes for the escape sequences from the control sequences, and sorting them into alphabetical order (which also does a reasonable job of clustering associated controls). ## Validation Steps Performed I think the existing unit tests should be good enough to confirm that all sequences are still being dispatched correctly. However, I've also manually tested a number of sequences to make sure they were still working as expected, in particular those that used intermediates, since they were the most affected by the dispatch code refactoring. Since these changes also affected the input state machine, I've done some manual testing of the conpty keyboard handling (both with and without the new Win32 input mode enabled) to make sure the keyboard VT sequences were processed correctly. I've also manually tested the various VT mouse modes in Vttest to confirm that they were still working correctly too. Closes #7276
2020-08-18 20:57:52 +02:00
const bool success = _engine->ActionVt52EscDispatch(_identifier.Finalize(wch),
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
{ _parameters.data(), _parameters.size() });
// Trace the result.
_trace.DispatchSequenceTrace(success);
if (!success)
{
// Suppress it and log telemetry on failed cases
TermTelemetry::Instance().LogFailed(wch);
}
}
// Routine Description:
// - Triggers the CsiDispatch action to indicate that the listener should handle a control sequence.
// These sequences perform various API-type commands that can include many parameters.
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
void StateMachine::_ActionCsiDispatch(const wchar_t wch)
{
_trace.TraceOnAction(L"CsiDispatch");
Refactor VT control sequence identification (#7304) This PR changes the way VT control sequences are identified and dispatched, to be more efficient and easier to extend. Instead of parsing the intermediate characters into a vector, and then having to identify a sequence using both that vector and the final char, we now use just a single `uint64_t` value as the identifier. The way the identifier is constructed is by taking the private parameter prefix, each of the intermediate characters, and then the final character, and shifting them into a 64-bit integer one byte at a time, in reverse order. For example, the `DECTLTC` control has a private parameter prefix of `?`, one intermediate of `'`, and a final character of `s`. The ASCII values of those characters are `0x3F`, `0x27`, and `0x73` respectively, and reversing them gets you 0x73273F, so that would then be the identifier for the control. The reason for storing them in reverse order, is because sometimes we need to look at the first intermediate to determine the operation, and treat the rest of the sequence as a kind of sub-identifier (the character set designation sequences are one example of this). When in reverse order, this can easily be achieved by masking off the low byte to get the first intermediate, and then shifting the value right by 8 bits to get a new identifier with the rest of the sequence. With 64 bits we have enough space for a private prefix, six intermediates, and the final char, which is way more than we should ever need (the _DEC STD 070_ specification recommends supporting at least three intermediates, but in practice we're unlikely to see more than two). With this new way of identifying controls, it should now be possible for every action code to be unique (for the most part). So I've also used this PR to clean up the action codes a bit, splitting the codes for the escape sequences from the control sequences, and sorting them into alphabetical order (which also does a reasonable job of clustering associated controls). ## Validation Steps Performed I think the existing unit tests should be good enough to confirm that all sequences are still being dispatched correctly. However, I've also manually tested a number of sequences to make sure they were still working as expected, in particular those that used intermediates, since they were the most affected by the dispatch code refactoring. Since these changes also affected the input state machine, I've done some manual testing of the conpty keyboard handling (both with and without the new Win32 input mode enabled) to make sure the keyboard VT sequences were processed correctly. I've also manually tested the various VT mouse modes in Vttest to confirm that they were still working correctly too. Closes #7276
2020-08-18 20:57:52 +02:00
const bool success = _engine->ActionCsiDispatch(_identifier.Finalize(wch),
{ _parameters.data(), _parameters.size() });
// Trace the result.
_trace.DispatchSequenceTrace(success);
if (!success)
{
// Suppress it and log telemetry on failed cases
TermTelemetry::Instance().LogFailed(wch);
}
}
// Routine Description:
// - Triggers the Collect action to indicate that the state machine should store this character as part of an escape/control sequence.
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
Refactor VT control sequence identification (#7304) This PR changes the way VT control sequences are identified and dispatched, to be more efficient and easier to extend. Instead of parsing the intermediate characters into a vector, and then having to identify a sequence using both that vector and the final char, we now use just a single `uint64_t` value as the identifier. The way the identifier is constructed is by taking the private parameter prefix, each of the intermediate characters, and then the final character, and shifting them into a 64-bit integer one byte at a time, in reverse order. For example, the `DECTLTC` control has a private parameter prefix of `?`, one intermediate of `'`, and a final character of `s`. The ASCII values of those characters are `0x3F`, `0x27`, and `0x73` respectively, and reversing them gets you 0x73273F, so that would then be the identifier for the control. The reason for storing them in reverse order, is because sometimes we need to look at the first intermediate to determine the operation, and treat the rest of the sequence as a kind of sub-identifier (the character set designation sequences are one example of this). When in reverse order, this can easily be achieved by masking off the low byte to get the first intermediate, and then shifting the value right by 8 bits to get a new identifier with the rest of the sequence. With 64 bits we have enough space for a private prefix, six intermediates, and the final char, which is way more than we should ever need (the _DEC STD 070_ specification recommends supporting at least three intermediates, but in practice we're unlikely to see more than two). With this new way of identifying controls, it should now be possible for every action code to be unique (for the most part). So I've also used this PR to clean up the action codes a bit, splitting the codes for the escape sequences from the control sequences, and sorting them into alphabetical order (which also does a reasonable job of clustering associated controls). ## Validation Steps Performed I think the existing unit tests should be good enough to confirm that all sequences are still being dispatched correctly. However, I've also manually tested a number of sequences to make sure they were still working as expected, in particular those that used intermediates, since they were the most affected by the dispatch code refactoring. Since these changes also affected the input state machine, I've done some manual testing of the conpty keyboard handling (both with and without the new Win32 input mode enabled) to make sure the keyboard VT sequences were processed correctly. I've also manually tested the various VT mouse modes in Vttest to confirm that they were still working correctly too. Closes #7276
2020-08-18 20:57:52 +02:00
void StateMachine::_ActionCollect(const wchar_t wch) noexcept
{
_trace.TraceOnAction(L"Collect");
// store collect data
Refactor VT control sequence identification (#7304) This PR changes the way VT control sequences are identified and dispatched, to be more efficient and easier to extend. Instead of parsing the intermediate characters into a vector, and then having to identify a sequence using both that vector and the final char, we now use just a single `uint64_t` value as the identifier. The way the identifier is constructed is by taking the private parameter prefix, each of the intermediate characters, and then the final character, and shifting them into a 64-bit integer one byte at a time, in reverse order. For example, the `DECTLTC` control has a private parameter prefix of `?`, one intermediate of `'`, and a final character of `s`. The ASCII values of those characters are `0x3F`, `0x27`, and `0x73` respectively, and reversing them gets you 0x73273F, so that would then be the identifier for the control. The reason for storing them in reverse order, is because sometimes we need to look at the first intermediate to determine the operation, and treat the rest of the sequence as a kind of sub-identifier (the character set designation sequences are one example of this). When in reverse order, this can easily be achieved by masking off the low byte to get the first intermediate, and then shifting the value right by 8 bits to get a new identifier with the rest of the sequence. With 64 bits we have enough space for a private prefix, six intermediates, and the final char, which is way more than we should ever need (the _DEC STD 070_ specification recommends supporting at least three intermediates, but in practice we're unlikely to see more than two). With this new way of identifying controls, it should now be possible for every action code to be unique (for the most part). So I've also used this PR to clean up the action codes a bit, splitting the codes for the escape sequences from the control sequences, and sorting them into alphabetical order (which also does a reasonable job of clustering associated controls). ## Validation Steps Performed I think the existing unit tests should be good enough to confirm that all sequences are still being dispatched correctly. However, I've also manually tested a number of sequences to make sure they were still working as expected, in particular those that used intermediates, since they were the most affected by the dispatch code refactoring. Since these changes also affected the input state machine, I've done some manual testing of the conpty keyboard handling (both with and without the new Win32 input mode enabled) to make sure the keyboard VT sequences were processed correctly. I've also manually tested the various VT mouse modes in Vttest to confirm that they were still working correctly too. Closes #7276
2020-08-18 20:57:52 +02:00
_identifier.AddIntermediate(wch);
}
// Routine Description:
// - Triggers the Param action to indicate that the state machine should store this character as a part of a parameter
// to a control sequence.
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
void StateMachine::_ActionParam(const wchar_t wch)
{
_trace.TraceOnAction(L"Param");
Refactor VT parameter handling (#7799) This PR introduces a pair of classes for managing VT parameters that automatically handle range checking and default fallback values, so the individual operations don't have to do that validation themselves. In addition to simplifying the code, this fixes a few cases where we were mishandling missing or extraneous parameters, and adds support for parameter sequences on commands that couldn't previously handle them. This PR also sets a limit on the number of parameters allowed, to help thwart DoS memory consumption attacks. ## References * The new parameter class also introduces the concept of an omitted/default parameter which is not necessarily zero, which is a prerequisite for addressing issue #4417. ## Detailed Description of the Pull Request / Additional comments There are two new classes provide by this PR: a `VTParameter` class, similar in function to a `std::optional<size_t>`, which holds an individual parameter (which may be an omitted/default value); and a `VTParameters` class, similar in function to `gsl:span<VTParameter>`, which holds a sequence of those parameters. Where `VTParameter` differs from `std::optional` is with the inclusion of two cast operators. There is a `size_t` cast that interprets omitted and zero values as 1 (the expected behaviour for most numeric parameters). And there is a generic cast, for use with the enum parameter types, which interprets omitted values as 0 (the expected behaviour for most selective parameters). The advantage of `VTParameters` class is that it has an `at` method that can never fail - out of range values simply return the a default `VTParameter` instance (this is standard behaviour in VT terminals). It also has a `size` method that will always return a minimum count of 1, since an empty parameter list is typically the equivalent of a single "default" parameter, so this guarantees you'll get at least one value when iterating over the list with `size()`. For cases where we just need to call the same dispatch method for every parameter, there is a helper `for_each` method, which repeatedly calls a given predicate function with each value in the sequence. It also collates the returned success values to determine the overall result of the sequence. As with the `size` method, this will always make at least one call, so it correctly handles empty sequences. With those two classes in place, we could get rid of all the parameter validation and default handling code in the `OutputStateMachineEngine`. We now just use the `VTParameters::at` method to grab a parameter and typically pass it straight to the appropriate dispatch method, letting the cast operators automatically handle the assignment of default values. Occasionally we might need a `value_or` call to specify a non-standard default value, but those cases are fairly rare. In some case the `OutputStateMachineEngine` was also checking whether parameters values were in range, but for the most part this shouldn't have been necessary, since that is something the dispatch classes would already have been doing themselves (in the few cases that they weren't, I've now updated them to do so). I've also updated the `InputStateMachineEngine` in a similar way to the `OutputStateMachineEngine`, getting rid of a few of the parameter extraction methods, and simplifying other parts of the implementation. It's not as clean a replacement as the output engine, but there are still benefits in using the new classes. ## Validation Steps Performed For the most part I haven't had to alter existing tests other than accounting for changes to the API. There were a couple of tests I needed to drop because they were checking for failure cases which shouldn't have been failing (unexpected parameters should never be an error), or testing output engine validation that is no longer handled at that level. I've added a few new tests to cover operations that take sequences of selective parameters (`ED`, `EL`, `TBC`, `SM`, and `RM`). And I've extended the cursor movement tests to make sure those operations can handle extraneous parameters that weren't expected. I've also added a test to verify that the state machine will correctly ignore parameters beyond the maximum 32 parameter count limit. I've also manual confirmed that the various test cases given in issues #2101 are now working as expected. Closes #2101
2020-10-15 18:12:52 +02:00
// Once we've reached the parameter limit, additional parameters are ignored.
if (!_parameterLimitReached)
{
Refactor VT parameter handling (#7799) This PR introduces a pair of classes for managing VT parameters that automatically handle range checking and default fallback values, so the individual operations don't have to do that validation themselves. In addition to simplifying the code, this fixes a few cases where we were mishandling missing or extraneous parameters, and adds support for parameter sequences on commands that couldn't previously handle them. This PR also sets a limit on the number of parameters allowed, to help thwart DoS memory consumption attacks. ## References * The new parameter class also introduces the concept of an omitted/default parameter which is not necessarily zero, which is a prerequisite for addressing issue #4417. ## Detailed Description of the Pull Request / Additional comments There are two new classes provide by this PR: a `VTParameter` class, similar in function to a `std::optional<size_t>`, which holds an individual parameter (which may be an omitted/default value); and a `VTParameters` class, similar in function to `gsl:span<VTParameter>`, which holds a sequence of those parameters. Where `VTParameter` differs from `std::optional` is with the inclusion of two cast operators. There is a `size_t` cast that interprets omitted and zero values as 1 (the expected behaviour for most numeric parameters). And there is a generic cast, for use with the enum parameter types, which interprets omitted values as 0 (the expected behaviour for most selective parameters). The advantage of `VTParameters` class is that it has an `at` method that can never fail - out of range values simply return the a default `VTParameter` instance (this is standard behaviour in VT terminals). It also has a `size` method that will always return a minimum count of 1, since an empty parameter list is typically the equivalent of a single "default" parameter, so this guarantees you'll get at least one value when iterating over the list with `size()`. For cases where we just need to call the same dispatch method for every parameter, there is a helper `for_each` method, which repeatedly calls a given predicate function with each value in the sequence. It also collates the returned success values to determine the overall result of the sequence. As with the `size` method, this will always make at least one call, so it correctly handles empty sequences. With those two classes in place, we could get rid of all the parameter validation and default handling code in the `OutputStateMachineEngine`. We now just use the `VTParameters::at` method to grab a parameter and typically pass it straight to the appropriate dispatch method, letting the cast operators automatically handle the assignment of default values. Occasionally we might need a `value_or` call to specify a non-standard default value, but those cases are fairly rare. In some case the `OutputStateMachineEngine` was also checking whether parameters values were in range, but for the most part this shouldn't have been necessary, since that is something the dispatch classes would already have been doing themselves (in the few cases that they weren't, I've now updated them to do so). I've also updated the `InputStateMachineEngine` in a similar way to the `OutputStateMachineEngine`, getting rid of a few of the parameter extraction methods, and simplifying other parts of the implementation. It's not as clean a replacement as the output engine, but there are still benefits in using the new classes. ## Validation Steps Performed For the most part I haven't had to alter existing tests other than accounting for changes to the API. There were a couple of tests I needed to drop because they were checking for failure cases which shouldn't have been failing (unexpected parameters should never be an error), or testing output engine validation that is no longer handled at that level. I've added a few new tests to cover operations that take sequences of selective parameters (`ED`, `EL`, `TBC`, `SM`, and `RM`). And I've extended the cursor movement tests to make sure those operations can handle extraneous parameters that weren't expected. I've also added a test to verify that the state machine will correctly ignore parameters beyond the maximum 32 parameter count limit. I've also manual confirmed that the various test cases given in issues #2101 are now working as expected. Closes #2101
2020-10-15 18:12:52 +02:00
// If we have no parameters and we're about to add one, get the next value ready here.
if (_parameters.empty())
{
_parameters.push_back({});
}
// On a delimiter, increase the number of params we've seen.
// "Empty" params should still count as a param -
// eg "\x1b[0;;m" should be three params
if (_isParameterDelimiter(wch))
{
// If we receive a delimiter after we've already accumulated the
// maximum allowed parameters, then we need to set a flag to
// indicate that further parameter characters should be ignored.
if (_parameters.size() >= MAX_PARAMETER_COUNT)
{
_parameterLimitReached = true;
}
else
{
// Otherwise move to next param.
_parameters.push_back({});
}
}
else
{
// Accumulate the character given into the last (current) parameter.
// If the value hasn't been initialized yet, it'll start as 0.
auto currentParameter = _parameters.back().value_or(0);
_AccumulateTo(wch, currentParameter);
_parameters.back() = currentParameter;
}
}
}
// Routine Description:
// - Triggers the Clear action to indicate that the state machine should erase all internal state.
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
void StateMachine::_ActionClear()
{
_trace.TraceOnAction(L"Clear");
// clear all internal stored state.
Refactor VT control sequence identification (#7304) This PR changes the way VT control sequences are identified and dispatched, to be more efficient and easier to extend. Instead of parsing the intermediate characters into a vector, and then having to identify a sequence using both that vector and the final char, we now use just a single `uint64_t` value as the identifier. The way the identifier is constructed is by taking the private parameter prefix, each of the intermediate characters, and then the final character, and shifting them into a 64-bit integer one byte at a time, in reverse order. For example, the `DECTLTC` control has a private parameter prefix of `?`, one intermediate of `'`, and a final character of `s`. The ASCII values of those characters are `0x3F`, `0x27`, and `0x73` respectively, and reversing them gets you 0x73273F, so that would then be the identifier for the control. The reason for storing them in reverse order, is because sometimes we need to look at the first intermediate to determine the operation, and treat the rest of the sequence as a kind of sub-identifier (the character set designation sequences are one example of this). When in reverse order, this can easily be achieved by masking off the low byte to get the first intermediate, and then shifting the value right by 8 bits to get a new identifier with the rest of the sequence. With 64 bits we have enough space for a private prefix, six intermediates, and the final char, which is way more than we should ever need (the _DEC STD 070_ specification recommends supporting at least three intermediates, but in practice we're unlikely to see more than two). With this new way of identifying controls, it should now be possible for every action code to be unique (for the most part). So I've also used this PR to clean up the action codes a bit, splitting the codes for the escape sequences from the control sequences, and sorting them into alphabetical order (which also does a reasonable job of clustering associated controls). ## Validation Steps Performed I think the existing unit tests should be good enough to confirm that all sequences are still being dispatched correctly. However, I've also manually tested a number of sequences to make sure they were still working as expected, in particular those that used intermediates, since they were the most affected by the dispatch code refactoring. Since these changes also affected the input state machine, I've done some manual testing of the conpty keyboard handling (both with and without the new Win32 input mode enabled) to make sure the keyboard VT sequences were processed correctly. I've also manually tested the various VT mouse modes in Vttest to confirm that they were still working correctly too. Closes #7276
2020-08-18 20:57:52 +02:00
_identifier.Clear();
_parameters.clear();
Refactor VT parameter handling (#7799) This PR introduces a pair of classes for managing VT parameters that automatically handle range checking and default fallback values, so the individual operations don't have to do that validation themselves. In addition to simplifying the code, this fixes a few cases where we were mishandling missing or extraneous parameters, and adds support for parameter sequences on commands that couldn't previously handle them. This PR also sets a limit on the number of parameters allowed, to help thwart DoS memory consumption attacks. ## References * The new parameter class also introduces the concept of an omitted/default parameter which is not necessarily zero, which is a prerequisite for addressing issue #4417. ## Detailed Description of the Pull Request / Additional comments There are two new classes provide by this PR: a `VTParameter` class, similar in function to a `std::optional<size_t>`, which holds an individual parameter (which may be an omitted/default value); and a `VTParameters` class, similar in function to `gsl:span<VTParameter>`, which holds a sequence of those parameters. Where `VTParameter` differs from `std::optional` is with the inclusion of two cast operators. There is a `size_t` cast that interprets omitted and zero values as 1 (the expected behaviour for most numeric parameters). And there is a generic cast, for use with the enum parameter types, which interprets omitted values as 0 (the expected behaviour for most selective parameters). The advantage of `VTParameters` class is that it has an `at` method that can never fail - out of range values simply return the a default `VTParameter` instance (this is standard behaviour in VT terminals). It also has a `size` method that will always return a minimum count of 1, since an empty parameter list is typically the equivalent of a single "default" parameter, so this guarantees you'll get at least one value when iterating over the list with `size()`. For cases where we just need to call the same dispatch method for every parameter, there is a helper `for_each` method, which repeatedly calls a given predicate function with each value in the sequence. It also collates the returned success values to determine the overall result of the sequence. As with the `size` method, this will always make at least one call, so it correctly handles empty sequences. With those two classes in place, we could get rid of all the parameter validation and default handling code in the `OutputStateMachineEngine`. We now just use the `VTParameters::at` method to grab a parameter and typically pass it straight to the appropriate dispatch method, letting the cast operators automatically handle the assignment of default values. Occasionally we might need a `value_or` call to specify a non-standard default value, but those cases are fairly rare. In some case the `OutputStateMachineEngine` was also checking whether parameters values were in range, but for the most part this shouldn't have been necessary, since that is something the dispatch classes would already have been doing themselves (in the few cases that they weren't, I've now updated them to do so). I've also updated the `InputStateMachineEngine` in a similar way to the `OutputStateMachineEngine`, getting rid of a few of the parameter extraction methods, and simplifying other parts of the implementation. It's not as clean a replacement as the output engine, but there are still benefits in using the new classes. ## Validation Steps Performed For the most part I haven't had to alter existing tests other than accounting for changes to the API. There were a couple of tests I needed to drop because they were checking for failure cases which shouldn't have been failing (unexpected parameters should never be an error), or testing output engine validation that is no longer handled at that level. I've added a few new tests to cover operations that take sequences of selective parameters (`ED`, `EL`, `TBC`, `SM`, and `RM`). And I've extended the cursor movement tests to make sure those operations can handle extraneous parameters that weren't expected. I've also added a test to verify that the state machine will correctly ignore parameters beyond the maximum 32 parameter count limit. I've also manual confirmed that the various test cases given in issues #2101 are now working as expected. Closes #2101
2020-10-15 18:12:52 +02:00
_parameterLimitReached = false;
_oscString.clear();
_oscParameter = 0;
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
_dcsStringHandler = nullptr;
_engine->ActionClear();
}
// Routine Description:
// - Triggers the Ignore action to indicate that the state machine should eat this character and say nothing.
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
void StateMachine::_ActionIgnore() noexcept
{
// do nothing.
_trace.TraceOnAction(L"Ignore");
}
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
// Routine Description:
// - Triggers the end of a data string when a CAN, SUB, or ESC is seen.
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_ActionInterrupt()
{
// This is only applicable for DCS strings. OSC strings require a full
// ST sequence to be received before they can be dispatched.
if (_state == VTStates::DcsPassThrough)
{
// The ESC signals the end of the data string.
_dcsStringHandler(AsciiChars::ESC);
_dcsStringHandler = nullptr;
}
}
// Routine Description:
// - Stores this character as part of the param indicating which OSC action to take.
// Arguments:
// - wch - Character to collect.
// Return Value:
// - <none>
Clamp parameter values to a maximum of 32767. (#5200) ## Summary of the Pull Request This PR clamps the parameter values in the VT `StateMachine` parser to 32767, which was the initial limit prior to PR #3956. This fixes a number of overflow bugs (some of which could cause the app to crash), since much of the code is not prepared to handle values outside the range of a `short`. ## References #3956 - the PR where the cap was changed to the range of `size_t` #4254 - one example of a crash caused by the higher range ## PR Checklist * [x] Closes #5160 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx ## Detailed Description of the Pull Request / Additional comments The DEC STD 070 reference recommends supporting up to at least 16384 for parameter values, so 32767 should be more than enough for any standard VT sequence. It might be nice to increase the limit to 65535 at some point, since that is the cap used by both XTerm and VTE. However, that is not essential, since there are very few situations where you'd even notice the difference. For now, 32767 is the safest choice for us, since anything greater than that has the potential to overflow and crash the app in a number of places. ## Validation Steps Performed I had to make a couple of modifications to the range checks in the `OutputEngineTest`, more or less reverting to the pre-#3956 behavior, but after that all of the unit tests passed as expected. I manually confirmed that this fixes the hanging test case from #5160, as well as overflow issues in the cursor operations, and crashes in `IL` and `DL` (see https://github.com/microsoft/terminal/issues/4254#issuecomment-575292926).
2020-04-01 14:49:57 +02:00
void StateMachine::_ActionOscParam(const wchar_t wch) noexcept
{
_trace.TraceOnAction(L"OscParamCollect");
_AccumulateTo(wch, _oscParameter);
}
// Routine Description:
// - Stores this character as part of the OSC string
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
void StateMachine::_ActionOscPut(const wchar_t wch)
{
_trace.TraceOnAction(L"OscPut");
_oscString.push_back(wch);
}
// Routine Description:
// - Triggers the CsiDispatch action to indicate that the listener should handle a control sequence.
// These sequences perform various API-type commands that can include many parameters.
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
void StateMachine::_ActionOscDispatch(const wchar_t wch)
{
_trace.TraceOnAction(L"OscDispatch");
const bool success = _engine->ActionOscDispatch(wch, _oscParameter, _oscString);
// Trace the result.
_trace.DispatchSequenceTrace(success);
if (!success)
{
// Suppress it and log telemetry on failed cases
TermTelemetry::Instance().LogFailed(wch);
}
}
// Routine Description:
// - Triggers the Ss3Dispatch action to indicate that the listener should handle a control sequence.
// These sequences perform various API-type commands that can include many parameters.
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
void StateMachine::_ActionSs3Dispatch(const wchar_t wch)
{
_trace.TraceOnAction(L"Ss3Dispatch");
const bool success = _engine->ActionSs3Dispatch(wch, { _parameters.data(), _parameters.size() });
// Trace the result.
_trace.DispatchSequenceTrace(success);
if (!success)
{
// Suppress it and log telemetry on failed cases
TermTelemetry::Instance().LogFailed(wch);
}
}
// Routine Description:
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
// - Triggers the DcsDispatch action to indicate that the listener should handle a control sequence.
// The returned handler function will be used to process the subsequent data string characters.
// Arguments:
// - wch - Character to dispatch.
// Return Value:
// - <none>
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
void StateMachine::_ActionDcsDispatch(const wchar_t wch)
{
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
_trace.TraceOnAction(L"DcsDispatch");
_dcsStringHandler = _engine->ActionDcsDispatch(_identifier.Finalize(wch),
{ _parameters.data(), _parameters.size() });
// If the returned handler is null, the sequence is not supported.
const bool success = _dcsStringHandler != nullptr;
// Trace the result.
_trace.DispatchSequenceTrace(success);
if (success)
{
// If successful, enter the pass through state.
_EnterDcsPassThrough();
}
else
{
// Otherwise ignore remaining chars and log telemetry on failed cases
_EnterDcsIgnore();
TermTelemetry::Instance().LogFailed(wch);
}
}
// Routine Description:
// - Moves the state machine into the Ground state.
// This state is entered:
// 1. By default at the beginning of operation
// 2. After any execute/dispatch action.
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterGround() noexcept
{
_state = VTStates::Ground;
_cachedSequence.reset(); // entering ground means we've completed the pending sequence
_trace.TraceStateChange(L"Ground");
}
// Routine Description:
// - Moves the state machine into the Escape state.
// This state is entered:
// 1. When the Escape character is seen at any time.
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterEscape()
{
_state = VTStates::Escape;
_trace.TraceStateChange(L"Escape");
_ActionClear();
_trace.ClearSequenceTrace();
}
// Routine Description:
// - Moves the state machine into the EscapeIntermediate state.
// This state is entered:
// 1. When EscIntermediate characters are seen after an Escape entry (only from the Escape state)
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterEscapeIntermediate() noexcept
{
_state = VTStates::EscapeIntermediate;
_trace.TraceStateChange(L"EscapeIntermediate");
}
// Routine Description:
// - Moves the state machine into the CsiEntry state.
// This state is entered:
// 1. When the CsiEntry character is seen after an Escape entry (only from the Escape state)
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterCsiEntry()
{
_state = VTStates::CsiEntry;
_trace.TraceStateChange(L"CsiEntry");
_ActionClear();
}
// Routine Description:
// - Moves the state machine into the CsiParam state.
// This state is entered:
// 1. When valid parameter characters are detected on entering a CSI (from CsiEntry state)
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterCsiParam() noexcept
{
_state = VTStates::CsiParam;
_trace.TraceStateChange(L"CsiParam");
}
// Routine Description:
// - Moves the state machine into the CsiIgnore state.
// This state is entered:
// 1. When an invalid character is detected during a CSI sequence indicating we should ignore the whole sequence.
// (From CsiEntry, CsiParam, or CsiIntermediate states.)
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterCsiIgnore() noexcept
{
_state = VTStates::CsiIgnore;
_trace.TraceStateChange(L"CsiIgnore");
}
// Routine Description:
// - Moves the state machine into the CsiIntermediate state.
// This state is entered:
// 1. When an intermediate character is seen immediately after entering a control sequence (from CsiEntry)
// 2. When an intermediate character is seen while collecting parameter data (from CsiParam)
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterCsiIntermediate() noexcept
{
_state = VTStates::CsiIntermediate;
_trace.TraceStateChange(L"CsiIntermediate");
}
// Routine Description:
// - Moves the state machine into the OscParam state.
// This state is entered:
// 1. When an OscEntry character (']') is seen after an Escape entry (only from the Escape state)
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterOscParam() noexcept
{
_state = VTStates::OscParam;
_trace.TraceStateChange(L"OscParam");
}
// Routine Description:
// - Moves the state machine into the OscString state.
// This state is entered:
// 1. When a delimiter character (';') is seen in the OSC Param state.
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterOscString() noexcept
{
_state = VTStates::OscString;
_trace.TraceStateChange(L"OscString");
}
// Routine Description:
// - Moves the state machine into the OscTermination state.
// This state is entered:
// 1. When an ESC is seen in an OSC string. This escape will be followed by a
// '\', as to encode a 0x9C as a 7-bit ASCII char stream.
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterOscTermination() noexcept
{
_state = VTStates::OscTermination;
_trace.TraceStateChange(L"OscTermination");
}
// Routine Description:
// - Moves the state machine into the Ss3Entry state.
// This state is entered:
// 1. When the Ss3Entry character is seen after an Escape entry (only from the Escape state)
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterSs3Entry()
{
_state = VTStates::Ss3Entry;
_trace.TraceStateChange(L"Ss3Entry");
_ActionClear();
}
// Routine Description:
// - Moves the state machine into the Ss3Param state.
// This state is entered:
// 1. When valid parameter characters are detected on entering a SS3 (from Ss3Entry state)
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterSs3Param() noexcept
{
_state = VTStates::Ss3Param;
_trace.TraceStateChange(L"Ss3Param");
}
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
// Routine Description:
// - Moves the state machine into the VT52Param state.
// This state is entered:
// 1. When a VT52 Cursor Address escape is detected, so parameters are expected to follow.
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterVt52Param() noexcept
{
_state = VTStates::Vt52Param;
_trace.TraceStateChange(L"Vt52Param");
}
// Routine Description:
// - Moves the state machine into the DcsEntry state.
// This state is entered:
// 1. When the DcsEntry character is seen after an Escape entry (only from the Escape state)
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterDcsEntry()
{
_state = VTStates::DcsEntry;
_trace.TraceStateChange(L"DcsEntry");
_ActionClear();
}
// Routine Description:
// - Moves the state machine into the DcsParam state.
// This state is entered:
// 1. When valid parameter characters are detected on entering a DCS (from DcsEntry state)
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterDcsParam() noexcept
{
_state = VTStates::DcsParam;
_trace.TraceStateChange(L"DcsParam");
}
// Routine Description:
// - Moves the state machine into the DcsIgnore state.
// This state is entered:
// 1. When an invalid character is detected during a DCS sequence indicating we should ignore the whole sequence.
// (From DcsEntry, DcsParam, DcsPassThrough, or DcsIntermediate states.)
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterDcsIgnore() noexcept
{
_state = VTStates::DcsIgnore;
_trace.TraceStateChange(L"DcsIgnore");
}
// Routine Description:
// - Moves the state machine into the DcsIntermediate state.
// This state is entered:
// 1. When an intermediate character is seen immediately after entering a control sequence (from DcsEntry)
// 2. When an intermediate character is seen while collecting parameter data (from DcsParam)
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterDcsIntermediate() noexcept
{
_state = VTStates::DcsIntermediate;
_trace.TraceStateChange(L"DcsIntermediate");
}
// Routine Description:
// - Moves the state machine into the DcsPassThrough state.
// This state is entered:
// 1. When a data string character is seen immediately after entering a control sequence (from DcsEntry)
// 2. When a data string character is seen while collecting parameter data (from DcsParam)
// 3. When a data string character is seen while collecting intermediate data (from DcsIntermediate)
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterDcsPassThrough() noexcept
{
_state = VTStates::DcsPassThrough;
_trace.TraceStateChange(L"DcsPassThrough");
}
// Routine Description:
// - Moves the state machine into the SosPmApcString state.
// This state is entered:
// 1. When the Sos character is seen after an Escape entry
// 2. When the Pm character is seen after an Escape entry
// 3. When the Apc character is seen after an Escape entry
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::_EnterSosPmApcString() noexcept
{
_state = VTStates::SosPmApcString;
_trace.TraceStateChange(L"SosPmApcString");
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the Ground state.
// Events in this state will:
// 1. Execute C0 control characters
// 2. Print all other characters
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventGround(const wchar_t wch)
{
_trace.TraceOnEvent(L"Ground");
if (_isC0Code(wch) || _isDelete(wch))
{
_ActionExecute(wch);
}
else
{
_ActionPrint(wch);
}
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the Escape state.
// Events in this state will:
// 1. Execute C0 control characters
// 2. Ignore Delete characters
// 3. Collect Intermediate characters
// 4. Enter Control Sequence state
// 5. Dispatch an Escape action.
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventEscape(const wchar_t wch)
{
_trace.TraceOnEvent(L"Escape");
if (_isC0Code(wch))
{
if (_engine->DispatchControlCharsFromEscape())
{
_ActionExecuteFromEscape(wch);
_EnterGround();
}
else
{
_ActionExecute(wch);
}
}
else if (_isDelete(wch))
{
_ActionIgnore();
}
else if (_isIntermediate(wch))
{
if (_engine->DispatchIntermediatesFromEscape())
{
_ActionEscDispatch(wch);
_EnterGround();
}
else
{
_ActionCollect(wch);
_EnterEscapeIntermediate();
}
}
Disable the acceptance of C1 control codes by default (#11690) There are some code pages with "unmapped" code points in the C1 range, which results in them being translated into Unicode C1 control codes, even though that is not their intended use. To avoid having these characters triggering unintentional escape sequences, this PR now disables C1 controls by default. Switching to ISO-2022 encoding will re-enable them, though, since that is the most likely scenario in which they would be required. They can also be explicitly enabled, even in UTF-8 mode, with the `DECAC1` escape sequence. What I've done is add a new mode to the `StateMachine` class that controls whether C1 code points are interpreted as control characters or not. When disabled, these code points are simply dropped from the output, similar to the way a `NUL` is interpreted. This isn't exactly the way they were handled in the v1 console (which I think replaces them with the font _notdef_ glyph), but it matches the XTerm behavior, which seems more appropriate considering this is in VT mode. And it's worth noting that Windows Explorer seems to work the same way. As mentioned above, the mode can be enabled by designating the ISO-2022 coding system with a `DOCS` sequence, and it will be disabled again when UTF-8 is designated. You can also enable it explicitly with a `DECAC1` sequence (originally this was actually a DEC printer sequence, but it doesn't seem unreasonable to use it in a terminal). I've also extended the operations that save and restore "cursor state" (e.g. `DECSC` and `DECRC`) to include the state of the C1 parser mode, since it's closely tied to the code page and character sets which are also saved there. Similarly, when a `DECSTR` sequence resets the code page and character sets, I've now made it reset the C1 mode as well. I should note that the new `StateMachine` mode is controlled via a generic `SetParserMode` method (with a matching API in the `ConGetSet` interface) to allow for easier addition of other modes in the future. And I've reimplemented the existing ANSI/VT52 mode in terms of these generic methods instead of it having to have its own separate APIs. ## Validation Steps Performed Some of the unit tests for OSC sequences were using a C1 `0x9C` for the string terminator, which doesn't work by default anymore. Since that's not a good practice anyway, I thought it best to change those to a standard 7-bit terminator. However, in tests that were explicitly validating the C1 controls, I've just enabled the C1 parser mode at the start of the tests in order to get them working again. There were also some ANSI mode adapter tests that had to be updated to account for the fact that it has now been reimplemented in terms of the `SetParserMode` API. I've added a new state machine test to validate the changes in behavior when the C1 parser mode is enabled or disabled. And I've added an adapter test to verify that the `DesignateCodingSystems` and `AcceptC1Controls` methods toggle the C1 parser mode as expected. I've manually verified the test cases in #10069 and #10310 to confirm that they're no longer triggering control sequences by default. Although, as I explained above, the C1 code points are completely dropped from the output rather than displayed as _notdef_ glyphs. I think this is a reasonable compromise though. Closes #10069 Closes #10310
2021-11-18 00:40:31 +01:00
else if (_parserMode.test(Mode::Ansi))
{
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
if (_isCsiIndicator(wch))
{
_EnterCsiEntry();
}
else if (_isOscIndicator(wch))
{
_EnterOscParam();
}
Improve support for VT character sets (#4496) This PR improves our VT character set support, enabling the [`SCS`] escape sequences to designate into all four G-sets with both 94- and 96-character sets, and supports invoking those G-sets into both the GL and GR areas of the code table, with [locking shifts] and [single shifts]. It also adds [`DOCS`] sequences to switch between UTF-8 and the ISO-2022 coding system (which is what the VT character sets require), and adds support for a lot more characters sets, up to around the level of a VT510. [`SCS`]: https://vt100.net/docs/vt510-rm/SCS.html [locking shifts]: https://vt100.net/docs/vt510-rm/LS.html [single shifts]: https://vt100.net/docs/vt510-rm/SS.html [`DOCS`]: https://en.wikipedia.org/wiki/ISO/IEC_2022#Interaction_with_other_coding_systems ## Detailed Description of the Pull Request / Additional comments To make it easier for us to declare a bunch of character sets, I've made a little `constexpr` class that can build up a mapping table from a base character set (ASCII or Latin1), along with a collection of mappings for the characters the deviate from the base set. Many of the character sets are simple variations of ASCII, so they're easy to define this way. This class then casts directly to a `wstring_view` which is how the translation tables are represented in most of the code. We have an array of four of these tables representing the four G-sets, two instances for the active left and right tables, and one instance for the single shift table. Initially we had just one `DesignateCharset` method, which could select the active character set. We now have two designate methods (for 94- and 96- character sets), and each takes a G-set number specifying the target of the designation, and a pair of characters identifying the character set that will be designated (at the higher VT levels, character sets are often identified by more than one character). There are then two new `LockingShift` methods to invoke these G-sets into either the GL or GR area of the code table, and a `SingleShift` method which invokes a G-set temporarily (for just the next character that is output). I should mention here that I had to make some changes to the state machine to make these single shift sequences work. The problem is that the input state machine treats `SS3` as the start of a control sequence, while the output state machine needs it to be dispatched immediately (it's literally the _Single Shift 3_ escape sequence). To make that work, I've added a `ParseControlSequenceAfterSs3` callback in the `IStateMachineEngine` interface to decide which behavior is appropriate. When it comes to mapping a character, it's simply an array reference into the appropriate `wstring_view` table. If the single shift table is set, that takes preference. Otherwise the GL table is used for characters in the range 0x20 to 0x7F, and the GR table for characters 0xA0 to 0xFF (technically some character sets will only map up to 0x7E and 0xFE, but that's easily controlled by the length of the `wstring_view`). The `DEL` character is a bit of a special case. By default it's meant to be ignored like the `NUL` character (it's essentially a time-fill character). However, it's possible that it could be remapped to a printable character in a 96-character set, so we need to check for that after the translation. This is handled in the `AdaptDispatch::Print` method, so it doesn't interfere with the primary `PrintString` code path. The biggest problem with this whole process, though, is that the GR mappings only really make sense if you have access to the raw output, but by the time the output gets to us, it would already have been translated to Unicode by the active code page. And in the case of UTF-8, the characters we eventually receive may originally have been composed from two or more code points. The way I've dealt with this was to disable the GR translations by default, and then added support for a pair of ISO-2022 `DOCS` sequences, which can switch the code page between UTF-8 and ISO-8859-1. When the code page is ISO-8859-1, we're essentially receiving the raw output bytes, so it's safe to enable the GR translations. This is not strictly correct ISO-2022 behavior, and there are edge cases where it's not going to work, but it's the best solution I could come up with. ## Validation Steps Performed As a result of the `SS3` changes in the state machine engine, I've had to move the existing `SS3` tests from the `OutputEngineTest` to the `InputEngineTest`, otherwise they would now fail (technically they should never have been output tests). I've added no additional unit tests, but I have done a lot of manual testing, and made sure we passed all the character set tests in Vttest (at least for the character sets we currently support). Note that this required a slightly hacked version of the app, since by default it doesn't expose a lot of the test to low-level terminals, and we currently identify as a VT100. Closes #3377 Closes #3487
2020-06-04 21:40:15 +02:00
else if (_isSs3Indicator(wch) && _engine->ParseControlSequenceAfterSs3())
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
{
_EnterSs3Entry();
}
else if (_isDcsIndicator(wch))
{
_EnterDcsEntry();
}
else if (_isSosIndicator(wch) || _isPmIndicator(wch) || _isApcIndicator(wch))
{
_EnterSosPmApcString();
}
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
else
{
_ActionEscDispatch(wch);
_EnterGround();
}
}
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
else if (_isVt52CursorAddress(wch))
{
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
_EnterVt52Param();
}
else
{
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
_ActionVt52EscDispatch(wch);
_EnterGround();
}
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the EscapeIntermediate state.
// Events in this state will:
// 1. Execute C0 control characters
// 2. Ignore Delete characters
// 3. Collect Intermediate characters
// 4. Dispatch an Escape action.
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventEscapeIntermediate(const wchar_t wch)
{
_trace.TraceOnEvent(L"EscapeIntermediate");
if (_isC0Code(wch))
{
_ActionExecute(wch);
}
else if (_isIntermediate(wch))
{
_ActionCollect(wch);
}
else if (_isDelete(wch))
{
_ActionIgnore();
}
Disable the acceptance of C1 control codes by default (#11690) There are some code pages with "unmapped" code points in the C1 range, which results in them being translated into Unicode C1 control codes, even though that is not their intended use. To avoid having these characters triggering unintentional escape sequences, this PR now disables C1 controls by default. Switching to ISO-2022 encoding will re-enable them, though, since that is the most likely scenario in which they would be required. They can also be explicitly enabled, even in UTF-8 mode, with the `DECAC1` escape sequence. What I've done is add a new mode to the `StateMachine` class that controls whether C1 code points are interpreted as control characters or not. When disabled, these code points are simply dropped from the output, similar to the way a `NUL` is interpreted. This isn't exactly the way they were handled in the v1 console (which I think replaces them with the font _notdef_ glyph), but it matches the XTerm behavior, which seems more appropriate considering this is in VT mode. And it's worth noting that Windows Explorer seems to work the same way. As mentioned above, the mode can be enabled by designating the ISO-2022 coding system with a `DOCS` sequence, and it will be disabled again when UTF-8 is designated. You can also enable it explicitly with a `DECAC1` sequence (originally this was actually a DEC printer sequence, but it doesn't seem unreasonable to use it in a terminal). I've also extended the operations that save and restore "cursor state" (e.g. `DECSC` and `DECRC`) to include the state of the C1 parser mode, since it's closely tied to the code page and character sets which are also saved there. Similarly, when a `DECSTR` sequence resets the code page and character sets, I've now made it reset the C1 mode as well. I should note that the new `StateMachine` mode is controlled via a generic `SetParserMode` method (with a matching API in the `ConGetSet` interface) to allow for easier addition of other modes in the future. And I've reimplemented the existing ANSI/VT52 mode in terms of these generic methods instead of it having to have its own separate APIs. ## Validation Steps Performed Some of the unit tests for OSC sequences were using a C1 `0x9C` for the string terminator, which doesn't work by default anymore. Since that's not a good practice anyway, I thought it best to change those to a standard 7-bit terminator. However, in tests that were explicitly validating the C1 controls, I've just enabled the C1 parser mode at the start of the tests in order to get them working again. There were also some ANSI mode adapter tests that had to be updated to account for the fact that it has now been reimplemented in terms of the `SetParserMode` API. I've added a new state machine test to validate the changes in behavior when the C1 parser mode is enabled or disabled. And I've added an adapter test to verify that the `DesignateCodingSystems` and `AcceptC1Controls` methods toggle the C1 parser mode as expected. I've manually verified the test cases in #10069 and #10310 to confirm that they're no longer triggering control sequences by default. Although, as I explained above, the C1 code points are completely dropped from the output rather than displayed as _notdef_ glyphs. I think this is a reasonable compromise though. Closes #10069 Closes #10310
2021-11-18 00:40:31 +01:00
else if (_parserMode.test(Mode::Ansi))
{
_ActionEscDispatch(wch);
_EnterGround();
}
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
else if (_isVt52CursorAddress(wch))
{
_EnterVt52Param();
}
else
{
_ActionVt52EscDispatch(wch);
_EnterGround();
}
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the CsiEntry state.
// Events in this state will:
// 1. Execute C0 control characters
// 2. Ignore Delete characters
// 3. Collect Intermediate characters
// 4. Begin to ignore all remaining parameters when an invalid character is detected (CsiIgnore)
// 5. Store parameter data
// 6. Collect Control Sequence Private markers
// 7. Dispatch a control sequence with parameters for action
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventCsiEntry(const wchar_t wch)
{
_trace.TraceOnEvent(L"CsiEntry");
if (_isC0Code(wch))
{
_ActionExecute(wch);
}
else if (_isDelete(wch))
{
_ActionIgnore();
}
else if (_isIntermediate(wch))
{
_ActionCollect(wch);
_EnterCsiIntermediate();
}
else if (_isCsiInvalid(wch))
{
_EnterCsiIgnore();
}
else if (_isNumericParamValue(wch) || _isParameterDelimiter(wch))
{
_ActionParam(wch);
_EnterCsiParam();
}
else if (_isCsiPrivateMarker(wch))
{
_ActionCollect(wch);
_EnterCsiParam();
}
else
{
_ActionCsiDispatch(wch);
_EnterGround();
}
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the CsiIntermediate state.
// Events in this state will:
// 1. Execute C0 control characters
// 2. Ignore Delete characters
// 3. Collect Intermediate characters
// 4. Begin to ignore all remaining parameters when an invalid character is detected (CsiIgnore)
// 5. Dispatch a control sequence with parameters for action
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventCsiIntermediate(const wchar_t wch)
{
_trace.TraceOnEvent(L"CsiIntermediate");
if (_isC0Code(wch))
{
_ActionExecute(wch);
}
else if (_isIntermediate(wch))
{
_ActionCollect(wch);
}
else if (_isDelete(wch))
{
_ActionIgnore();
}
else if (_isIntermediateInvalid(wch))
{
_EnterCsiIgnore();
}
else
{
_ActionCsiDispatch(wch);
_EnterGround();
}
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the CsiIgnore state.
// Events in this state will:
// 1. Execute C0 control characters
// 2. Ignore Delete characters
// 3. Collect Intermediate characters
// 4. Begin to ignore all remaining parameters when an invalid character is detected (CsiIgnore)
// 5. Return to Ground
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventCsiIgnore(const wchar_t wch)
{
_trace.TraceOnEvent(L"CsiIgnore");
if (_isC0Code(wch))
{
_ActionExecute(wch);
}
else if (_isDelete(wch))
{
_ActionIgnore();
}
else if (_isIntermediate(wch))
{
_ActionIgnore();
}
else if (_isIntermediateInvalid(wch))
{
_ActionIgnore();
}
else
{
_EnterGround();
}
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the CsiParam state.
// Events in this state will:
// 1. Execute C0 control characters
// 2. Ignore Delete characters
// 3. Collect Intermediate characters
// 4. Begin to ignore all remaining parameters when an invalid character is detected (CsiIgnore)
// 5. Store parameter data
// 6. Dispatch a control sequence with parameters for action
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventCsiParam(const wchar_t wch)
{
_trace.TraceOnEvent(L"CsiParam");
if (_isC0Code(wch))
{
_ActionExecute(wch);
}
else if (_isDelete(wch))
{
_ActionIgnore();
}
else if (_isNumericParamValue(wch) || _isParameterDelimiter(wch))
{
_ActionParam(wch);
}
else if (_isIntermediate(wch))
{
_ActionCollect(wch);
_EnterCsiIntermediate();
}
else if (_isParameterInvalid(wch))
{
_EnterCsiIgnore();
}
else
{
_ActionCsiDispatch(wch);
_EnterGround();
}
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the OscParam state.
// Events in this state will:
// 1. Collect numeric values into an Osc Param
// 2. Move to the OscString state on a delimiter
// 3. Ignore everything else.
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
Clamp parameter values to a maximum of 32767. (#5200) ## Summary of the Pull Request This PR clamps the parameter values in the VT `StateMachine` parser to 32767, which was the initial limit prior to PR #3956. This fixes a number of overflow bugs (some of which could cause the app to crash), since much of the code is not prepared to handle values outside the range of a `short`. ## References #3956 - the PR where the cap was changed to the range of `size_t` #4254 - one example of a crash caused by the higher range ## PR Checklist * [x] Closes #5160 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx ## Detailed Description of the Pull Request / Additional comments The DEC STD 070 reference recommends supporting up to at least 16384 for parameter values, so 32767 should be more than enough for any standard VT sequence. It might be nice to increase the limit to 65535 at some point, since that is the cap used by both XTerm and VTE. However, that is not essential, since there are very few situations where you'd even notice the difference. For now, 32767 is the safest choice for us, since anything greater than that has the potential to overflow and crash the app in a number of places. ## Validation Steps Performed I had to make a couple of modifications to the range checks in the `OutputEngineTest`, more or less reverting to the pre-#3956 behavior, but after that all of the unit tests passed as expected. I manually confirmed that this fixes the hanging test case from #5160, as well as overflow issues in the cursor operations, and crashes in `IL` and `DL` (see https://github.com/microsoft/terminal/issues/4254#issuecomment-575292926).
2020-04-01 14:49:57 +02:00
void StateMachine::_EventOscParam(const wchar_t wch) noexcept
{
_trace.TraceOnEvent(L"OscParam");
if (_isOscTerminator(wch))
{
_EnterGround();
}
else if (_isNumericParamValue(wch))
{
_ActionOscParam(wch);
}
else if (_isOscDelimiter(wch))
{
_EnterOscString();
}
else
{
_ActionIgnore();
}
}
// Routine Description:
// - Processes a character event into a Action that occurs while in the OscParam state.
// Events in this state will:
// 1. Trigger the OSC action associated with the param on an OscTerminator
// 2. If we see a ESC, enter the OscTermination state. We'll wait for one
// more character before we dispatch the string.
// 3. Ignore OscInvalid characters.
// 4. Collect everything else into the OscString
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventOscString(const wchar_t wch)
{
_trace.TraceOnEvent(L"OscString");
if (_isOscTerminator(wch))
{
_ActionOscDispatch(wch);
_EnterGround();
}
else if (_isEscape(wch))
{
_EnterOscTermination();
}
else if (_isOscInvalid(wch))
{
_ActionIgnore();
}
else
{
// add this character to our OSC string
_ActionOscPut(wch);
}
}
// Routine Description:
// - Handle the two-character termination of a OSC sequence.
// Events in this state will:
// 1. Trigger the OSC action associated with the param on an OscTerminator
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
// 2. Otherwise treat this as a normal escape character event.
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
void StateMachine::_EventOscTermination(const wchar_t wch)
{
_trace.TraceOnEvent(L"OscTermination");
if (_isStringTerminatorIndicator(wch))
{
_ActionOscDispatch(wch);
_EnterGround();
}
else
{
_EnterEscape();
_EventEscape(wch);
}
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the Ss3Entry state.
// Events in this state will:
// 1. Execute C0 control characters
// 2. Ignore Delete characters
// 3. Begin to ignore all remaining parameters when an invalid character is detected (CsiIgnore)
// 4. Store parameter data
// 5. Dispatch a control sequence with parameters for action
// SS3 sequences are structurally the same as CSI sequences, just with a
// different initiation. It's safe to reuse CSI's functions for
// determining if a character is a parameter, delimiter, or invalid.
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventSs3Entry(const wchar_t wch)
{
_trace.TraceOnEvent(L"Ss3Entry");
if (_isC0Code(wch))
{
_ActionExecute(wch);
}
else if (_isDelete(wch))
{
_ActionIgnore();
}
else if (_isCsiInvalid(wch))
{
// It's safe for us to go into the CSI ignore here, because both SS3 and
// CSI sequences ignore characters the same way.
_EnterCsiIgnore();
}
else if (_isNumericParamValue(wch) || _isParameterDelimiter(wch))
{
_ActionParam(wch);
_EnterSs3Param();
}
else
{
_ActionSs3Dispatch(wch);
_EnterGround();
}
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the CsiParam state.
// Events in this state will:
// 1. Execute C0 control characters
// 2. Ignore Delete characters
// 3. Begin to ignore all remaining parameters when an invalid character is detected (CsiIgnore)
// 4. Store parameter data
// 5. Dispatch a control sequence with parameters for action
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventSs3Param(const wchar_t wch)
{
_trace.TraceOnEvent(L"Ss3Param");
if (_isC0Code(wch))
{
_ActionExecute(wch);
}
else if (_isDelete(wch))
{
_ActionIgnore();
}
else if (_isNumericParamValue(wch) || _isParameterDelimiter(wch))
{
_ActionParam(wch);
}
else if (_isParameterInvalid(wch))
{
_EnterCsiIgnore();
}
else
{
_ActionSs3Dispatch(wch);
_EnterGround();
}
}
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
// Routine Description:
// - Processes a character event into an Action that occurs while in the Vt52Param state.
// Events in this state will:
// 1. Execute C0 control characters
// 2. Ignore Delete characters
// 3. Store exactly two parameter characters
// 4. Dispatch a control sequence with parameters for action (always Direct Cursor Address)
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventVt52Param(const wchar_t wch)
{
_trace.TraceOnEvent(L"Vt52Param");
if (_isC0Code(wch))
{
_ActionExecute(wch);
}
else if (_isDelete(wch))
{
_ActionIgnore();
}
else
{
_parameters.push_back(wch);
if (_parameters.size() == 2)
{
// The command character is processed before the parameter values,
// but it will always be 'Y', the Direct Cursor Address command.
_ActionVt52EscDispatch(L'Y');
_EnterGround();
}
}
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the DcsEntry state.
// Events in this state will:
// 1. Ignore C0 control characters
// 2. Ignore Delete characters
// 3. Begin to ignore all remaining characters when an invalid character is detected (DcsIgnore)
// 4. Store parameter data
// 5. Collect Intermediate characters
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
// 6. Dispatch the Final character in preparation for parsing the data string
// DCS sequences are structurally almost the same as CSI sequences, just with an
// extra data string. It's safe to reuse CSI functions for
// determining if a character is a parameter, delimiter, or invalid.
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventDcsEntry(const wchar_t wch)
{
_trace.TraceOnEvent(L"DcsEntry");
if (_isC0Code(wch))
{
_ActionIgnore();
}
else if (_isDelete(wch))
{
_ActionIgnore();
}
else if (_isCsiInvalid(wch))
{
_EnterDcsIgnore();
}
else if (_isNumericParamValue(wch) || _isParameterDelimiter(wch))
{
_ActionParam(wch);
_EnterDcsParam();
}
else if (_isIntermediate(wch))
{
_ActionCollect(wch);
_EnterDcsIntermediate();
}
else
{
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
_ActionDcsDispatch(wch);
}
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the DcsIgnore state.
// In this state the entire DCS string is considered invalid and we will ignore everything.
// The termination state is handled outside when an ESC is seen.
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventDcsIgnore() noexcept
{
_trace.TraceOnEvent(L"DcsIgnore");
_ActionIgnore();
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the DcsIntermediate state.
// Events in this state will:
// 1. Ignore C0 control characters
// 2. Ignore Delete characters
// 3. Collect intermediate data.
// 4. Begin to ignore all remaining intermediates when an invalid character is detected (DcsIgnore)
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
// 5. Dispatch the Final character in preparation for parsing the data string
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventDcsIntermediate(const wchar_t wch)
{
_trace.TraceOnEvent(L"DcsIntermediate");
if (_isC0Code(wch))
{
_ActionIgnore();
}
else if (_isDelete(wch))
{
_ActionIgnore();
}
else if (_isIntermediate(wch))
{
_ActionCollect(wch);
}
else if (_isIntermediateInvalid(wch))
{
_EnterDcsIgnore();
}
else
{
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
_ActionDcsDispatch(wch);
}
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the DcsParam state.
// Events in this state will:
// 1. Ignore C0 control characters
// 2. Ignore Delete characters
// 3. Collect DCS parameter data
// 4. Enter DcsIntermediate if we see an intermediate
// 5. Begin to ignore all remaining parameters when an invalid character is detected (DcsIgnore)
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
// 6. Dispatch the Final character in preparation for parsing the data string
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventDcsParam(const wchar_t wch)
{
_trace.TraceOnEvent(L"DcsParam");
if (_isC0Code(wch))
{
_ActionIgnore();
}
else if (_isDelete(wch))
{
_ActionIgnore();
}
if (_isNumericParamValue(wch) || _isParameterDelimiter(wch))
{
_ActionParam(wch);
}
else if (_isIntermediate(wch))
{
_ActionCollect(wch);
_EnterDcsIntermediate();
}
else if (_isParameterInvalid(wch))
{
_EnterDcsIgnore();
}
else
{
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
_ActionDcsDispatch(wch);
}
}
// Routine Description:
// - Processes a character event into an Action that occurs while in the DcsPassThrough state.
// Events in this state will:
// 1. Pass through if character is valid.
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
// 2. Ignore everything else.
// The termination state is handled outside when an ESC is seen.
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventDcsPassThrough(const wchar_t wch)
{
_trace.TraceOnEvent(L"DcsPassThrough");
if (_isC0Code(wch) || _isDcsPassThroughValid(wch))
{
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
if (!_dcsStringHandler(wch))
{
_EnterDcsIgnore();
}
}
else
{
_ActionIgnore();
}
}
// Routine Description:
// - Handle SOS/PM/APC string.
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
// In this state the entire string is ignored.
// The termination state is handled outside when an ESC is seen.
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
void StateMachine::_EventSosPmApcString(const wchar_t /*wch*/) noexcept
{
_trace.TraceOnEvent(L"SosPmApcString");
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
_ActionIgnore();
}
// Routine Description:
// - Entry to the state machine. Takes characters one by one and processes them according to the state machine rules.
// Arguments:
// - wch - New character to operate upon
// Return Value:
// - <none>
void StateMachine::ProcessCharacter(const wchar_t wch)
{
_trace.TraceCharInput(wch);
// Process "from anywhere" events first.
const bool isFromAnywhereChar = (wch == AsciiChars::CAN || wch == AsciiChars::SUB);
// GH#4201 - If this sequence was ^[^X or ^[^Z, then we should
// _ActionExecuteFromEscape, as to send a Ctrl+Alt+key key. We should only
// do this for the InputStateMachineEngine - the OutputEngine should execute
// these from any state.
if (isFromAnywhereChar && !(_state == VTStates::Escape && _engine->DispatchControlCharsFromEscape()))
{
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
_ActionInterrupt();
_ActionExecute(wch);
_EnterGround();
}
// Preprocess C1 control characters and treat them as ESC + their 7-bit equivalent.
else if (_isC1ControlCharacter(wch))
{
Disable the acceptance of C1 control codes by default (#11690) There are some code pages with "unmapped" code points in the C1 range, which results in them being translated into Unicode C1 control codes, even though that is not their intended use. To avoid having these characters triggering unintentional escape sequences, this PR now disables C1 controls by default. Switching to ISO-2022 encoding will re-enable them, though, since that is the most likely scenario in which they would be required. They can also be explicitly enabled, even in UTF-8 mode, with the `DECAC1` escape sequence. What I've done is add a new mode to the `StateMachine` class that controls whether C1 code points are interpreted as control characters or not. When disabled, these code points are simply dropped from the output, similar to the way a `NUL` is interpreted. This isn't exactly the way they were handled in the v1 console (which I think replaces them with the font _notdef_ glyph), but it matches the XTerm behavior, which seems more appropriate considering this is in VT mode. And it's worth noting that Windows Explorer seems to work the same way. As mentioned above, the mode can be enabled by designating the ISO-2022 coding system with a `DOCS` sequence, and it will be disabled again when UTF-8 is designated. You can also enable it explicitly with a `DECAC1` sequence (originally this was actually a DEC printer sequence, but it doesn't seem unreasonable to use it in a terminal). I've also extended the operations that save and restore "cursor state" (e.g. `DECSC` and `DECRC`) to include the state of the C1 parser mode, since it's closely tied to the code page and character sets which are also saved there. Similarly, when a `DECSTR` sequence resets the code page and character sets, I've now made it reset the C1 mode as well. I should note that the new `StateMachine` mode is controlled via a generic `SetParserMode` method (with a matching API in the `ConGetSet` interface) to allow for easier addition of other modes in the future. And I've reimplemented the existing ANSI/VT52 mode in terms of these generic methods instead of it having to have its own separate APIs. ## Validation Steps Performed Some of the unit tests for OSC sequences were using a C1 `0x9C` for the string terminator, which doesn't work by default anymore. Since that's not a good practice anyway, I thought it best to change those to a standard 7-bit terminator. However, in tests that were explicitly validating the C1 controls, I've just enabled the C1 parser mode at the start of the tests in order to get them working again. There were also some ANSI mode adapter tests that had to be updated to account for the fact that it has now been reimplemented in terms of the `SetParserMode` API. I've added a new state machine test to validate the changes in behavior when the C1 parser mode is enabled or disabled. And I've added an adapter test to verify that the `DesignateCodingSystems` and `AcceptC1Controls` methods toggle the C1 parser mode as expected. I've manually verified the test cases in #10069 and #10310 to confirm that they're no longer triggering control sequences by default. Although, as I explained above, the C1 code points are completely dropped from the output rather than displayed as _notdef_ glyphs. I think this is a reasonable compromise though. Closes #10069 Closes #10310
2021-11-18 00:40:31 +01:00
// But note that we only do this if C1 control code parsing has been
// explicitly requested, since there are some code pages with "unmapped"
// code points that get translated as C1 controls when that is not their
// intended use. In order to avoid them triggering unintentional escape
// sequences, we ignore these characters by default.
if (_parserMode.test(Mode::AcceptC1))
{
ProcessCharacter(AsciiChars::ESC);
ProcessCharacter(_c1To7Bit(wch));
}
}
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
// Don't go to escape from the OSC string state - ESC can be used to terminate OSC strings.
else if (_isEscape(wch) && _state != VTStates::OscString)
{
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
_ActionInterrupt();
_EnterEscape();
}
else
{
// Then pass to the current state as an event
switch (_state)
{
case VTStates::Ground:
return _EventGround(wch);
case VTStates::Escape:
return _EventEscape(wch);
case VTStates::EscapeIntermediate:
return _EventEscapeIntermediate(wch);
case VTStates::CsiEntry:
return _EventCsiEntry(wch);
case VTStates::CsiIntermediate:
return _EventCsiIntermediate(wch);
case VTStates::CsiIgnore:
return _EventCsiIgnore(wch);
case VTStates::CsiParam:
return _EventCsiParam(wch);
case VTStates::OscParam:
return _EventOscParam(wch);
case VTStates::OscString:
return _EventOscString(wch);
case VTStates::OscTermination:
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
return _EventOscTermination(wch);
case VTStates::Ss3Entry:
return _EventSs3Entry(wch);
case VTStates::Ss3Param:
return _EventSs3Param(wch);
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
case VTStates::Vt52Param:
return _EventVt52Param(wch);
case VTStates::DcsEntry:
return _EventDcsEntry(wch);
case VTStates::DcsIgnore:
return _EventDcsIgnore();
case VTStates::DcsIntermediate:
return _EventDcsIntermediate(wch);
case VTStates::DcsParam:
return _EventDcsParam(wch);
case VTStates::DcsPassThrough:
return _EventDcsPassThrough(wch);
case VTStates::SosPmApcString:
return _EventSosPmApcString(wch);
default:
return;
}
}
}
// Method Description:
// - Pass the current string we're processing through to the engine. It may eat
// the string, it may write it straight to the input unmodified, it might
// write the string to the tty application. A pointer to this function will
// get handed to the OutputStateMachineEngine, so that it can write strings
// it doesn't understand to the tty.
// This does not modify the state of the state machine. Callers should be in
// the Action*Dispatch state, and upon completion, the state's handler (eg
// _EventCsiParam) should move us into the ground state.
// Arguments:
// - <none>
// Return Value:
// - true if the engine successfully handled the string.
bool StateMachine::FlushToTerminal()
{
bool success{ true };
if (success && _cachedSequence.has_value())
{
// Flush the partial sequence to the terminal before we flush the rest of it.
// We always want to clear the sequence, even if we failed, so we don't accumulate bad state
// and dump it out elsewhere later.
success = _engine->ActionPassThroughString(*_cachedSequence);
_cachedSequence.reset();
}
if (success)
{
// _pwchCurr is incremented after a call to ProcessCharacter to indicate
// that pwchCurr was processed.
// However, if we're here, then the processing of pwchChar triggered the
// engine to request the entire sequence get passed through, including pwchCurr.
success = _engine->ActionPassThroughString(_CurrentRun());
}
return success;
}
// Routine Description:
// - Helper for entry to the state machine. Will take an array of characters
// and print as many as it can without encountering a character indicating
// a escape sequence, then feed characters into the state machine one at a
// time until we return to the ground state.
// Arguments:
// - string - Characters to operate upon
// Return Value:
// - <none>
void StateMachine::ProcessString(const std::wstring_view string)
{
size_t start = 0;
size_t current = start;
_currentString = string;
_runOffset = 0;
_runSize = 0;
while (current < string.size())
{
// The run will be everything from the start INCLUDING the current one
// in case we process the current character and it turns into a passthrough
// fallback that picks up this _run inside `FlushToTerminal` above.
_runOffset = start;
_runSize = current - start + 1;
if (_processingIndividually)
{
// If we're processing characters individually, send it to the state machine.
ProcessCharacter(til::at(string, current));
++current;
if (_state == VTStates::Ground) // Then check if we're back at ground. If we are, the next character (pwchCurr)
{ // is the start of the next run of characters that might be printable.
_processingIndividually = false;
start = current;
}
}
else
{
if (_isActionableFromGround(til::at(string, current))) // If the current char is the start of an escape sequence, or should be executed in ground state...
{
if (_runSize > 0)
stab in the dark to fix x86 tests. (#4202) ## Summary of the Pull Request Perform checking on `std::basic_string_view<T>.substr()` calls to prevent running out of bounds and sporadic Privileged Instruction throws during x86 tests. ## PR Checklist * [x] Closes the x86 tests failing all over the place since #4125 for no apparent reason * [x] I work here * [x] Tests pass ## Detailed Description of the Pull Request / Additional comments It appears that not all `std::basic_string_view<T>.substr()` calls are created equally. I rooted around for other versions of the code in our source tree and found several versions that were less careful about checking the start position and the size than the one that appears when building locally on dev machines. My theory is that one of these older versions is deployed somewhere in the CI. Instead of clamping down the size parameter appropriately or throwing correctly when the position is out of bounds, I believe that it's just creating a substring with a bad range over an invalid/uninitialized memory region. Then when the test operates on that, sometimes it turns out to trigger the privileged instruction NTSTATUS error we are seeing in CI. ## Test Procedure 1. Fixed the thing 2. Ran the CI and it worked 3. Reverted everything and turned off all of the CI build except just the parser tests (and supporting libraries) 4. Ran CI and it failed 5. Put the fix back on top (cherry-pick) 6. It worked. 7. Ran it again. 8. It worked. 9. Turn all the rest of the CI build back on
2020-01-14 01:46:07 +01:00
{
// Because the run above is composed INCLUDING current, we must
stab in the dark to fix x86 tests. (#4202) ## Summary of the Pull Request Perform checking on `std::basic_string_view<T>.substr()` calls to prevent running out of bounds and sporadic Privileged Instruction throws during x86 tests. ## PR Checklist * [x] Closes the x86 tests failing all over the place since #4125 for no apparent reason * [x] I work here * [x] Tests pass ## Detailed Description of the Pull Request / Additional comments It appears that not all `std::basic_string_view<T>.substr()` calls are created equally. I rooted around for other versions of the code in our source tree and found several versions that were less careful about checking the start position and the size than the one that appears when building locally on dev machines. My theory is that one of these older versions is deployed somewhere in the CI. Instead of clamping down the size parameter appropriately or throwing correctly when the position is out of bounds, I believe that it's just creating a substring with a bad range over an invalid/uninitialized memory region. Then when the test operates on that, sometimes it turns out to trigger the privileged instruction NTSTATUS error we are seeing in CI. ## Test Procedure 1. Fixed the thing 2. Ran the CI and it worked 3. Reverted everything and turned off all of the CI build except just the parser tests (and supporting libraries) 4. Ran CI and it failed 5. Put the fix back on top (cherry-pick) 6. It worked. 7. Ran it again. 8. It worked. 9. Turn all the rest of the CI build back on
2020-01-14 01:46:07 +01:00
// trim it off here since we just determined it's actionable
// and only pass through everything before it.
_runSize -= 1;
const auto allLeadingUpTo = _CurrentRun();
stab in the dark to fix x86 tests. (#4202) ## Summary of the Pull Request Perform checking on `std::basic_string_view<T>.substr()` calls to prevent running out of bounds and sporadic Privileged Instruction throws during x86 tests. ## PR Checklist * [x] Closes the x86 tests failing all over the place since #4125 for no apparent reason * [x] I work here * [x] Tests pass ## Detailed Description of the Pull Request / Additional comments It appears that not all `std::basic_string_view<T>.substr()` calls are created equally. I rooted around for other versions of the code in our source tree and found several versions that were less careful about checking the start position and the size than the one that appears when building locally on dev machines. My theory is that one of these older versions is deployed somewhere in the CI. Instead of clamping down the size parameter appropriately or throwing correctly when the position is out of bounds, I believe that it's just creating a substring with a bad range over an invalid/uninitialized memory region. Then when the test operates on that, sometimes it turns out to trigger the privileged instruction NTSTATUS error we are seeing in CI. ## Test Procedure 1. Fixed the thing 2. Ran the CI and it worked 3. Reverted everything and turned off all of the CI build except just the parser tests (and supporting libraries) 4. Ran CI and it failed 5. Put the fix back on top (cherry-pick) 6. It worked. 7. Ran it again. 8. It worked. 9. Turn all the rest of the CI build back on
2020-01-14 01:46:07 +01:00
_engine->ActionPrintString(allLeadingUpTo); // ... print all the chars leading up to it as part of the run...
_trace.DispatchPrintRunTrace(allLeadingUpTo);
}
_processingIndividually = true; // begin processing future characters individually...
start = current;
continue;
}
else
{
++current; // Otherwise, add this char to the current run to be printed.
}
}
}
// When we leave the loop, current has been advanced to the length of the string itself
// (or one past the array index to the final char) so this `substr` operation doesn't +1
// to include the final character (unlike the one inside the top of the loop above.)
if (start < string.size())
{
_runOffset = start;
_runSize = std::string::npos;
}
else
{
_runSize = 0;
}
const auto run = _CurrentRun();
// If we're at the end of the string and have remaining un-printed characters,
if (!_processingIndividually && !run.empty())
{
// print the rest of the characters in the string
_engine->ActionPrintString(run);
_trace.DispatchPrintRunTrace(run);
}
else if (_processingIndividually)
{
Return to ground when we flush the last char (#2823) ## Summary of the Pull Request The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. <!-- Please review the items on the PR checklist before submitting--> ## PR Checklist * [x] Closes #2746 * [x] I work here * [x] Tests added/passed * [n/a] Requires documentation to be updated <hr> * Return to ground when we flush the last char The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Fixes #2746. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. * Update src/terminal/parser/stateMachine.cpp Co-Authored-By: Dustin L. Howett (MSFT) <duhowett@microsoft.com> * add the comment @miniksa wanted
2019-10-04 17:47:39 +02:00
// One of the "weird things" in VT input is the case of something like
// <kbd>alt+[</kbd>. In VT, that's encoded as `\x1b[`. However, that's
// also the start of a CSI, and could be the start of a longer sequence,
// there's no way to know for sure. For an <kbd>alt+[</kbd> keypress,
// the parser originally would just sit in the `CsiEntry` state after
// processing it, which would pollute the following keypress (e.g.
// <kbd>alt+[</kbd>, <kbd>A</kbd> would be processed like `\x1b[A`,
// which is _wrong_).
//
// Fortunately, for VT input, each keystroke comes in as an individual
// write operation. So, if at the end of processing a string for the
// InputEngine, we find that we're not in the Ground state, that implies
// that we've processed some input, but not dispatched it yet. This
// block at the end of `ProcessString` will then re-process the
// undispatched string, but it will ensure that it dispatches on the
// last character of the string. For our previous `\x1b[` scenario, that
// means we'll make sure to call `_ActionEscDispatch('[')`., which will
// properly decode the string as <kbd>alt+[</kbd>.
if (_engine->FlushAtEndOfString())
{
// Reset our state, and put all but the last char in again.
ResetState();
// Chars to flush are [pwchSequenceStart, pwchCurr)
auto wchIter = run.cbegin();
while (wchIter < run.cend() - 1)
{
ProcessCharacter(*wchIter);
wchIter++;
}
// Manually execute the last char [pwchCurr]
switch (_state)
{
case VTStates::Ground:
_ActionExecute(*wchIter);
Return to ground when we flush the last char (#2823) ## Summary of the Pull Request The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. <!-- Please review the items on the PR checklist before submitting--> ## PR Checklist * [x] Closes #2746 * [x] I work here * [x] Tests added/passed * [n/a] Requires documentation to be updated <hr> * Return to ground when we flush the last char The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Fixes #2746. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. * Update src/terminal/parser/stateMachine.cpp Co-Authored-By: Dustin L. Howett (MSFT) <duhowett@microsoft.com> * add the comment @miniksa wanted
2019-10-04 17:47:39 +02:00
break;
case VTStates::Escape:
case VTStates::EscapeIntermediate:
_ActionEscDispatch(*wchIter);
Return to ground when we flush the last char (#2823) ## Summary of the Pull Request The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. <!-- Please review the items on the PR checklist before submitting--> ## PR Checklist * [x] Closes #2746 * [x] I work here * [x] Tests added/passed * [n/a] Requires documentation to be updated <hr> * Return to ground when we flush the last char The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Fixes #2746. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. * Update src/terminal/parser/stateMachine.cpp Co-Authored-By: Dustin L. Howett (MSFT) <duhowett@microsoft.com> * add the comment @miniksa wanted
2019-10-04 17:47:39 +02:00
break;
case VTStates::CsiEntry:
case VTStates::CsiIntermediate:
case VTStates::CsiIgnore:
case VTStates::CsiParam:
_ActionCsiDispatch(*wchIter);
Return to ground when we flush the last char (#2823) ## Summary of the Pull Request The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. <!-- Please review the items on the PR checklist before submitting--> ## PR Checklist * [x] Closes #2746 * [x] I work here * [x] Tests added/passed * [n/a] Requires documentation to be updated <hr> * Return to ground when we flush the last char The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Fixes #2746. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. * Update src/terminal/parser/stateMachine.cpp Co-Authored-By: Dustin L. Howett (MSFT) <duhowett@microsoft.com> * add the comment @miniksa wanted
2019-10-04 17:47:39 +02:00
break;
case VTStates::OscParam:
case VTStates::OscString:
case VTStates::OscTermination:
_ActionOscDispatch(*wchIter);
Return to ground when we flush the last char (#2823) ## Summary of the Pull Request The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. <!-- Please review the items on the PR checklist before submitting--> ## PR Checklist * [x] Closes #2746 * [x] I work here * [x] Tests added/passed * [n/a] Requires documentation to be updated <hr> * Return to ground when we flush the last char The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Fixes #2746. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. * Update src/terminal/parser/stateMachine.cpp Co-Authored-By: Dustin L. Howett (MSFT) <duhowett@microsoft.com> * add the comment @miniksa wanted
2019-10-04 17:47:39 +02:00
break;
case VTStates::Ss3Entry:
case VTStates::Ss3Param:
_ActionSs3Dispatch(*wchIter);
Return to ground when we flush the last char (#2823) ## Summary of the Pull Request The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. <!-- Please review the items on the PR checklist before submitting--> ## PR Checklist * [x] Closes #2746 * [x] I work here * [x] Tests added/passed * [n/a] Requires documentation to be updated <hr> * Return to ground when we flush the last char The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Fixes #2746. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. * Update src/terminal/parser/stateMachine.cpp Co-Authored-By: Dustin L. Howett (MSFT) <duhowett@microsoft.com> * add the comment @miniksa wanted
2019-10-04 17:47:39 +02:00
break;
}
Return to ground when we flush the last char (#2823) ## Summary of the Pull Request The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. <!-- Please review the items on the PR checklist before submitting--> ## PR Checklist * [x] Closes #2746 * [x] I work here * [x] Tests added/passed * [n/a] Requires documentation to be updated <hr> * Return to ground when we flush the last char The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground. Fixes #2746. Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it. Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name. * Update src/terminal/parser/stateMachine.cpp Co-Authored-By: Dustin L. Howett (MSFT) <duhowett@microsoft.com> * add the comment @miniksa wanted
2019-10-04 17:47:39 +02:00
// microsoft/terminal#2746: Make sure to return to the ground state
// after dispatching the characters
_EnterGround();
}
else
{
// If the engine doesn't require flushing at the end of the string, we
// want to cache the partial sequence in case we have to flush the whole
// thing to the terminal later.
if (!_cachedSequence)
{
_cachedSequence.emplace(std::wstring{});
}
auto& cachedSequence = *_cachedSequence;
cachedSequence.append(run);
}
}
}
// Routine Description:
// - Wherever the state machine is, whatever it's going, go back to ground.
// This is used by conhost to "jiggle the handle" - when VT support is
// turned off, we don't want any bad state left over for the next input it's turned on for
// Arguments:
// - <none>
// Return Value:
// - <none>
void StateMachine::ResetState() noexcept
{
_EnterGround();
}
// Routine Description:
// - Takes the given printable character and accumulates it as the new ones digit
// into the given size_t. All existing value is moved up by 10.
// - For example, if your value had 437 and you put in the printable number 2,
// this function will update value to 4372.
Clamp parameter values to a maximum of 32767. (#5200) ## Summary of the Pull Request This PR clamps the parameter values in the VT `StateMachine` parser to 32767, which was the initial limit prior to PR #3956. This fixes a number of overflow bugs (some of which could cause the app to crash), since much of the code is not prepared to handle values outside the range of a `short`. ## References #3956 - the PR where the cap was changed to the range of `size_t` #4254 - one example of a crash caused by the higher range ## PR Checklist * [x] Closes #5160 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx ## Detailed Description of the Pull Request / Additional comments The DEC STD 070 reference recommends supporting up to at least 16384 for parameter values, so 32767 should be more than enough for any standard VT sequence. It might be nice to increase the limit to 65535 at some point, since that is the cap used by both XTerm and VTE. However, that is not essential, since there are very few situations where you'd even notice the difference. For now, 32767 is the safest choice for us, since anything greater than that has the potential to overflow and crash the app in a number of places. ## Validation Steps Performed I had to make a couple of modifications to the range checks in the `OutputEngineTest`, more or less reverting to the pre-#3956 behavior, but after that all of the unit tests passed as expected. I manually confirmed that this fixes the hanging test case from #5160, as well as overflow issues in the cursor operations, and crashes in `IL` and `DL` (see https://github.com/microsoft/terminal/issues/4254#issuecomment-575292926).
2020-04-01 14:49:57 +02:00
// - Clamps to 32767 if it gets too big.
// Arguments:
// - wch - Printable character to accumulate into the value (after conversion to number, of course)
// - value - The value to update with the printable character. See example above.
// Return Value:
// - <none> - But really it's the update to the given value parameter.
Clamp parameter values to a maximum of 32767. (#5200) ## Summary of the Pull Request This PR clamps the parameter values in the VT `StateMachine` parser to 32767, which was the initial limit prior to PR #3956. This fixes a number of overflow bugs (some of which could cause the app to crash), since much of the code is not prepared to handle values outside the range of a `short`. ## References #3956 - the PR where the cap was changed to the range of `size_t` #4254 - one example of a crash caused by the higher range ## PR Checklist * [x] Closes #5160 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx ## Detailed Description of the Pull Request / Additional comments The DEC STD 070 reference recommends supporting up to at least 16384 for parameter values, so 32767 should be more than enough for any standard VT sequence. It might be nice to increase the limit to 65535 at some point, since that is the cap used by both XTerm and VTE. However, that is not essential, since there are very few situations where you'd even notice the difference. For now, 32767 is the safest choice for us, since anything greater than that has the potential to overflow and crash the app in a number of places. ## Validation Steps Performed I had to make a couple of modifications to the range checks in the `OutputEngineTest`, more or less reverting to the pre-#3956 behavior, but after that all of the unit tests passed as expected. I manually confirmed that this fixes the hanging test case from #5160, as well as overflow issues in the cursor operations, and crashes in `IL` and `DL` (see https://github.com/microsoft/terminal/issues/4254#issuecomment-575292926).
2020-04-01 14:49:57 +02:00
void StateMachine::_AccumulateTo(const wchar_t wch, size_t& value) noexcept
{
const size_t digit = wch - L'0';
Clamp parameter values to a maximum of 32767. (#5200) ## Summary of the Pull Request This PR clamps the parameter values in the VT `StateMachine` parser to 32767, which was the initial limit prior to PR #3956. This fixes a number of overflow bugs (some of which could cause the app to crash), since much of the code is not prepared to handle values outside the range of a `short`. ## References #3956 - the PR where the cap was changed to the range of `size_t` #4254 - one example of a crash caused by the higher range ## PR Checklist * [x] Closes #5160 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx ## Detailed Description of the Pull Request / Additional comments The DEC STD 070 reference recommends supporting up to at least 16384 for parameter values, so 32767 should be more than enough for any standard VT sequence. It might be nice to increase the limit to 65535 at some point, since that is the cap used by both XTerm and VTE. However, that is not essential, since there are very few situations where you'd even notice the difference. For now, 32767 is the safest choice for us, since anything greater than that has the potential to overflow and crash the app in a number of places. ## Validation Steps Performed I had to make a couple of modifications to the range checks in the `OutputEngineTest`, more or less reverting to the pre-#3956 behavior, but after that all of the unit tests passed as expected. I manually confirmed that this fixes the hanging test case from #5160, as well as overflow issues in the cursor operations, and crashes in `IL` and `DL` (see https://github.com/microsoft/terminal/issues/4254#issuecomment-575292926).
2020-04-01 14:49:57 +02:00
value = value * 10 + digit;
// Values larger than the maximum should be mapped to the largest supported value.
if (value > MAX_PARAMETER_VALUE)
{
Clamp parameter values to a maximum of 32767. (#5200) ## Summary of the Pull Request This PR clamps the parameter values in the VT `StateMachine` parser to 32767, which was the initial limit prior to PR #3956. This fixes a number of overflow bugs (some of which could cause the app to crash), since much of the code is not prepared to handle values outside the range of a `short`. ## References #3956 - the PR where the cap was changed to the range of `size_t` #4254 - one example of a crash caused by the higher range ## PR Checklist * [x] Closes #5160 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx ## Detailed Description of the Pull Request / Additional comments The DEC STD 070 reference recommends supporting up to at least 16384 for parameter values, so 32767 should be more than enough for any standard VT sequence. It might be nice to increase the limit to 65535 at some point, since that is the cap used by both XTerm and VTE. However, that is not essential, since there are very few situations where you'd even notice the difference. For now, 32767 is the safest choice for us, since anything greater than that has the potential to overflow and crash the app in a number of places. ## Validation Steps Performed I had to make a couple of modifications to the range checks in the `OutputEngineTest`, more or less reverting to the pre-#3956 behavior, but after that all of the unit tests passed as expected. I manually confirmed that this fixes the hanging test case from #5160, as well as overflow issues in the cursor operations, and crashes in `IL` and `DL` (see https://github.com/microsoft/terminal/issues/4254#issuecomment-575292926).
2020-04-01 14:49:57 +02:00
value = MAX_PARAMETER_VALUE;
}
}