terminal/src/terminal/parser/stateMachine.hpp

185 lines
6.2 KiB
C++
Raw Normal View History

// Copyright (c) Microsoft Corporation.
// Licensed under the MIT license.
/*
Module Name:
- stateMachine.hpp
Abstract:
- This declares the entire state machine for handling Virtual Terminal Sequences
- The design is based from the specifications at http://vt100.net
- The actual implementation of actions decoded by the StateMachine should be
implemented in an IStateMachineEngine.
*/
#pragma once
#include "IStateMachineEngine.hpp"
#include "telemetry.hpp"
#include "tracing.hpp"
#include <memory>
namespace Microsoft::Console::VirtualTerminal
{
Clamp parameter values to a maximum of 32767. (#5200) ## Summary of the Pull Request This PR clamps the parameter values in the VT `StateMachine` parser to 32767, which was the initial limit prior to PR #3956. This fixes a number of overflow bugs (some of which could cause the app to crash), since much of the code is not prepared to handle values outside the range of a `short`. ## References #3956 - the PR where the cap was changed to the range of `size_t` #4254 - one example of a crash caused by the higher range ## PR Checklist * [x] Closes #5160 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx ## Detailed Description of the Pull Request / Additional comments The DEC STD 070 reference recommends supporting up to at least 16384 for parameter values, so 32767 should be more than enough for any standard VT sequence. It might be nice to increase the limit to 65535 at some point, since that is the cap used by both XTerm and VTE. However, that is not essential, since there are very few situations where you'd even notice the difference. For now, 32767 is the safest choice for us, since anything greater than that has the potential to overflow and crash the app in a number of places. ## Validation Steps Performed I had to make a couple of modifications to the range checks in the `OutputEngineTest`, more or less reverting to the pre-#3956 behavior, but after that all of the unit tests passed as expected. I manually confirmed that this fixes the hanging test case from #5160, as well as overflow issues in the cursor operations, and crashes in `IL` and `DL` (see https://github.com/microsoft/terminal/issues/4254#issuecomment-575292926).
2020-04-01 14:49:57 +02:00
// The DEC STD 070 reference recommends supporting up to at least 16384 for
// parameter values, so 32767 should be more than enough. At most we might
// want to increase this to 65535, since that is what XTerm and VTE support,
// but for now 32767 is the safest limit for our existing code base.
constexpr size_t MAX_PARAMETER_VALUE = 32767;
Refactor VT parameter handling (#7799) This PR introduces a pair of classes for managing VT parameters that automatically handle range checking and default fallback values, so the individual operations don't have to do that validation themselves. In addition to simplifying the code, this fixes a few cases where we were mishandling missing or extraneous parameters, and adds support for parameter sequences on commands that couldn't previously handle them. This PR also sets a limit on the number of parameters allowed, to help thwart DoS memory consumption attacks. ## References * The new parameter class also introduces the concept of an omitted/default parameter which is not necessarily zero, which is a prerequisite for addressing issue #4417. ## Detailed Description of the Pull Request / Additional comments There are two new classes provide by this PR: a `VTParameter` class, similar in function to a `std::optional<size_t>`, which holds an individual parameter (which may be an omitted/default value); and a `VTParameters` class, similar in function to `gsl:span<VTParameter>`, which holds a sequence of those parameters. Where `VTParameter` differs from `std::optional` is with the inclusion of two cast operators. There is a `size_t` cast that interprets omitted and zero values as 1 (the expected behaviour for most numeric parameters). And there is a generic cast, for use with the enum parameter types, which interprets omitted values as 0 (the expected behaviour for most selective parameters). The advantage of `VTParameters` class is that it has an `at` method that can never fail - out of range values simply return the a default `VTParameter` instance (this is standard behaviour in VT terminals). It also has a `size` method that will always return a minimum count of 1, since an empty parameter list is typically the equivalent of a single "default" parameter, so this guarantees you'll get at least one value when iterating over the list with `size()`. For cases where we just need to call the same dispatch method for every parameter, there is a helper `for_each` method, which repeatedly calls a given predicate function with each value in the sequence. It also collates the returned success values to determine the overall result of the sequence. As with the `size` method, this will always make at least one call, so it correctly handles empty sequences. With those two classes in place, we could get rid of all the parameter validation and default handling code in the `OutputStateMachineEngine`. We now just use the `VTParameters::at` method to grab a parameter and typically pass it straight to the appropriate dispatch method, letting the cast operators automatically handle the assignment of default values. Occasionally we might need a `value_or` call to specify a non-standard default value, but those cases are fairly rare. In some case the `OutputStateMachineEngine` was also checking whether parameters values were in range, but for the most part this shouldn't have been necessary, since that is something the dispatch classes would already have been doing themselves (in the few cases that they weren't, I've now updated them to do so). I've also updated the `InputStateMachineEngine` in a similar way to the `OutputStateMachineEngine`, getting rid of a few of the parameter extraction methods, and simplifying other parts of the implementation. It's not as clean a replacement as the output engine, but there are still benefits in using the new classes. ## Validation Steps Performed For the most part I haven't had to alter existing tests other than accounting for changes to the API. There were a couple of tests I needed to drop because they were checking for failure cases which shouldn't have been failing (unexpected parameters should never be an error), or testing output engine validation that is no longer handled at that level. I've added a few new tests to cover operations that take sequences of selective parameters (`ED`, `EL`, `TBC`, `SM`, and `RM`). And I've extended the cursor movement tests to make sure those operations can handle extraneous parameters that weren't expected. I've also added a test to verify that the state machine will correctly ignore parameters beyond the maximum 32 parameter count limit. I've also manual confirmed that the various test cases given in issues #2101 are now working as expected. Closes #2101
2020-10-15 18:12:52 +02:00
// The DEC STD 070 reference requires that a minimum of 16 parameter values
// are supported, but most modern terminal emulators will allow around twice
// that number.
constexpr size_t MAX_PARAMETER_COUNT = 32;
class StateMachine final
{
#ifdef UNIT_TESTING
friend class OutputEngineTest;
friend class InputEngineTest;
#endif
public:
StateMachine(std::unique_ptr<IStateMachineEngine> engine);
Disable the acceptance of C1 control codes by default (#11690) There are some code pages with "unmapped" code points in the C1 range, which results in them being translated into Unicode C1 control codes, even though that is not their intended use. To avoid having these characters triggering unintentional escape sequences, this PR now disables C1 controls by default. Switching to ISO-2022 encoding will re-enable them, though, since that is the most likely scenario in which they would be required. They can also be explicitly enabled, even in UTF-8 mode, with the `DECAC1` escape sequence. What I've done is add a new mode to the `StateMachine` class that controls whether C1 code points are interpreted as control characters or not. When disabled, these code points are simply dropped from the output, similar to the way a `NUL` is interpreted. This isn't exactly the way they were handled in the v1 console (which I think replaces them with the font _notdef_ glyph), but it matches the XTerm behavior, which seems more appropriate considering this is in VT mode. And it's worth noting that Windows Explorer seems to work the same way. As mentioned above, the mode can be enabled by designating the ISO-2022 coding system with a `DOCS` sequence, and it will be disabled again when UTF-8 is designated. You can also enable it explicitly with a `DECAC1` sequence (originally this was actually a DEC printer sequence, but it doesn't seem unreasonable to use it in a terminal). I've also extended the operations that save and restore "cursor state" (e.g. `DECSC` and `DECRC`) to include the state of the C1 parser mode, since it's closely tied to the code page and character sets which are also saved there. Similarly, when a `DECSTR` sequence resets the code page and character sets, I've now made it reset the C1 mode as well. I should note that the new `StateMachine` mode is controlled via a generic `SetParserMode` method (with a matching API in the `ConGetSet` interface) to allow for easier addition of other modes in the future. And I've reimplemented the existing ANSI/VT52 mode in terms of these generic methods instead of it having to have its own separate APIs. ## Validation Steps Performed Some of the unit tests for OSC sequences were using a C1 `0x9C` for the string terminator, which doesn't work by default anymore. Since that's not a good practice anyway, I thought it best to change those to a standard 7-bit terminator. However, in tests that were explicitly validating the C1 controls, I've just enabled the C1 parser mode at the start of the tests in order to get them working again. There were also some ANSI mode adapter tests that had to be updated to account for the fact that it has now been reimplemented in terms of the `SetParserMode` API. I've added a new state machine test to validate the changes in behavior when the C1 parser mode is enabled or disabled. And I've added an adapter test to verify that the `DesignateCodingSystems` and `AcceptC1Controls` methods toggle the C1 parser mode as expected. I've manually verified the test cases in #10069 and #10310 to confirm that they're no longer triggering control sequences by default. Although, as I explained above, the C1 code points are completely dropped from the output rather than displayed as _notdef_ glyphs. I think this is a reasonable compromise though. Closes #10069 Closes #10310
2021-11-18 00:40:31 +01:00
enum class Mode : size_t
{
AcceptC1,
Ansi,
};
void SetParserMode(const Mode mode, const bool enabled) noexcept;
bool GetParserMode(const Mode mode) const noexcept;
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
void ProcessCharacter(const wchar_t wch);
void ProcessString(const std::wstring_view string);
void ResetState() noexcept;
bool FlushToTerminal();
const IStateMachineEngine& Engine() const noexcept;
IStateMachineEngine& Engine() noexcept;
private:
void _ActionExecute(const wchar_t wch);
void _ActionExecuteFromEscape(const wchar_t wch);
void _ActionPrint(const wchar_t wch);
void _ActionEscDispatch(const wchar_t wch);
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
void _ActionVt52EscDispatch(const wchar_t wch);
Refactor VT control sequence identification (#7304) This PR changes the way VT control sequences are identified and dispatched, to be more efficient and easier to extend. Instead of parsing the intermediate characters into a vector, and then having to identify a sequence using both that vector and the final char, we now use just a single `uint64_t` value as the identifier. The way the identifier is constructed is by taking the private parameter prefix, each of the intermediate characters, and then the final character, and shifting them into a 64-bit integer one byte at a time, in reverse order. For example, the `DECTLTC` control has a private parameter prefix of `?`, one intermediate of `'`, and a final character of `s`. The ASCII values of those characters are `0x3F`, `0x27`, and `0x73` respectively, and reversing them gets you 0x73273F, so that would then be the identifier for the control. The reason for storing them in reverse order, is because sometimes we need to look at the first intermediate to determine the operation, and treat the rest of the sequence as a kind of sub-identifier (the character set designation sequences are one example of this). When in reverse order, this can easily be achieved by masking off the low byte to get the first intermediate, and then shifting the value right by 8 bits to get a new identifier with the rest of the sequence. With 64 bits we have enough space for a private prefix, six intermediates, and the final char, which is way more than we should ever need (the _DEC STD 070_ specification recommends supporting at least three intermediates, but in practice we're unlikely to see more than two). With this new way of identifying controls, it should now be possible for every action code to be unique (for the most part). So I've also used this PR to clean up the action codes a bit, splitting the codes for the escape sequences from the control sequences, and sorting them into alphabetical order (which also does a reasonable job of clustering associated controls). ## Validation Steps Performed I think the existing unit tests should be good enough to confirm that all sequences are still being dispatched correctly. However, I've also manually tested a number of sequences to make sure they were still working as expected, in particular those that used intermediates, since they were the most affected by the dispatch code refactoring. Since these changes also affected the input state machine, I've done some manual testing of the conpty keyboard handling (both with and without the new Win32 input mode enabled) to make sure the keyboard VT sequences were processed correctly. I've also manually tested the various VT mouse modes in Vttest to confirm that they were still working correctly too. Closes #7276
2020-08-18 20:57:52 +02:00
void _ActionCollect(const wchar_t wch) noexcept;
void _ActionParam(const wchar_t wch);
void _ActionCsiDispatch(const wchar_t wch);
Clamp parameter values to a maximum of 32767. (#5200) ## Summary of the Pull Request This PR clamps the parameter values in the VT `StateMachine` parser to 32767, which was the initial limit prior to PR #3956. This fixes a number of overflow bugs (some of which could cause the app to crash), since much of the code is not prepared to handle values outside the range of a `short`. ## References #3956 - the PR where the cap was changed to the range of `size_t` #4254 - one example of a crash caused by the higher range ## PR Checklist * [x] Closes #5160 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx ## Detailed Description of the Pull Request / Additional comments The DEC STD 070 reference recommends supporting up to at least 16384 for parameter values, so 32767 should be more than enough for any standard VT sequence. It might be nice to increase the limit to 65535 at some point, since that is the cap used by both XTerm and VTE. However, that is not essential, since there are very few situations where you'd even notice the difference. For now, 32767 is the safest choice for us, since anything greater than that has the potential to overflow and crash the app in a number of places. ## Validation Steps Performed I had to make a couple of modifications to the range checks in the `OutputEngineTest`, more or less reverting to the pre-#3956 behavior, but after that all of the unit tests passed as expected. I manually confirmed that this fixes the hanging test case from #5160, as well as overflow issues in the cursor operations, and crashes in `IL` and `DL` (see https://github.com/microsoft/terminal/issues/4254#issuecomment-575292926).
2020-04-01 14:49:57 +02:00
void _ActionOscParam(const wchar_t wch) noexcept;
void _ActionOscPut(const wchar_t wch);
void _ActionOscDispatch(const wchar_t wch);
void _ActionSs3Dispatch(const wchar_t wch);
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
void _ActionDcsDispatch(const wchar_t wch);
void _ActionClear();
void _ActionIgnore() noexcept;
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
void _ActionInterrupt();
void _EnterGround() noexcept;
void _EnterEscape();
void _EnterEscapeIntermediate() noexcept;
void _EnterCsiEntry();
void _EnterCsiParam() noexcept;
void _EnterCsiIgnore() noexcept;
void _EnterCsiIntermediate() noexcept;
void _EnterOscParam() noexcept;
void _EnterOscString() noexcept;
void _EnterOscTermination() noexcept;
void _EnterSs3Entry();
void _EnterSs3Param() noexcept;
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
void _EnterVt52Param() noexcept;
void _EnterDcsEntry();
void _EnterDcsParam() noexcept;
void _EnterDcsIgnore() noexcept;
void _EnterDcsIntermediate() noexcept;
void _EnterDcsPassThrough() noexcept;
void _EnterSosPmApcString() noexcept;
void _EventGround(const wchar_t wch);
void _EventEscape(const wchar_t wch);
void _EventEscapeIntermediate(const wchar_t wch);
void _EventCsiEntry(const wchar_t wch);
void _EventCsiIntermediate(const wchar_t wch);
void _EventCsiIgnore(const wchar_t wch);
void _EventCsiParam(const wchar_t wch);
Clamp parameter values to a maximum of 32767. (#5200) ## Summary of the Pull Request This PR clamps the parameter values in the VT `StateMachine` parser to 32767, which was the initial limit prior to PR #3956. This fixes a number of overflow bugs (some of which could cause the app to crash), since much of the code is not prepared to handle values outside the range of a `short`. ## References #3956 - the PR where the cap was changed to the range of `size_t` #4254 - one example of a crash caused by the higher range ## PR Checklist * [x] Closes #5160 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx ## Detailed Description of the Pull Request / Additional comments The DEC STD 070 reference recommends supporting up to at least 16384 for parameter values, so 32767 should be more than enough for any standard VT sequence. It might be nice to increase the limit to 65535 at some point, since that is the cap used by both XTerm and VTE. However, that is not essential, since there are very few situations where you'd even notice the difference. For now, 32767 is the safest choice for us, since anything greater than that has the potential to overflow and crash the app in a number of places. ## Validation Steps Performed I had to make a couple of modifications to the range checks in the `OutputEngineTest`, more or less reverting to the pre-#3956 behavior, but after that all of the unit tests passed as expected. I manually confirmed that this fixes the hanging test case from #5160, as well as overflow issues in the cursor operations, and crashes in `IL` and `DL` (see https://github.com/microsoft/terminal/issues/4254#issuecomment-575292926).
2020-04-01 14:49:57 +02:00
void _EventOscParam(const wchar_t wch) noexcept;
void _EventOscString(const wchar_t wch);
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
void _EventOscTermination(const wchar_t wch);
void _EventSs3Entry(const wchar_t wch);
void _EventSs3Param(const wchar_t wch);
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
void _EventVt52Param(const wchar_t wch);
void _EventDcsEntry(const wchar_t wch);
void _EventDcsIgnore() noexcept;
void _EventDcsIntermediate(const wchar_t wch);
void _EventDcsParam(const wchar_t wch);
void _EventDcsPassThrough(const wchar_t wch);
void _EventSosPmApcString(const wchar_t wch) noexcept;
Clamp parameter values to a maximum of 32767. (#5200) ## Summary of the Pull Request This PR clamps the parameter values in the VT `StateMachine` parser to 32767, which was the initial limit prior to PR #3956. This fixes a number of overflow bugs (some of which could cause the app to crash), since much of the code is not prepared to handle values outside the range of a `short`. ## References #3956 - the PR where the cap was changed to the range of `size_t` #4254 - one example of a crash caused by the higher range ## PR Checklist * [x] Closes #5160 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx ## Detailed Description of the Pull Request / Additional comments The DEC STD 070 reference recommends supporting up to at least 16384 for parameter values, so 32767 should be more than enough for any standard VT sequence. It might be nice to increase the limit to 65535 at some point, since that is the cap used by both XTerm and VTE. However, that is not essential, since there are very few situations where you'd even notice the difference. For now, 32767 is the safest choice for us, since anything greater than that has the potential to overflow and crash the app in a number of places. ## Validation Steps Performed I had to make a couple of modifications to the range checks in the `OutputEngineTest`, more or less reverting to the pre-#3956 behavior, but after that all of the unit tests passed as expected. I manually confirmed that this fixes the hanging test case from #5160, as well as overflow issues in the cursor operations, and crashes in `IL` and `DL` (see https://github.com/microsoft/terminal/issues/4254#issuecomment-575292926).
2020-04-01 14:49:57 +02:00
void _AccumulateTo(const wchar_t wch, size_t& value) noexcept;
enum class VTStates
{
Ground,
Escape,
EscapeIntermediate,
CsiEntry,
CsiIntermediate,
CsiIgnore,
CsiParam,
OscParam,
OscString,
OscTermination,
Ss3Entry,
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
Ss3Param,
Vt52Param,
DcsEntry,
DcsIgnore,
DcsIntermediate,
DcsParam,
DcsPassThrough,
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
SosPmApcString
};
Microsoft::Console::VirtualTerminal::ParserTracing _trace;
std::unique_ptr<IStateMachineEngine> _engine;
VTStates _state;
Disable the acceptance of C1 control codes by default (#11690) There are some code pages with "unmapped" code points in the C1 range, which results in them being translated into Unicode C1 control codes, even though that is not their intended use. To avoid having these characters triggering unintentional escape sequences, this PR now disables C1 controls by default. Switching to ISO-2022 encoding will re-enable them, though, since that is the most likely scenario in which they would be required. They can also be explicitly enabled, even in UTF-8 mode, with the `DECAC1` escape sequence. What I've done is add a new mode to the `StateMachine` class that controls whether C1 code points are interpreted as control characters or not. When disabled, these code points are simply dropped from the output, similar to the way a `NUL` is interpreted. This isn't exactly the way they were handled in the v1 console (which I think replaces them with the font _notdef_ glyph), but it matches the XTerm behavior, which seems more appropriate considering this is in VT mode. And it's worth noting that Windows Explorer seems to work the same way. As mentioned above, the mode can be enabled by designating the ISO-2022 coding system with a `DOCS` sequence, and it will be disabled again when UTF-8 is designated. You can also enable it explicitly with a `DECAC1` sequence (originally this was actually a DEC printer sequence, but it doesn't seem unreasonable to use it in a terminal). I've also extended the operations that save and restore "cursor state" (e.g. `DECSC` and `DECRC`) to include the state of the C1 parser mode, since it's closely tied to the code page and character sets which are also saved there. Similarly, when a `DECSTR` sequence resets the code page and character sets, I've now made it reset the C1 mode as well. I should note that the new `StateMachine` mode is controlled via a generic `SetParserMode` method (with a matching API in the `ConGetSet` interface) to allow for easier addition of other modes in the future. And I've reimplemented the existing ANSI/VT52 mode in terms of these generic methods instead of it having to have its own separate APIs. ## Validation Steps Performed Some of the unit tests for OSC sequences were using a C1 `0x9C` for the string terminator, which doesn't work by default anymore. Since that's not a good practice anyway, I thought it best to change those to a standard 7-bit terminator. However, in tests that were explicitly validating the C1 controls, I've just enabled the C1 parser mode at the start of the tests in order to get them working again. There were also some ANSI mode adapter tests that had to be updated to account for the fact that it has now been reimplemented in terms of the `SetParserMode` API. I've added a new state machine test to validate the changes in behavior when the C1 parser mode is enabled or disabled. And I've added an adapter test to verify that the `DesignateCodingSystems` and `AcceptC1Controls` methods toggle the C1 parser mode as expected. I've manually verified the test cases in #10069 and #10310 to confirm that they're no longer triggering control sequences by default. Although, as I explained above, the C1 code points are completely dropped from the output rather than displayed as _notdef_ glyphs. I think this is a reasonable compromise though. Closes #10069 Closes #10310
2021-11-18 00:40:31 +01:00
til::enumset<Mode> _parserMode{ Mode::Ansi };
Add support for VT52 emulation (#4789) ## Summary of the Pull Request This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode. ## References PR #2017 defined the initial specification for VT52 support. PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences. ## PR Checklist * [x] Closes #976 * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017 ## Detailed Description of the Pull Request / Additional comments Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character. Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command. The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class. I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. ## Validation Steps Performed I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset. For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes. In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 23:20:40 +02:00
std::wstring_view _currentString;
size_t _runOffset;
size_t _runSize;
// Construct current run.
//
// Note: We intentionally use this method to create the run lazily for better performance.
// You may find the usage of offset & size unsafe, but under heavy load it shows noticeable performance benefit.
std::wstring_view _CurrentRun() const
{
return _currentString.substr(_runOffset, _runSize);
}
Refactor VT control sequence identification (#7304) This PR changes the way VT control sequences are identified and dispatched, to be more efficient and easier to extend. Instead of parsing the intermediate characters into a vector, and then having to identify a sequence using both that vector and the final char, we now use just a single `uint64_t` value as the identifier. The way the identifier is constructed is by taking the private parameter prefix, each of the intermediate characters, and then the final character, and shifting them into a 64-bit integer one byte at a time, in reverse order. For example, the `DECTLTC` control has a private parameter prefix of `?`, one intermediate of `'`, and a final character of `s`. The ASCII values of those characters are `0x3F`, `0x27`, and `0x73` respectively, and reversing them gets you 0x73273F, so that would then be the identifier for the control. The reason for storing them in reverse order, is because sometimes we need to look at the first intermediate to determine the operation, and treat the rest of the sequence as a kind of sub-identifier (the character set designation sequences are one example of this). When in reverse order, this can easily be achieved by masking off the low byte to get the first intermediate, and then shifting the value right by 8 bits to get a new identifier with the rest of the sequence. With 64 bits we have enough space for a private prefix, six intermediates, and the final char, which is way more than we should ever need (the _DEC STD 070_ specification recommends supporting at least three intermediates, but in practice we're unlikely to see more than two). With this new way of identifying controls, it should now be possible for every action code to be unique (for the most part). So I've also used this PR to clean up the action codes a bit, splitting the codes for the escape sequences from the control sequences, and sorting them into alphabetical order (which also does a reasonable job of clustering associated controls). ## Validation Steps Performed I think the existing unit tests should be good enough to confirm that all sequences are still being dispatched correctly. However, I've also manually tested a number of sequences to make sure they were still working as expected, in particular those that used intermediates, since they were the most affected by the dispatch code refactoring. Since these changes also affected the input state machine, I've done some manual testing of the conpty keyboard handling (both with and without the new Win32 input mode enabled) to make sure the keyboard VT sequences were processed correctly. I've also manually tested the various VT mouse modes in Vttest to confirm that they were still working correctly too. Closes #7276
2020-08-18 20:57:52 +02:00
VTIDBuilder _identifier;
Refactor VT parameter handling (#7799) This PR introduces a pair of classes for managing VT parameters that automatically handle range checking and default fallback values, so the individual operations don't have to do that validation themselves. In addition to simplifying the code, this fixes a few cases where we were mishandling missing or extraneous parameters, and adds support for parameter sequences on commands that couldn't previously handle them. This PR also sets a limit on the number of parameters allowed, to help thwart DoS memory consumption attacks. ## References * The new parameter class also introduces the concept of an omitted/default parameter which is not necessarily zero, which is a prerequisite for addressing issue #4417. ## Detailed Description of the Pull Request / Additional comments There are two new classes provide by this PR: a `VTParameter` class, similar in function to a `std::optional<size_t>`, which holds an individual parameter (which may be an omitted/default value); and a `VTParameters` class, similar in function to `gsl:span<VTParameter>`, which holds a sequence of those parameters. Where `VTParameter` differs from `std::optional` is with the inclusion of two cast operators. There is a `size_t` cast that interprets omitted and zero values as 1 (the expected behaviour for most numeric parameters). And there is a generic cast, for use with the enum parameter types, which interprets omitted values as 0 (the expected behaviour for most selective parameters). The advantage of `VTParameters` class is that it has an `at` method that can never fail - out of range values simply return the a default `VTParameter` instance (this is standard behaviour in VT terminals). It also has a `size` method that will always return a minimum count of 1, since an empty parameter list is typically the equivalent of a single "default" parameter, so this guarantees you'll get at least one value when iterating over the list with `size()`. For cases where we just need to call the same dispatch method for every parameter, there is a helper `for_each` method, which repeatedly calls a given predicate function with each value in the sequence. It also collates the returned success values to determine the overall result of the sequence. As with the `size` method, this will always make at least one call, so it correctly handles empty sequences. With those two classes in place, we could get rid of all the parameter validation and default handling code in the `OutputStateMachineEngine`. We now just use the `VTParameters::at` method to grab a parameter and typically pass it straight to the appropriate dispatch method, letting the cast operators automatically handle the assignment of default values. Occasionally we might need a `value_or` call to specify a non-standard default value, but those cases are fairly rare. In some case the `OutputStateMachineEngine` was also checking whether parameters values were in range, but for the most part this shouldn't have been necessary, since that is something the dispatch classes would already have been doing themselves (in the few cases that they weren't, I've now updated them to do so). I've also updated the `InputStateMachineEngine` in a similar way to the `OutputStateMachineEngine`, getting rid of a few of the parameter extraction methods, and simplifying other parts of the implementation. It's not as clean a replacement as the output engine, but there are still benefits in using the new classes. ## Validation Steps Performed For the most part I haven't had to alter existing tests other than accounting for changes to the API. There were a couple of tests I needed to drop because they were checking for failure cases which shouldn't have been failing (unexpected parameters should never be an error), or testing output engine validation that is no longer handled at that level. I've added a few new tests to cover operations that take sequences of selective parameters (`ED`, `EL`, `TBC`, `SM`, and `RM`). And I've extended the cursor movement tests to make sure those operations can handle extraneous parameters that weren't expected. I've also added a test to verify that the state machine will correctly ignore parameters beyond the maximum 32 parameter count limit. I've also manual confirmed that the various test cases given in issues #2101 are now working as expected. Closes #2101
2020-10-15 18:12:52 +02:00
std::vector<VTParameter> _parameters;
bool _parameterLimitReached;
std::wstring _oscString;
size_t _oscParameter;
Introduce a mechanism for passing through DCS data strings (#9307) This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
2021-04-30 21:17:30 +02:00
IStateMachineEngine::StringHandler _dcsStringHandler;
std::optional<std::wstring> _cachedSequence;
// This is tracked per state machine instance so that separate calls to Process*
// can start and finish a sequence.
bool _processingIndividually;
};
}