terminal/src/terminal/parser/stateMachine.hpp
James Holderness 6742965bb8
Disable the acceptance of C1 control codes by default (#11690)
There are some code pages with "unmapped" code points in the C1 range,
which results in them being translated into Unicode C1 control codes,
even though that is not their intended use. To avoid having these
characters triggering unintentional escape sequences, this PR now
disables C1 controls by default.

Switching to ISO-2022 encoding will re-enable them, though, since that
is the most likely scenario in which they would be required. They can
also be explicitly enabled, even in UTF-8 mode, with the `DECAC1` escape
sequence.

What I've done is add a new mode to the `StateMachine` class that
controls whether C1 code points are interpreted as control characters or
not. When disabled, these code points are simply dropped from the
output, similar to the way a `NUL` is interpreted.

This isn't exactly the way they were handled in the v1 console (which I
think replaces them with the font _notdef_ glyph), but it matches the
XTerm behavior, which seems more appropriate considering this is in VT
mode. And it's worth noting that Windows Explorer seems to work the same
way.

As mentioned above, the mode can be enabled by designating the ISO-2022
coding system with a `DOCS` sequence, and it will be disabled again when
UTF-8 is designated. You can also enable it explicitly with a `DECAC1`
sequence (originally this was actually a DEC printer sequence, but it
doesn't seem unreasonable to use it in a terminal).

I've also extended the operations that save and restore "cursor state"
(e.g. `DECSC` and `DECRC`) to include the state of the C1 parser mode,
since it's closely tied to the code page and character sets which are
also saved there. Similarly, when a `DECSTR` sequence resets the code
page and character sets, I've now made it reset the C1 mode as well.

I should note that the new `StateMachine` mode is controlled via a
generic `SetParserMode` method (with a matching API in the `ConGetSet`
interface) to allow for easier addition of other modes in the future.
And I've reimplemented the existing ANSI/VT52 mode in terms of these
generic methods instead of it having to have its own separate APIs.

## Validation Steps Performed

Some of the unit tests for OSC sequences were using a C1 `0x9C` for the
string terminator, which doesn't work by default anymore. Since that's
not a good practice anyway, I thought it best to change those to a
standard 7-bit terminator. However, in tests that were explicitly
validating the C1 controls, I've just enabled the C1 parser mode at the
start of the tests in order to get them working again.

There were also some ANSI mode adapter tests that had to be updated to
account for the fact that it has now been reimplemented in terms of the
`SetParserMode` API.

I've added a new state machine test to validate the changes in behavior
when the C1 parser mode is enabled or disabled. And I've added an
adapter test to verify that the `DesignateCodingSystems` and
`AcceptC1Controls` methods toggle the C1 parser mode as expected.

I've manually verified the test cases in #10069 and #10310 to confirm
that they're no longer triggering control sequences by default.
Although, as I explained above, the C1 code points are completely
dropped from the output rather than displayed as _notdef_ glyphs. I
think this is a reasonable compromise though.

Closes #10069
Closes #10310
2021-11-17 23:40:31 +00:00

185 lines
6.2 KiB
C++

// Copyright (c) Microsoft Corporation.
// Licensed under the MIT license.
/*
Module Name:
- stateMachine.hpp
Abstract:
- This declares the entire state machine for handling Virtual Terminal Sequences
- The design is based from the specifications at http://vt100.net
- The actual implementation of actions decoded by the StateMachine should be
implemented in an IStateMachineEngine.
*/
#pragma once
#include "IStateMachineEngine.hpp"
#include "telemetry.hpp"
#include "tracing.hpp"
#include <memory>
namespace Microsoft::Console::VirtualTerminal
{
// The DEC STD 070 reference recommends supporting up to at least 16384 for
// parameter values, so 32767 should be more than enough. At most we might
// want to increase this to 65535, since that is what XTerm and VTE support,
// but for now 32767 is the safest limit for our existing code base.
constexpr size_t MAX_PARAMETER_VALUE = 32767;
// The DEC STD 070 reference requires that a minimum of 16 parameter values
// are supported, but most modern terminal emulators will allow around twice
// that number.
constexpr size_t MAX_PARAMETER_COUNT = 32;
class StateMachine final
{
#ifdef UNIT_TESTING
friend class OutputEngineTest;
friend class InputEngineTest;
#endif
public:
StateMachine(std::unique_ptr<IStateMachineEngine> engine);
enum class Mode : size_t
{
AcceptC1,
Ansi,
};
void SetParserMode(const Mode mode, const bool enabled);
bool GetParserMode(const Mode mode) const;
void ProcessCharacter(const wchar_t wch);
void ProcessString(const std::wstring_view string);
void ResetState() noexcept;
bool FlushToTerminal();
const IStateMachineEngine& Engine() const noexcept;
IStateMachineEngine& Engine() noexcept;
private:
void _ActionExecute(const wchar_t wch);
void _ActionExecuteFromEscape(const wchar_t wch);
void _ActionPrint(const wchar_t wch);
void _ActionEscDispatch(const wchar_t wch);
void _ActionVt52EscDispatch(const wchar_t wch);
void _ActionCollect(const wchar_t wch) noexcept;
void _ActionParam(const wchar_t wch);
void _ActionCsiDispatch(const wchar_t wch);
void _ActionOscParam(const wchar_t wch) noexcept;
void _ActionOscPut(const wchar_t wch);
void _ActionOscDispatch(const wchar_t wch);
void _ActionSs3Dispatch(const wchar_t wch);
void _ActionDcsDispatch(const wchar_t wch);
void _ActionClear();
void _ActionIgnore() noexcept;
void _ActionInterrupt();
void _EnterGround() noexcept;
void _EnterEscape();
void _EnterEscapeIntermediate() noexcept;
void _EnterCsiEntry();
void _EnterCsiParam() noexcept;
void _EnterCsiIgnore() noexcept;
void _EnterCsiIntermediate() noexcept;
void _EnterOscParam() noexcept;
void _EnterOscString() noexcept;
void _EnterOscTermination() noexcept;
void _EnterSs3Entry();
void _EnterSs3Param() noexcept;
void _EnterVt52Param() noexcept;
void _EnterDcsEntry();
void _EnterDcsParam() noexcept;
void _EnterDcsIgnore() noexcept;
void _EnterDcsIntermediate() noexcept;
void _EnterDcsPassThrough() noexcept;
void _EnterSosPmApcString() noexcept;
void _EventGround(const wchar_t wch);
void _EventEscape(const wchar_t wch);
void _EventEscapeIntermediate(const wchar_t wch);
void _EventCsiEntry(const wchar_t wch);
void _EventCsiIntermediate(const wchar_t wch);
void _EventCsiIgnore(const wchar_t wch);
void _EventCsiParam(const wchar_t wch);
void _EventOscParam(const wchar_t wch) noexcept;
void _EventOscString(const wchar_t wch);
void _EventOscTermination(const wchar_t wch);
void _EventSs3Entry(const wchar_t wch);
void _EventSs3Param(const wchar_t wch);
void _EventVt52Param(const wchar_t wch);
void _EventDcsEntry(const wchar_t wch);
void _EventDcsIgnore() noexcept;
void _EventDcsIntermediate(const wchar_t wch);
void _EventDcsParam(const wchar_t wch);
void _EventDcsPassThrough(const wchar_t wch);
void _EventSosPmApcString(const wchar_t wch) noexcept;
void _AccumulateTo(const wchar_t wch, size_t& value) noexcept;
enum class VTStates
{
Ground,
Escape,
EscapeIntermediate,
CsiEntry,
CsiIntermediate,
CsiIgnore,
CsiParam,
OscParam,
OscString,
OscTermination,
Ss3Entry,
Ss3Param,
Vt52Param,
DcsEntry,
DcsIgnore,
DcsIntermediate,
DcsParam,
DcsPassThrough,
SosPmApcString
};
Microsoft::Console::VirtualTerminal::ParserTracing _trace;
std::unique_ptr<IStateMachineEngine> _engine;
VTStates _state;
til::enumset<Mode> _parserMode{ Mode::Ansi };
std::wstring_view _currentString;
size_t _runOffset;
size_t _runSize;
// Construct current run.
//
// Note: We intentionally use this method to create the run lazily for better performance.
// You may find the usage of offset & size unsafe, but under heavy load it shows noticeable performance benefit.
std::wstring_view _CurrentRun() const
{
return _currentString.substr(_runOffset, _runSize);
}
VTIDBuilder _identifier;
std::vector<VTParameter> _parameters;
bool _parameterLimitReached;
std::wstring _oscString;
size_t _oscParameter;
IStateMachineEngine::StringHandler _dcsStringHandler;
std::optional<std::wstring> _cachedSequence;
// This is tracked per state machine instance so that separate calls to Process*
// can start and finish a sequence.
bool _processingIndividually;
};
}