Commit graph

26 commits

Author SHA1 Message Date
Leonard Hecker f2386de422
Improve performance and binary size of til::enumset (#11493)
This commit approximately doubles the performance of til::enumset
and reduces it's binary footprint by approximately 1kB.
Most of the binary size can be attributed to exception handling.

Unfortunately this commit removes assertions that the given values are less than
the number of bits in the `underlying_type`. However I believe this to be a good
trade-off as the tests previously only happened at runtime, while tests at
compile time would be highly preferable. Such tests are technically possible,
however MSVC fails to compile (valid) `static_assert`s containing
`static_cast`s over a parameter pack at the time of writing.
With future MSVC versions such checks can be added to this class.

This change was initially discussed in #10492, but was forgotten to
be considered before it was merged. Since the work was already done,
this commit re-introduces the optimization. It's free!

## PR Checklist
* [x] I work here.
* [x] Tests added/passed

## Validation Steps Performed
* Run `printf '\e]8;;http://example.com\e\\This is a link\e]8;;\e\\\n'` in WSL
* A wild dotted line appears ✔️
2021-11-23 18:44:58 +00:00
James Holderness 6742965bb8
Disable the acceptance of C1 control codes by default (#11690)
There are some code pages with "unmapped" code points in the C1 range,
which results in them being translated into Unicode C1 control codes,
even though that is not their intended use. To avoid having these
characters triggering unintentional escape sequences, this PR now
disables C1 controls by default.

Switching to ISO-2022 encoding will re-enable them, though, since that
is the most likely scenario in which they would be required. They can
also be explicitly enabled, even in UTF-8 mode, with the `DECAC1` escape
sequence.

What I've done is add a new mode to the `StateMachine` class that
controls whether C1 code points are interpreted as control characters or
not. When disabled, these code points are simply dropped from the
output, similar to the way a `NUL` is interpreted.

This isn't exactly the way they were handled in the v1 console (which I
think replaces them with the font _notdef_ glyph), but it matches the
XTerm behavior, which seems more appropriate considering this is in VT
mode. And it's worth noting that Windows Explorer seems to work the same
way.

As mentioned above, the mode can be enabled by designating the ISO-2022
coding system with a `DOCS` sequence, and it will be disabled again when
UTF-8 is designated. You can also enable it explicitly with a `DECAC1`
sequence (originally this was actually a DEC printer sequence, but it
doesn't seem unreasonable to use it in a terminal).

I've also extended the operations that save and restore "cursor state"
(e.g. `DECSC` and `DECRC`) to include the state of the C1 parser mode,
since it's closely tied to the code page and character sets which are
also saved there. Similarly, when a `DECSTR` sequence resets the code
page and character sets, I've now made it reset the C1 mode as well.

I should note that the new `StateMachine` mode is controlled via a
generic `SetParserMode` method (with a matching API in the `ConGetSet`
interface) to allow for easier addition of other modes in the future.
And I've reimplemented the existing ANSI/VT52 mode in terms of these
generic methods instead of it having to have its own separate APIs.

## Validation Steps Performed

Some of the unit tests for OSC sequences were using a C1 `0x9C` for the
string terminator, which doesn't work by default anymore. Since that's
not a good practice anyway, I thought it best to change those to a
standard 7-bit terminator. However, in tests that were explicitly
validating the C1 controls, I've just enabled the C1 parser mode at the
start of the tests in order to get them working again.

There were also some ANSI mode adapter tests that had to be updated to
account for the fact that it has now been reimplemented in terms of the
`SetParserMode` API.

I've added a new state machine test to validate the changes in behavior
when the C1 parser mode is enabled or disabled. And I've added an
adapter test to verify that the `DesignateCodingSystems` and
`AcceptC1Controls` methods toggle the C1 parser mode as expected.

I've manually verified the test cases in #10069 and #10310 to confirm
that they're no longer triggering control sequences by default.
Although, as I explained above, the C1 code points are completely
dropped from the output rather than displayed as _notdef_ glyphs. I
think this is a reasonable compromise though.

Closes #10069
Closes #10310
2021-11-17 23:40:31 +00:00
Chester Liu 6a37818c07
Defer _run substr operation in StateMachine::ProcessString (#10471)
Do not eagerly create run by substring to reduce the unnecessary overhead.

Switching from `.substr` to offset/length tracking saves us 7% CPU time.
2021-06-29 17:47:27 -05:00
James Holderness 2559ed6efa
Introduce a mechanism for passing through DCS data strings (#9307)
This PR introduces a mechanism via which DCS data strings can be passed
through directly to the dispatch method that will be handling them, so
the data can be processed as it is received, rather than being buffered
in the state machine. This also simplifies the way string termination is
handled, so it now more closely matches the behaviour of the original
DEC terminals.

* Initial support for DCS sequences was introduced in PR #6328.
* Handling of DCS (and other) C1 controls was added in PR #7340.
* This is a prerequisite for Sixel (#448) and Soft Font (#9164) support.

The way this now works, a `DCS` sequence is dispatched as soon as the
final character of the `VTID` is received. Based on that ID, the
`OutputStateMachineEngine` should forward the call to the corresponding
dispatch method, and its the responsibility of that method to return an
appropriate handler function for the sequence.

From then on, the `StateMachine` will pass on all of the remaining bytes
in the data string to the handler function. When a data string is
terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass
on one final `ESC` character to let the handler know that the sequence
is finished. The handler can also end a sequence prematurely by
returning false, and then all remaining data bytes will be ignored.

Note that once a `DCS` sequence has been dispatched, it's not possible
to abort the data string. Both `CAN` and `SUB` are considered valid
forms of termination, and an `ESC` doesn't necessarily have to be
followed by a `\` for the string terminator. This is because the data
string is typically processed as it's received. For example, when
outputting a Sixel image, you wouldn't erase the parts that had already
been displayed if the data string is terminated early.

With this new way of handling the string termination, I was also able to
simplify some of the `StateMachine` processing, and get rid of a few
states that are no longer necessary. These changes don't apply to the
`OSC` sequences, though, since we're more likely to want to match the
XTerm behavior for those cases (which requires a valid `ST` control for
the sequence to be accepted).

## Validation Steps Performed

For the unit tests, I've had to make a few changes to some of the
`OutputEngineTests` to account for the updated `StateMachine`
processing. I've also added a new `StateMachineTest` to confirm that the
data strings are correctly passed through to the string handler under
all forms of termination.

To test whether the framework is actually usable, I've been working on
DRCS Soft Font support branched off of this PR, and haven't encountered
any problems. To test the throughput speed, I also hacked together a
basic Sixel parser, and that seemed to perform reasonably well.

Closes #7316
2021-04-30 19:17:30 +00:00
Michael Niksa 5220738d8e
Fix VT parser memory leak in tracing (#8618)
Fix memory leak that occurs from not dispatching the end of sequences on all actions (since it is buffering up all characters for trace reasons.) Also don't bother storing if no one is listening.

## PR Checklist
- [x] Closes #8283
* [x] Fixes leak found while bumbling around.
* [x] I work here.

## Detailed Description of the Pull Request / Additional comments
- We trace all the things leading up to the Action phase in the VT parser for ETW tracing to make debugging the parser easier, but we made two mistakes.
- At some point, three of the actions (related to print/execute) weren't dispatching the stored up sequence to tracing and not clearing it. So printing/executing in a giant run over and over caused the vector to bloat and bloat and bloat forever.
- We're storing things even when no one is listening. That's a waste.

## Validation Steps Performed
- Watched it grow every time I did `type big.txt` under `taskman.exe`. Then watched it not do that after.
- I did technically WPR it to figure out this was the culprit.
2021-01-04 17:14:08 +00:00
James Holderness 55151a4a04
Refactor VT parameter handling (#7799)
This PR introduces a pair of classes for managing VT parameters that
automatically handle range checking and default fallback values, so the
individual operations don't have to do that validation themselves. In
addition to simplifying the code, this fixes a few cases where we were
mishandling missing or extraneous parameters, and adds support for
parameter sequences on commands that couldn't previously handle them.
This PR also sets a limit on the number of parameters allowed, to help
thwart DoS memory consumption attacks.

## References

* The new parameter class also introduces the concept of an
  omitted/default parameter which is not necessarily zero, which is a
  prerequisite for addressing issue #4417.

## Detailed Description of the Pull Request / Additional comments

There are two new classes provide by this PR: a `VTParameter` class,
similar in function to a `std::optional<size_t>`, which holds an
individual parameter (which may be an omitted/default value); and a
`VTParameters` class, similar in function to `gsl:span<VTParameter>`,
which holds a sequence of those parameters.

Where `VTParameter` differs from `std::optional` is with the inclusion
of two cast operators. There is a `size_t` cast that interprets omitted
and zero values as 1 (the expected behaviour for most numeric
parameters). And there is a generic cast, for use with the enum
parameter types, which interprets omitted values as 0 (the expected
behaviour for most selective parameters).

The advantage of `VTParameters` class is that it has an `at` method that
can never fail - out of range values simply return the a default
`VTParameter` instance (this is standard behaviour in VT terminals). It
also has a `size` method that will always return a minimum count of 1,
since an empty parameter list is typically the equivalent of a single
"default" parameter, so this guarantees you'll get at least one value
when iterating over the list with `size()`.

For cases where we just need to call the same dispatch method for every
parameter, there is a helper `for_each` method, which repeatedly calls a
given predicate function with each value in the sequence. It also
collates the returned success values to determine the overall result of
the sequence. As with the `size` method, this will always make at least
one call, so it correctly handles empty sequences.

With those two classes in place, we could get rid of all the parameter
validation and default handling code in the `OutputStateMachineEngine`.
We now just use the `VTParameters::at` method to grab a parameter and
typically pass it straight to the appropriate dispatch method, letting
the cast operators automatically handle the assignment of default
values. Occasionally we might need a `value_or` call to specify a
non-standard default value, but those cases are fairly rare.

In some case the `OutputStateMachineEngine` was also checking whether
parameters values were in range, but for the most part this shouldn't
have been necessary, since that is something the dispatch classes would
already have been doing themselves (in the few cases that they weren't,
I've now updated them to do so).

I've also updated the `InputStateMachineEngine` in a similar way to the
`OutputStateMachineEngine`, getting rid of a few of the parameter
extraction methods, and simplifying other parts of the implementation.
It's not as clean a replacement as the output engine, but there are
still benefits in using the new classes.

## Validation Steps Performed

For the most part I haven't had to alter existing tests other than
accounting for changes to the API. There were a couple of tests I needed
to drop because they were checking for failure cases which shouldn't
have been failing (unexpected parameters should never be an error), or
testing output engine validation that is no longer handled at that
level.

I've added a few new tests to cover operations that take sequences of
selective parameters (`ED`, `EL`, `TBC`, `SM`, and `RM`). And I've
extended the cursor movement tests to make sure those operations can
handle extraneous parameters that weren't expected. I've also added a
test to verify that the state machine will correctly ignore parameters
beyond the maximum 32 parameter count limit.

I've also manual confirmed that the various test cases given in issues
#2101 are now working as expected.

Closes #2101
2020-10-15 16:12:52 +00:00
Chester Liu f91b53d5fd
Preprocess and convert C1 controls to their 7 bit equivalent (#7340)
C1 control characters are now first converted to their 7 bit equivalent.
This allows us to unify the logic of C1 and C0 escape handling. This
also adds support for SOS/PM/APC string.

* Unify the logic for C1 and C0 escape handling by converting C1 to C0
  beforehand. This adds support for various C1 characters, including
  IND(8/4), NEL(8/5), HTS(8/8), RI(8/13), SS2(8/14), SS3(8/15),
  OSC(9/13), etc. 
* Add support for SOS/PM/APC escape sequences. Fixes #7032
* Use "Variable Length String" logic to unify the string termination
  handling of OSC, DCS and SOS/PM/APC. This fixes an issue where OSC
  action is successfully dispatched even when terminated with non-ST
  character. Introduced by #6328, the DCS PassThrough is spared from
  this issue. This PR puts them together and add test cases for them.

References:
https://vt100.net/docs/vt510-rm/chapter4.html
https://vt100.net/emu/dec_ansi_parser

Closes #7032
Closes #7317
2020-09-16 22:30:46 +00:00
James Holderness 7fcff4d33a
Refactor VT control sequence identification (#7304)
This PR changes the way VT control sequences are identified and
dispatched, to be more efficient and easier to extend. Instead of
parsing the intermediate characters into a vector, and then having to
identify a sequence using both that vector and the final char, we now
use just a single `uint64_t` value as the identifier.

The way the identifier is constructed is by taking the private parameter
prefix, each of the intermediate characters, and then the final
character, and shifting them into a 64-bit integer one byte at a time,
in reverse order. For example, the `DECTLTC` control has a private
parameter prefix of `?`, one intermediate of `'`, and a final character
of `s`. The ASCII values of those characters are `0x3F`, `0x27`, and
`0x73` respectively, and reversing them gets you 0x73273F, so that would
then be the identifier for the control.

The reason for storing them in reverse order, is because sometimes we
need to look at the first intermediate to determine the operation, and
treat the rest of the sequence as a kind of sub-identifier (the
character set designation sequences are one example of this). When in
reverse order, this can easily be achieved by masking off the low byte
to get the first intermediate, and then shifting the value right by 8
bits to get a new identifier with the rest of the sequence.

With 64 bits we have enough space for a private prefix, six
intermediates, and the final char, which is way more than we should ever
need (the _DEC STD 070_ specification recommends supporting at least
three intermediates, but in practice we're unlikely to see more than
two).

With this new way of identifying controls, it should now be possible for
every action code to be unique (for the most part). So I've also used
this PR to clean up the action codes a bit, splitting the codes for the
escape sequences from the control sequences, and sorting them into
alphabetical order (which also does a reasonable job of clustering
associated controls).

## Validation Steps Performed

I think the existing unit tests should be good enough to confirm that
all sequences are still being dispatched correctly. However, I've also
manually tested a number of sequences to make sure they were still
working as expected, in particular those that used intermediates, since
they were the most affected by the dispatch code refactoring.

Since these changes also affected the input state machine, I've done
some manual testing of the conpty keyboard handling (both with and
without the new Win32 input mode enabled) to make sure the keyboard VT
sequences were processed correctly. I've also manually tested the
various VT mouse modes in Vttest to confirm that they were still working
correctly too.

Closes #7276
2020-08-18 18:57:52 +00:00
Chester Liu acac35023d
Add initial support for VT DCS sequences (#6328)
As the title suggests, this commit adds initial support for the VT DCS
sequences. The parameters are parsed but not yet used. The pass through
data is yet to be handled. This effectively fixes #120 by making Sixel
graphics sequences *ignored* instead of printed.

* https://vt100.net/docs/vt510-rm/chapter4.html
* https://vt100.net/emu/dec_ansi_parser

Tests added.

References #448
Closes #120
2020-08-17 10:30:07 -07:00
James Holderness 96a77cb74b
Improve support for VT character sets (#4496)
This PR improves our VT character set support, enabling the [`SCS`]
escape sequences to designate into all four G-sets with both 94- and
96-character sets, and supports invoking those G-sets into both the GL
and GR areas of the code table, with [locking shifts] and [single
shifts]. It also adds [`DOCS`] sequences to switch between UTF-8 and the
ISO-2022 coding system (which is what the VT character sets require),
and adds support for a lot more characters sets, up to around the level
of a VT510.

[`SCS`]: https://vt100.net/docs/vt510-rm/SCS.html
[locking shifts]: https://vt100.net/docs/vt510-rm/LS.html
[single shifts]: https://vt100.net/docs/vt510-rm/SS.html
[`DOCS`]: https://en.wikipedia.org/wiki/ISO/IEC_2022#Interaction_with_other_coding_systems

## Detailed Description of the Pull Request / Additional comments

To make it easier for us to declare a bunch of character sets, I've made
a little `constexpr` class that can build up a mapping table from a base
character set (ASCII or Latin1), along with a collection of mappings for
the characters the deviate from the base set. Many of the character sets
are simple variations of ASCII, so they're easy to define this way.

This class then casts directly to a `wstring_view` which is how the
translation tables are represented in most of the code. We have an array
of four of these tables representing the four G-sets, two instances for
the active left and right tables, and one instance for the single shift
table.

Initially we had just one `DesignateCharset` method, which could select
the active character set. We now have two designate methods (for 94- and
96- character sets), and each takes a G-set number specifying the target
of the designation, and a pair of characters identifying the character
set that will be designated (at the higher VT levels, character sets are
often identified by more than one character).

There are then two new `LockingShift` methods to invoke these G-sets
into either the GL or GR area of the code table, and a `SingleShift`
method which invokes a G-set temporarily (for just the next character
that is output).

I should mention here that I had to make some changes to the state
machine to make these single shift sequences work. The problem is that
the input state machine treats `SS3` as the start of a control sequence,
while the output state machine needs it to be dispatched immediately
(it's literally the _Single Shift 3_ escape sequence). To make that
work, I've added a `ParseControlSequenceAfterSs3` callback in the
`IStateMachineEngine` interface to decide which behavior is appropriate.

When it comes to mapping a character, it's simply an array reference
into the appropriate `wstring_view` table. If the single shift table is
set, that takes preference. Otherwise the GL table is used for
characters in the range 0x20 to 0x7F, and the GR table for characters
0xA0 to 0xFF (technically some character sets will only map up to 0x7E
and 0xFE, but that's easily controlled by the length of the
`wstring_view`).

The `DEL` character is a bit of a special case. By default it's meant to
be ignored like the `NUL` character (it's essentially a time-fill
character). However, it's possible that it could be remapped to a
printable character in a 96-character set, so we need to check for that
after the translation. This is handled in the `AdaptDispatch::Print`
method, so it doesn't interfere with the primary `PrintString` code
path.

The biggest problem with this whole process, though, is that the GR
mappings only really make sense if you have access to the raw output,
but by the time the output gets to us, it would already have been
translated to Unicode by the active code page. And in the case of UTF-8,
the characters we eventually receive may originally have been composed
from two or more code points.

The way I've dealt with this was to disable the GR translations by
default, and then added support for a pair of ISO-2022 `DOCS` sequences,
which can switch the code page between UTF-8 and ISO-8859-1. When the
code page is ISO-8859-1, we're essentially receiving the raw output
bytes, so it's safe to enable the GR translations. This is not strictly
correct ISO-2022 behavior, and there are edge cases where it's not going
to work, but it's the best solution I could come up with.

## Validation Steps Performed

As a result of the `SS3` changes in the state machine engine, I've had
to move the existing `SS3` tests from the `OutputEngineTest` to the
`InputEngineTest`, otherwise they would now fail (technically they
should never have been output tests).

I've added no additional unit tests, but I have done a lot of manual
testing, and made sure we passed all the character set tests in Vttest
(at least for the character sets we currently support). Note that this
required a slightly hacked version of the app, since by default it
doesn't expose a lot of the test to low-level terminals, and we
currently identify as a VT100.

Closes #3377
Closes #3487
2020-06-04 19:40:15 +00:00
James Holderness d92c8293ce
Add support for VT52 emulation (#4789)
## Summary of the Pull Request

This PR adds support for the core VT52 commands, and implements the `DECANM` private mode sequence, which switches the terminal between ANSI mode and VT52-compatible mode.

## References

PR #2017 defined the initial specification for VT52 support.
PR #4044 removed the original VT52 cursor ops that conflicted with VT100 sequences.

## PR Checklist
* [x] Closes #976
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [x] Tests added/passed
* [ ] Requires documentation to be updated
* [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #2017

## Detailed Description of the Pull Request / Additional comments

Most of the work involves updates to the parsing state machine, which behaves differently in VT52 mode. `CSI`, `OSC`, and `SS3` sequences are not applicable, and there is one special-case escape sequence (_Direct Cursor Address_), which requires an additional state to handle parameters that come _after_ the final character.

Once the parsing is handled though, it's mostly just a matter of dispatching the commands to existing methods in the `ITermDispatch` interface. Only one new method was required in the interface to handle the _Identify_ command.

The only real new functionality is in the `TerminalInput` class, which needs to generate different escape sequences for certain keys in VT52 mode. This does not yet support _all_ of the VT52 key sequences, because the VT100 support is itself not yet complete. But the basics are in place, and I think the rest is best left for a follow-up issue, and potentially a refactor of the `TerminalInput` class.

I should point out that the original spec called for a new _Graphic Mode_ character set, but I've since discovered that the VT terminals that _emulate_ VT52 just use the existing VT100 _Special Graphics_ set, so that is really what we should be doing too. We can always consider adding the VT52 graphic set as a option later, if there is demand for strict VT52 compatibility. 

## Validation Steps Performed

I've added state machine and adapter tests to confirm that the `DECANM` mode changing sequences are correctly dispatched and forwarded to the `ConGetSet` handler. I've also added state machine tests that confirm the VT52 escape sequences are dispatched correctly when the ANSI mode is reset.

For fuzzing support, I've extended the VT command fuzzer to generate the different kinds of VT52 sequences, as well as mode change sequences to switch between the ANSI and VT52 modes.

In terms of manual testing, I've confirmed that the _Test of VT52 mode_ in Vttest now works as expected.
2020-06-01 21:20:40 +00:00
James Holderness 8585bc6bde
Clamp parameter values to a maximum of 32767. (#5200)
## Summary of the Pull Request

This PR clamps the parameter values in the VT `StateMachine` parser to 32767, which was the initial limit prior to PR #3956. This fixes a number of overflow bugs (some of which could cause the app to crash), since much of the code is not prepared to handle values outside the range of a `short`.

## References

#3956 - the PR where the cap was changed to the range of `size_t`
#4254 - one example of a crash caused by the higher range

## PR Checklist
* [x] Closes #5160
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [x] Tests added/passed
* [ ] Requires documentation to be updated
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

## Detailed Description of the Pull Request / Additional comments

The DEC STD 070 reference recommends supporting up to at least 16384 for parameter values, so 32767 should be more than enough for any standard VT sequence. It might be nice to increase the limit to 65535 at some point, since that is the cap used by both XTerm and VTE. However, that is not essential, since there are very few situations where you'd even notice the difference. For now, 32767 is the safest choice for us, since anything greater than that has the potential to overflow and crash the app in a number of places.

## Validation Steps Performed

I had to make a couple of modifications to the range checks in the `OutputEngineTest`, more or less reverting to the pre-#3956 behavior, but after that all of the unit tests passed as expected.

I manually confirmed that this fixes the hanging test case from #5160, as well as overflow issues in the cursor operations, and crashes in `IL` and `DL` (see https://github.com/microsoft/terminal/issues/4254#issuecomment-575292926).
2020-04-01 12:49:57 +00:00
Mike Griese 7b9c8c7055
Fix C-M-z, C-M-x in Conpty (#4940)
## Summary of the Pull Request

This PR ensures that Conpty properly treats `^[^Z` and `^[^X` as
<kbd>Ctrl+Alt+z</kbd> and <kbd>Ctrl+Alt+x</kbd>, instead of <kbd>Ctrl+z</kbd>
and <kbd>Ctrl+x</kbd>.

## References

## PR Checklist
* [x] Closes #4201
* [x] I work here
* [x] Tests added/passed
* [n/a] Requires documentation to be updated

## Detailed Description of the Pull Request / Additional comments

`^Z` and `^X` are special control characters, SUB and CAN. For the output state
machine, these characters are supposed to be executed from _any_ state. However,
we shouldn't do this for the input engine. With the current behavior, these
characters are immediately executed regardless of what state we're in. That
means we end up synthesizing <kbd>Ctrl+z/x</kbd> for these characters. However,
for the InputStateMachine engine, when these characters are preceeded by `^[`
(ESC), we want to treat them as <kbd>Ctrl+Alt+z/x</kbd>.

This just adds a check in `StateMachine` to see if we should immediately execute
these characters from any state, similar to many of the other exceptions we
already perform in the StateMachine for the input engine.

## Validation Steps Performed
* ran tests
* checked `showkey -a` in gnome-terminal
* checked `showkey -a` in conhost
* checked `showkey -a` in vt pipeterm (conhost as a conpty terminal)
* checked `showkey -a` in Windows Terminal
2020-03-16 16:59:48 +00:00
Dustin L. Howett (MSFT) 64ac0d25e0
vt: make sure to Flush the entire parsed sequence (#4870)
When we had to flush unknown sequences to the terminal, we were only
taking the _most recent run_ with us; therefore, if we received `\e[?12`
and `34h` in separate packets we would _only_ send out `34h`.

This change fixes that issue by ensuring that we cache partial bits of
sequences we haven't yet completed, just in case we need to flush them.

Fixes #3080.
Fixes #3081.
2020-03-11 16:44:52 -07:00
Josh Soref a13ccfd0f5
Fix a bunch of spelling errors across the project (#4295)
Generated by https://github.com/jsoref/spelling `f`; to maintain your repo, please consider `fchurn`

I generally try to ignore upstream bits. I've accidentally included some items from the `deps/` directory. I expect someone will give me a list of items to drop, I'm happy to drop whole files/directories, or to split the PR into multiple items (E.g. comments/locals/public).

Closes #4294
2020-02-10 20:40:01 +00:00
Michael Niksa 4129ceb904 stab in the dark to fix x86 tests. (#4202)
## Summary of the Pull Request
Perform checking on `std::basic_string_view<T>.substr()` calls to
prevent running out of bounds and sporadic Privileged Instruction throws
during x86 tests.

## PR Checklist
* [x] Closes the x86 tests failing all over the place since #4125 for no
  apparent reason
* [x] I work here
* [x] Tests pass 

## Detailed Description of the Pull Request / Additional comments
It appears that not all `std::basic_string_view<T>.substr()` calls are
created equally. I rooted around for other versions of the code in our
source tree and found several versions that were less careful about
checking the start position and the size than the one that appears when
building locally on dev machines. 

My theory is that one of these older versions is deployed somewhere in
the CI. Instead of clamping down the size parameter appropriately or
throwing correctly when the position is out of bounds, I believe that
it's just creating a substring with a bad range over an
invalid/uninitialized memory region. Then when the test operates on
that, sometimes it turns out to trigger the privileged instruction
NTSTATUS error we are seeing in CI.

## Test Procedure
1. Fixed the thing
2. Ran the CI and it worked
3. Reverted everything and turned off all of the CI build except just
   the parser tests (and supporting libraries)
4. Ran CI and it failed
5. Put the fix back on top (cherry-pick)
6. It worked.
7. Ran it again.
8. It worked.
9. Turn all the rest of the CI build back on
2020-01-14 00:46:07 +00:00
Michael Niksa 913c5ec744 Correct passthrough sequences after refactor (#4125)
## Summary of the Pull Request
When refactoring the `StateMachine::ProcessString` algorithm to use safer structures, I made an off-by-one error when attempting to simplify the loop. 

## References
- Introduced in #3956 

## PR Checklist
* [x] Closes #4116
* [x] I work here.
* [x] Tests added/passed
* [x] No documentation
* [x] I'm a core contributor.

## Detailed Description of the Pull Request / Additional comments
The algorithm in use exploited holding onto some pointers and sizes as it rotated around the loop to call back as member variables in the pass-through function `FlushToTerminal`. 

As a part of the refactor, I adjusted to persisting a `std::wstring_view` of the currently processing string instead of pointer/size. I also attempted to simplify the loop at the same time as both the individual and group branches were performing some redundant operations in respect to updating the "run" length. 

Turns out, I made a mistake here. I wrote it so it worked correctly for the bottom half where we transition from bulk printing to an escape but then I messed up the top case.

## Validation Steps Performed
- [x] Manual validation of the exact command given in the bug report.
- [x] Wrote automated tests to validate both paths through the `ProcessString` loop that work with the `_run` variable.
2020-01-08 19:16:49 +00:00
Michael Niksa d711d731d7
Apply audit mode to TerminalConnection/Core/Settings and WinCon… (#4016)
## Summary of the Pull Request
- Enables auditing of some Terminal libraries (Connection, Core, Settings)
- Also audit WinConPTY.LIB since Connection depends on it

## PR Checklist
* [x] Rolls audit out to more things
* [x] I work here
* [x] Tests should still pass
* [x] Am core contributor

## Detailed Description of the Pull Request / Additional comments
This is turning on the auditing of these projects (as enabled by the heavier lifting in the other refactor) and then cleaning up the remaining warnings.

## Validation Steps Performed
- [x] Built it
- [x] Ran the tests
2020-01-03 10:44:27 -08:00
Michael Niksa 322989d017 Apply audit mode to TerminalInput/Adapter/Parser libraries (#4005)
<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request
- Enables auditing of Virtual Terminal libraries (input, adapter, parser)

<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [x] Rolls audit out to more things
* [x] I work here
* [x] Tests should still pass
* [x] Am core contributor
* [x] Closes #3957

<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments
This is turning on the auditing of these projects (as enabled by the heavier lifting in the other refactor) and then cleaning up the remaining warnings.

<!-- Describe how you validated the behavior. Add automated tests wherever possible, but list manual validation steps taken as well -->
## Validation Steps Performed
- [x] Built it
- [x] Ran the tests
2020-01-03 14:25:21 +00:00
Michael Niksa 6f667f48ae
Make the terminal parser/adapter and related classes use modern… (#3956)
## Summary of the Pull Request
Refactors parsing/adapting libraries and consumers to use safer and/or more consistent mechanisms for passing information.

## PR Checklist
* [x] I work here
* [x] Tests still pass
* [x] Am a core contributor.

## Detailed Description of the Pull Request / Additional comments
This is in support of hopefully turning audit mode on to more projects. If I turned it on, it would immediately complain about certain classes of issues like pointer and size, pointer math, etc. The changes in this refactoring will eliminate those off the top.

Additionally, this has caught a bunch of comments all over the VT classes that weren't updated to match the parameters lists.

Additionally, this has caught a handful of member variables on classes that were completely unused (and now gone).

Additionally, I'm killing almost all hungarian and shortening variable names. I'm only really leaving 'p' for pointers.

Additionally, this is vaguely in support of a future where we can have "infinite scrollback" in that I'm moving things to size_t across the board. I know it's a bit of a memory cost, but all the casting and moving between types is error prone and unfun to save a couple bytes.

## Validation Steps Performed
- [x] build it
- [x] run all the tests
- [x] everyone looked real hard at it
2019-12-19 14:12:53 -08:00
Dustin L. Howett (MSFT) dd2fbef39d
Move the VT parser's parse state into instanced storage (#3110)
The VT parser used to be keeping a boolean used to determine whether it
was in bulk or single-character parse mode in a function-level static.
That turned out to not be great.

Fixes #3108; fixes #3073.
2019-10-09 11:01:15 -07:00
Mike Griese 53c81a08f8
Return to ground when we flush the last char (#2823)
## Summary of the Pull Request
The InputStateMachineEngine was incorrectly not returning to the ground state after flushing the last sequence. That means that something like alt+backspace would leave us in the Escape state, not the ground state. This makes sure we return to ground.

Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was originally used for a theoretical time when we only open-sourced the parser. It's unnecessary now, and we can get rid of it.

Also includes a small patch to bcz.cmd, to make sure bx works with projects with a space in their name.

<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [x] Closes #2746
* [x] I work here
* [x] Tests added/passed
* [n/a] Requires documentation to be updated


<hr>

* Return to ground when we flush the last char

  The InputStateMachineEngine was incorrectly not returning to the ground state
  after flushing the last sequence. That means that something like alt+backspace
  would leave us in the Escape state, not the ground state. This makes sure we
  return to ground.

  Fixes #2746.

  Additionally removes the "Parser.UnitTests-common.vcxproj" file, which was
  originally used for a theoretical time when we only open-sourced the parser.
  It's unnecessary now, and we can get rid of it.

  Also includes a small patch to bcz.cmd, to make sure bx works with projects
  with a space in their name.

* Update src/terminal/parser/stateMachine.cpp

Co-Authored-By: Dustin L. Howett (MSFT) <duhowett@microsoft.com>

* add the comment @miniksa wanted
2019-10-04 10:47:39 -05:00
Mike Griese 94bcbb9204
The InputStateMachine should dispatch Intermediate characters (#1267)
The OutputStateMachine needs to collect "Intermediate" characters to be able to call [`Designate G0 Character Set`](https://invisible-island.net/xterm/ctlseqs/ctlseqs.html#h2-Controls-beginning-with-ESC) (as well as other sequences we don't yet support).

However, the InputStateMachine used by conpty to process input should _not_ collect these characters. The input engine uses `\x1b` as an indicator that a key was pressed with `Alt`. For keys like `/`, we want to dispatch the key immediately, instead of collecting it and leaving us in the Escape state.
2019-06-14 14:48:12 -05:00
adiviness 9b92986b49
add clang-format conf to the project, format the c++ code (#1141) 2019-06-11 13:27:09 -07:00
James Holderness 19dbec8c33 Support any number of leading zeros in VT parameter values (#1191)
* Support any number of leading zeros in VT parameter values.

* Add unit tests for leading zeros in VT parameter values.
2019-06-10 11:59:30 -05:00
Dustin Howett d4d59fa339 Initial release of the Windows Terminal source code
This commit introduces all of the Windows Terminal and Console Host source,
under the MIT license.
2019-05-02 15:29:04 -07:00