2020-05-28 17:42:13 +02:00
|
|
|
https://(?:(?:[-a-zA-Z0-9?&=]*\.|)microsoft\.com)/[-a-zA-Z0-9?&=_#\/.]*
|
2020-04-14 22:17:15 +02:00
|
|
|
https://aka\.ms/[-a-zA-Z0-9?&=\/_]*
|
Improve support for VT character sets (#4496)
This PR improves our VT character set support, enabling the [`SCS`]
escape sequences to designate into all four G-sets with both 94- and
96-character sets, and supports invoking those G-sets into both the GL
and GR areas of the code table, with [locking shifts] and [single
shifts]. It also adds [`DOCS`] sequences to switch between UTF-8 and the
ISO-2022 coding system (which is what the VT character sets require),
and adds support for a lot more characters sets, up to around the level
of a VT510.
[`SCS`]: https://vt100.net/docs/vt510-rm/SCS.html
[locking shifts]: https://vt100.net/docs/vt510-rm/LS.html
[single shifts]: https://vt100.net/docs/vt510-rm/SS.html
[`DOCS`]: https://en.wikipedia.org/wiki/ISO/IEC_2022#Interaction_with_other_coding_systems
## Detailed Description of the Pull Request / Additional comments
To make it easier for us to declare a bunch of character sets, I've made
a little `constexpr` class that can build up a mapping table from a base
character set (ASCII or Latin1), along with a collection of mappings for
the characters the deviate from the base set. Many of the character sets
are simple variations of ASCII, so they're easy to define this way.
This class then casts directly to a `wstring_view` which is how the
translation tables are represented in most of the code. We have an array
of four of these tables representing the four G-sets, two instances for
the active left and right tables, and one instance for the single shift
table.
Initially we had just one `DesignateCharset` method, which could select
the active character set. We now have two designate methods (for 94- and
96- character sets), and each takes a G-set number specifying the target
of the designation, and a pair of characters identifying the character
set that will be designated (at the higher VT levels, character sets are
often identified by more than one character).
There are then two new `LockingShift` methods to invoke these G-sets
into either the GL or GR area of the code table, and a `SingleShift`
method which invokes a G-set temporarily (for just the next character
that is output).
I should mention here that I had to make some changes to the state
machine to make these single shift sequences work. The problem is that
the input state machine treats `SS3` as the start of a control sequence,
while the output state machine needs it to be dispatched immediately
(it's literally the _Single Shift 3_ escape sequence). To make that
work, I've added a `ParseControlSequenceAfterSs3` callback in the
`IStateMachineEngine` interface to decide which behavior is appropriate.
When it comes to mapping a character, it's simply an array reference
into the appropriate `wstring_view` table. If the single shift table is
set, that takes preference. Otherwise the GL table is used for
characters in the range 0x20 to 0x7F, and the GR table for characters
0xA0 to 0xFF (technically some character sets will only map up to 0x7E
and 0xFE, but that's easily controlled by the length of the
`wstring_view`).
The `DEL` character is a bit of a special case. By default it's meant to
be ignored like the `NUL` character (it's essentially a time-fill
character). However, it's possible that it could be remapped to a
printable character in a 96-character set, so we need to check for that
after the translation. This is handled in the `AdaptDispatch::Print`
method, so it doesn't interfere with the primary `PrintString` code
path.
The biggest problem with this whole process, though, is that the GR
mappings only really make sense if you have access to the raw output,
but by the time the output gets to us, it would already have been
translated to Unicode by the active code page. And in the case of UTF-8,
the characters we eventually receive may originally have been composed
from two or more code points.
The way I've dealt with this was to disable the GR translations by
default, and then added support for a pair of ISO-2022 `DOCS` sequences,
which can switch the code page between UTF-8 and ISO-8859-1. When the
code page is ISO-8859-1, we're essentially receiving the raw output
bytes, so it's safe to enable the GR translations. This is not strictly
correct ISO-2022 behavior, and there are edge cases where it's not going
to work, but it's the best solution I could come up with.
## Validation Steps Performed
As a result of the `SS3` changes in the state machine engine, I've had
to move the existing `SS3` tests from the `OutputEngineTest` to the
`InputEngineTest`, otherwise they would now fail (technically they
should never have been output tests).
I've added no additional unit tests, but I have done a lot of manual
testing, and made sure we passed all the character set tests in Vttest
(at least for the character sets we currently support). Note that this
required a slightly hacked version of the app, since by default it
doesn't expose a lot of the test to low-level terminals, and we
currently identify as a VT100.
Closes #3377
Closes #3487
2020-06-04 21:40:15 +02:00
|
|
|
https://www\.itscj\.ipsj\.or\.jp/iso-ir/[-0-9]+\.pdf
|
|
|
|
https://www\.vt100\.net/docs/[-a-zA-Z0-9#_\/.]*
|
2020-05-04 23:47:29 +02:00
|
|
|
https://www.w3.org/[-a-zA-Z0-9?&=\/_#]*
|
ci: run spell check in CI, fix remaining issues (#4799)
This commit introduces a github action to check our spelling and fixes
the following misspelled words so that we come up green.
It also renames TfEditSes to TfEditSession, because Ses is not a word.
currently, excerpt, fallthrough, identified, occurred, propagate,
provided, rendered, resetting, separate, succeeded, successfully,
terminal, transferred, adheres, breaks, combining, preceded,
architecture, populated, previous, setter, visible, window, within,
appxmanifest, hyphen, control, offset, powerpoint, suppress, parsing,
prioritized, aforementioned, check in, build, filling, indices, layout,
mapping, trying, scroll, terabyte, vetoes, viewport, whose
2020-03-25 19:02:53 +01:00
|
|
|
https://(?:(?:www\.|)youtube\.com|youtu.be)/[-a-zA-Z0-9?&=]*
|
2021-11-23 16:21:42 +01:00
|
|
|
https://(?:[a-z-]+\.|)github(?:usercontent|)\.com/[-a-zA-Z0-9?%&=_\/.+]*
|
2021-01-22 06:11:11 +01:00
|
|
|
https://www.xfree86.org/[-a-zA-Z0-9?&=\/_#]*
|
2020-05-28 15:01:52 +02:00
|
|
|
[Pp]ublicKeyToken="?[0-9a-fA-F]{16}"?
|
|
|
|
(?:[{"]|UniqueIdentifier>)[0-9a-fA-F]{8}-(?:[0-9a-fA-F]{4}-){3}[0-9a-fA-F]{12}(?:[}"]|</UniqueIdentifier)
|
Improve support for VT character sets (#4496)
This PR improves our VT character set support, enabling the [`SCS`]
escape sequences to designate into all four G-sets with both 94- and
96-character sets, and supports invoking those G-sets into both the GL
and GR areas of the code table, with [locking shifts] and [single
shifts]. It also adds [`DOCS`] sequences to switch between UTF-8 and the
ISO-2022 coding system (which is what the VT character sets require),
and adds support for a lot more characters sets, up to around the level
of a VT510.
[`SCS`]: https://vt100.net/docs/vt510-rm/SCS.html
[locking shifts]: https://vt100.net/docs/vt510-rm/LS.html
[single shifts]: https://vt100.net/docs/vt510-rm/SS.html
[`DOCS`]: https://en.wikipedia.org/wiki/ISO/IEC_2022#Interaction_with_other_coding_systems
## Detailed Description of the Pull Request / Additional comments
To make it easier for us to declare a bunch of character sets, I've made
a little `constexpr` class that can build up a mapping table from a base
character set (ASCII or Latin1), along with a collection of mappings for
the characters the deviate from the base set. Many of the character sets
are simple variations of ASCII, so they're easy to define this way.
This class then casts directly to a `wstring_view` which is how the
translation tables are represented in most of the code. We have an array
of four of these tables representing the four G-sets, two instances for
the active left and right tables, and one instance for the single shift
table.
Initially we had just one `DesignateCharset` method, which could select
the active character set. We now have two designate methods (for 94- and
96- character sets), and each takes a G-set number specifying the target
of the designation, and a pair of characters identifying the character
set that will be designated (at the higher VT levels, character sets are
often identified by more than one character).
There are then two new `LockingShift` methods to invoke these G-sets
into either the GL or GR area of the code table, and a `SingleShift`
method which invokes a G-set temporarily (for just the next character
that is output).
I should mention here that I had to make some changes to the state
machine to make these single shift sequences work. The problem is that
the input state machine treats `SS3` as the start of a control sequence,
while the output state machine needs it to be dispatched immediately
(it's literally the _Single Shift 3_ escape sequence). To make that
work, I've added a `ParseControlSequenceAfterSs3` callback in the
`IStateMachineEngine` interface to decide which behavior is appropriate.
When it comes to mapping a character, it's simply an array reference
into the appropriate `wstring_view` table. If the single shift table is
set, that takes preference. Otherwise the GL table is used for
characters in the range 0x20 to 0x7F, and the GR table for characters
0xA0 to 0xFF (technically some character sets will only map up to 0x7E
and 0xFE, but that's easily controlled by the length of the
`wstring_view`).
The `DEL` character is a bit of a special case. By default it's meant to
be ignored like the `NUL` character (it's essentially a time-fill
character). However, it's possible that it could be remapped to a
printable character in a 96-character set, so we need to check for that
after the translation. This is handled in the `AdaptDispatch::Print`
method, so it doesn't interfere with the primary `PrintString` code
path.
The biggest problem with this whole process, though, is that the GR
mappings only really make sense if you have access to the raw output,
but by the time the output gets to us, it would already have been
translated to Unicode by the active code page. And in the case of UTF-8,
the characters we eventually receive may originally have been composed
from two or more code points.
The way I've dealt with this was to disable the GR translations by
default, and then added support for a pair of ISO-2022 `DOCS` sequences,
which can switch the code page between UTF-8 and ISO-8859-1. When the
code page is ISO-8859-1, we're essentially receiving the raw output
bytes, so it's safe to enable the GR translations. This is not strictly
correct ISO-2022 behavior, and there are edge cases where it's not going
to work, but it's the best solution I could come up with.
## Validation Steps Performed
As a result of the `SS3` changes in the state machine engine, I've had
to move the existing `SS3` tests from the `OutputEngineTest` to the
`InputEngineTest`, otherwise they would now fail (technically they
should never have been output tests).
I've added no additional unit tests, but I have done a lot of manual
testing, and made sure we passed all the character set tests in Vttest
(at least for the character sets we currently support). Note that this
required a slightly hacked version of the app, since by default it
doesn't expose a lot of the test to low-level terminals, and we
currently identify as a VT100.
Closes #3377
Closes #3487
2020-06-04 21:40:15 +02:00
|
|
|
(?:0[Xx]|\\x|U\+|#)[a-f0-9A-FGgRr]{2,}[Uu]?[Ll]{0,2}\b
|
2020-05-18 00:13:48 +02:00
|
|
|
microsoft/cascadia-code\@[0-9a-fA-F]{40}
|
ci: run spell check in CI, fix remaining issues (#4799)
This commit introduces a github action to check our spelling and fixes
the following misspelled words so that we come up green.
It also renames TfEditSes to TfEditSession, because Ses is not a word.
currently, excerpt, fallthrough, identified, occurred, propagate,
provided, rendered, resetting, separate, succeeded, successfully,
terminal, transferred, adheres, breaks, combining, preceded,
architecture, populated, previous, setter, visible, window, within,
appxmanifest, hyphen, control, offset, powerpoint, suppress, parsing,
prioritized, aforementioned, check in, build, filling, indices, layout,
mapping, trying, scroll, terabyte, vetoes, viewport, whose
2020-03-25 19:02:53 +01:00
|
|
|
\d+x\d+Logo
|
|
|
|
Scro\&ll
|
|
|
|
# selectionInput.cpp
|
|
|
|
:\\windows\\syste\b
|
Fix copying wrapped lines by implementing better scrolling (#5181)
Now that the Terminal is doing a better job of actually marking which
lines were and were not wrapped, we're not always copying lines as
"wrapped" when they should be. We're more correctly marking lines as not
wrapped, when previously we'd leave them marked wrapped.
The real problem is here in the `ScrollFrame` method - we'd manually
newline the cursor to make the terminal's viewport shift down to a new
line. If we had to scroll the viewport for a _wrapped_ line, this would
cause the Terminal to mark that line as broken, because conpty would
emit an extra `\n` that didn't actually exist.
This more correctly implements `ScrollFrame`. Now, well move where we
"thought" the cursor was, so when we get to the next `PaintBufferLine`,
if the cursor needs to newline for the next line, it'll newline, but if
we're in the middle of a wrapped line, we'll just keep printing the
wrapped line.
A couple follow up bugs were found to be caused by the same bad logic.
See #5039 and #5161 for more details on the investigations there.
## References
* #4741 RwR, which probably made this worse
* #5122, which I branched off of
* #1245, #357 - a pair of other conpty wrapped lines bugs
* #5228 - A followup issue for this PR
## PR Checklist
* [x] Closes #5113
* [x] Closes #5180 (by fixing DECRST 25)
* [x] Closes #5039
* [x] Closes #5161 (by ensuring we only `removeSpaces` on the actual
bottom line)
* [x] I work here
* [x] Tests added/passed
* [n/a] Requires documentation to be updated
## Validation Steps Performed
* Checked the cases from #1245, #357 to validate that they still work
* Added more and more tests for these scenarios, and then I added MORE
tests
* The entire team played with this in selfhost builds
2020-04-09 02:06:25 +02:00
|
|
|
TestUtils::VerifyExpectedString\(tb, L"[^"]+"
|
2020-06-30 03:55:40 +02:00
|
|
|
(?:hostSm|mach)\.ProcessString\(L"[^"]+"
|
ci: update to Spell check to 0.0.17a (#9014)
### Plurals and paste tenses
In the past, plurals `foo`+`s` and past tenses `foo`+`ed` were
automatically tolerated. This turned out to be a bad design choice on my
part.
The basic example is that `potatos` would sometimes be treated as a
mistake and sometimes not (depending on the presence of `potato`).
You can see in this PR, that this logic resulted in `Applys` being
accepted as a word along with `AppContainered` -- there's nothing
intrinsically wrong w/ the latter, but unfortunately in order to screen
out the former, my shortcut just couldn't stick around. This means that
the `dictionary`/`expect` files will grow perhaps by a tiny bit, but as
you can see, not really by much.
This is also why `thereses` (a user) was accepted as a word in the past
(therese is in the base dictionary, so `therese` + `s` was acceptable).
### Pull requests
When GitHub initially introduced GitHub Actions, the event for
`pull_request` was created without enough permission for a tool like
this to work properly. I worked around that by using the `schedule`
event. In 2020, they introduced a replacement event
`pull_request_target` which has enough permission. This means that I can
stop relying on the `schedule` event.
### Miscellaneous
* I've folded together some `expect/` files since now is as good a time
as any.
* I've included a hint about `excludes.txt` (I added a similar one for
our primary repo recently, and it came up this week in
`microsoft/terminal` -- @zadjii-msft)
* I've standardized on a default of `.github/actions/spelling` to make
the out of the box experience easier for new adopters, so I'm applying
that change here -- if you're attached to the old directory name,
specifying it is still supported. -- note the directory rename may
cause a merge conflict for people with open PRs and changes to the
contents, this shouldn't be a big problem.
2021-02-03 20:17:38 +01:00
|
|
|
\b([A-Za-z])\g{-1}{3,}\b
|
2020-10-15 02:29:10 +02:00
|
|
|
0x[0-9A-Za-z]+
|
2020-06-30 03:55:40 +02:00
|
|
|
Base64::s_(?:En|De)code\(L"[^"]+"
|
|
|
|
VERIFY_ARE_EQUAL\(L"[^"]+"
|
2020-10-16 04:02:59 +02:00
|
|
|
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789\+/"
|
2020-07-17 19:11:45 +02:00
|
|
|
std::memory_order_[\w]+
|
ci: spelling: update to v0.0.18 (#10035)
Co-authored-by: Josh Soref <jsoref@users.noreply.github.com>
<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request
Upgrade check-spelling to [v0.0.18](https://github.com/check-spelling/check-spelling/releases/tag/v0.0.18)
<!-- Other than the issue solved, is this relevant to any other issues/existing PRs? -->
## References
<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [ ] Closes #xxx
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [ ] Tests added/passed
* [ ] Documentation updated. If checked, please file a pull request on [our docs repo](https://github.com/MicrosoftDocs/terminal) and link it here: #xxx
* [ ] Schema updated.
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx
<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments
I've replaced the `dictionary` directory with `allow` and `reject`. When terminal got check-spelling, I didn't have a way to do `allow`/`reject` (but they were added a while ago). With this release, the bot will complain about items that are in user managed files that wouldn't be valid, this is mostly `-`s in dictionary files, but it also includes numbers `0`..`9` and `_`. If a specific token needs to be accepted but not its sub-elements, the item should be added to `patterns.txt` instead (`D2DERR_SHADER_COMPILE_FAILED` is an example).
With this version, check-spelling defaults to only considering tokens with at least 3 letters. It's possible to tune it back to 2 (or even 1), but in testing, the 2 character tokens have ended up not being worthwhile. (This can be [adjusted](https://github.com/check-spelling/check-spelling/wiki/Configuration#shortest_word) if it turns out that people manage to misspell two character tokens often enough to justify checking them.)
<!-- Describe how you validated the behavior. Add automated tests wherever possible, but list manual validation steps taken as well -->
## Validation Steps Performed
I ran a number of passes of the spell checker in https://github.com/check-spelling/terminal/actions (note: I tend to delete this repository, so this link may be dead at some point, and action run logs expire).
2021-05-14 15:28:37 +02:00
|
|
|
D2DERR_SHADER_COMPILE_FAILED
|
2021-06-11 01:48:54 +02:00
|
|
|
TIL_FEATURE_[0-9A-Z_]+
|
2021-09-30 00:03:05 +02:00
|
|
|
vcvars\w*
|