Commit graph

12 commits

Author SHA1 Message Date
Dustin L. Howett 1df3182865
Fully regenerate CodepointWidthDetector from Unicode 13.0 (#8035)
This commit also adds an override UCD and migrates all of the overrides
from GetQuickCharWidth into it.

GetQuickCharWidth
-----------------

The removal of overrides from GQCW reduces the number of comparisons
required for looking up a single character's width from 41 (32
individual ranged comparisons from GQCW + 8+1 from the binary search in
CPWD) to 11 (2 from GQCW, 8+1 from CPWD).

GQCW also incorrectly marked 67 reserved codepoints as `Wide` when they
should have been `Narrow`.

The codepoints whose definitions have changed from `Wide` to `Narrow` are:

```
2E9A 2EF4 2EF5 2EF6 2EF7 2EF8 2EF9 2EFA 2EFB 2EFC 2EFD 2EFE 2EFF 2FD6
2FD7 2FD8 2FD9 2FDA 2FDB 2FDC 2FDD 2FDE 2FDF 2FE0 2FE1 2FE2 2FE3 2FE4
2FE5 2FE6 2FE7 2FE8 2FE9 2FEA 2FEB 2FEC 2FED 2FEE 2FEF 2FFC 2FFD 2FFE
2FFF 31E4 31E5 31E6 31E7 31E8 31E9 31EA 31EB 31EC 31ED 31EE 31EF 321F
A48D A48E A48F FE1A FE1B FE1C FE1D FE1E FE1F FE53 FE67
```

All of them are reserved, but those reserved regions are marked as narrow
in the UCD.

This change also offers us the chance to document exactly why we're
overriding a specific character range. Comments from the override
document will be copied to the generated CPWD table.

New in Unicode 13.0
------------------

Some widths have changed due to previously-reserved characters becoming
_used_ such as U+32FF SQUARE ERA NAME REIWA, the Tangut components
756-768, the entire Khitan Small Script character set, and the Tangut
Ideographs.

A number of the changes in this diff are due to better/worse comment
tracking and the removal of the Emoji/EPres comments. The script once
mistakenly applied comments to packed regions (and it has been updated
to not do so.)

Validation
----------

I build a test application that compared codepoints 0-FFFF for GQCW
against their new registered widths.
2020-10-27 17:36:28 +00:00
Dustin L. Howett 4aecbf3833
Clear the last error before calling Mb2Wc in ConvertToW (#7391)
When the console functional tests are running on OneCoreUAP, the
newly-introduced (65bd4e327, #4309) FillOutputCharacterA tests will
actually fail because of radio interference on the return value of GLE.

Fixes MSFT-28163465
2020-08-25 17:17:21 +00:00
Leon Liang 7ae34336da
Make most emojis full-width (#5795)
The table that we refer to in `CodepointWidthDetector.cpp` to determine
whether or not a codepoint should be rendered as Wide vs Narrow was
based off EastAsianWidth[1].  If a codepoint wasn't included in this
table, they're considered Narrow. Many emojis aren't specified in the
EAW list, so this PR supplements our table with emoji codepoints from
emoji-data[2] in order to render most, if not all, emojis as full-width. 

There are certain codepoints I've added to the comments (in case we want
to add them officially to the table in the future) that Microsoft
decided to give an emoji presentation even if it's specified as
Narrow/Ambiguous in the EAW list and are _not_ specified in the Unicode
emoji list. These include all of the Mahjong Tiles block, different
direction pencils (✎✐), different pointing index fingers (☜, ☞) among
others. I have no idea if I've captured all of them, as I don't know of
an easy way to detect which are Microsoft specific emojis.

## Validation Steps Performed
I have looked at so many emojis that I dream emoji.

These screenshots aren't encompassing _all_ emoji but I've tried to grab
a couple from all across the codepoint ranges:

Before:
![before](https://user-images.githubusercontent.com/57155886/81445092-2051a980-912d-11ea-9739-c9f588da407d.png)

After:
![after](https://user-images.githubusercontent.com/57155886/81445107-2778b780-912d-11ea-9615-676c2150e798.png)

[1] http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
[2] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt

Closes #900
2020-05-08 22:31:09 +00:00
Michael Kitzan 65bd4e327c
Fix FillConsoleOutputCharacterA crash (#4309)
## Summary of the Pull Request
Despite being specified as `noexcept`, `FillConsoleOutputCharacterA` emits an exception when a call to `ConvetToW` is made with an argument character which can't be converted. This PR fixes this throw, by wrapping `ConvertToW` in a try-catch_return.

## PR Checklist
* [x] Closes #4258
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [x] Tests added/passed: thanks @miniksa 

## Detailed Description of the Pull Request / Additional comments
Following the semantics of other `FillConsoleOutputCharacter*` the output param `cellsModified` is set to `0`. The try-catch_return is also what other functions of this family perform in case of errors.

## Validation Steps Performed
Original repro no longer crashes.
2020-02-10 14:09:08 -08:00
Josh Soref a13ccfd0f5
Fix a bunch of spelling errors across the project (#4295)
Generated by https://github.com/jsoref/spelling `f`; to maintain your repo, please consider `fchurn`

I generally try to ignore upstream bits. I've accidentally included some items from the `deps/` directory. I expect someone will give me a list of items to drop, I'm happy to drop whole files/directories, or to split the PR into multiple items (E.g. comments/locals/public).

Closes #4294
2020-02-10 20:40:01 +00:00
Dustin L. Howett (MSFT) 9e5792ba51 Always use a VK in MapVirtualKeyW(..., MAPVK_VK_TO_VSC) (#3199)
Fixes #2873.
2019-10-17 14:58:41 -07:00
Michael Niksa 18bacfe973 A few PR comments. A constexpr here, a misleading comment there, and an extraneous local. 2019-09-09 16:01:28 -07:00
Michael Niksa 9678dd894c C26414, don't use smart pointers for locals 2019-09-03 11:27:43 -07:00
Michael Niksa 65dec36cb1 C26446, Use .at instead of array indices 2019-08-29 11:05:32 -07:00
adiviness 9b92986b49
add clang-format conf to the project, format the c++ code (#1141) 2019-06-11 13:27:09 -07:00
Michael Niksa 87e85603b9 Merged PR 3215853: Fix spacing/layout for block characters and many retroactively-recategorized emoji (and more!)
This encompasses a handful of problems with column counting.

The Terminal project didn't set a fallback column counter. Oops. I've fixed this to use the `DxEngine` as the fallback.

The `DxEngine` didn't implement its fallback method. Oops. I've fixed this to use the `CustomTextLayout` to figure out the advances based on the same font and fallback pattern as the real final layout, just without "rounding" it into cells yet.
- `CustomTextLayout` has been updated to move the advance-correction into a separate phase from glyph shaping. Previously, we corrected the advances to nice round cell counts during shaping, which is fine for drawing, but hard for column count analysis.
- Now that there are separate phases, an `Analyze` method was added to the `CustomTextLayout` which just performs the text analysis steps and the glyph shaping, but no advance correction to column boundaries nor actual drawing.

I've taken the caching code that I was working on to improve chafa, and I've brought it into this. Now that we're doing a lot of fallback and heavy lifting in terms of analysis via the layout, we should cache the results until the font changes.

I've adjusted how column counting is done overall. It's always been in these phases:
1. We used a quick-lookup of ranges of characters we knew to rapidly decide `Narrow`, `Wide` or `Invalid` (a.k.a. "I dunno")
2. If it was `Invalid`, we consulted a table based off of the Unicode standard that has either `Narrow`, `Wide`, or `Ambiguous` as a result.
3. If it's still `Ambiguous`, we consult a render engine fallback (usually GDI or now DX) to see how many columns it would take.
4. If we still don't know, then it's `Wide` to be safe.
- I've added an additional flow here. The quick-lookup can now return `Ambiguous` off the bat for some glyph characters in the x2000-x3000 range that used to just be simple shapes but have been retroactively recategorized as emoji and are frequently now using full width color glyphs.
- This new state causes the lookup to go immediately to the render engine if it is available instead of consulting the Unicode standard table first because the half/fullwidth table doesn't appear to have been updated for this nuance to reclass these characters as ambiguous, but we'd like to keep that table as a "generated from the spec" sort of table and keep our exceptions in the "quick lookup" function.

I have confirmed the following things "just work" now:
- The windows logo flag from the demo. (💖🌌😊)
- The dotted chart on the side of crossterm demo (•)
- The powerline characters that make arrows with the Consolas patched font (██)
- An accented é
- The warning and checkmark symbols appearing same size as the X. (✔⚠🔥)

Related work items: #21167256, #21237515, #21243859, #21274645, #21296827
2019-05-02 15:29:10 -07:00
Dustin Howett d4d59fa339 Initial release of the Windows Terminal source code
This commit introduces all of the Windows Terminal and Console Host source,
under the MIT license.
2019-05-02 15:29:04 -07:00