maxmustermann/terminal

Author	SHA1	Message	Date
Dustin L. Howett	1df3182865	Fully regenerate CodepointWidthDetector from Unicode 13.0 (#8035 ) This commit also adds an override UCD and migrates all of the overrides from GetQuickCharWidth into it. GetQuickCharWidth ----------------- The removal of overrides from GQCW reduces the number of comparisons required for looking up a single character's width from 41 (32 individual ranged comparisons from GQCW + 8+1 from the binary search in CPWD) to 11 (2 from GQCW, 8+1 from CPWD). GQCW also incorrectly marked 67 reserved codepoints as `Wide` when they should have been `Narrow`. The codepoints whose definitions have changed from `Wide` to `Narrow` are: ``` 2E9A 2EF4 2EF5 2EF6 2EF7 2EF8 2EF9 2EFA 2EFB 2EFC 2EFD 2EFE 2EFF 2FD6 2FD7 2FD8 2FD9 2FDA 2FDB 2FDC 2FDD 2FDE 2FDF 2FE0 2FE1 2FE2 2FE3 2FE4 2FE5 2FE6 2FE7 2FE8 2FE9 2FEA 2FEB 2FEC 2FED 2FEE 2FEF 2FFC 2FFD 2FFE 2FFF 31E4 31E5 31E6 31E7 31E8 31E9 31EA 31EB 31EC 31ED 31EE 31EF 321F A48D A48E A48F FE1A FE1B FE1C FE1D FE1E FE1F FE53 FE67 ``` All of them are reserved, but those reserved regions are marked as narrow in the UCD. This change also offers us the chance to document exactly why we're overriding a specific character range. Comments from the override document will be copied to the generated CPWD table. New in Unicode 13.0 ------------------ Some widths have changed due to previously-reserved characters becoming _used_ such as U+32FF SQUARE ERA NAME REIWA, the Tangut components 756-768, the entire Khitan Small Script character set, and the Tangut Ideographs. A number of the changes in this diff are due to better/worse comment tracking and the removal of the Emoji/EPres comments. The script once mistakenly applied comments to packed regions (and it has been updated to not do so.) Validation ---------- I build a test application that compared codepoints 0-FFFF for GQCW against their new registered widths.	2020-10-27 17:36:28 +00:00
Dustin L. Howett (MSFT)	ba1a298d6b	Partially regenerate codepoint widths from Emoji 13.0 (#5934 ) This removes all glyphs from the emoji list that do not default to "emoji presentation" (EPres). It removes all local overrides, but retains the comments about the emoji we left out that are Microsoft-specific. This brings us fully in line with the most popular Terminals on OS X, except that we squash our emoji down to fit in one cell and they let them hang over the edges and damage other characters. Oh well. ## Detailed Description of the Pull Request / Additional comments Late Friday evening, I tested my emoji test file on iTerm2. In so doing, I realized that @j4james and @leonMSFT were right the entire time in #5914: Emoji that require `U+FE0F` must not be double-width by default. I finally banged up a powershell script that parses the UCD and emits a codepoint width table. Once checked in, this will be definitive. Refs #900, #5914. Fixes #5941.	2020-05-17 13:32:43 -07:00
Dustin L. Howett (MSFT)	c39f9c6626	CodepointWidthDetector: reclassify U+25FB, U+25FC as Narrow (#5914 ) This seems to be in line with the emoji-sequences table in the latest version of the Unicode standard: those glyphs require U+FE0F to activate their emoji presentation. Since we don't support composing U+FE0F, we should not present them as emoji by default. Fixes #5910. Yes, I hate this.	2020-05-14 23:49:08 +00:00
Leon Liang	cf62922ad8	Revert some emoji back to narrow width A couple of codepoints, namely the card suites, male and female signs, and white and black smiling faces were changed to have a two-column width as part of #5795 since they were specified as emoji in Unicode's emoji list v13.0[1]. These particular glyphs also show up in some of the most fundamental code pages, such as CP437[2] and WGL4[3]. We should not be touching the width of the glyphs in these codepages, as suddenly changing a long-time-running narrow glyph to use two-columns all of a sudden will surely break (and has already broken) things. [1] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt [2] https://en.wikipedia.org/wiki/Code_page_437 [3] https://en.wikipedia.org/wiki/Windows_Glyph_List_4 Closes #5822	2020-05-12 19:38:11 +00:00
Leon Liang	7ae34336da	Make most emojis full-width (#5795 ) The table that we refer to in `CodepointWidthDetector.cpp` to determine whether or not a codepoint should be rendered as Wide vs Narrow was based off EastAsianWidth[1]. If a codepoint wasn't included in this table, they're considered Narrow. Many emojis aren't specified in the EAW list, so this PR supplements our table with emoji codepoints from emoji-data[2] in order to render most, if not all, emojis as full-width. There are certain codepoints I've added to the comments (in case we want to add them officially to the table in the future) that Microsoft decided to give an emoji presentation even if it's specified as Narrow/Ambiguous in the EAW list and are _not_ specified in the Unicode emoji list. These include all of the Mahjong Tiles block, different direction pencils (✎✐), different pointing index fingers (☜, ☞) among others. I have no idea if I've captured all of them, as I don't know of an easy way to detect which are Microsoft specific emojis. ## Validation Steps Performed I have looked at so many emojis that I dream emoji. These screenshots aren't encompassing _all_ emoji but I've tried to grab a couple from all across the codepoint ranges: Before: ![before](https://user-images.githubusercontent.com/57155886/81445092-2051a980-912d-11ea-9739-c9f588da407d.png) After: ![after](https://user-images.githubusercontent.com/57155886/81445107-2778b780-912d-11ea-9615-676c2150e798.png) [1] http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt [2] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt Closes #900	2020-05-08 22:31:09 +00:00
Chester Liu	4f8acb4b9f	Make CodepointWidthDetector::GetWidth faster (#3727 ) This is a subset of #3578 which I think is harmless and the first step towards making things right. References #3546 #3578 ## Detailed Description of the Pull Request / Additional comments For more robust Unicode support, `CodepointWidthDetector` should provide concrete width information rather than a simple boolean of `IsWide`. Currently only `IsWide` is widely used and optimized using quick lookup table and fallback cache. This PR moves those optimization into `GetWidth`. ## Validation Steps Performed API remains unchanged. Things are not broken.	2020-04-04 00:56:22 +00:00
Michael Niksa	6735311fc9	Suppress last two errors (C26455 default constructor throw in DxEngine because it's due for refactoring soon anyway & C26444 custom construction/destruction on OutputCellIterator because I can't see what's going on and it needs more investigation and shouldn't hold this up). Also run codeformat.	2019-09-03 16:18:19 -07:00
Michael Niksa	87f5852a72	Define actual constructor for CodepointWidthDetector as default isn't cutting it.	2019-09-03 15:03:54 -07:00
Michael Niksa	4f1157c044	C26447,C26440 - is noexcept but can throw or doesn't throw but not noexcept	2019-08-29 15:23:07 -07:00
Michael Niksa	b33a59816e	C26496, mark const if it's never written after creation	2019-08-29 11:27:39 -07:00
Dustin L. Howett (MSFT)	16e1e29a12	Replace CodepointWidthDetector's runtime table with a static one (#2368 ) This commit replaces CodepointWidthDetector's dynamically-generated map with a static constexpr one that's compiled into the binary. It also almost totally removes the notion of an `Invalid` width. We definitely had gaps in our character coverage where we'd report a character as invalid, but we'd then flatten that down to `Narrow` when asked. By combining the not-present state and the narrow state, we get to save a significant chunk of data. I've tested this by feeding it all 0x10FFFF codepoints (and then some) and making sure they 100% match the old code's outputs. \|------------------------------\|---------------\|----------------\| \| Metric \| Then \| Now \| \|------------------------------\|---------------\|----------------\| \| disk space \| 56k (`.text`) \| 3k (`.rdata`) \| \| runtime memory (allocations) \| 1088 \| 0 \| \| runtime memory (bytes) \| 51k \| ~0 \| \| memory behavior \| not shared \| fully shared \| \| lookup time \| ~31ns \| ~9ns \| \| first hit penalty \| ~170000ns \| 0ns \| \| lines of code \| 1088 \| 285 \| \| clarity \| extreme \| slightly worse \| \|------------------------------\|---------------\|----------------\| I also took a moment and cleaned up a stray boolean that we didn't need.	2019-08-16 10:54:17 -07:00
Michael Niksa	87e85603b9	Merged PR 3215853: Fix spacing/layout for block characters and many retroactively-recategorized emoji (and more!) This encompasses a handful of problems with column counting. The Terminal project didn't set a fallback column counter. Oops. I've fixed this to use the `DxEngine` as the fallback. The `DxEngine` didn't implement its fallback method. Oops. I've fixed this to use the `CustomTextLayout` to figure out the advances based on the same font and fallback pattern as the real final layout, just without "rounding" it into cells yet. - `CustomTextLayout` has been updated to move the advance-correction into a separate phase from glyph shaping. Previously, we corrected the advances to nice round cell counts during shaping, which is fine for drawing, but hard for column count analysis. - Now that there are separate phases, an `Analyze` method was added to the `CustomTextLayout` which just performs the text analysis steps and the glyph shaping, but no advance correction to column boundaries nor actual drawing. I've taken the caching code that I was working on to improve chafa, and I've brought it into this. Now that we're doing a lot of fallback and heavy lifting in terms of analysis via the layout, we should cache the results until the font changes. I've adjusted how column counting is done overall. It's always been in these phases: 1. We used a quick-lookup of ranges of characters we knew to rapidly decide `Narrow`, `Wide` or `Invalid` (a.k.a. "I dunno") 2. If it was `Invalid`, we consulted a table based off of the Unicode standard that has either `Narrow`, `Wide`, or `Ambiguous` as a result. 3. If it's still `Ambiguous`, we consult a render engine fallback (usually GDI or now DX) to see how many columns it would take. 4. If we still don't know, then it's `Wide` to be safe. - I've added an additional flow here. The quick-lookup can now return `Ambiguous` off the bat for some glyph characters in the x2000-x3000 range that used to just be simple shapes but have been retroactively recategorized as emoji and are frequently now using full width color glyphs. - This new state causes the lookup to go immediately to the render engine if it is available instead of consulting the Unicode standard table first because the half/fullwidth table doesn't appear to have been updated for this nuance to reclass these characters as ambiguous, but we'd like to keep that table as a "generated from the spec" sort of table and keep our exceptions in the "quick lookup" function. I have confirmed the following things "just work" now: - The windows logo flag from the demo. (⚫⚪💖✅🌌😊) - The dotted chart on the side of crossterm demo (•) - The powerline characters that make arrows with the Consolas patched font (██) - An accented é - The warning and checkmark symbols appearing same size as the X. (✔⚠🔥) Related work items: #21167256, #21237515, #21243859, #21274645, #21296827	2019-05-02 15:29:10 -07:00
Dustin Howett	d4d59fa339	Initial release of the Windows Terminal source code This commit introduces all of the Windows Terminal and Console Host source, under the MIT license.	2019-05-02 15:29:04 -07:00

13 commits