terminal/.github/actions/spell-check/expect
Dustin L. Howett eccfb537ad
tools: add a powershell script to generate CPWD from the UCD (#5946)
This commit introduces Generate-CodepointWidthsFromUCD, a powershell
(7+) script that will parse a UCD XML database in the UAX 42 format from
https://www.unicode.org/Public/UCD/latest/ucdxml/ and generate
CodepointWidthDetector's giant width array.

By default, it will emit one UnicodeRange for every range of non-narrow
glyphs with a different Width + Emoji + Emoji Presentation class;
however, it can be run in "packing" and "full" mode.

* Packing mode: ignore the width/emoji/pres class and combine adjacent
  runs that CPWD will treat the same.
     * This is for optimizing the number of individual ranges emitted
       into code.
* Full mode: include narrow codepoints (helpful for visualization)

It also supports overrides, provided in an XML document of the same format
as the UCD itself. Entries in the overrides files are applied after the
entire UCD is read and will replace any impacted ranges.

The output (when packing) looks like this:

```c++
// Generated by Generate-CodepointWidthsFromUCD -Pack:True -Full:False
// on 05/17/2020 02:47:55 (UTC) from Unicode 13.0.0.
// 66182 (0x10286) codepoints covered.
static constexpr std::array<UnicodeRange, 23> s_wideAndAmbiguousTable{
    UnicodeRange{ 0xa1, 0xa1, CodepointWidth::Ambiguous },
    UnicodeRange{ 0xa4, 0xa4, CodepointWidth::Ambiguous },
    UnicodeRange{ 0xa7, 0xa8, CodepointWidth::Ambiguous },
    .
    .
    .
    UnicodeRange{ 0x1f210, 0x1f23b, CodepointWidth::Wide },
    UnicodeRange{ 0x1f37e, 0x1f393, CodepointWidth::Wide },
    UnicodeRange{ 0x100000, 0x10fffd, CodepointWidth::Ambiguous },
};
```

The output (when overriding) looks like this:

```c++
// Generated by Generate-CodepointWidthsFromUCD.ps1 -Pack:True -Full:False -NoOverrides:False
// on 5/22/2020 11:17:39 PM (UTC) from Unicode 13.0.0.
// 321205 (0x4E6B5) codepoints covered.
// 240 (0xF0) codepoints overridden.
static constexpr std::array<UnicodeRange, 23> s_wideAndAmbiguousTable{
    UnicodeRange{ 0xa1, 0xa1, CodepointWidth::Ambiguous },
    ...
    UnicodeRange{ 0xfe20, 0xfe2f, CodepointWidth::Narrow }, // narrow combining ligatures (split into left/right halves, which take 2 columns together)
    ...
    UnicodeRange{ 0x100000, 0x10fffd, CodepointWidth::Ambiguous },
};
```
2020-06-03 07:16:14 +00:00
..
alphabet.txt ci: spelling: update to 0.0.16a; update advice (#5922) 2020-05-28 08:01:52 -05:00
expect.txt tools: add a powershell script to generate CPWD from the UCD (#5946) 2020-06-03 07:16:14 +00:00
README.md ci: spelling: update to 0.0.16a; update advice (#5922) 2020-05-28 08:01:52 -05:00
web.txt Add a context menu entry to "Open Windows Terminal here" (#6100) 2020-05-28 15:42:13 +00:00

The contents of each .txt file in this directory are merged together.

  • alphabet is a sample for alphabet related items
  • web is a sample for web/html related items
  • expect is the main list of expected items -- there is nothing particularly special about the file name (beyond the extension which is important).

These terms are things which temporarily exist in the project, but which aren't necessarily words.

If something is a word that could come and go, it probably belongs in a dictionary.