terminal/src/host/ut_host/CodepointWidthDetectorTests.cpp

102 lines
3.4 KiB
C++
Raw Normal View History

// Copyright (c) Microsoft Corporation.
// Licensed under the MIT license.
#include "precomp.h"
#include "WexTestClass.h"
#include "../../inc/consoletaeftemplates.hpp"
#include "CommonState.hpp"
#include "../types/inc/CodepointWidthDetector.hpp"
using namespace WEX::Logging;
static constexpr std::wstring_view emoji = L"\xD83E\xDD22"; // U+1F922 nauseated face
static constexpr std::wstring_view ambiguous = L"\x414"; // U+0414 cyrillic capital de
Merged PR 3215853: Fix spacing/layout for block characters and many retroactively-recategorized emoji (and more!) This encompasses a handful of problems with column counting. The Terminal project didn't set a fallback column counter. Oops. I've fixed this to use the `DxEngine` as the fallback. The `DxEngine` didn't implement its fallback method. Oops. I've fixed this to use the `CustomTextLayout` to figure out the advances based on the same font and fallback pattern as the real final layout, just without "rounding" it into cells yet. - `CustomTextLayout` has been updated to move the advance-correction into a separate phase from glyph shaping. Previously, we corrected the advances to nice round cell counts during shaping, which is fine for drawing, but hard for column count analysis. - Now that there are separate phases, an `Analyze` method was added to the `CustomTextLayout` which just performs the text analysis steps and the glyph shaping, but no advance correction to column boundaries nor actual drawing. I've taken the caching code that I was working on to improve chafa, and I've brought it into this. Now that we're doing a lot of fallback and heavy lifting in terms of analysis via the layout, we should cache the results until the font changes. I've adjusted how column counting is done overall. It's always been in these phases: 1. We used a quick-lookup of ranges of characters we knew to rapidly decide `Narrow`, `Wide` or `Invalid` (a.k.a. "I dunno") 2. If it was `Invalid`, we consulted a table based off of the Unicode standard that has either `Narrow`, `Wide`, or `Ambiguous` as a result. 3. If it's still `Ambiguous`, we consult a render engine fallback (usually GDI or now DX) to see how many columns it would take. 4. If we still don't know, then it's `Wide` to be safe. - I've added an additional flow here. The quick-lookup can now return `Ambiguous` off the bat for some glyph characters in the x2000-x3000 range that used to just be simple shapes but have been retroactively recategorized as emoji and are frequently now using full width color glyphs. - This new state causes the lookup to go immediately to the render engine if it is available instead of consulting the Unicode standard table first because the half/fullwidth table doesn't appear to have been updated for this nuance to reclass these characters as ambiguous, but we'd like to keep that table as a "generated from the spec" sort of table and keep our exceptions in the "quick lookup" function. I have confirmed the following things "just work" now: - The windows logo flag from the demo. (⚫⚪💖✅🌌😊) - The dotted chart on the side of crossterm demo (•) - The powerline characters that make arrows with the Consolas patched font (██) - An accented é - The warning and checkmark symbols appearing same size as the X. (✔⚠🔥) Related work items: #21167256, #21237515, #21243859, #21274645, #21296827
2019-05-02 01:13:53 +02:00
// codepoint and utf16 encoded string
static const std::vector<std::tuple<unsigned int, std::wstring, CodepointWidth>> testData = {
{ 0x7, L"\a", CodepointWidth::Narrow }, // BEL
{ 0x20, L" ", CodepointWidth::Narrow },
{ 0x39, L"9", CodepointWidth::Narrow },
{ 0x414, L"\x414", CodepointWidth::Ambiguous }, // U+0414 cyrillic capital de
{ 0x1104, L"\x1104", CodepointWidth::Wide }, // U+1104 hangul choseong ssangtikeut
{ 0x306A, L"\x306A", CodepointWidth::Wide }, // U+306A hiragana na
{ 0x30CA, L"\x30CA", CodepointWidth::Wide }, // U+30CA katakana na
{ 0x72D7, L"\x72D7", CodepointWidth::Wide }, // U+72D7
{ 0x1F47E, L"\xD83D\xDC7E", CodepointWidth::Wide }, // U+1F47E alien monster
{ 0x1F51C, L"\xD83D\xDD1C", CodepointWidth::Wide } // U+1F51C SOON
};
class CodepointWidthDetectorTests
{
TEST_CLASS(CodepointWidthDetectorTests);
TEST_METHOD(CanLookUpEmoji)
{
Merged PR 3215853: Fix spacing/layout for block characters and many retroactively-recategorized emoji (and more!) This encompasses a handful of problems with column counting. The Terminal project didn't set a fallback column counter. Oops. I've fixed this to use the `DxEngine` as the fallback. The `DxEngine` didn't implement its fallback method. Oops. I've fixed this to use the `CustomTextLayout` to figure out the advances based on the same font and fallback pattern as the real final layout, just without "rounding" it into cells yet. - `CustomTextLayout` has been updated to move the advance-correction into a separate phase from glyph shaping. Previously, we corrected the advances to nice round cell counts during shaping, which is fine for drawing, but hard for column count analysis. - Now that there are separate phases, an `Analyze` method was added to the `CustomTextLayout` which just performs the text analysis steps and the glyph shaping, but no advance correction to column boundaries nor actual drawing. I've taken the caching code that I was working on to improve chafa, and I've brought it into this. Now that we're doing a lot of fallback and heavy lifting in terms of analysis via the layout, we should cache the results until the font changes. I've adjusted how column counting is done overall. It's always been in these phases: 1. We used a quick-lookup of ranges of characters we knew to rapidly decide `Narrow`, `Wide` or `Invalid` (a.k.a. "I dunno") 2. If it was `Invalid`, we consulted a table based off of the Unicode standard that has either `Narrow`, `Wide`, or `Ambiguous` as a result. 3. If it's still `Ambiguous`, we consult a render engine fallback (usually GDI or now DX) to see how many columns it would take. 4. If we still don't know, then it's `Wide` to be safe. - I've added an additional flow here. The quick-lookup can now return `Ambiguous` off the bat for some glyph characters in the x2000-x3000 range that used to just be simple shapes but have been retroactively recategorized as emoji and are frequently now using full width color glyphs. - This new state causes the lookup to go immediately to the render engine if it is available instead of consulting the Unicode standard table first because the half/fullwidth table doesn't appear to have been updated for this nuance to reclass these characters as ambiguous, but we'd like to keep that table as a "generated from the spec" sort of table and keep our exceptions in the "quick lookup" function. I have confirmed the following things "just work" now: - The windows logo flag from the demo. (⚫⚪💖✅🌌😊) - The dotted chart on the side of crossterm demo (•) - The powerline characters that make arrows with the Consolas patched font (██) - An accented é - The warning and checkmark symbols appearing same size as the X. (✔⚠🔥) Related work items: #21167256, #21237515, #21243859, #21274645, #21296827
2019-05-02 01:13:53 +02:00
CodepointWidthDetector widthDetector;
VERIFY_IS_TRUE(widthDetector.IsWide(emoji));
}
TEST_METHOD(CanExtractCodepoint)
{
Merged PR 3215853: Fix spacing/layout for block characters and many retroactively-recategorized emoji (and more!) This encompasses a handful of problems with column counting. The Terminal project didn't set a fallback column counter. Oops. I've fixed this to use the `DxEngine` as the fallback. The `DxEngine` didn't implement its fallback method. Oops. I've fixed this to use the `CustomTextLayout` to figure out the advances based on the same font and fallback pattern as the real final layout, just without "rounding" it into cells yet. - `CustomTextLayout` has been updated to move the advance-correction into a separate phase from glyph shaping. Previously, we corrected the advances to nice round cell counts during shaping, which is fine for drawing, but hard for column count analysis. - Now that there are separate phases, an `Analyze` method was added to the `CustomTextLayout` which just performs the text analysis steps and the glyph shaping, but no advance correction to column boundaries nor actual drawing. I've taken the caching code that I was working on to improve chafa, and I've brought it into this. Now that we're doing a lot of fallback and heavy lifting in terms of analysis via the layout, we should cache the results until the font changes. I've adjusted how column counting is done overall. It's always been in these phases: 1. We used a quick-lookup of ranges of characters we knew to rapidly decide `Narrow`, `Wide` or `Invalid` (a.k.a. "I dunno") 2. If it was `Invalid`, we consulted a table based off of the Unicode standard that has either `Narrow`, `Wide`, or `Ambiguous` as a result. 3. If it's still `Ambiguous`, we consult a render engine fallback (usually GDI or now DX) to see how many columns it would take. 4. If we still don't know, then it's `Wide` to be safe. - I've added an additional flow here. The quick-lookup can now return `Ambiguous` off the bat for some glyph characters in the x2000-x3000 range that used to just be simple shapes but have been retroactively recategorized as emoji and are frequently now using full width color glyphs. - This new state causes the lookup to go immediately to the render engine if it is available instead of consulting the Unicode standard table first because the half/fullwidth table doesn't appear to have been updated for this nuance to reclass these characters as ambiguous, but we'd like to keep that table as a "generated from the spec" sort of table and keep our exceptions in the "quick lookup" function. I have confirmed the following things "just work" now: - The windows logo flag from the demo. (⚫⚪💖✅🌌😊) - The dotted chart on the side of crossterm demo (•) - The powerline characters that make arrows with the Consolas patched font (██) - An accented é - The warning and checkmark symbols appearing same size as the X. (✔⚠🔥) Related work items: #21167256, #21237515, #21243859, #21274645, #21296827
2019-05-02 01:13:53 +02:00
CodepointWidthDetector widthDetector;
for (const auto& data : testData)
{
const auto& expected = std::get<0>(data);
const auto& wstr = std::get<1>(data);
const auto result = widthDetector._extractCodepoint({ wstr.c_str(), wstr.size() });
VERIFY_ARE_EQUAL(result, expected);
}
}
TEST_METHOD(CanGetWidths)
{
Merged PR 3215853: Fix spacing/layout for block characters and many retroactively-recategorized emoji (and more!) This encompasses a handful of problems with column counting. The Terminal project didn't set a fallback column counter. Oops. I've fixed this to use the `DxEngine` as the fallback. The `DxEngine` didn't implement its fallback method. Oops. I've fixed this to use the `CustomTextLayout` to figure out the advances based on the same font and fallback pattern as the real final layout, just without "rounding" it into cells yet. - `CustomTextLayout` has been updated to move the advance-correction into a separate phase from glyph shaping. Previously, we corrected the advances to nice round cell counts during shaping, which is fine for drawing, but hard for column count analysis. - Now that there are separate phases, an `Analyze` method was added to the `CustomTextLayout` which just performs the text analysis steps and the glyph shaping, but no advance correction to column boundaries nor actual drawing. I've taken the caching code that I was working on to improve chafa, and I've brought it into this. Now that we're doing a lot of fallback and heavy lifting in terms of analysis via the layout, we should cache the results until the font changes. I've adjusted how column counting is done overall. It's always been in these phases: 1. We used a quick-lookup of ranges of characters we knew to rapidly decide `Narrow`, `Wide` or `Invalid` (a.k.a. "I dunno") 2. If it was `Invalid`, we consulted a table based off of the Unicode standard that has either `Narrow`, `Wide`, or `Ambiguous` as a result. 3. If it's still `Ambiguous`, we consult a render engine fallback (usually GDI or now DX) to see how many columns it would take. 4. If we still don't know, then it's `Wide` to be safe. - I've added an additional flow here. The quick-lookup can now return `Ambiguous` off the bat for some glyph characters in the x2000-x3000 range that used to just be simple shapes but have been retroactively recategorized as emoji and are frequently now using full width color glyphs. - This new state causes the lookup to go immediately to the render engine if it is available instead of consulting the Unicode standard table first because the half/fullwidth table doesn't appear to have been updated for this nuance to reclass these characters as ambiguous, but we'd like to keep that table as a "generated from the spec" sort of table and keep our exceptions in the "quick lookup" function. I have confirmed the following things "just work" now: - The windows logo flag from the demo. (⚫⚪💖✅🌌😊) - The dotted chart on the side of crossterm demo (•) - The powerline characters that make arrows with the Consolas patched font (██) - An accented é - The warning and checkmark symbols appearing same size as the X. (✔⚠🔥) Related work items: #21167256, #21237515, #21243859, #21274645, #21296827
2019-05-02 01:13:53 +02:00
CodepointWidthDetector widthDetector;
for (const auto& data : testData)
{
const auto& expected = std::get<2>(data);
const auto& wstr = std::get<1>(data);
const auto result = widthDetector.GetWidth({ wstr.c_str(), wstr.size() });
VERIFY_ARE_EQUAL(result, expected);
}
}
Merged PR 3215853: Fix spacing/layout for block characters and many retroactively-recategorized emoji (and more!) This encompasses a handful of problems with column counting. The Terminal project didn't set a fallback column counter. Oops. I've fixed this to use the `DxEngine` as the fallback. The `DxEngine` didn't implement its fallback method. Oops. I've fixed this to use the `CustomTextLayout` to figure out the advances based on the same font and fallback pattern as the real final layout, just without "rounding" it into cells yet. - `CustomTextLayout` has been updated to move the advance-correction into a separate phase from glyph shaping. Previously, we corrected the advances to nice round cell counts during shaping, which is fine for drawing, but hard for column count analysis. - Now that there are separate phases, an `Analyze` method was added to the `CustomTextLayout` which just performs the text analysis steps and the glyph shaping, but no advance correction to column boundaries nor actual drawing. I've taken the caching code that I was working on to improve chafa, and I've brought it into this. Now that we're doing a lot of fallback and heavy lifting in terms of analysis via the layout, we should cache the results until the font changes. I've adjusted how column counting is done overall. It's always been in these phases: 1. We used a quick-lookup of ranges of characters we knew to rapidly decide `Narrow`, `Wide` or `Invalid` (a.k.a. "I dunno") 2. If it was `Invalid`, we consulted a table based off of the Unicode standard that has either `Narrow`, `Wide`, or `Ambiguous` as a result. 3. If it's still `Ambiguous`, we consult a render engine fallback (usually GDI or now DX) to see how many columns it would take. 4. If we still don't know, then it's `Wide` to be safe. - I've added an additional flow here. The quick-lookup can now return `Ambiguous` off the bat for some glyph characters in the x2000-x3000 range that used to just be simple shapes but have been retroactively recategorized as emoji and are frequently now using full width color glyphs. - This new state causes the lookup to go immediately to the render engine if it is available instead of consulting the Unicode standard table first because the half/fullwidth table doesn't appear to have been updated for this nuance to reclass these characters as ambiguous, but we'd like to keep that table as a "generated from the spec" sort of table and keep our exceptions in the "quick lookup" function. I have confirmed the following things "just work" now: - The windows logo flag from the demo. (⚫⚪💖✅🌌😊) - The dotted chart on the side of crossterm demo (•) - The powerline characters that make arrows with the Consolas patched font (██) - An accented é - The warning and checkmark symbols appearing same size as the X. (✔⚠🔥) Related work items: #21167256, #21237515, #21243859, #21274645, #21296827
2019-05-02 01:13:53 +02:00
static bool FallbackMethod(const std::wstring_view glyph)
{
if (glyph.size() < 1)
{
return false;
}
else
{
return (glyph.at(0) % 2) == 1;
}
}
TEST_METHOD(AmbiguousCache)
{
// Set up a detector with fallback.
CodepointWidthDetector widthDetector;
widthDetector.SetFallbackMethod(std::bind(&FallbackMethod, std::placeholders::_1));
// Ensure fallback cache is empty.
VERIFY_ARE_EQUAL(0u, widthDetector._fallbackCache.size());
// Lookup ambiguous width character.
widthDetector.IsWide(ambiguous);
// Cache should hold it.
VERIFY_ARE_EQUAL(1u, widthDetector._fallbackCache.size());
// Cached item should match what we expect
const auto it = widthDetector._fallbackCache.begin();
VERIFY_ARE_EQUAL(ambiguous, it->first);
VERIFY_ARE_EQUAL(FallbackMethod(ambiguous), it->second);
// Cache should empty when font changes.
widthDetector.NotifyFontChanged();
VERIFY_ARE_EQUAL(0u, widthDetector._fallbackCache.size());
}
};