Commit graph

3 commits

Author SHA1 Message Date
Michael Niksa 4f1157c044 C26447,C26440 - is noexcept but can throw or doesn't throw but not noexcept 2019-08-29 15:23:07 -07:00
Michael Niksa 6aac2c06e3
Change ParseNext function in UTF16 parser to never yield invalid data… (#1129)
…. It will return a replacement character at that point if it was given bad data. #788

<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request

This modifies the parser used while inserting text into the underlying data buffer to never return an empty sequence. The empty sequence is invalid as you can't insert a "nothing" into the buffer. The buffer asserted this with a fail fast crash. Now we will instead insert U+FFFD (the Unicode replacement character) � to symbolize that something was invalid and has been replaced.

<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [x] Closes #788 and internal MSFT: 20990158
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [x] Tests added/passed
* [x] Requires documentation to be updated
* [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #788

<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments

The solution here isn't perfect and isn't going to solve all of our problems. I was basically trying to stop the crash while not getting in the way of the other things coming down the pipe for the input channels.

I considered the following:
1. Remove the fail fast assertion from the buffer
  - I didn't want to do this because it really is invalid to get all the way to placing the text down into the buffer and then request a string of 0 length get inserted. I feel the fail fast is a good indication that something is terribly wrong elsewhere that should be corrected.
2. Update the UTF16 parser in order to stop returning empty strings
  - This is what I ultimately did. If it would ever return just a lead, it returns �. If it would ever return just a trail, it returns �. Otherwise it will return them as a pair if they're both there, or it will return a single valid codepoint. I am now assuming that if the parse function is being called in an Output Iterator and doesn't contain a string with all pieces of the data that are needed, that someone at a higher level messed up the data, it is in valid, and it should be repaired into replacements.
  - This then will move the philosophy up out of the buffer layer to make folks inserting into the buffer identify half a sequence (if they're sitting on a stream where this circumstance could happen... one `wchar_t` at a time) and hold onto it until the next bit arrives. This is because there can be many different routes into the buffer from many different streams/channels. So buffering it low, right near the insertion point, is bad as it might pair loose `wchar_t` across stream entrypoints.
3. Update the iterator, on creating views, to disallow/transform empty strings. 
  - I considered this solution as well, but it would have required, under some circumstances, a second parsing of the string to identify lead/trail status from outside the `Utf16Parser` class to realize when to use the � character. So I avoided the double-parse.
4. Change the cooked read classes to identify that they pulled the lead `wchar_t` from a sequence then try to pull another one.
   - I was going to attempt this, but @adiviness said that he tried it and it made all sorts of other weirdness happen with the edit line.
   - Additionally, @adiviness has an outstanding series of effort to make cooked read significantly less horrible and disgusting. I didn't want to get in the way here.
5. Change the `GetChar` method off of the input buffer queue to return a `char32_t`, a `wstring_view`, transform a standalone lead/trail, etc.
    - The `GetChar` method is used by several different accessors and API calls to retrieve information off of the input queue, transforming the Key events into straight up characters. To change this at that level would change them all.  Long-term, it is probably warranted to do so as all of those consumers likely need to become aware of handling UTF-16 surrogates before we can declare victory. But two problems.
          1. This gets in the way of @adiviness work on cooked read data
          2. This goes WAY beyond the scope of what I want to accomplish here as the immediate goal is to stop the crash, not fix the world.


I've validated this by:
1. Writing some additional tests against the Utf16Parser to simulate some of the theoretical sequences that could arrive and need to be corrected into replacement characters per a verbal discussion and whiteboarding with @adiviness.
2. Manually triggered the emoji panel and inserted a bunch of emoji. Then seeked around left and right, deleted assorted points with the backspace key, pressed enter to commit, and used the up-arrow history to recommit them to see what happened. There were no crashes. The behavior is still weird and not great... but outside the scope of no crashy crashy.
2019-06-04 15:22:18 -07:00
Dustin Howett d4d59fa339 Initial release of the Windows Terminal source code
This commit introduces all of the Windows Terminal and Console Host source,
under the MIT license.
2019-05-02 15:29:04 -07:00