0
0
Fork 0
mirror of https://github.com/matrix-construct/construct synced 2024-11-13 21:41:06 +01:00
Commit graph

21 commits

Author SHA1 Message Date
Jason Volk
4a8302038a ircd::gpt::vocab: Add tokenization and detokenization count() convenience. 2022-09-21 16:42:07 -07:00
Jason Volk
56d944f33e ircd::gpt::vocab: Add space-prefix convenience argument. 2022-07-01 17:36:45 -07:00
Jason Volk
69ca5e3395 ircd::gpt: Fix fs::fd/map options regressions. 2022-06-30 15:55:23 -07:00
Jason Volk
78848925ee ircd::gpt: Various refactoring. 2022-06-19 20:14:22 -07:00
Jason Volk
366289823e ircd::gpt::vocab: Simplify overflow truncation length. 2022-06-17 21:11:53 -07:00
Jason Volk
858b56e4fe ircd::gpt::vocab: Fix bug. 2021-10-15 11:40:32 -07:00
Jason Volk
2df266e3be ircd::gpt::vocab: Improve debug fmtstr. 2021-10-06 13:13:47 -07:00
Jason Volk
d2b9e88a65 ircd::gpt::vocab: Simplify masks w/ sign extension. 2021-09-14 23:39:55 -07:00
Jason Volk
782379aeb4 ircd::gpt::vocab: Simplify UTF-8 length gauge. 2021-09-14 23:39:55 -07:00
Jason Volk
71b1b44a7f ircd::utf: Rename encode() to encode_sparse(). 2021-08-08 09:47:02 -07:00
Jason Volk
4f97dcf456 ircd: Vector initialization fixes for GCC. 2021-05-14 05:57:47 -07:00
Jason Volk
665eeb6cd7 ircd::gpt::vocab: No-split mask for trailing punctuation. 2021-04-22 12:27:57 -07:00
Jason Volk
aaced40d90 ircd::gpt::vocab: Mask erroneous trailing character case; fix pretoken case. 2021-04-22 12:27:57 -07:00
Jason Volk
b2f788e255 ircd::gpt::vocab: Minor reorg pre-tokenize related. 2021-04-22 12:27:57 -07:00
Jason Volk
1e08339955 ircd::gpt::vocab: Fixes for additional missing cases. 2021-04-22 12:27:57 -07:00
Jason Volk
eeadc15319 ircd::gpt::vocab: Fixes for additional mismatching cases. 2021-04-22 12:27:57 -07:00
Jason Volk
0a6be0efed ircd::gpt::vocab: Fix string length accumulation. 2021-04-22 12:27:57 -07:00
Jason Volk
0a87754c99 ircd::gpt::vocab: Fix token init missing null terminations. 2021-04-22 12:27:57 -07:00
Jason Volk
734948863f ircd::gpt::vocab: Add token debug string tool. 2021-03-09 04:50:19 -08:00
Jason Volk
53c4260a21 ircd::gpt: Add Basic Latin (lower) and C0 replacement LUT; various. 2021-03-09 04:50:19 -08:00
Jason Volk
29b99dcf4d ircd::gpt: Split vocab related into separate unit. 2021-03-02 11:13:59 -08:00
Renamed from ircd/gpt.cc (Browse further)