Jason Volk
|
4a8302038a
|
ircd::gpt::vocab: Add tokenization and detokenization count() convenience.
|
2022-09-21 16:42:07 -07:00 |
|
Jason Volk
|
56d944f33e
|
ircd::gpt::vocab: Add space-prefix convenience argument.
|
2022-07-01 17:36:45 -07:00 |
|
Jason Volk
|
69ca5e3395
|
ircd::gpt: Fix fs::fd/map options regressions.
|
2022-06-30 15:55:23 -07:00 |
|
Jason Volk
|
78848925ee
|
ircd::gpt: Various refactoring.
|
2022-06-19 20:14:22 -07:00 |
|
Jason Volk
|
366289823e
|
ircd::gpt::vocab: Simplify overflow truncation length.
|
2022-06-17 21:11:53 -07:00 |
|
Jason Volk
|
858b56e4fe
|
ircd::gpt::vocab: Fix bug.
|
2021-10-15 11:40:32 -07:00 |
|
Jason Volk
|
2df266e3be
|
ircd::gpt::vocab: Improve debug fmtstr.
|
2021-10-06 13:13:47 -07:00 |
|
Jason Volk
|
d2b9e88a65
|
ircd::gpt::vocab: Simplify masks w/ sign extension.
|
2021-09-14 23:39:55 -07:00 |
|
Jason Volk
|
782379aeb4
|
ircd::gpt::vocab: Simplify UTF-8 length gauge.
|
2021-09-14 23:39:55 -07:00 |
|
Jason Volk
|
71b1b44a7f
|
ircd::utf: Rename encode() to encode_sparse().
|
2021-08-08 09:47:02 -07:00 |
|
Jason Volk
|
4f97dcf456
|
ircd: Vector initialization fixes for GCC.
|
2021-05-14 05:57:47 -07:00 |
|
Jason Volk
|
665eeb6cd7
|
ircd::gpt::vocab: No-split mask for trailing punctuation.
|
2021-04-22 12:27:57 -07:00 |
|
Jason Volk
|
aaced40d90
|
ircd::gpt::vocab: Mask erroneous trailing character case; fix pretoken case.
|
2021-04-22 12:27:57 -07:00 |
|
Jason Volk
|
b2f788e255
|
ircd::gpt::vocab: Minor reorg pre-tokenize related.
|
2021-04-22 12:27:57 -07:00 |
|
Jason Volk
|
1e08339955
|
ircd::gpt::vocab: Fixes for additional missing cases.
|
2021-04-22 12:27:57 -07:00 |
|
Jason Volk
|
eeadc15319
|
ircd::gpt::vocab: Fixes for additional mismatching cases.
|
2021-04-22 12:27:57 -07:00 |
|
Jason Volk
|
0a6be0efed
|
ircd::gpt::vocab: Fix string length accumulation.
|
2021-04-22 12:27:57 -07:00 |
|
Jason Volk
|
0a87754c99
|
ircd::gpt::vocab: Fix token init missing null terminations.
|
2021-04-22 12:27:57 -07:00 |
|
Jason Volk
|
734948863f
|
ircd::gpt::vocab: Add token debug string tool.
|
2021-03-09 04:50:19 -08:00 |
|
Jason Volk
|
53c4260a21
|
ircd::gpt: Add Basic Latin (lower) and C0 replacement LUT; various.
|
2021-03-09 04:50:19 -08:00 |
|
Jason Volk
|
29b99dcf4d
|
ircd::gpt: Split vocab related into separate unit.
|
2021-03-02 11:13:59 -08:00 |
|