One method to set them all... or something like that.
The defaults for git-grep options were scattered over the run
function body. This change refactors them into a separate method.
The application of defaults is checked implicitly by existing
tests and linters, and the new approach makes it very easy
to inspect the desired defaults are set.
We need to shorten the timeout to bound effectively for
computation size. This protects against "too big" repos.
This also protects to some extent against too long lines
if kept to very low values (basically so that grep cannot run out
of memory beforehand).
Docs-PR: forgejo/docs#812
There is no reason to reject initial dashes in git-grep
expressions... other than the code not supporting it previously.
A new method is introduced to relax the security checks.
It is a waste of resources to scan them looking for matches
because they are never returned back - they appear as empty
lines in the current format.
Notably, even if they were returned, it is unlikely that matching
in binary files makes sense when the goal is "code search".
Analogously to how it happens for MaxResultLimit.
The default of 20 is inspired by a well-known, commercial code
hosting platform.
Unbounded limits are risky because they expose Forgejo to a class
of DoS attacks where queries are crafted to take advantage of
missing bounds.
- The parser of `git grep`'s output uses `bufio.Scanner`, which is a good
choice overall, however it does have a limit that's usually not noticed,
it will not read more than `64 * 1024` bytes at once which can be hit in
practical scenarios.
- Use `bufio.Reader` instead which doesn't have this limitation, but is
a bit harder to work with as it's a more lower level primitive.
- Adds unit test.
- Resolves https://codeberg.org/forgejo/forgejo/issues/3149