Fix for gitea putting everything into one request without batching and
sending it to Elasticsearch for indexing as issued in #28117
This issue occured in large repositories while Gitea tries to
index the code using ElasticSearch.
I've applied necessary changes that takes batch length from below config
(app.ini)
```
[queue.code_indexer]
BATCH_LENGTH=<length_int>
```
and batches all requests to Elasticsearch in chunks as configured in the
above config
Resolves https://github.com/go-gitea/gitea/issues/28704
Example of an entry in the generated `APKINDEX` file:
```
C:Q1xCO3H9LTTEbhKt9G1alSC87I56c=
P:hello
V:2.12-r1
A:x86_64
T:The GNU Hello program produces a familiar, friendly greeting
U:https://www.gnu.org/software/hello/
L:GPL-3.0-or-later
S:15403
I:36864
o:hello
m:
t:1705934118
D:so:libc.musl-x86_64.so.1
p:cmd:hello=2.12-r1
i:foobar=1.0 !baz
k:42
```
the `i:` and `k:` entries are new.
---------
Co-authored-by: KN4CK3R <admin@oldschoolhack.me>
Fixes#28660
Fixes an admin api bug related to `user.LoginSource`
Fixed `/user/emails` response not identical to GitHub api
This PR unifies the user update methods. The goal is to keep the logic
only at one place (having audit logs in mind). For example, do the
password checks only in one method not everywhere a password is updated.
After that PR is merged, the user creation should be next.
In #28691, schedule plans will be deleted when a repo's actions unit is
disabled. But when the unit is enabled, the schedule plans won't be
created again.
This PR fixes the bug. The schedule plans will be created again when the
actions unit is re-enabled
Renames it to `ENABLED` to be consistent with other settings and
deprecates it.
I believe this change is necessary because other setting groups such as
`attachment`, `cors`, `mailer`, etc. have an `ENABLED` setting, but
`oauth2` is the only one with an `ENABLE` setting, which could cause
confusion for users.
This is no longer a breaking change because `ENABLE` has been set as
deprecated and as an alias to `ENABLED`.
## Purpose
This is a refactor toward building an abstraction over managing git
repositories.
Afterwards, it does not matter anymore if they are stored on the local
disk or somewhere remote.
## What this PR changes
We used `git.OpenRepository` everywhere previously.
Now, we should split them into two distinct functions:
Firstly, there are temporary repositories which do not change:
```go
git.OpenRepository(ctx, diskPath)
```
Gitea managed repositories having a record in the database in the
`repository` table are moved into the new package `gitrepo`:
```go
gitrepo.OpenRepository(ctx, repo_model.Repo)
```
Why is `repo_model.Repository` the second parameter instead of file
path?
Because then we can easily adapt our repository storage strategy.
The repositories can be stored locally, however, they could just as well
be stored on a remote server.
## Further changes in other PRs
- A Git Command wrapper on package `gitrepo` could be created. i.e.
`NewCommand(ctx, repo_model.Repository, commands...)`. `git.RunOpts{Dir:
repo.RepoPath()}`, the directory should be empty before invoking this
method and it can be filled in the function only. #28940
- Remove the `RepoPath()`/`WikiPath()` functions to reduce the
possibility of mistakes.
---------
Co-authored-by: delvh <dev.lh@web.de>
The `ToUTF8*` functions were stripping BOM, while BOM is actually valid
in UTF8, so the stripping must be optional depending on use case. This
does:
- Add a options struct to all `ToUTF8*` functions, that by default will
strip BOM to preserve existing behaviour
- Remove `ToUTF8` function, it was dead code
- Rename `ToUTF8WithErr` to `ToUTF8`
- Preserve BOM in Monaco Editor
- Remove a unnecessary newline in the textarea value. Browsers did
ignore it, it seems but it's better not to rely on this behaviour.
Fixes: https://github.com/go-gitea/gitea/issues/28743
Related: https://github.com/go-gitea/gitea/issues/6716 which seems to
have once introduced a mechanism that strips and re-adds the BOM, but
from what I can tell, this mechanism was removed at some point after
that PR.
Fix#22066
# Purpose
This PR fix the releases will be deleted when mirror repository sync the
tags.
# The problem
In the previous implementation of #19125. All releases record in
databases of one mirror repository will be deleted before sync.
Ref:
https://github.com/go-gitea/gitea/pull/19125/files#diff-2aa04998a791c30e5a02b49a97c07fcd93d50e8b31640ce2ddb1afeebf605d02R481
# The Pros
This PR introduced a new method which will load all releases from
databases and all tags on git data into memory. And detect which tags
needs to be inserted, which tags need to be updated or deleted. Only
tags releases(IsTag=true) which are not included in git data will be
deleted, only tags which sha1 changed will be updated. So it will not
delete any real releases include drafts.
# The Cons
The drawback is the memory usage will be higher than before if there are
many tags on this repository. This PR defined a special release struct
to reduce columns loaded from database to memory.
This should fix https://github.com/go-gitea/gitea/issues/28927
Technically older versions of Git would support this flag as well, but
per https://github.com/go-gitea/gitea/pull/28466 that's the version
where using it (object-format=sha256) left "experimental" state.
`sha1` is (currently) the default, so older clients should be unaffected
in either case.
Signed-off-by: jolheiser <john.olheiser@gmail.com>
When LFS hooks are present in gitea-repositories, operations like git
push for creating a pull request fail. These repositories are not meant
to include LFS files or git push them, that is handled separately. And
so they should not have LFS hooks.
Installing git-lfs on some systems (like Debian Linux) will
automatically set up /etc/gitconfig to create LFS hooks in repositories.
For most git commands in Gitea this is not a problem, either because
they run on a temporary clone or the git command does not create LFS
hooks.
But one case where this happens is git archive for creating repository
archives. To fix that, add a GIT_CONFIG_NOSYSTEM=1 to disable using the
system configuration for that command.
According to a comment, GIT_CONFIG_NOSYSTEM is not used for all git
commands because the system configuration can be intentionally set up
for Gitea to use.
Resolves#19810, #21148
Sometimes you need to work on a feature which depends on another (unmerged) feature.
In this case, you may create a PR based on that feature instead of the main branch.
Currently, such PRs will be closed without the possibility to reopen in case the parent feature is merged and its branch is deleted.
Automatic target branch change make life a lot easier in such cases.
Github and Bitbucket behave in such way.
Example:
$PR_1$: main <- feature1
$PR_2$: feature1 <- feature2
Currently, merging $PR_1$ and deleting its branch leads to $PR_2$ being closed without the possibility to reopen.
This is both annoying and loses the review history when you open a new PR.
With this change, $PR_2$ will change its target branch to main ($PR_2$: main <- feature2) after $PR_1$ has been merged and its branch has been deleted.
This behavior is enabled by default but can be disabled.
For security reasons, this target branch change will not be executed when merging PRs targeting another repo.
Fixes#27062Fixes#18408
---------
Co-authored-by: Denys Konovalov <kontakt@denyskon.de>
Co-authored-by: delvh <dev.lh@web.de>
Fixes#26548
This PR refactors the rendering of markup links. The old code uses
`strings.Replace` to change some urls while the new code uses more
context to decide which link should be generated.
The added tests should ensure the same output for the old and new
behaviour (besides the bug).
We may need to refactor the rendering a bit more to make it clear how
the different helper methods render the input string. There are lots of
options (resolve links / images / mentions / git hashes / emojis / ...)
but you don't really know what helper uses which options. For example,
we currently support images in the user description which should not be
allowed I think:
<details>
<summary>Profile</summary>
https://try.gitea.io/KN4CK3R
![grafik](https://github.com/go-gitea/gitea/assets/1666336/109ae422-496d-4200-b52e-b3a528f553e5)
</details>
---------
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Fixes#27114.
* In Gitea 1.12 (#9532), a "dismiss stale approvals" branch protection
setting was introduced, for ignoring stale reviews when verifying the
approval count of a pull request.
* In Gitea 1.14 (#12674), the "dismiss review" feature was added.
* This caused confusion with users (#25858), as "dismiss" now means 2
different things.
* In Gitea 1.20 (#25882), the behavior of the "dismiss stale approvals"
branch protection was modified to actually dismiss the stale review.
For some users this new behavior of dismissing the stale reviews is not
desirable.
So this PR reintroduces the old behavior as a new "ignore stale
approvals" branch protection setting.
---------
Co-authored-by: delvh <dev.lh@web.de>
Fix#28157
This PR fix the possible bugs about actions schedule.
## The Changes
- Move `UpdateRepositoryUnit` and `SetRepoDefaultBranch` from models to
service layer
- Remove schedules plan from database and cancel waiting & running
schedules tasks in this repository when actions unit has been disabled
or global disabled.
- Remove schedules plan from database and cancel waiting & running
schedules tasks in this repository when default branch changed.
Mainly for MySQL/MSSQL.
It is important for Gitea to use case-sensitive database charset
collation. If the database is using a case-insensitive collation, Gitea
will show startup error/warning messages, and show the errors/warnings
on the admin panel's Self-Check page.
Make `gitea doctor convert` work for MySQL to convert the collations of
database & tables & columns.
* Fix#28131
## ⚠️ BREAKING ⚠️
It is not quite breaking, but it's highly recommended to convert the
database&table&column to a consistent and case-sensitive collation.
In #26365 issue references were disabled entirely for documents,
intending to match GitHub behavior. However cross-references do appear
to work in documents on GitHub.
This is useful for example to write release notes in a markdown document
and reference issues. While the simpler syntax may create links when not
intended, hopefully the cross-reference syntax is unique enough to avoid
it.
The CORS code has been unmaintained for long time, and the behavior is
not correct.
This PR tries to improve it. The key point is written as comment in
code. And add more tests.
Fix#28515Fix#27642Fix#17098
Fix#28526, regression of
* #26365
(although the author of #26365 has recent activities, but there is no
response for the regression, so I proposed this quick fix and keep the
fix simple to make it easier to backport to 1.21)
Nowadays, cache will be used on almost everywhere of Gitea and it cannot
be disabled, otherwise some features will become unaviable.
Then I think we can just remove the option for cache enable. That means
cache cannot be disabled.
But of course, we can still use cache configuration to set how should
Gitea use the cache.
The 4 functions are duplicated, especially as interface methods. I think
we just need to keep `MustID` the only one and remove other 3.
```
MustID(b []byte) ObjectID
MustIDFromString(s string) ObjectID
NewID(b []byte) (ObjectID, error)
NewIDFromString(s string) (ObjectID, error)
```
Introduced the new interfrace method `ComputeHash` which will replace
the interface `HasherInterface`. Now we don't need to keep two
interfaces.
Reintroduced `git.NewIDFromString` and `git.MustIDFromString`. The new
function will detect the hash length to decide which objectformat of it.
If it's 40, then it's SHA1. If it's 64, then it's SHA256. This will be
right if the commitID is a full one. So the parameter should be always a
full commit id.
@AdamMajer Please review.
- Modify the `Password` field in `CreateUserOption` struct to remove the
`Required` tag
- Update the `v1_json.tmpl` template to include the `email` field and
remove the `password` field
---------
Signed-off-by: Bo-Yi Wu <appleboy.tw@gmail.com>
Update golang.org/x/crypto for CVE-2023-48795 and update other packages.
`go-git` is not updated because it needs time to figure out why some
tests fail.
- If a topic has zero repository count, it means that none of the
repositories are using that topic, that would make them 'useless' to
keep. One caveat is that if that topic is going to be used in the
future, it will be added again to the database, but simply with a new
ID.
Refs: https://codeberg.org/forgejo/forgejo/pulls/1964
Co-authored-by: Gusted <postmaster@gusted.xyz>
- Remove `ObjectFormatID`
- Remove function `ObjectFormatFromID`.
- Use `Sha1ObjectFormat` directly but not a pointer because it's an
empty struct.
- Store `ObjectFormatName` in `repository` struct
- When a repository is orphaned and has objects stored in any of the
storages such as repository avatar or attachments the delete function
would error, because the storage module wasn't initalized.
- Add code to initialize the storage module.
Refs: https://codeberg.org/forgejo/forgejo/pulls/1954
Co-authored-by: Gusted <postmaster@gusted.xyz>
Refactor Hash interfaces and centralize hash function. This will allow
easier introduction of different hash function later on.
This forms the "no-op" part of the SHA256 enablement patch.
## Changes
- Add deprecation warning to `Token` and `AccessToken` authentication
methods in swagger.
- Add deprecation warning header to API response. Example:
```
HTTP/1.1 200 OK
...
Warning: token and access_token API authentication is deprecated
...
```
- Add setting `DISABLE_QUERY_AUTH_TOKEN` to reject query string auth
tokens entirely. Default is `false`
## Next steps
- `DISABLE_QUERY_AUTH_TOKEN` should be true in a subsequent release and
the methods should be removed in swagger
- `DISABLE_QUERY_AUTH_TOKEN` should be removed and the implementation of
the auth methods in question should be removed
## Open questions
- Should there be further changes to the swagger documentation?
Deprecation is not yet supported for security definitions (coming in
[OpenAPI Spec version
3.2.0](https://github.com/OAI/OpenAPI-Specification/issues/2506))
- Should the API router logger sanitize urls that use `token` or
`access_token`? (This is obviously an insufficient solution on its own)
---------
Co-authored-by: delvh <dev.lh@web.de>
1. Do not sort the "checks" slice again and again when "Register", it
just wastes CPU when the Gitea instance runs
2. If a check doesn't exist, tell the end user
3. Add some tests
The function `GetByBean` has an obvious defect that when the fields are
empty values, it will be ignored. Then users will get a wrong result
which is possibly used to make a security problem.
To avoid the possibility, this PR removed function `GetByBean` and all
references.
And some new generic functions have been introduced to be used.
The recommand usage like below.
```go
// if query an object according id
obj, err := db.GetByID[Object](ctx, id)
// query with other conditions
obj, err := db.Get[Object](ctx, builder.Eq{"a": a, "b":b})
```
It will fix#28268 .
<img width="1313" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc">
<img width="1318" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9">
## ⚠️ BREAKING ⚠️
But need to give up some features:
<img width="1312" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610">
However, such abandonment may fix#28055 .
## Backgroud
When the user switches the dashboard context to an org, it means they
want to search issues in the repos that belong to the org. However, when
they switch to themselves, it means all repos they can access because
they may have created an issue in a public repo that they don't own.
<img width="286" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97">
It's a confusing design. Think about this: What does "In your
repositories" mean when the user switches to an org? Repos belong to the
user or the org?
Whatever, it has been broken by #26012 and its following PRs. After the
PR, it searches for issues in repos that the dashboard context user owns
or has been explicitly granted access to, so it causes #28268.
## How to fix it
It's not really difficult to fix it. Just extend the repo scope to
search issues when the dashboard context user is the doer. Since the
user may create issues or be mentioned in any public repo, we can just
set `AllPublic` to true, which is already supported by indexers. The DB
condition will also support it in this PR.
But the real difficulty is how to count the search results grouped by
repos. It's something like "search issues with this keyword and those
filters, and return the total number and the top results. **Then, group
all of them by repo and return the counts of each group.**"
<img width="314" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4">
Before #26012, it was being done in the DB, but it caused the results to
be incomplete (see the description of #26012).
And to keep this, #26012 implement it in an inefficient way, just count
the issues by repo one by one, so it cannot work when `AllPublic` is
true because it's almost impossible to do this for all public repos.
1bfcdeef4c/modules/indexer/issues/indexer.go (L318-L338)
## Give up unnecessary features
We may can resovle `TODO: use "group by" of the indexer engines to
implement it`, I'm sure it can be done with Elasticsearch, but IIRC,
Bleve and Meilisearch don't support "group by".
And the real question is, does it worth it? Why should we need to know
the counts grouped by repos?
Let me show you my search dashboard on gitea.com.
<img width="1304" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff">
I never think the long repo list helps anything.
And if we agree to abandon it, things will be much easier. That is this
PR.
## TODO
I know it's important to filter by repos when searching issues. However,
it shouldn't be the way we have it now. It could be implemented like
this.
<img width="1316" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe">
The indexers support it well now, but it requires some frontend work,
which I'm not good at. So, I think someone could help do that in another
PR and merge this one to fix the bug first.
Or please block this PR and help to complete it.
Finally, "Switch dashboard context" is also a design that needs
improvement. In my opinion, it can be accomplished by adding filtering
conditions instead of "switching".
The summary string ends up in the database, and (at least) MySQL &
PostgreSQL require valid UTF8 strings.
Fixes#28178
Co-authored-by: Darrin Smart <darrin@filmlight.ltd.uk>
Previously only the first term had to be matched. That default
Meilisearch behavior makes sense for e.g. some kind of autocomplete to
find and select a single result. But for filtering issues it means you
can't narrow down results by adding more terms.
This is also more consistent with other indexers and GitHub.
---
Reference:
https://www.meilisearch.com/docs/reference/api/search#matching-strategy
The git command may operate the git directory (add/remove) files in any
time.
So when the code iterates the directory, some files may disappear during
the "walk". All "IsNotExist" errors should be ignored.
Fix#26765
gitea doctor failed at checking and fixing 'delete-orphaned-repos',
because table name 'user' needs quoting to be correctly recognized by at
least PostgreSQL.
fixes#28199
- Currently the repository description uses the same sanitizer as a
normal markdown document. This means that element such as heading and
images are allowed and can be abused.
- Create a minimal restricted sanitizer for the repository description,
which only allows what the postprocessor currently allows, which are
links and emojis.
- Added unit testing.
- Resolves https://codeberg.org/forgejo/forgejo/issues/1202
- Resolves https://codeberg.org/Codeberg/Community/issues/1122
(cherry picked from commit 631c87cc2347f0036a75dcd21e24429bbca28207)
Co-authored-by: Gusted <postmaster@gusted.xyz>
Fix#25473
Although there was `m.Post("/login/oauth/access_token", CorsHandler()...`,
it never really worked, because it still lacks the "OPTIONS" handler.
Fixes#27819
We have support for two factor logins with the normal web login and with
basic auth. For basic auth the two factor check was implemented at three
different places and you need to know that this check is necessary. This
PR moves the check into the basic auth itself.
- On user deletion, delete action runners that the user has created.
- Add a database consistency check to remove action runners that have
nonexistent belonging owner.
- Resolves https://codeberg.org/forgejo/forgejo/issues/1720
(cherry picked from commit 009ca7223dab054f7f760b7ccae69e745eebfabb)
Co-authored-by: Gusted <postmaster@gusted.xyz>
This patchset changes the connection string builder to use net.URL and
the host/port parser to use the stdlib function for splitting host from
port. It also adds a footnote about a potentially required portnumber
for postgres UNIX sockets.
Fixes: #24552
This PR adds a prefix path for all minio storage and override base path
will override the path.
The previous behavior is undefined officially, so it will be marked as
breaking.
After many refactoring PRs for the "locale" and "template context
function", now the ".locale" is not needed for web templates any more.
This PR does a clean up for:
1. Remove `ctx.Data["locale"]` for web context.
2. Use `ctx.Locale` in `500.tmpl`, for consistency.
3. Add a test check for `500 page` locale usage.
4. Remove the `Str2html` and `DotEscape` from mail template context
data, they are copy&paste errors introduced by #19169 and #16200 . These
functions are template functions (provided by the common renderer), but
not template data variables.
5. Make email `SendAsync` function mockable (I was planning to add more
tests but it would make this PR much too complex, so the tests could be
done in another PR)