The fix against the race incorrectly assumes the sha of the commit being
pushed belongs to the base repository. It finds the highest possible
pull request ID from the head repository instead of looking it up in
the base repository.
Figuring out if a PR was created in the future based on the highest
index of the base repository would require collecting all of them
because there is no way to know in advance which repository may be
involved in the race.
Fixing this race can be done either by:
* Introducing a new field in the pull_request table https://codeberg.org/forgejo/forgejo/pulls/2842
which feels more like a hack than a real solution
* Refactoring the logic
which would be a significant undertaking
The race has been in the codebase for a very long time and manifests
itself in the CI, when events happen in quick succession. The only
concrete manifestation was however fixed by https://codeberg.org/forgejo/forgejo/issues/2009
Since this race now only exists in theory and not in practice, let's
revert this bugous commit until a proper solution is implemented.
Fixes: https://codeberg.org/forgejo/forgejo/issues/2817
This reverts commit 036f1eddc5.
Conflicts:
services/pull/pull.go
Unlike other async processing in the queue, we should sync branches to
the DB immediately when handling git hook calling. If it fails, users
can see the error message in the output of the git command.
It can avoid potential inconsistency issues, and help #29494.
---------
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
(cherry picked from commit 25b842df261452a29570ba89ffc3a4842d73f68c)
Conflicts:
routers/web/repo/wiki.go
services/repository/branch.go
services/repository/migrate.go
services/wiki/wiki.go
also apply to Forgejo specific usage of the refactored functions
Now we can get object format name from git command line or from the
database repository table. Assume the column is right, we don't need to
read from git command line every time.
This also fixed a possible bug that the object format is wrong when
migrating a sha256 repository from external.
<img width="658" alt="image"
src="https://github.com/go-gitea/gitea/assets/81045/6e9a9dcf-13bf-4267-928b-6bf2c2560423">
(cherry picked from commit b79c30435f439af8243ee281310258cdf141e27b)
Conflicts:
routers/web/repo/blame.go
services/agit/agit.go
context
* Split TestPullRequest out of AddTestPullRequestTask
* Before scheduling the task, AddTestPullRequestTask stores the max
index of the repository
* When the task runs, it does not take into account pull requests that
have an index higher than the recorded max index
When AddTestPullRequestTask is called with isSync == true, it is the
direct consequence of a new commit being pushed. Forgejo knows nothing
of this new commit yet. If a PR is created later and its head
references the new commit, it will have an index that is higher and
must not be taken into account. It would be acting and triggering a
notification for a PR based on an event that happened before it
existed.
Refs: https://codeberg.org/forgejo/forgejo/issues/2009
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/2236
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Co-authored-by: Earl Warren <contact@earl-warren.org>
Co-committed-by: Earl Warren <contact@earl-warren.org>
(cherry picked from commit b3be895a30)
## Purpose
This is a refactor toward building an abstraction over managing git
repositories.
Afterwards, it does not matter anymore if they are stored on the local
disk or somewhere remote.
## What this PR changes
We used `git.OpenRepository` everywhere previously.
Now, we should split them into two distinct functions:
Firstly, there are temporary repositories which do not change:
```go
git.OpenRepository(ctx, diskPath)
```
Gitea managed repositories having a record in the database in the
`repository` table are moved into the new package `gitrepo`:
```go
gitrepo.OpenRepository(ctx, repo_model.Repo)
```
Why is `repo_model.Repository` the second parameter instead of file
path?
Because then we can easily adapt our repository storage strategy.
The repositories can be stored locally, however, they could just as well
be stored on a remote server.
## Further changes in other PRs
- A Git Command wrapper on package `gitrepo` could be created. i.e.
`NewCommand(ctx, repo_model.Repository, commands...)`. `git.RunOpts{Dir:
repo.RepoPath()}`, the directory should be empty before invoking this
method and it can be filled in the function only. #28940
- Remove the `RepoPath()`/`WikiPath()` functions to reduce the
possibility of mistakes.
---------
Co-authored-by: delvh <dev.lh@web.de>
- Remove `ObjectFormatID`
- Remove function `ObjectFormatFromID`.
- Use `Sha1ObjectFormat` directly but not a pointer because it's an
empty struct.
- Store `ObjectFormatName` in `repository` struct
Refactor Hash interfaces and centralize hash function. This will allow
easier introduction of different hash function later on.
This forms the "no-op" part of the SHA256 enablement patch.
Fix#28056
This PR will check whether the repo has zero branch when pushing a
branch. If that, it means this repository hasn't been synced.
The reason caused that is after user upgrade from v1.20 -> v1.21, he
just push branches without visit the repository user interface. Because
all repositories routers will check whether a branches sync is necessary
but push has not such check.
For every repository, it has two states, synced or not synced. If there
is zero branch for a repository, then it will be assumed as non-sync
state. Otherwise, it's synced state. So if we think it's synced, we just
need to update branch/insert new branch. Otherwise do a full sync. So
that, for every push, there will be almost no extra load added. It's
high performance than yours.
For the implementation, we in fact will try to update the branch first,
if updated success with affect records > 0, then all are done. Because
that means the branch has been in the database. If no record is
affected, that means the branch does not exist in database. So there are
two possibilities. One is this is a new branch, then we just need to
insert the record. Another is the branches haven't been synced, then we
need to sync all the branches into database.
- Push commits updates are run in a queue and updates can come from less
traceable places such as Git over SSH, therefor add more information
about on which repository the pushUpdate failed.
Refs: https://codeberg.org/forgejo/forgejo/pulls/1723
(cherry picked from commit 37ab946039)
Co-authored-by: Gusted <postmaster@gusted.xyz>
To avoid deadlock problem, almost database related functions should be
have ctx as the first parameter.
This PR do a refactor for some of these functions.
When branch's commit CommitMessage is too long, the column maybe too
short.(TEXT 16K for mysql).
This PR will fix it to only store the summary because these message will
only show on branch list or possible future search?
Related #14180
Related #25233
Related #22639Close#19786
Related #12763
This PR will change all the branches retrieve method from reading git
data to read database to reduce git read operations.
- [x] Sync git branches information into database when push git data
- [x] Create a new table `Branch`, merge some columns of `DeletedBranch`
into `Branch` table and drop the table `DeletedBranch`.
- [x] Read `Branch` table when visit `code` -> `branch` page
- [x] Read `Branch` table when list branch names in `code` page dropdown
- [x] Read `Branch` table when list git ref compare page
- [x] Provide a button in admin page to manually sync all branches.
- [x] Sync branches if repository is not empty but database branches are
empty when visiting pages with branches list
- [x] Use `commit_time desc` as the default FindBranch order by to keep
consistent as before and deleted branches will be always at the end.
---------
Co-authored-by: Jason Song <i@wolfogre.com>
Extract from #22743
`DeleteBranch` will trigger a push update event, so that
`pull_service.CloseBranchPulls` has been invoked twice and
`AddDeletedBranch` is better to be moved to push update then even user
delete a branch via git command, it will also be triggered.
Co-authored-by: Giteabot <teabot@gitea.io>
Before there was a "graceful function": RunWithShutdownFns, it's mainly
for some modules which doesn't support context.
The old queue system doesn't work well with context, so the old queues
need it.
After the queue refactoring, the new queue works with context well, so,
use Golang context as much as possible, the `RunWithShutdownFns` could
be removed (replaced by RunWithCancel for context cancel mechanism), the
related code could be simplified.
This PR also fixes some legacy queue-init problems, eg:
* typo : archiver: "unable to create codes indexer queue" => "unable to
create repo-archive queue"
* no nil check for failed queues, which causes unfriendly panic
After this PR, many goroutines could have better display name:
![image](https://github.com/go-gitea/gitea/assets/2114189/701b2a9b-8065-4137-aeaa-0bda2b34604a)
![image](https://github.com/go-gitea/gitea/assets/2114189/f1d5f50f-0534-40f0-b0be-f2c9daa5fe92)
This PR replaces all string refName as a type `git.RefName` to make the
code more maintainable.
Fix#15367
Replaces #23070
It also fixed a bug that tags are not sync because `git remote --prune
origin` will not remove local tags if remote removed.
We in fact should use `git fetch --prune --tags origin` but not `git
remote update origin` to do the sync.
Some answer from ChatGPT as ref.
> If the git fetch --prune --tags command is not working as expected,
there could be a few reasons why. Here are a few things to check:
>
>Make sure that you have the latest version of Git installed on your
system. You can check the version by running git --version in your
terminal. If you have an outdated version, try updating Git and see if
that resolves the issue.
>
>Check that your Git repository is properly configured to track the
remote repository's tags. You can check this by running git config
--get-all remote.origin.fetch and verifying that it includes
+refs/tags/*:refs/tags/*. If it does not, you can add it by running git
config --add remote.origin.fetch "+refs/tags/*:refs/tags/*".
>
>Verify that the tags you are trying to prune actually exist on the
remote repository. You can do this by running git ls-remote --tags
origin to list all the tags on the remote repository.
>
>Check if any local tags have been created that match the names of tags
on the remote repository. If so, these local tags may be preventing the
git fetch --prune --tags command from working properly. You can delete
local tags using the git tag -d command.
---------
Co-authored-by: delvh <dev.lh@web.de>
# ⚠️ Breaking
Many deprecated queue config options are removed (actually, they should
have been removed in 1.18/1.19).
If you see the fatal message when starting Gitea: "Please update your
app.ini to remove deprecated config options", please follow the error
messages to remove these options from your app.ini.
Example:
```
2023/05/06 19:39:22 [E] Removed queue option: `[indexer].ISSUE_INDEXER_QUEUE_TYPE`. Use new options in `[queue.issue_indexer]`
2023/05/06 19:39:22 [E] Removed queue option: `[indexer].UPDATE_BUFFER_LEN`. Use new options in `[queue.issue_indexer]`
2023/05/06 19:39:22 [F] Please update your app.ini to remove deprecated config options
```
Many options in `[queue]` are are dropped, including:
`WRAP_IF_NECESSARY`, `MAX_ATTEMPTS`, `TIMEOUT`, `WORKERS`,
`BLOCK_TIMEOUT`, `BOOST_TIMEOUT`, `BOOST_WORKERS`, they can be removed
from app.ini.
# The problem
The old queue package has some legacy problems:
* complexity: I doubt few people could tell how it works.
* maintainability: Too many channels and mutex/cond are mixed together,
too many different structs/interfaces depends each other.
* stability: due to the complexity & maintainability, sometimes there
are strange bugs and difficult to debug, and some code doesn't have test
(indeed some code is difficult to test because a lot of things are mixed
together).
* general applicability: although it is called "queue", its behavior is
not a well-known queue.
* scalability: it doesn't seem easy to make it work with a cluster
without breaking its behaviors.
It came from some very old code to "avoid breaking", however, its
technical debt is too heavy now. It's a good time to introduce a better
"queue" package.
# The new queue package
It keeps using old config and concept as much as possible.
* It only contains two major kinds of concepts:
* The "base queue": channel, levelqueue, redis
* They have the same abstraction, the same interface, and they are
tested by the same testing code.
* The "WokerPoolQueue", it uses the "base queue" to provide "worker
pool" function, calls the "handler" to process the data in the base
queue.
* The new code doesn't do "PushBack"
* Think about a queue with many workers, the "PushBack" can't guarantee
the order for re-queued unhandled items, so in new code it just does
"normal push"
* The new code doesn't do "pause/resume"
* The "pause/resume" was designed to handle some handler's failure: eg:
document indexer (elasticsearch) is down
* If a queue is paused for long time, either the producers blocks or the
new items are dropped.
* The new code doesn't do such "pause/resume" trick, it's not a common
queue's behavior and it doesn't help much.
* If there are unhandled items, the "push" function just blocks for a
few seconds and then re-queue them and retry.
* The new code doesn't do "worker booster"
* Gitea's queue's handlers are light functions, the cost is only the
go-routine, so it doesn't make sense to "boost" them.
* The new code only use "max worker number" to limit the concurrent
workers.
* The new "Push" never blocks forever
* Instead of creating more and more blocking goroutines, return an error
is more friendly to the server and to the end user.
There are more details in code comments: eg: the "Flush" problem, the
strange "code.index" hanging problem, the "immediate" queue problem.
Almost ready for review.
TODO:
* [x] add some necessary comments during review
* [x] add some more tests if necessary
* [x] update documents and config options
* [x] test max worker / active worker
* [x] re-run the CI tasks to see whether any test is flaky
* [x] improve the `handleOldLengthConfiguration` to provide more
friendly messages
* [x] fine tune default config values (eg: length?)
## Code coverage:
![image](https://user-images.githubusercontent.com/2114189/236620635-55576955-f95d-4810-b12f-879026a3afdf.png)
When the base repository contains multiple branches with the same
commits as the base branch, pull requests can show a long list of
commits already in the base branch as having been added.
What this is supposed to do is exclude commits already in the base
branch. But the mechansim to do so assumed a commit only exists in a
single branch. Now use `git rev-list A B --not branchName` instead of
filtering commits afterwards.
The logic to detect if there was a force push also was wrong for
multiple branches. If the old commit existed in any branch in the base
repository it would assume there was no force push. Instead check if the
old commit is an ancestor of the new commit.
Related to: #22294#23186#23054
Replace: #23218
Some discussion is in the comments of #23218.
Highlights:
- Add Expiration for cache context. If a cache context has been used for
more than 10s, the cache data will be ignored, and warning logs will be
printed.
- Add `discard` field to `cacheContext`, a `cacheContext` with `discard`
true will drop all cached data and won't store any new one.
- Introduce `WithNoCacheContext`, if one wants to run long-life tasks,
but the parent context is a cache context,
`WithNoCacheContext(perentCtx)` will discard the cache data, so it will
be safe to keep the context for a long time.
- It will be fine to treat an original context as a cache context, like
`GetContextData(context.Backgraud())`, no warning logs will be printed.
Some cases about nesting:
When:
- *A*, *B* or *C* means a cache context.
- ~*A*~, ~*B*~ or ~*C*~ means a discard cache context.
- `ctx` means `context.Backgrand()`
- *A(ctx)* means a cache context with `ctx` as the parent context.
- *B(A(ctx))* means a cache context with `A(ctx)` as the parent context.
- `With` means `WithCacheContext`
- `WithNo` means `WithNoCacheContext`
So:
- `With(ctx)` -> *A(ctx)*
- `With(With(ctx))` -> *A(ctx)*, not *B(A(ctx))*
- `With(With(With(ctx)))` -> *A(ctx)*, not *C(B(A(ctx)))*
- `WithNo(ctx)` -> *ctx*, not *~A~(ctx)*
- `WithNo(With(ctx))` -> *~A~(ctx)*
- `WithNo(WithNo(With(ctx)))` -> *~A~(ctx)*, not *~B~(~A~(ctx))*
- `With(WithNo(With(ctx)))` -> *B(~A~(ctx))*
- `WithNo(With(WithNo(With(ctx))))` -> *~B~(~A~(ctx))*
- `With(WithNo(With(WithNo(With(ctx)))))` -> *C(~B~(~A~(ctx)))*
To avoid duplicated load of the same data in an HTTP request, we can set
a context cache to do that. i.e. Some pages may load a user from a
database with the same id in different areas on the same page. But the
code is hidden in two different deep logic. How should we share the
user? As a result of this PR, now if both entry functions accept
`context.Context` as the first parameter and we just need to refactor
`GetUserByID` to reuse the user from the context cache. Then it will not
be loaded twice on an HTTP request.
But of course, sometimes we would like to reload an object from the
database, that's why `RemoveContextData` is also exposed.
The core context cache is here. It defines a new context
```go
type cacheContext struct {
ctx context.Context
data map[any]map[any]any
lock sync.RWMutex
}
var cacheContextKey = struct{}{}
func WithCacheContext(ctx context.Context) context.Context {
return context.WithValue(ctx, cacheContextKey, &cacheContext{
ctx: ctx,
data: make(map[any]map[any]any),
})
}
```
Then you can use the below 4 methods to read/write/del the data within
the same context.
```go
func GetContextData(ctx context.Context, tp, key any) any
func SetContextData(ctx context.Context, tp, key, value any)
func RemoveContextData(ctx context.Context, tp, key any)
func GetWithContextCache[T any](ctx context.Context, cacheGroupKey string, cacheTargetID any, f func() (T, error)) (T, error)
```
Then let's take a look at how `system.GetString` implement it.
```go
func GetSetting(ctx context.Context, key string) (string, error) {
return cache.GetWithContextCache(ctx, contextCacheKey, key, func() (string, error) {
return cache.GetString(genSettingCacheKey(key), func() (string, error) {
res, err := GetSettingNoCache(ctx, key)
if err != nil {
return "", err
}
return res.SettingValue, nil
})
})
}
```
First, it will check if context data include the setting object with the
key. If not, it will query from the global cache which may be memory or
a Redis cache. If not, it will get the object from the database. In the
end, if the object gets from the global cache or database, it will be
set into the context cache.
An object stored in the context cache will only be destroyed after the
context disappeared.
The update by rebase code reuses the merge code but shortcircuits and
pushes back up to the head. However, it doesn't set the correct pushing
environment - and just uses the same environment as the base repo. This
leads to the push update failing and thence the PR becomes out-of-sync
with the head.
This PR fixes this and adjusts the trace logging elsewhere to help make
this clearer.
Fix#18802
Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: John Olheiser <john.olheiser@gmail.com>
After #22362, we can feel free to use transactions without
`db.DefaultContext`.
And there are still lots of models using `db.DefaultContext`, I think we
should refactor them carefully and one by one.
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Change all license headers to comply with REUSE specification.
Fix#16132
Co-authored-by: flynnnnnnnnnn <flynnnnnnnnnn@github>
Co-authored-by: John Olheiser <john.olheiser@gmail.com>
This PR adds a context parameter to a bunch of methods. Some helper
`xxxCtx()` methods got replaced with the normal name now.
Co-authored-by: delvh <dev.lh@web.de>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Fix#19513
This PR introduce a new db method `InTransaction(context.Context)`,
and also builtin check on `db.TxContext` and `db.WithTx`.
There is also a new method `db.AutoTx` has been introduced but could be used by other PRs.
`WithTx` will always open a new transaction, if a transaction exist in context, return an error.
`AutoTx` will try to open a new transaction if no transaction exist in context.
That means it will always enter a transaction if there is no error.
Co-authored-by: delvh <dev.lh@web.de>
Co-authored-by: 6543 <6543@obermui.de>
For normal commits the notification url was wrong because oldCommitID is received from the shrinked commits list.
This PR moves the commits list shrinking after the oldCommitID assignment.
* Move access and repo permission to models/perm/access
* fix test
* Move some git related files into sub package models/git
* Fix build
* fix git test
* move lfs to sub package
* move more git related functions to models/git
* Move functions sequence
* Some improvements per @KN4CK3R and @delvh
* Move some repository related code into sub package
* Move more repository functions out of models
* Fix lint
* Some performance optimization for webhooks and others
* some refactors
* Fix lint
* Fix
* Update modules/repository/delete.go
Co-authored-by: delvh <dev.lh@web.de>
* Fix test
* Merge
* Fix test
* Fix test
* Fix test
* Fix test
Co-authored-by: delvh <dev.lh@web.de>
When a new branch is pushed the old SHA is always listed as the empty sha and thus the compare link that is created does not work correctly.
Therefore when creating the compare link for new branches:
1. Attempt to get the parent of the first commit and use that as the basis
for the compare link.
2. If this is not possible make a comparison to the default branch
3. Finally if that is not possible simply do not show a compare link.
However, there are multiple broken compare links remaining therefore, in order for these to not break we will simply make the compare link redirect to the default branch.
Fix#19144
Signed-off-by: a1012112796 <1012112796@qq.com>
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Andrew Thornton <art27@cantab.net>
* Remove `db.DefaultContext` usage in routers, use `ctx` directly
* Use `ctx` directly if there is one, remove some `db.DefaultContext` in `services`
* Use ctx instead of db.DefaultContext for `cmd` and some `modules` packages
* fix incorrect context usage
* Start adding mechanism to return unhandled data
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Create pushback interface
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Add Pausable interface to WorkerPool and Manager
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Implement Pausable and PushBack for the bytefifos
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Implement Pausable and Pushback for ChannelQueues and ChannelUniqueQueues
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Wire in UI for pausing
Signed-off-by: Andrew Thornton <art27@cantab.net>
* add testcases and fix a few issues
Signed-off-by: Andrew Thornton <art27@cantab.net>
* fix build
Signed-off-by: Andrew Thornton <art27@cantab.net>
* prevent "race" in the test
Signed-off-by: Andrew Thornton <art27@cantab.net>
* fix jsoniter mismerge
Signed-off-by: Andrew Thornton <art27@cantab.net>
* fix conflicts
Signed-off-by: Andrew Thornton <art27@cantab.net>
* fix format
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Add warnings for no worker configurations and prevent data-loss with redis/levelqueue
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Use StopTimer
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Lauris BH <lauris@nix.lv>
Co-authored-by: 6543 <6543@obermui.de>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
This PR continues the work in #17125 by progressively ensuring that git
commands run within the request context.
This now means that the if there is a git repo already open in the context it will be used instead of reopening it.
Signed-off-by: Andrew Thornton <art27@cantab.net>