0
0
Fork 0
mirror of https://codeberg.org/forgejo/forgejo.git synced 2025-01-25 22:40:13 +01:00
Commit graph

36 commits

Author SHA1 Message Date
Jason Song
375fd15fbf
Refactor indexer ()
Refactor `modules/indexer` to make it more maintainable. And it can be
easier to support more features. I'm trying to solve some of issue
searching, this is a precursor to making functional changes.

Current supported engines and the index versions:

| engines | issues | code |
| - | - | - |
| db | Just a wrapper for database queries, doesn't need version | - |
| bleve | The version of index is **2** | The version of index is **6**
|
| elasticsearch | The old index has no version, will be treated as
version **0** in this PR | The version of index is **1** |
| meilisearch | The old index has no version, will be treated as version
**0** in this PR | - |


## Changes

### Split

Splited it into mutiple packages

```text
indexer
├── internal
│   ├── bleve
│   ├── db
│   ├── elasticsearch
│   └── meilisearch
├── code
│   ├── bleve
│   ├── elasticsearch
│   └── internal
└── issues
    ├── bleve
    ├── db
    ├── elasticsearch
    ├── internal
    └── meilisearch
```

- `indexer/interanal`: Internal shared package for indexer.
- `indexer/interanal/[engine]`: Internal shared package for each engine
(bleve/db/elasticsearch/meilisearch).
- `indexer/code`: Implementations for code indexer.
- `indexer/code/internal`: Internal shared package for code indexer.
- `indexer/code/[engine]`: Implementation via each engine for code
indexer.
- `indexer/issues`: Implementations for issues indexer.

### Deduplication

- Combine `Init/Ping/Close` for code indexer and issues indexer.
- ~Combine `issues.indexerHolder` and `code.wrappedIndexer` to
`internal.IndexHolder`.~ Remove it, use dummy indexer instead when the
indexer is not ready.
- Duplicate two copies of creating ES clients.
- Duplicate two copies of `indexerID()`.


### Enhancement

- [x] Support index version for elasticsearch issues indexer, the old
index without version will be treated as version 0.
- [x] Fix spell of `elastic_search/ElasticSearch`, it should be
`Elasticsearch`.
- [x] Improve versioning of ES index. We don't need `Aliases`:
- Gitea does't need aliases for "Zero Downtime" because it never delete
old indexes.
- The old code of issues indexer uses the orignal name to create issue
index, so it's tricky to convert it to an alias.
- [x] Support index version for meilisearch issues indexer, the old
index without version will be treated as version 0.
- [x] Do "ping" only when `Ping` has been called, don't ping
periodically and cache the status.
- [x] Support the context parameter whenever possible.
- [x] Fix outdated example config.
- [x] Give up the requeue logic of issues indexer: When indexing fails,
call Ping to check if it was caused by the engine being unavailable, and
only requeue the task if the engine is unavailable.
- It is fragile and tricky, could cause data losing (It did happen when
I was doing some tests for this PR). And it works for ES only.
- Just always requeue the failed task, if it caused by bad data, it's a
bug of Gitea which should be fixed.

---------

Co-authored-by: Giteabot <teabot@gitea.io>
2023-06-23 12:37:56 +00:00
wxiaoguang
18f26cfbf7
Improve queue and logger context ()
Before there was a "graceful function": RunWithShutdownFns, it's mainly
for some modules which doesn't support context.

The old queue system doesn't work well with context, so the old queues
need it.

After the queue refactoring, the new queue works with context well, so,
use Golang context as much as possible, the `RunWithShutdownFns` could
be removed (replaced by RunWithCancel for context cancel mechanism), the
related code could be simplified.

This PR also fixes some legacy queue-init problems, eg:

* typo : archiver: "unable to create codes indexer queue" => "unable to
create repo-archive queue"
* no nil check for failed queues, which causes unfriendly panic

After this PR, many goroutines could have better display name:

![image](https://github.com/go-gitea/gitea/assets/2114189/701b2a9b-8065-4137-aeaa-0bda2b34604a)

![image](https://github.com/go-gitea/gitea/assets/2114189/f1d5f50f-0534-40f0-b0be-f2c9daa5fe92)
2023-05-26 07:31:55 +00:00
Lunny Xiao
c355728a6f
Remove unnecessary code ()
As title, remove unnecessary code.
2023-05-10 04:57:06 +00:00
wxiaoguang
6f9c278559
Rewrite queue ()
# ⚠️ Breaking

Many deprecated queue config options are removed (actually, they should
have been removed in 1.18/1.19).

If you see the fatal message when starting Gitea: "Please update your
app.ini to remove deprecated config options", please follow the error
messages to remove these options from your app.ini.

Example:

```
2023/05/06 19:39:22 [E] Removed queue option: `[indexer].ISSUE_INDEXER_QUEUE_TYPE`. Use new options in `[queue.issue_indexer]`
2023/05/06 19:39:22 [E] Removed queue option: `[indexer].UPDATE_BUFFER_LEN`. Use new options in `[queue.issue_indexer]`
2023/05/06 19:39:22 [F] Please update your app.ini to remove deprecated config options
```

Many options in `[queue]` are are dropped, including:
`WRAP_IF_NECESSARY`, `MAX_ATTEMPTS`, `TIMEOUT`, `WORKERS`,
`BLOCK_TIMEOUT`, `BOOST_TIMEOUT`, `BOOST_WORKERS`, they can be removed
from app.ini.

# The problem

The old queue package has some legacy problems:

* complexity: I doubt few people could tell how it works.
* maintainability: Too many channels and mutex/cond are mixed together,
too many different structs/interfaces depends each other.
* stability: due to the complexity & maintainability, sometimes there
are strange bugs and difficult to debug, and some code doesn't have test
(indeed some code is difficult to test because a lot of things are mixed
together).
* general applicability: although it is called "queue", its behavior is
not a well-known queue.
* scalability: it doesn't seem easy to make it work with a cluster
without breaking its behaviors.

It came from some very old code to "avoid breaking", however, its
technical debt is too heavy now. It's a good time to introduce a better
"queue" package.

# The new queue package

It keeps using old config and concept as much as possible.

* It only contains two major kinds of concepts:
    * The "base queue": channel, levelqueue, redis
* They have the same abstraction, the same interface, and they are
tested by the same testing code.
* The "WokerPoolQueue", it uses the "base queue" to provide "worker
pool" function, calls the "handler" to process the data in the base
queue.
* The new code doesn't do "PushBack"
* Think about a queue with many workers, the "PushBack" can't guarantee
the order for re-queued unhandled items, so in new code it just does
"normal push"
* The new code doesn't do "pause/resume"
* The "pause/resume" was designed to handle some handler's failure: eg:
document indexer (elasticsearch) is down
* If a queue is paused for long time, either the producers blocks or the
new items are dropped.
* The new code doesn't do such "pause/resume" trick, it's not a common
queue's behavior and it doesn't help much.
* If there are unhandled items, the "push" function just blocks for a
few seconds and then re-queue them and retry.
* The new code doesn't do "worker booster"
* Gitea's queue's handlers are light functions, the cost is only the
go-routine, so it doesn't make sense to "boost" them.
* The new code only use "max worker number" to limit the concurrent
workers.
* The new "Push" never blocks forever
* Instead of creating more and more blocking goroutines, return an error
is more friendly to the server and to the end user.

There are more details in code comments: eg: the "Flush" problem, the
strange "code.index" hanging problem, the "immediate" queue problem.

Almost ready for review.

TODO:

* [x] add some necessary comments during review
* [x] add some more tests if necessary
* [x] update documents and config options
* [x] test max worker / active worker
* [x] re-run the CI tasks to see whether any test is flaky
* [x] improve the `handleOldLengthConfiguration` to provide more
friendly messages
* [x] fine tune default config values (eg: length?)

## Code coverage:

![image](https://user-images.githubusercontent.com/2114189/236620635-55576955-f95d-4810-b12f-879026a3afdf.png)
2023-05-08 19:49:59 +08:00
Lunny Xiao
5cf7da63ee
Refactor config provider ()
This PR introduces more abstract about `ConfigProvider` and hides more `ini` references.

---------

Co-authored-by: delvh <dev.lh@web.de>
2023-04-25 23:06:39 +08:00
wxiaoguang
e422342eeb
Allow adding new files to an empty repo ()
![image](https://user-images.githubusercontent.com/2114189/232561612-2bfcfd0a-fc04-47ba-965f-5d0bcea46c54.png)
2023-04-19 21:40:42 +08:00
Lunny Xiao
c53ad052d8
Refactor the setting to make unit test easier ()
Some bugs caused by less unit tests in fundamental packages. This PR
refactor `setting` package so that create a unit test will be easier
than before.

- All `LoadFromXXX` files has been splited as two functions, one is
`InitProviderFromXXX` and `LoadCommonSettings`. The first functions will
only include the code to create or new a ini file. The second function
will load common settings.
- It also renames all functions in setting from `newXXXService` to
`loadXXXSetting` or `loadXXXFrom` to make the function name less
confusing.
- Move `XORMLog` to `SQLLog` because it's a better name for that.

Maybe we should finally move these `loadXXXSetting` into the `XXXInit`
function? Any idea?

---------

Co-authored-by: 6543 <6543@obermui.de>
Co-authored-by: delvh <dev.lh@web.de>
2023-02-20 00:12:01 +08:00
Lunny Xiao
0a7d3ff786
refactor some functions to support ctx as first parameter ()
Co-authored-by: KN4CK3R <admin@oldschoolhack.me>
Co-authored-by: Lauris BH <lauris@nix.lv>
2022-12-03 10:48:26 +08:00
flynnnnnnnnnn
e81ccc406b
Implement FSFE REUSE for golang files ()
Change all license headers to comply with REUSE specification.

Fix 

Co-authored-by: flynnnnnnnnnn <flynnnnnnnnnn@github>
Co-authored-by: John Olheiser <john.olheiser@gmail.com>
2022-11-27 18:20:29 +00:00
wxiaoguang
88f2e457d8
Fix data-race problems in git module (quick patch) ()
* Fix data-race problems in git module

* use HomeDir instead of setting.RepoRootPath

Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2022-06-11 11:56:27 +08:00
wxiaoguang
a0051634b9
Refactor git module, make Gitea use internal git config ()
* Refactor git module, make Gitea use internal git config, add safe.directory config

* introduce git.InitSimple and git.InitWithConfigSync, make serv cmd use gitconfig

* use HOME instead of GIT_CONFIG_GLOBAL, because git always needs a correct HOME

* fix cmd env in cmd/serv.go

* fine tune error message

* Fix a incorrect test case

* fix configAddNonExist

* fix configAddNonExist logic, add `--fixed-value` flag, add tests

* add configSetNonExist function in case it's needed.

* use configSetNonExist for `user.name` and `user.email`

* add some comments

* Update cmd/serv.go

Co-authored-by: zeripath <art27@cantab.net>

* Update cmd/serv.go

Co-authored-by: zeripath <art27@cantab.net>

* Update modules/git/git.go

Co-authored-by: zeripath <art27@cantab.net>

* Update modules/setting/setting.go

Co-authored-by: zeripath <art27@cantab.net>

* Update modules/git/repo_attribute.go

Co-authored-by: zeripath <art27@cantab.net>

* fix spaces in messages

* use `configSet("core.protectNTFS", ...)` instead of `globalCommandArgs`

* remove GIT_CONFIG_NOSYSTEM, continue to use system's git config

* Update cmd/serv.go

Co-authored-by: zeripath <art27@cantab.net>

* fix merge

* remove code for safe.directory

* separate git.CommonEnvs to CommonGitCmdEnvs and CommonCmdServEnvs

* avoid Golang's data race error

Co-authored-by: zeripath <art27@cantab.net>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2022-06-10 09:57:49 +08:00
Lunny Xiao
fd7d83ace6
Move almost all functions' parameter db.Engine to context.Context ()
* Move almost all functions' parameter db.Engine to context.Context
* remove some unnecessary wrap functions
2022-05-20 22:08:52 +08:00
Lunny Xiao
b8911fb456
Use a struct as test options ()
* Use a struct as test options

* Fix name

* Fix test
2022-04-14 21:58:21 +08:00
6543
3e88af898a
Make git.OpenRepository accept Context ()
* OpenRepositoryCtx -> OpenRepository
* OpenRepository -> openRepositoryWithDefaultContext, only for internal usage
2022-03-30 03:13:41 +08:00
zeripath
d3dbdbe6c5
Prevent intermittent failures in RepoIndexerTest (2) ()
So whilst  fixes one issue it caused another. We need to initialise the Git
module first.

Related 
Fix 

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: 6543 <6543@obermui.de>
2022-03-27 17:54:51 -04:00
zeripath
793ce9dacf
Prevent intermittent failures in RepoIndexerTest ()
The RepoIndexerTest is failing with considerable frequency due to a race inherrent in
its design. This PR adjust this test to avoid the reliance on waiting for the populate
repo indexer to run and forcibly adds the repo to the queue. It then flushes the queue.

It may be worth separating out the tests somewhat by testing the Index function
directly away from the queue however, this forceful method should solve the current
problem.

Signed-off-by: Andrew Thornton <art27@cantab.net>
2022-03-27 15:05:01 +08:00
zeripath
f1c6cf7c51
Prevent Stats Indexer reporting error if repo dir missing ()
Repositories missing their directory should not report an error from the stats
indexer.

Close 

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2022-02-24 23:22:09 -05:00
zeripath
a82fd98d53
Pause queues ()
* Start adding mechanism to return unhandled data

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Create pushback interface

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Add Pausable interface to WorkerPool and Manager

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Implement Pausable and PushBack for the bytefifos

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Implement Pausable and Pushback for ChannelQueues and ChannelUniqueQueues

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Wire in UI for pausing

Signed-off-by: Andrew Thornton <art27@cantab.net>

* add testcases and fix a few issues

Signed-off-by: Andrew Thornton <art27@cantab.net>

* fix build

Signed-off-by: Andrew Thornton <art27@cantab.net>

* prevent "race" in the test

Signed-off-by: Andrew Thornton <art27@cantab.net>

* fix jsoniter mismerge

Signed-off-by: Andrew Thornton <art27@cantab.net>

* fix conflicts

Signed-off-by: Andrew Thornton <art27@cantab.net>

* fix format

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Add warnings for no worker configurations and prevent data-loss with redis/levelqueue

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Use StopTimer

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: Lauris BH <lauris@nix.lv>
Co-authored-by: 6543 <6543@obermui.de>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
2022-01-22 21:22:14 +00:00
6543
54e9ee37a7
format with gofumpt ()
* gofumpt -w -l .

* gofumpt -w -l -extra .

* Add linter

* manual fix

* change make fmt
2022-01-20 18:46:10 +01:00
qwerty287
9d943bf374
Add missing X-Total-Count and fix some related bugs ()
* Add missing `X-Total-Count` and fix some related bugs

Adds `X-Total-Count` header to APIs that return a list but doesn't have it yet.
Fixed bugs:
* not returned after reporting error (39eb82446c/routers/api/v1/user/star.go (L70))
* crash with index out of bounds, API issue/issueSubscriptions

I also found various endpoints that return lists but do not apply/support pagination yet:
```
/repos/{owner}/{repo}/issues/{index}/labels
/repos/{owner}/{repo}/issues/comments/{id}/reactions
/repos/{owner}/{repo}/branch_protections
/repos/{owner}/{repo}/contents
/repos/{owner}/{repo}/hooks/git
/repos/{owner}/{repo}/issue_templates
/repos/{owner}/{repo}/releases/{id}/assets
/repos/{owner}/{repo}/reviewers
/repos/{owner}/{repo}/teams
/user/emails
/users/{username}/heatmap
```
If this is not expected, an new issue should be opened.

Closes 

* fmt

* Update routers/api/v1/repo/issue_subscription.go

Co-authored-by: KN4CK3R <admin@oldschoolhack.me>

* Use FindAndCount

Co-authored-by: KN4CK3R <admin@oldschoolhack.me>
Co-authored-by: 6543 <6543@obermui.de>
2021-12-15 13:39:34 +08:00
Lunny Xiao
719bddcd76
Move repository model into models/repo ()
* Some refactors related repository model

* Move more methods out of repository

* Move repository into models/repo

* Fix test

* Fix test

* some improvements

* Remove unnecessary function
2021-12-10 09:27:50 +08:00
zeripath
01087e9eef
Make Requests Processes and create process hierarchy. Associate OpenRepository with context. ()
This PR registers requests with the process manager and manages hierarchy within the processes.

Git repos are then associated with a context, (usually the request's context) - with sub commands using this context as their base context.

Signed-off-by: Andrew Thornton <art27@cantab.net>
2021-11-30 20:06:32 +00:00
wxiaoguang
750a8465f5
A better go code formatter, and now make fmt can run in Windows ()
* go build / format tools
* re-format imports
2021-11-17 20:34:35 +08:00
wxiaoguang
df64fa4865
Decouple unit test code from business code () 2021-11-12 22:36:47 +08:00
Lunny Xiao
4a57c9ea17
Fix some lints ()
Fix some linting problems.
2021-10-17 20:47:12 +01:00
wxiaoguang
b231d0deab
Ignore Sync errors on pipes when doing CheckAttributeReader.CheckPath, fix the hang of git cat-file ()
* Ignore Sync errors on pipes when doing `CheckAttributeReader.CheckPath`

* apply env patch

* Drop the Sync and fix a number of issues with the Close function

Signed-off-by: Andrew Thornton <art27@cantab.net>

* add logs for DBIndexer and CheckPath

* Fix some more closing bugs

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Add test case for language_stats

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Update modules/indexer/stats/db.go

Co-authored-by: Lauris BH <lauris@nix.lv>

Co-authored-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Lauris BH <lauris@nix.lv>
Co-authored-by: 6543 <6543@obermui.de>
2021-09-20 20:46:51 +01:00
Lunny Xiao
a4bfef265d
Move db related basic functions to models/db ()
* Move db related basic functions to models/db

* Fix lint

* Fix lint

* Fix test

* Fix lint

* Fix lint

* revert unnecessary change

* Fix test

* Fix wrong replace string

* Use *Context

* Correct committer spelling and fix wrong replaced words

Co-authored-by: zeripath <art27@cantab.net>
2021-09-19 19:49:59 +08:00
zeripath
d6a33cef23
If the default branch is not present do not report error on stats indexing (follow-up of ) ()
 doesn't completely fix this problem because the error returned is an ObjectNotExist
error not a BranchNotExist error.

Add test for ErrObjectNotExist too

Fix 

Signed-off-by: Andrew Thornton <art27@cantab.net>
2021-04-22 17:35:29 +02:00
zeripath
f719ffc783
If the default branch is not present do not report error on stats indexing ()
* If the default branch is not present do not report error on stats indexing

Fix 

Signed-off-by: Andrew Thornton <art27@cantab.net>

* as per lunny

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: techknowlogick <techknowlogick@gitea.io>
2021-04-22 09:19:21 +08:00
zeripath
511f6138d4
Use native git variants by default with go-git variants as build tag ()
* Move last commit cache back into modules/git

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Remove go-git from the interface for last commit cache

Signed-off-by: Andrew Thornton <art27@cantab.net>

* move cacheref to last_commit_cache

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Remove go-git from routers/private/hook

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Move FindLFSFiles to pipeline

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Make no-go-git variants

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Submodule RefID

Signed-off-by: Andrew Thornton <art27@cantab.net>

* fix issue with GetCommitsInfo

Signed-off-by: Andrew Thornton <art27@cantab.net>

* fix GetLastCommitForPaths

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Improve efficiency

Signed-off-by: Andrew Thornton <art27@cantab.net>

* More efficiency

Signed-off-by: Andrew Thornton <art27@cantab.net>

* even faster

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Reduce duplication

* As per @lunny

Signed-off-by: Andrew Thornton <art27@cantab.net>

* attempt to fix drone

Signed-off-by: Andrew Thornton <art27@cantab.net>

* fix test-tags

Signed-off-by: Andrew Thornton <art27@cantab.net>

* default to use no-go-git variants and add gogit build tag

Signed-off-by: Andrew Thornton <art27@cantab.net>

* placate lint

Signed-off-by: Andrew Thornton <art27@cantab.net>

* as per @6543

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: 6543 <6543@obermui.de>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
2020-12-17 22:00:47 +08:00
zeripath
bac65f1b82
Fix incorrect error logging in Stats indexer and OAuth2 ()
* Fix incorrect logging in oauth2.go

Fix 

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Handle ErrAlreadyInQueue in stats indexer

Fix 

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Fixes type in error message of indexer

Add the missing character in the error message.

Co-authored-by: techknowlogick <techknowlogick@gitea.io>
Co-authored-by: Lieven Hollevoet <hollie@lika.be>
2020-08-01 10:45:26 -04:00
Cirno the Strongest
9d652002c6
Fix language stat calculation ()
* Fix language stat calculation

* Group languages and ignore 0 size files

* remove unneeded code
2020-05-31 01:58:55 +03:00
Lauris BH
ea4c139cd2
Change language statistics to save size instead of percentage ()
* Change language statistics to save size instead of percentage in database

Co-Authored-By: Cirno the Strongest <1447794+CirnoT@users.noreply.github.com>

* Do not exclude if only language

* Fix edge cases with special langauges

Co-authored-by: Cirno the Strongest <1447794+CirnoT@users.noreply.github.com>
2020-05-30 10:46:15 +03:00
Lauris BH
a1d796f521
Index code and stats only for non-empty repositories ()
Fix test and switch to unique queue

Fix MySQL support when deleting old statistics
2020-02-14 13:42:30 +01:00
Lunny Xiao
3d69bbd58f
Fix queue pop error and stat empty repository error ()
* Fix queue pop error and stat empty repository error

* Fix error
2020-02-12 18:12:27 +08:00
Lauris BH
ad2642a8aa
Language statistics bar for repositories ()
* Implementation for calculating language statistics

Impement saving code language statistics to database

Implement rendering langauge stats

Add primary laguage to show in repository list

Implement repository stats indexer queue

Add indexer test

Refactor to use queue module

* Do not timeout for queues
2020-02-11 11:34:17 +02:00