minio

Author	SHA1	Message	Date
Klaus Post	8a6b13c239	Avoid synchronizing usage writes (#11560 ) If the periodic `case <-t.C:` save gets held up for a long time it will end up synchronize all disk writes for saving the caches. We add jitter to per set writes so they don't sync up and don't hold a lock for the write, since it isn't needed anyway. If an outage prevents writes for a long while we also add individual waits for each disk in case there was a queue. Furthermore limit the number of buffers kept to 2GiB, since this could get huge in large clusters. This will not act as a hard limit but should be enough for normal operation.	2021-02-18 00:38:37 -08:00
Harshavardhana	ffea6fcf09	fix: rename crawler as scanner in config (#11549 )	2021-02-17 12:04:11 -08:00
Krishnan Parthasarathi	b87fae0049	Simplify PutObjReader for plain-text reader usage (#11470 ) This change moves away from a unified constructor for plaintext and encrypted usage. NewPutObjReader is simplified for the plain-text reader use. For encrypted reader use, WithEncryption should be called on an initialized PutObjReader. Plaintext: func NewPutObjReader(rawReader hash.Reader) PutObjReader The hash.Reader is used to provide payload size and md5sum to the downstream consumers. This is different from the previous version in that there is no need to pass nil values for unused parameters. Encrypted: func WithEncryption(encReader hash.Reader, key crypto.ObjectKey) (*PutObjReader, error) This method sets up encrypted reader along with the key to seal the md5sum produced by the plain-text reader (already setup when NewPutObjReader was called). Usage: ``` pReader := NewPutObjReader(rawReader) // ... other object handler code goes here // Prepare the encrypted hashed reader pReader, err = pReader.WithEncryption(encReader, objEncKey) ```	2021-02-10 08:52:50 -08:00
Harshavardhana	e0055609bb	fix: crawler to skip healing the drives in a set being healed (#11274 ) If an erasure set had a drive replacement recently, we don't need to attempt healing on another drive with in the same erasure set - this would ensure we do not double heal the same content and also prioritizes usage for such an erasure set to be calculated sooner.	2021-01-19 02:40:52 -08:00
Harshavardhana	628ef081d1	fix: preserve cache calculated previously while moving from v2 to v3 (#11269 ) This ensures that all the prometheus monitoring and usage trackers to avoid alerts configured, although we cannot support v1 to v2 here - we can v2 to v3.	2021-01-13 09:58:08 -08:00
Klaus Post	51dad1d130	Fix missing GetObjectNInfo Closure (#11243 ) Review for missing Close of returned value from `GetObjectNInfo`. This was often obscured by the stuff that auto-unlocks when reaching EOF.	2021-01-08 10:12:26 -08:00
Harshavardhana	3e1221a01c	fix: log once updating dataUsageCache versions (#11190 ) also reduce usage of *bytes.Buffer for reading `usage-cache.bin`	2020-12-31 09:45:09 -08:00
Klaus Post	4bca62a0bd	crawler: Stream bucket usage cache data (#11068 ) Stream bucket caches to storage and through RPC calls.	2020-12-10 13:03:22 -08:00
Harshavardhana	dc819afa44	fix: auto update crawler meta version PR `038bcd9079` introduced version '3', we need to make sure that we do not print an unexpected error instead log a message to indicate we will auto update the version.	2020-12-08 10:40:51 -08:00
Ritesh H Shukla	038bcd9079	Add replication capacity metrics support in crawler (#10786 )	2020-12-07 13:47:48 -08:00
Klaus Post	e6ea5c2703	crawler: Missing folder heal check per set (#10876 )	2020-12-01 12:07:39 -08:00
Harshavardhana	ca88ca753c	ignore typed errors correctly in list cache layer (#10879 ) bonus write bucket metadata cache with enough quorum Possible fix for #10868	2020-11-12 09:28:56 -08:00
Klaus Post	493c714663	Remove erasureSets and erasureObjects from ObjectLayer (#10442 )	2020-09-10 09:18:19 -07:00
Klaus Post	c097ce9c32	continous healing based on crawler (#10103 ) Design: https://gist.github.com/klauspost/792fe25c315caf1dd15c8e79df124914	2020-08-24 13:47:01 -07:00
Harshavardhana	caad314faa	add ruleguard support, fix all the reported issues (#10335 )	2020-08-24 12:11:20 -07:00
Klaus Post	00d3cc4b69	Enforce quota checks after crawl (#10036 ) Enforce bucket quotas when crawling has finished. This ensures that we will not do quota enforcement on old data. Additionally, delete less if we are closer to quota than we thought.	2020-07-14 18:59:05 -07:00
Klaus Post	43d6e3ae06	merge object lifecycle checks into usage crawler (#9579 )	2020-06-12 10:28:21 -07:00
Harshavardhana	53aaa5d2a5	Export bucket usage counts as part of bucket metrics (#9710 ) Bonus fixes in quota enforcement to use the new datastructure and use timedValue to cache a value/reload automatically avoids one less global variable.	2020-05-27 06:45:43 -07:00
Harshavardhana	498389123e	avoid unnecessary logging on fresh/newly replaced drives (#9470 ) data usage tracker and crawler seem to be logging non-actionable information on console, which is not useful and is fixed on its own in almost all deployments, lets keep this logging to minimal.	2020-04-28 01:16:57 -07:00
Klaus Post	073aac3d92	add data update tracking using bloom filter (#9208 ) By monitoring PUT/DELETE and heal operations it is possible to track changed paths and keep a bloom filter for this data. This can help prioritize paths to scan. The bloom filter can identify paths that have not changed, and the few collisions will only result in a marginal extra workload. This can be implemented on either a bucket+(1 prefix level) with reasonable performance. The bloom filter is set to have a false positive rate at 1% at 1M entries. A bloom table of this size is about ~2500 bytes when serialized. To not force a full scan of all paths that have changed cycle bloom filters would need to be kept, so we guarantee that dirty paths have been scanned within cycle runs. Until cycle bloom filters have been collected all paths are considered dirty.	2020-04-27 10:06:21 -07:00
Harshavardhana	b1a2169dcc	fix: data usage crawler env handling, usage-cache.bin location (#9163 ) canonicalize the ENVs such that we can bring these ENVs as part of the config values, as a subsequent change. - fix location of per bucket usage to `.minio.sys/buckets/<bucket_name>/usage-cache.bin` - fix location of the overall usage in `json` at `.minio.sys/buckets/.usage.json` (avoid conflicts with a bucket named `usage.json` ) - fix location of the overall usage in `msgp` at `.minio.sys/buckets/.usage.bin` (avoid conflicts with a bucket named `usage.bin`	2020-03-19 09:47:47 -07:00
Klaus Post	8d98662633	re-implement data usage crawler to be more efficient (#9075 ) Implementation overview: https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b	2020-03-18 16:19:29 -07:00

22 commits