minio

Author	SHA1	Message	Date
Harshavardhana	14d8a931fe	re-use io.Copy buffers with 32k pools (#13553 ) Borrowed idea from Go's usage of this optimization for ReadFrom() on client side, we should re-use the 32k buffers io.Copy() allocates for generic copy from a reader to writer. the performance increase for reads for really tiny objects is at this range after this change. > * Fastest: +7.89% (+1.3 MiB/s) throughput, +7.89% (+1308.1) obj/s	2021-11-02 08:11:50 -07:00
Harshavardhana	4ed0eb7012	remove double reads updating object metadata (#13542 ) Removes RLock/RUnlock for updating metadata, since we already take a write lock to update metadata, this change removes reading of xl.meta as well as an additional lock, the performance gain should increase 3x theoretically for - PutObjectRetention - PutObjectLegalHold This optimization is mainly for Veeam like workloads that require a certain level of iops from these API calls, we were losing iops.	2021-10-30 08:22:04 -07:00
Klaus Post	421160631a	MakeBucket: Delete leftover buckets on error (#13368 ) In (erasureServerPools).MakeBucketWithLocation deletes the created buckets if any set returns an error. Add `NoRecreate` option, which will not recreate the bucket in `DeleteBucket`, if the operation fails. Additionally use context.Background() for operations we always want to be performed.	2021-10-06 10:24:40 -07:00
Poorna Krishnamoorthy	c4373ef290	Add support for multi site replication (#12880 )	2021-09-18 13:31:35 -07:00
Harshavardhana	35f2552fc5	reduce extra getObjectInfo() calls during ILM transition (#13091 ) * reduce extra getObjectInfo() calls during ILM transition This PR also changes expiration logic to be non-blocking, scanner is now free from additional costs incurred due to slower object layer calls and hitting the drives. * move verifying expiration inside locks	2021-08-27 17:06:47 -07:00
Klaus Post	88d719689c	Synchronize bucket cycle numbers (#13058 ) Synchronize bucket cycles so it is much more likely that the same prefixes will be picked up for scanning. Use the global bloom filter cycle for that. Bump bloom filter versions to clear those.	2021-08-25 08:25:26 -07:00
Harshavardhana	035882d292	fix: remove parentIsObject() check (#12851 ) we will allow situations such as ``` a/b/1.txt a/b ``` and ``` a/b a/b/1.txt ``` we are going to document that this usecase is not supported and we will never support it, if any application does this users have to delete the top level parent to make sure namespace is accessible at lower level. rest of the situations where the prefixes get created across sets are supported as is.	2021-08-03 13:26:57 -07:00
Anis Elleuch	7722b91e1d	s3: Force a prefix removal using a special header (#12504 ) An S3 client can send `x-minio-force-delete: true` to remove a prefix.	2021-06-15 18:43:14 -07:00
Harshavardhana	fdc2020b10	move to iam, bucket policy from minio/pkg (#12400 )	2021-05-29 21:16:42 -07:00
Harshavardhana	1aa5858543	move madmin to github.com/minio/madmin-go (#12239 )	2021-05-06 08:52:02 -07:00
Harshavardhana	4eb9b6eaf8	preserve metadata multipart restore (#12139 ) avoid re-read of xl.meta instead just use the success criteria from PutObjectPart() and check the ETag matches per Part, if they match then the parts have been successfully restored as is. Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-24 19:07:27 -07:00
Krishnan Parthasarathi	c829e3a13b	Support for remote tier management (#12090 ) With this change, MinIO's ILM supports transitioning objects to a remote tier. This change includes support for Azure Blob Storage, AWS S3 compatible object storage incl. MinIO and Google Cloud Storage as remote tier storage backends. Some new additions include: - Admin APIs remote tier configuration management - Simple journal to track remote objects to be 'collected' This is used by object API handlers which 'mutate' object versions by overwriting/replacing content (Put/CopyObject) or removing the version itself (e.g DeleteObjectVersion). - Rework of previous ILM transition to fit the new model In the new model, a storage class (a.k.a remote tier) is defined by the 'remote' object storage type (one of s3, azure, GCS), bucket name and a prefix. * Fixed bugs, review comments, and more unit-tests - Leverage inline small object feature - Migrate legacy objects to the latest object format before transitioning - Fix restore to particular version if specified - Extend SharedDataDirCount to handle transitioned and restored objects - Restore-object should accept version-id for version-suspended bucket (#12091) - Check if remote tier creds have sufficient permissions - Bonus minor fixes to existing error messages Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Krishna Srinivas <krishna@minio.io> Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	d46386246f	api: Introduce metadata update APIs to update only metadata (#11962 ) Current implementation heavily relies on readAllFileInfo but with the advent of xl.meta inlined with data, we cannot easily avoid reading data when we are only interested is updating metadata, this leads to invariably write amplification during metadata updates, repeatedly reading data when we are only interested in updating metadata. This PR ensures that we implement a metadata only update API at storage layer, that handles updates to metadata alone for any given version - given the version is valid and present. This helps reduce the chattiness for following calls.. - PutObjectTags - DeleteObjectTags - PutObjectLegalHold - PutObjectRetention - ReplicateObject (updates metadata on replication status)	2021-04-04 13:32:31 -07:00
Poorna Krishnamoorthy	47c09a1e6f	Various improvements in replication (#11949 ) - collect real time replication metrics for prometheus. - add pending_count, failed_count metric for total pending/failed replication operations. - add API to get replication metrics - add MRF worker to handle spill-over replication operations - multiple issues found with replication - fixes an issue when client sends a bucket name with `/` at the end from SetRemoteTarget API call make sure to trim the bucket name to avoid any extra `/`. - hold write locks in GetObjectNInfo during replication to ensure that object version stack is not overwritten while reading the content. - add additional protection during WriteMetadata() to ensure that we always write a valid FileInfo{} and avoid ever writing empty FileInfo{} to the lowest layers. Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-04-03 09:03:42 -07:00
Anis Elleuch	eac66e67ec	Use maximum parity for config files (#11740 ) Some deployments have low parity (EC:2), but we really do not need to save our config data with the same parity configuration. N/2 would be better to keep MinIO configurations intact when unexpected a number of drives fail.	2021-03-09 10:19:47 -08:00
Klaus Post	fa9cf1251b	Imporve healing and reporting (#11312 ) * Provide information on actively healing, buckets healed/queued, objects healed/failed. * Add concurrent healing of multiple sets (typically on startup). * Add bucket level resume, so restarts will only heal non-healed buckets. * Print summary after healing a disk is done.	2021-03-04 14:36:23 -08:00
Harshavardhana	c6a120df0e	fix: Prometheus metrics to re-use storage disks (#11647 ) also re-use storage disks for all `mc admin server info` calls as well, implement a new LocalStorageInfo() API call at ObjectLayer to lookup local disks storageInfo also fixes bugs where there were double calls to StorageInfo()	2021-03-02 17:28:04 -08:00
Harshavardhana	9171d6ef65	rename all references from crawl -> scanner (#11621 )	2021-02-26 15:11:42 -08:00
Andreas Auernhammer	1f659204a2	remove GetObject from ObjectLayer interface (#11635 ) This commit removes the `GetObject` method from the `ObjectLayer` interface. The `GetObject` method is not longer used by the HTTP handlers implementing the high-level S3 semantics. Instead, they use the `GetObjectNInfo` method which returns both, an object handle as well as the object metadata. Therefore, it is no longer necessary that a concrete `ObjectLayer` implements `GetObject`.	2021-02-26 09:52:02 -08:00
Krishna Srinivas	876b79b8d8	read-health check endpoint returns success if cluster can serve read requests (#11310 )	2021-02-09 01:00:44 -08:00
Anis Elleuch	e96fdcd5ec	tagging: Add event notif for PUT object tagging (#11366 ) An optimization to avoid double calling for during PutObject tagging	2021-02-01 13:52:51 -08:00
Poorna Krishnamoorthy	fd3f02637a	fix: replication regression due to proxying requests (#11356 ) In PR #11165 due to incorrect proxying for 2 way replication even when the object was not yet replicated Additionally, fix metadata comparisons when deciding to do full replication vs metadata copy. fixes #11340	2021-01-27 11:22:34 -08:00
Harshavardhana	a6c146bd00	validate storage class across pools when setting config (#11320 ) ``` mc admin config set alias/ storage_class standard=EC:3 ``` should only succeed if parity ratio is valid for all server pools, if not we should fail proactively. This PR also needs to bring other changes now that we need to cater for variadic drive counts per pool. Bonus fixes also various bugs reproduced with - GetObjectWithPartNumber() - CopyObjectPartWithOffsets() - CopyObjectWithMetadata() - PutObjectPart,PutObject with truncated streams	2021-01-22 12:09:24 -08:00
Ritesh H Shukla	b4add82bb6	Updated Prometheus metrics (#11141 ) * Add metrics for nodes online and offline * Add cluster capacity metrics * Introduce v2 metrics	2021-01-18 20:35:38 -08:00
Harshavardhana	4315f93421	fix: make sure parentDirIsObject is used at set level (#11280 ) parentDirIsObject is not using set level understanding to check for parent objects, without this it can lead to objects that can actually reside on a separate set as objects and would conflict.	2021-01-17 01:11:48 -08:00
Poorna Krishnamoorthy	7824e19d20	Allow synchronous replication if enabled. (#11165 ) Synchronous replication can be enabled by setting the --sync flag while adding a remote replication target. This PR also adds proxying on GET/HEAD to another node in a active-active replication setup in the event of a 404 on the current node.	2021-01-11 22:36:51 -08:00
Harshavardhana	e7ae49f9c9	fix: calculate prometheus disks_offline/disks_total correctly (#11215 ) fixes #11196	2021-01-04 09:42:09 -08:00
Harshavardhana	027e17468a	fix: discarding results do not attempt in-memory metacache writer (#11163 ) Optimizations include - do not write the metacache block if the size of the block is '0' and it is the first block - where listing is attempted for a transient prefix, this helps to avoid creating lots of empty metacache entries for `minioMetaBucket` - avoid the entire initialization sequence of cacheCh , metacacheBlockWriter if we are simply going to skip them when discardResults is set to true. - No need to hold write locks while writing metacache blocks - each block is unique, per bucket, per prefix and also is written by a single node.	2020-12-24 15:02:02 -08:00
Anis Elleuch	2ecaab55a6	admin: ServerInfo returns info without object layer initialized (#11142 )	2020-12-21 09:35:19 -08:00
Harshavardhana	8368ab76aa	fix: remove the requirement for healing buckets in ListBucketsHeal (#11098 ) With new refactor of bucket healing, healing bucket happens automatically including its metadata, there is no need to redundant heal buckets also in ListBucketsHeal remove it.	2020-12-14 12:07:07 -08:00
Harshavardhana	2eb52ca5f4	fix: heal bucket metadata right before healing bucket (#11097 ) optimization mainly to avoid listing the entire `.minio.sys/buckets/.minio.sys` directory, this can get really huge and comes in the way of startup routines, contents inside `.minio.sys/buckets/.minio.sys` are rather transient and not necessary to be healed.	2020-12-13 11:57:08 -08:00
Harshavardhana	9c53cc1b83	fix: heal multiple buckets in bulk (#11029 ) makes server startup, orders of magnitude faster with large number of buckets	2020-12-05 13:00:44 -08:00
Poorna Krishnamoorthy	1ebf6f146a	Add support for ILM transition (#10565 ) This PR adds transition support for ILM to transition data to another MinIO target represented by a storage class ARN. Subsequent GET or HEAD for that object will be streamed from the transition tier. If PostRestoreObject API is invoked, the transitioned object can be restored for duration specified to the source cluster.	2020-11-19 18:47:17 -08:00
Harshavardhana	9a34fd5c4a	Revert "Revert "Add delete marker replication support (#10396 )"" This reverts commit `267d7bf0a9`.	2020-11-19 18:43:58 -08:00
Harshavardhana	267d7bf0a9	Revert "Add delete marker replication support (#10396 )" This reverts commit `50c10a5087`. PR is moved to origin/dev branch	2020-11-12 11:43:14 -08:00
Poorna Krishnamoorthy	50c10a5087	Add delete marker replication support (#10396 ) Delete marker replication is implemented for V2 configuration specified in AWS spec (though AWS allows it only in the V1 configuration). This PR also brings in a MinIO only extension of replicating permanent deletes, i.e. deletes specifying version id are replicated to target cluster.	2020-11-10 15:24:14 -08:00
Klaus Post	2294e53a0b	Don't retain context in locker (#10515 ) Use the context for internal timeouts, but disconnect it from outgoing calls so we always receive the results and cancel it remotely.	2020-11-04 08:25:42 -08:00
Harshavardhana	b686bb9c83	fix: replaced drive properly by healing the entire drive (#10799 ) Bonus fixes, we do not need reload format anymore as the replaced drive is healed locally we only need to ensure that drive heal reloads the drive properly. We preserve the UUID of the original order, this means that the replacement in `format.json` doesn't mean that the drive needs to be reloaded into memory anymore. fixes #10791	2020-10-31 01:34:48 -07:00
Harshavardhana	0104af6bcc	delayed locks until we have started reading the body (#10474 ) This is to ensure that Go contexts work properly, after some interesting experiments I found that Go net/http doesn't cancel the context when Body is non-zero and hasn't been read till EOF. The following gist explains this, this can lead to pile up of go-routines on the server which will never be canceled and will die at a really later point in time, which can simply overwhelm the server. https://gist.github.com/harshavardhana/c51dcfd055780eaeb71db54f9c589150 To avoid this refactor the locking such that we take locks after we have started reading from the body and only take locks when needed. Also, remove contextReader as it's not useful, doesn't work as expected context is not canceled until the body reaches EOF so there is no point in wrapping it with context and putting a `select {` on it which can unnecessarily increase the CPU overhead. We will still use the context to cancel the lockers etc. Additional simplification in the locker code to avoid timers as re-using them is a complicated ordeal avoid them in the hot path, since locking is very common this may avoid lots of allocations.	2020-09-14 15:57:13 -07:00
Harshavardhana	37da0c647e	fix: delete marker compatibility behavior for suspended bucket (#10395 ) - delete-marker should be created on a suspended bucket as `null` - delete-marker should delete any pre-existing `null` versioned object and create an entry `null`	2020-09-02 00:19:03 -07:00
Harshavardhana	a20d4568a2	fix: make sure to use uniform drive count calculation (#10208 ) It is possible in situations when server was deployed in asymmetric configuration in the past such as ``` minio server ~/fs{1...4}/disk{1...5} ``` Results in setDriveCount of 10 in older releases but with fairly recent releases we have moved to having server affinity which means that a set drive count ascertained from above config will be now '4' While the object layer make sure that we honor `format.json` the storageClass configuration however was by mistake was using the global value obtained by heuristics. Which leads to prematurely using lower parity without being requested by the an administrator. This PR fixes this behavior.	2020-08-05 13:31:12 -07:00
Harshavardhana	ec06089eda	fix: re-implement cluster healthcheck (#10101 )	2020-07-20 18:31:22 -07:00
Harshavardhana	2955aae8e4	feat: Add notification support for bucketCreates and removal (#10075 )	2020-07-20 12:52:49 -07:00
Harshavardhana	14b1c9f8e4	fix: return Range errors after If-Matches (#10045 ) closes #7292	2020-07-17 13:01:22 -07:00
Anis Elleuch	778e9c864f	Move dependency from minio-go v6 to v7 (#10042 )	2020-07-14 09:38:05 -07:00
Harshavardhana	2d17c16d93	fix: make sure to honor versioning from browser UI deletes (#10016 )	2020-07-10 22:21:04 -07:00
Harshavardhana	2743d4ca87	fix: Add support for preserving mtime for replication (#9995 ) This PR is needed for bucket replication support	2020-07-08 17:36:56 -07:00
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	2020-06-12 20:04:01 -07:00
Harshavardhana	5686a7e273	fix NAS gateway support for policy/notification (#9765 ) Fixes #9764	2020-06-03 13:18:54 -07:00

1 2 3

117 commits