minio

Author	SHA1	Message	Date
Harshavardhana	bb639d9f29	remove double reads delete versions (#13544 ) deleting collection of versions belonging to same object, we can avoid re-reading the xl.meta from the disk instead purge all the requested versions in-memory, the tradeoff is to allocate a map to de-dup the versions, allow disks to be read only once per object. additionally reduce the data transfer between nodes by shortening msgp data values.	2021-11-01 10:50:07 -07:00
Poorna Krishnamoorthy	c4373ef290	Add support for multi site replication (#12880 )	2021-09-18 13:31:35 -07:00
Klaus Post	24722ddd02	Remove inline data hack (#12946 ) move the code down to the storage layer, this logic decouples the inline data from the size parameter making it flexible and future proof.	2021-08-13 08:25:54 -07:00
Harshavardhana	40a2fa8e81	fix: add more optimizations to putMetacacheObject() (#12916 ) - avoid extra lookup for 'xl.meta' since we are definitely sure that it doesn't exist. - use this in newMultipartUpload() as well - also additionally do not write with O_DSYNC to avoid loading the drives, instead create 'xl.meta' for listing operations without O_DSYNC since these are ephemeral objects. - do the same with newMultipartUpload() since it gets synced when the PutObjectPart() is attempted, we do not need to tax newMultipartUpload() instead.	2021-08-10 11:12:22 -07:00
Krishnan Parthasarathi	29eea52e14	Skip transitioning of object versions if inlined (#12705 )	2021-07-16 09:38:27 -07:00
Aditya Manthramurthy	30a3921d3e	[Tiering] Support remote tiers with object versioning (#12342 ) - Adds versioning support for S3 based remote tiers that have versioning enabled. This ensures that when reading or deleting we specify the specific version ID of the object. In case of deletion, this is important to ensure that the object version is actually deleted instead of simply being marked for deletion. - Stores the remote object's version id in the tier-journal. Tier-journal file version is not bumped up as serializing the new struct version is compatible with old journals without the remote object version id. - `storageRESTVersion` is bumped up as FileInfo struct now includes a `TransitionRemoteVersionID` member. - Azure and GCS support for this feature will be added subsequently. Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>	2021-06-03 14:26:51 -07:00
Nitish Tiwari	776589f0da	Add free inode metric for Prometheus (#12225 )	2021-05-06 12:50:48 -07:00
Krishnan Parthasarathi	c829e3a13b	Support for remote tier management (#12090 ) With this change, MinIO's ILM supports transitioning objects to a remote tier. This change includes support for Azure Blob Storage, AWS S3 compatible object storage incl. MinIO and Google Cloud Storage as remote tier storage backends. Some new additions include: - Admin APIs remote tier configuration management - Simple journal to track remote objects to be 'collected' This is used by object API handlers which 'mutate' object versions by overwriting/replacing content (Put/CopyObject) or removing the version itself (e.g DeleteObjectVersion). - Rework of previous ILM transition to fit the new model In the new model, a storage class (a.k.a remote tier) is defined by the 'remote' object storage type (one of s3, azure, GCS), bucket name and a prefix. * Fixed bugs, review comments, and more unit-tests - Leverage inline small object feature - Migrate legacy objects to the latest object format before transitioning - Fix restore to particular version if specified - Extend SharedDataDirCount to handle transitioned and restored objects - Restore-object should accept version-id for version-suspended bucket (#12091) - Check if remote tier creds have sufficient permissions - Bonus minor fixes to existing error messages Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Krishna Srinivas <krishna@minio.io> Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Anis Elleuch	0eb146e1b2	add additional metrics per disk API latency, API call counts #11250 ) ``` mc admin info --json ``` provides these details, for now, we shall eventually expose this at Prometheus level eventually. Co-authored-by: Harshavardhana <harsha@minio.io>	2021-03-16 20:06:57 -07:00
Harshavardhana	9ccc483df6	[feat]: change erasure coding default block size from 10MiB to 1MiB (#11721 ) major performance improvements in range GETs to avoid large read amplification when ranges are tiny and random ``` ------------------- Operation: GET Operations: 142014 -> 339421 Duration: 4m50s -> 4m56s * Average: +139.41% (+1177.3 MiB/s) throughput, +139.11% (+658.4) obj/s * Fastest: +125.24% (+1207.4 MiB/s) throughput, +132.32% (+612.9) obj/s * 50% Median: +139.06% (+1175.7 MiB/s) throughput, +133.46% (+660.9) obj/s * Slowest: +203.40% (+1267.9 MiB/s) throughput, +198.59% (+753.5) obj/s ``` TTFB from 10MiB BlockSize ``` * First Access TTFB: Avg: 81ms, Median: 61ms, Best: 20ms, Worst: 2.056s ``` TTFB from 1MiB BlockSize ``` * First Access TTFB: Avg: 22ms, Median: 21ms, Best: 8ms, Worst: 91ms ``` Full object reads however do see a slight change which won't be noticeable in real world, so not doing any comparisons TTFB still had improvements with full object reads with 1MiB ``` * First Access TTFB: Avg: 68ms, Median: 35ms, Best: 11ms, Worst: 1.16s ``` v/s TTFB with 10MiB ``` * First Access TTFB: Avg: 388ms, Median: 98ms, Best: 20ms, Worst: 4.156s ``` This change should affect all new uploads, previous uploads should continue to work with business as usual. But dramatic improvements can be seen with these changes.	2021-03-06 14:09:34 -08:00
Harshavardhana	b3c56b53fb	fix: metacache should only rename entries during cleanup (#11503 ) To avoid large delays in metacache cleanup, use rename instead of recursive delete calls, renames are cheaper move the content to minioMetaTmpBucket and then cleanup this folder once in 24hrs instead. If the new cache can replace an existing one, we should let it replace since that is currently being saved anyways, this avoids pile up of 1000's of metacache entires for same listing calls that are not necessary to be stored on disk.	2021-02-11 10:22:03 -08:00
Anis Elleuch	65aa2bc614	ilm: Remove object in HEAD/GET if having an applicable ILM rule (#11296 ) Remove an object on the fly if there is a lifecycle rule with delete expiry action for the corresponding object.	2021-02-01 09:52:11 -08:00
Anis Elleuch	e9ac7b0fb7	heal: Remove empty directories (#11354 ) Since the introduction of __XLDIR__, an empty directory does not have a meaning anymore in erasure mode. Make healing removes it wherever it finds it.	2021-01-27 02:19:28 -08:00
Harshavardhana	f21d650ed4	fix: readData in bulk call using messagepack byte wrappers (#11228 ) This PR refactors the way we use buffers for O_DIRECT and to re-use those buffers for messagepack reader writer. After some extensive benchmarking found that not all objects have this benefit, and only objects smaller than 64KiB see this benefit overall. Benefits are seen from almost all objects from 1KiB - 32KiB Beyond this no objects see benefit with bulk call approach as the latency of bytes sent over the wire v/s streaming content directly from disk negate each other with no remarkable benefits. All other optimizations include reuse of msgp.Reader, msgp.Writer using sync.Pool's for all internode calls.	2021-01-07 19:27:31 -08:00
Harshavardhana	d0027c3c41	do not use large buffers if not necessary (#11220 ) without this change, there is a performance regression for small objects GETs, this makes the overall speed to go back to pre '59d363' commit days.	2021-01-04 18:51:52 -08:00
Harshavardhana	c4131c2798	feat: Small object optimization read data in single bulk call (#11207 )	2021-01-03 11:27:57 -08:00
Harshavardhana	c19e6ce773	avoid a crash in crawler when lifecycle is not initialized (#11170 ) Bonus for static buffers use bytes.NewReader instead of bytes.NewBuffer, to use a more reader friendly implementation	2020-12-26 22:58:06 -08:00
Poorna Krishnamoorthy	1ebf6f146a	Add support for ILM transition (#10565 ) This PR adds transition support for ILM to transition data to another MinIO target represented by a storage class ARN. Subsequent GET or HEAD for that object will be streamed from the transition tier. If PostRestoreObject API is invoked, the transitioned object can be restored for duration specified to the source cluster.	2020-11-19 18:47:17 -08:00
Harshavardhana	9a34fd5c4a	Revert "Revert "Add delete marker replication support (#10396 )"" This reverts commit `267d7bf0a9`.	2020-11-19 18:43:58 -08:00
Harshavardhana	267d7bf0a9	Revert "Add delete marker replication support (#10396 )" This reverts commit `50c10a5087`. PR is moved to origin/dev branch	2020-11-12 11:43:14 -08:00
Poorna Krishnamoorthy	50c10a5087	Add delete marker replication support (#10396 ) Delete marker replication is implemented for V2 configuration specified in AWS spec (though AWS allows it only in the V1 configuration). This PR also brings in a MinIO only extension of replicating permanent deletes, i.e. deletes specifying version id are replicated to target cluster.	2020-11-10 15:24:14 -08:00
Harshavardhana	1a1f00fa15	fix: use internode data for DisksInfo, VolsInfo in message pack (#10821 ) Similar to #10775 for fewer memory allocations, since we use getOnlineDisks() extensively for listing we should optimize it further. Additionally, remove all unused walkers from the storage layer	2020-11-04 10:10:54 -08:00
Klaus Post	37749f4623	Optimize FileInfo(Version) transfer (#10775 ) File Info decoding, in particular, is showing up as a major allocator and time consumer for internode data transfers Switch to message pack for cross-server transfers: ``` MSGP: Size: 945 bytes BenchmarkEncodeFileInfoMsgp-32 1558444 866 ns/op 1.16 MB/s 0 B/op 0 allocs/op BenchmarkDecodeFileInfoMsgp-32 479968 2487 ns/op 0.40 MB/s 848 B/op 18 allocs/op GOB: Size: 1409 bytes BenchmarkEncodeFileInfoGOB-32 333339 3237 ns/op 0.31 MB/s 576 B/op 19 allocs/op BenchmarkDecodeFileInfoGOB-32 20869 57837 ns/op 0.02 MB/s 16439 B/op 428 allocs/op ```	2020-11-02 17:07:52 -08:00
Klaus Post	a982baff27	ListObjects Metadata Caching (#10648 ) Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c Gist of improvements: * Cross-server caching and listing will use the same data across servers and requests. * Lists can be arbitrarily resumed at a constant speed. * Metadata for all files scanned is stored for streaming retrieval. * The existing bloom filters controlled by the crawler is used for validating caches. * Concurrent requests for the same data (or parts of it) will not spawn additional walkers. * Listing a subdirectory of an existing recursive cache will use the cache. * All listing operations are fully streamable so the number of objects in a bucket no longer dictates the amount of memory. * Listings can be handled by any server within the cluster. * Caches are cleaned up when out of date or superseded by a more recent one.	2020-10-28 09:18:35 -07:00
Harshavardhana	17747db93f	fix: support healing older content (#10076 ) This PR adds support for healing older content i.e from 2yrs, 1yr. Also handles other situations where our config was not encrypted yet. This PR also ensures that our Listing is consistent and quorum friendly, such that we don't list partial objects	2020-07-17 17:41:29 -07:00
Anis Elleuch	21a37e3393	fix: ListObjectVersions should return ordered Version & DeleteMarker (#9959 ) The S3 specification says that versions are ordered in the response of list object versions. mc snapshot needs this to know which version comes first especially when two versions have the same exact last-modified field.	2020-07-03 09:15:44 -07:00
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	2020-06-12 20:04:01 -07:00
poornas	3d3b75fb8d	Avoid overwriting object tags when changing lock (#9794 )	2020-06-10 08:16:30 -07:00
Anis Elleuch	48e614b167	honor lifecycle expiration with tag rule (#9604 )	2020-05-14 16:21:03 -07:00
Harshavardhana	23a8411732	Add a generic Walk()'er to list a bucket, optinally prefix (#9026 ) This generic Walk() is used by likes of Lifecyle, or KMS to rotate keys or any other functionality which relies on this functionality.	2020-02-25 21:22:28 +05:30
Harshavardhana	b3f22eac56	Offload listing to posix layer (#7611 ) This PR adds one API WalkCh which sorts and sends list over the network Each disk walks independently in a sorted manner.	2019-05-14 13:49:10 -07:00
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	2019-04-09 11:39:42 -07:00
Anis Elleuch	380d6c6435	Use getObjectInfo() in both FS and XL ListObjects() to simplify and to return complete object information (#3247 )	2016-11-13 11:48:02 -08:00
Harshavardhana	bccf549463	server: Move all the top level files into cmd folder. (#2490 ) This change brings a change which was done for the 'mc' package to allow for clean repo and have a cleaner github drop in experience.	2016-08-18 16:23:42 -07:00

35 commits