Commit graph

81 commits

Author SHA1 Message Date
Harshavardhana 83ed1f361b fix: make sure parentDirIsObject is used at set level (#11280)
parentDirIsObject is not using set level understanding
to check for parent objects, without this it can lead to
objects that can actually reside on a separate set as
objects and would conflict.
2021-07-29 09:02:52 -07:00
Klaus Post 2439d4fb3c Don't retain context in locker (#10515)
Use the context for internal timeouts, but disconnect it from outgoing
calls so we always receive the results and cancel it remotely.
2020-11-04 10:08:58 -08:00
Harshavardhana 0570c21671 fix: replaced drive properly by healing the entire drive
Bonus fixes, we do not need reload format anymore
as the replaced drive is healed locally we only need
to ensure that drive heal reloads the drive properly.

We preserve the UUID of the original order, this means
that the replacement in `format.json` doesn't mean that
the drive needs to be reloaded into memory anymore.

fixes #10791
2020-10-31 00:30:14 -07:00
Harshavardhana 0104af6bcc
delayed locks until we have started reading the body (#10474)
This is to ensure that Go contexts work properly, after some
interesting experiments I found that Go net/http doesn't
cancel the context when Body is non-zero and hasn't been
read till EOF.

The following gist explains this, this can lead to pile up
of go-routines on the server which will never be canceled
and will die at a really later point in time, which can
simply overwhelm the server.

https://gist.github.com/harshavardhana/c51dcfd055780eaeb71db54f9c589150

To avoid this refactor the locking such that we take locks after we
have started reading from the body and only take locks when needed.

Also, remove contextReader as it's not useful, doesn't work as expected
context is not canceled until the body reaches EOF so there is no point
in wrapping it with context and putting a `select {` on it which
can unnecessarily increase the CPU overhead.

We will still use the context to cancel the lockers etc.
Additional simplification in the locker code to avoid timers
as re-using them is a complicated ordeal avoid them in
the hot path, since locking is very common this may avoid
lots of allocations.
2020-09-14 15:57:13 -07:00
Harshavardhana 37da0c647e
fix: delete marker compatibility behavior for suspended bucket (#10395)
- delete-marker should be created on a suspended bucket as `null`
- delete-marker should delete any pre-existing `null` versioned
  object and create an entry `null`
2020-09-02 00:19:03 -07:00
Harshavardhana a20d4568a2
fix: make sure to use uniform drive count calculation (#10208)
It is possible in situations when server was deployed
in asymmetric configuration in the past such as

```
minio server ~/fs{1...4}/disk{1...5}
```

Results in setDriveCount of 10 in older releases
but with fairly recent releases we have moved to
having server affinity which means that a set drive
count ascertained from above config will be now '4'

While the object layer make sure that we honor
`format.json` the storageClass configuration however
was by mistake was using the global value obtained
by heuristics. Which leads to prematurely using
lower parity without being requested by the an
administrator.

This PR fixes this behavior.
2020-08-05 13:31:12 -07:00
Harshavardhana ec06089eda
fix: re-implement cluster healthcheck (#10101) 2020-07-20 18:31:22 -07:00
Harshavardhana 2955aae8e4
feat: Add notification support for bucketCreates and removal (#10075) 2020-07-20 12:52:49 -07:00
Harshavardhana 14b1c9f8e4
fix: return Range errors after If-Matches (#10045)
closes #7292
2020-07-17 13:01:22 -07:00
Anis Elleuch 778e9c864f
Move dependency from minio-go v6 to v7 (#10042) 2020-07-14 09:38:05 -07:00
Harshavardhana 2d17c16d93
fix: make sure to honor versioning from browser UI deletes (#10016) 2020-07-10 22:21:04 -07:00
Harshavardhana 2743d4ca87
fix: Add support for preserving mtime for replication (#9995)
This PR is needed for bucket replication support
2020-07-08 17:36:56 -07:00
Harshavardhana 4915433bd2
Support bucket versioning (#9377)
- Implement a new xl.json 2.0.0 format to support,
  this moves the entire marshaling logic to POSIX
  layer, top layer always consumes a common FileInfo
  construct which simplifies the metadata reads.
- Implement list object versions
- Migrate to siphash from crchash for new deployments
  for object placements.

Fixes #2111
2020-06-12 20:04:01 -07:00
Harshavardhana 5686a7e273
fix NAS gateway support for policy/notification (#9765)
Fixes #9764
2020-06-03 13:18:54 -07:00
Harshavardhana b2db8123ec
Preserve errors returned by diskInfo to detect disk errors (#9727)
This PR basically reverts #9720 and re-implements it differently
2020-05-28 13:03:04 -07:00
Harshavardhana b330c2c57e
Introduce simpler GetMultipartInfo call for performance (#9722)
Advantages avoids 100's of stats which are needed for each
upload operation in FS/NAS gateway mode when uploading a large
multipart object, dramatically increases performance for
multipart uploads by avoiding recursive calls.

For other gateway's simplifies the approach since
azure, gcs, hdfs gateway's don't capture any specific
metadata during upload which needs handler validation
for encryption/compression.

Erasure coding was already optimized, additionally
just avoids small allocations of large data structure.

Fixes #7206
2020-05-28 12:36:20 -07:00
P R 3f6d624c7b
add gateway object tagging support (#9124) 2020-05-23 11:09:35 -07:00
Harshavardhana bd032d13ff
migrate all bucket metadata into a single file (#9586)
this is a major overhaul by migrating off all
bucket metadata related configs into a single
object '.metadata.bin' this allows us for faster
bootups across 1000's of buckets and as well
as keeps the code simple enough for future
work and additions.

Additionally also fixes #9396, #9394
2020-05-19 13:53:54 -07:00
Harshavardhana a1de9cec58
cleanup object-lock/bucket tagging for gateways (#9548)
This PR is to ensure that we call the relevant object
layer APIs for necessary S3 API level functionalities
allowing gateway implementations to return proper
errors as NotImplemented{}

This allows for all our tests in mint to behave
appropriately and can be handled appropriately as
well.
2020-05-08 13:44:44 -07:00
Bala FA 3773874cd3
add bucket tagging support (#9389)
This patch also simplifies object tagging support
2020-05-05 14:18:13 -07:00
Klaus Post 073aac3d92
add data update tracking using bloom filter (#9208)
By monitoring PUT/DELETE and heal operations it is possible
to track changed paths and keep a bloom filter for this data. 

This can help prioritize paths to scan. The bloom filter can identify
paths that have not changed, and the few collisions will only result
in a marginal extra workload. This can be implemented on either a
bucket+(1 prefix level) with reasonable performance.

The bloom filter is set to have a false positive rate at 1% at 1M 
entries. A bloom table of this size is about ~2500 bytes when serialized.

To not force a full scan of all paths that have changed cycle bloom
filters would need to be kept, so we guarantee that dirty paths have
been scanned within cycle runs. Until cycle bloom filters have been
collected all paths are considered dirty.
2020-04-27 10:06:21 -07:00
Harshavardhana 282c9f790a
fix: validate partNumber in queryParam as part of preConditions (#9386) 2020-04-20 22:01:59 -07:00
Bala FA 2c3e34f001
add force delete option of non-empty bucket (#9166)
passing HTTP header `x-minio-force-delete: true` would 
allow standard S3 API DeleteBucket to delete a non-empty
bucket forcefully.
2020-03-27 21:52:59 -07:00
Anis Elleuch db2155551a
heal: Pass scan mode to HealObjects to deep scan full quorum objects (#9159)
As an optimization of the healing, HealObjects() avoid sending an
object to the background healing subsystem when the object is
present in all disks.

However, HealObjects() should have checked the scan type, if this
deep, always pass the object to the healing subsystem.
2020-03-18 17:50:00 -07:00
Klaus Post 8d98662633
re-implement data usage crawler to be more efficient (#9075)
Implementation overview: 

https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b
2020-03-18 16:19:29 -07:00
Harshavardhana 23a8411732
Add a generic Walk()'er to list a bucket, optinally prefix (#9026)
This generic Walk() is used by likes of Lifecyle, or
KMS to rotate keys or any other functionality which
relies on this functionality.
2020-02-25 21:22:28 +05:30
Harshavardhana ab7d3cd508
fix: Speed up multi-object delete by taking bulk locks (#8974)
Change distributed locking to allow taking bulk locks
across objects, reduces usually 1000 calls to 1.

Also allows for situations where multiple clients sends
delete requests to objects with following names

```
{1,2,3,4,5}
```

```
{5,4,3,2,1}
```

will block and ensure that we do not fail the request
on each other.
2020-02-21 11:29:57 +05:30
Anis Elleuch d4dcf1d722
metrics: Use StorageInfo() instead to have consistent info (#9006)
Metrics used to have its own code to calculate offline disks.
StorageInfo() was avoided because it is an expensive operation
by sending calls to all nodes.

To make metrics & server info share the same code, a new
argument `local` is added to StorageInfo() so it will only
query local disks when needed.

Metrics now calls StorageInfo() as server info handler does
but with the local flag set to false.

Co-authored-by: Praveen raj Mani <praveen@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2020-02-20 09:21:33 +05:30
Krishnan Parthasarathi 026265f8f7
Add support for bucket encryption feature (#8890)
- pkg/bucket/encryption provides support for handling bucket 
  encryption configuration
- changes under cmd/ provide support for AES256 algorithm only

Co-Authored-By: Poorna  <poornas@users.noreply.github.com>
Co-authored-by: Harshavardhana <harsha@minio.io>
2020-02-05 15:12:34 +05:30
Klaus Post 9990464cd5
Fix recursive deep scan of buckets (#8900) 2020-01-30 17:20:07 +05:30
Harshavardhana f98616dce7
heal: Optimize heal listing by avoiding batches (#8901)
Also limit the heal per object if there is incoming
requests by suspending heal for longer periods of time.
2020-01-29 12:05:44 +05:30
Harshavardhana 0cbebf0f57 Rename pkg/{tagging,lifecycle} to pkg/bucket sub-directory (#8892)
Rename to allow for more such features to come in a more
proper hierarchical manner.
2020-01-27 14:12:34 -08:00
Harshavardhana f14f60a487 fix: Avoid double usage calculation on every restart (#8856)
On every restart of the server, usage was being
calculated which is not useful instead wait for
sufficient time to start the crawling routine.

This PR also avoids lots of double allocations
through strings, optimizes usage of string builders
and also avoids crawling through symbolic links.

Fixes #8844
2020-01-21 14:07:49 -08:00
Nitish Tiwari 61c17c8933 Add ObjectTagging Support (#8754)
This PR adds support for AWS S3 ObjectTagging API as explained here
https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html
2020-01-20 08:45:59 -08:00
Praveen raj Mani 5d09233115 Fix Readiness check (#8681)
- Remove goroutine-check in Readiness check
- Bring in quorum check for readiness

Fixes #8385

Co-authored-by: Harshavardhana <harsha@minio.io>
2019-12-28 22:24:43 +05:30
Nitish Tiwari 3df7285c3c Add Support for Cache and S3 related metrics in Prometheus endpoint (#8591)
This PR adds support below metrics

- Cache Hit Count
- Cache Miss Count
- Data served from Cache (in Bytes)
- Bytes received from AWS S3
- Bytes sent to AWS S3
- Number of requests sent to AWS S3

Fixes #8549
2019-12-05 23:16:06 -08:00
poornas ca96560d56 Add object retention at the per object (#8528)
level - this PR builds on #8120 which
added PutBucketObjectLockConfiguration and
GetBucketObjectLockConfiguration APIS

This PR implements PutObjectRetention,
GetObjectRetention API and enhances
PUT and GET API operations to display
governance metadata if permissions allow.
2019-11-20 13:18:09 -08:00
Harshavardhana e9b2bf00ad Support MinIO to be deployed on more than 32 nodes (#8492)
This PR implements locking from a global entity into
a more localized set level entity, allowing for locks
to be held only on the resources which are writing
to a collection of disks rather than a global level.

In this process this PR also removes the top-level
limit of 32 nodes to an unlimited number of nodes. This
is a precursor change before bring in bucket expansion.
2019-11-13 12:17:45 -08:00
Krishnan Parthasarathi 559a59220e Add initial support for bucket lifecycle (#7563)
This PR is based off @sinhaashish's PR for object lifecycle
management, which includes support only for,
- Expiration of object
- Filter using object prefix (_not_ object tags)

N B the code for actual expiration of objects will be included in a
subsequent PR.
2019-07-19 21:20:33 +01:00
Anis Elleuch 7abadfccc2 Add self-healing feature (#7604)
- Background Heal routine receives heal requests from a channel, either to
heal format, buckets or objects
- Daily sweeper lists all objects in all buckets, these objects
don't necessarly have read quorum so they can be removed if
these objects are unhealable
- Heal daily ops receives objects from the daily sweeper
and send them to the heal routine.
2019-06-08 22:14:07 -07:00
Harshavardhana 2c0b3cadfc Update go mod with sem versions of our libraries (#7687) 2019-05-29 16:35:12 -07:00
Anis Elleuch 9c90a28546 Implement bulk delete (#7607)
Bulk delete at storage level in Multiple Delete Objects API

In order to accelerate bulk delete in Multiple Delete objects API,
a new bulk delete is introduced in storage layer, which will accept
a list of objects to delete rather than only one. Consequently,
a new API is also need to be added to Object API.
2019-05-13 12:25:49 -07:00
kannappanr 5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Anis Elleuch facbd653ba Add normal/deep type of heal scanning (#7251)
Healing scan used to read all objects parts to check for bitrot
checksum. This commit will add a quicker way of healing scan
by only checking if parts are actually present in disks or not.
2019-03-14 13:08:51 -07:00
Harshavardhana 7079abc931 Implement HealObjects API to simplify healing (#7351) 2019-03-13 17:35:09 -07:00
Anis Elleuch b05825ffe8 s3: Fix precondition failed in CopyObjectPart when src is encrypted (#7276)
CopyObject precondition checks into GetObjectReader
in order to perform SSE-C pre-condition checks using the
last 32 bytes of encrypted ETag rather than the decrypted
ETag

This also necessitates moving precondition checks for
gateways to gateway layer rather than object handler check
2019-03-06 12:38:41 -08:00
Harshavardhana 082f777281 Revamp bucket metadata healing (#7208)
Bucket metadata healing in the current code was executed multiple
times each time for a given set. Bucket metadata just like
objects are hashed in accordance with its name on any given set,
to allow hashing to play a role we should let the top level
code decide where to navigate.

Current code also had 3 bucket metadata files hardcoded, whereas
we should make it generic by listing and navigating the .minio.sys
to heal such objects.

We also had another bug where due to isObjectDangling changes
without pre-existing bucket metadata files, we were erroneously
reporting it as grey/corrupted objects.

This PR fixes all of the above items.
2019-02-11 09:23:13 +05:30
poornas 40b8d11209 Move metadata into ObjectOptions for NewMultipart and PutObject (#7060) 2019-02-09 11:01:06 +05:30
Harshavardhana 30135eed86 Redo how to handle stale dangling files (#7171)
foo.CORRUPTED should never be created because when
multiple sets are involved we would hash the file
to wrong a location, this PR removes the code.

But allows DeleteBucket() to work properly to delete
dangling buckets/objects. Also adds another option
to Healing where a user needs to specify `--remove`
such that all dangling objects will be deleted with
user confirmation.
2019-02-05 17:58:48 -08:00
poornas 5a80cbec2a Add double encryption at S3 gateway. (#6423)
This PR adds pass-through, single encryption at gateway and double
encryption support (gateway encryption with pass through of SSE
headers to backend).

If KMS is set up (either with Vault as KMS or using
MINIO_SSE_MASTER_KEY),gateway will automatically perform
single encryption. If MINIO_GATEWAY_SSE is set up in addition to
Vault KMS, double encryption is performed.When neither KMS nor
MINIO_GATEWAY_SSE is set, do a pass through to backend.

When double encryption is specified, MINIO_GATEWAY_SSE can be set to
"C" for SSE-C encryption at gateway and backend, "S3" for SSE-S3
encryption at gateway/backend or both to support more than one option.

Fixes #6323, #6696
2019-01-05 14:16:42 -08:00