Commit graph

139 commits

Author SHA1 Message Date
Harshavardhana
40fcd3dc48 Deprecate listDirFactory in HealObjects, rely on ListObjectsHeal (#8419) 2019-10-22 03:13:04 +05:30
Harshavardhana
68a519a468
Use errgroups instead of sync.WaitGroup as needed (#8354) 2019-10-14 09:44:51 -07:00
Harshavardhana
3b8adf7528 Move storageclass config handling into cmd/config/storageclass (#8360)
Continuation of the changes done in PR #8351 to refactor,
add tests and move global handling into a more idiomatic
style for Go as packages.
2019-10-07 11:20:24 +05:30
Harshavardhana
127641731a
Parallelize initialization of storageDisks (#8288) 2019-09-27 16:47:12 -07:00
Harshavardhana
c8fbc94329
Fix writing 'format.json' and make it atomic (#8296)
- Choose a unique uuid such that under situations of duplicate
  mounts we do not append to an existing json entry.
- Avoid AppendFile instead use WriteAll() to write the entire
  byte array atomically.
2019-09-24 18:47:26 -07:00
Harshavardhana
5392eee250 Avoid recursion and use a simple loop to merge entries (#8239)
This avoids stack overflows when there are
lot of entries to be skipped, this PR also
optimizes the code to reuse the buffers.
2019-09-17 06:08:37 +05:30
Harshavardhana
e12f52e2c6 Enhancements to daily-sweeper routine to reduce CPU load (#8209)
- ListObjectsHeal should list only objects
  which need healing, not the entire namespace.
- DeleteObjects() to be used to delete 1000s of
  objects in bulk instead of serially.
2019-09-11 00:38:44 +05:30
Harshavardhana
42e716a094
formatsToDrivesInfo should return drives with correct order (#8157)
This is a defensive change to avoid any future issues,
from this part of the code. New change also ensures
to populate UUID if present for the right disk.
2019-08-30 14:11:18 -07:00
Praveen raj Mani
e211f6f52e Parallelize the DiskInfo calls in xl.StorageInfo() (#8115) 2019-08-22 20:02:40 -07:00
Harshavardhana
e6d8e272ce
Use const slashSeparator instead of "/" everywhere (#8028) 2019-08-06 12:08:58 -07:00
Praveen raj Mani
b976521c83 Ignore faulty disks in xl-sets Storage info (#7878) 2019-08-02 12:17:26 -07:00
Krishnan Parthasarathi
559a59220e Add initial support for bucket lifecycle (#7563)
This PR is based off @sinhaashish's PR for object lifecycle
management, which includes support only for,
- Expiration of object
- Filter using object prefix (_not_ object tags)

N B the code for actual expiration of objects will be included in a
subsequent PR.
2019-07-19 21:20:33 +01:00
Krishna Srinivas
a2e904b966 Support any string as delimiter for listing (#7882) 2019-07-05 14:06:12 -07:00
Krishna Srinivas
338e9a9be9 Put object client disconnect (#7824)
Fail putObject  and postpolicy in case client prematurely disconnects
Use request's context to cancel lock requests on client disconnects
2019-06-28 22:09:17 -07:00
Anis Elleuch
7abadfccc2 Add self-healing feature (#7604)
- Background Heal routine receives heal requests from a channel, either to
heal format, buckets or objects
- Daily sweeper lists all objects in all buckets, these objects
don't necessarly have read quorum so they can be removed if
these objects are unhealable
- Heal daily ops receives objects from the daily sweeper
and send them to the heal routine.
2019-06-08 22:14:07 -07:00
Anis Elleuch
158b8c2e86 sets: Correctly set IsTruncated in listing (#7675)
IsTruncated should not be set to true if there is no further
possible entries beyond maxKeys.

This commit will also move wide testing on object API from xl
to xl sets.
2019-05-22 13:36:16 -07:00
Harshavardhana
55aa20595f
Remove an empty entry being added into XML marshal (#7656) 2019-05-16 13:09:21 -07:00
Harshavardhana
b3f22eac56 Offload listing to posix layer (#7611)
This PR adds one API WalkCh which
sorts and sends list over the network

Each disk walks independently in a sorted manner.
2019-05-14 13:49:10 -07:00
Anis Elleuch
9c90a28546 Implement bulk delete (#7607)
Bulk delete at storage level in Multiple Delete Objects API

In order to accelerate bulk delete in Multiple Delete objects API,
a new bulk delete is introduced in storage layer, which will accept
a list of objects to delete rather than only one. Consequently,
a new API is also need to be added to Object API.
2019-05-13 12:25:49 -07:00
Harshavardhana
64998fc4ab Remove delayIsLeaf requirement simplify ListObjects further (#7593) 2019-05-02 10:36:57 +05:30
Harshavardhana
f767a2538a
Optimize listing with leaf check offloaded to posix (#7541)
Other listing optimizations include

- remove double sorting while filtering object entries
- improve error message when upload-id is not in quorum
- use jsoniter for full unmarshal json, instead of gjson
- remove unused code
2019-04-23 14:54:28 -07:00
Harshavardhana
620e462413 Implement S3-HDFS gateway (#7440)
- [x] Support bucket and regular object operations
- [x] Supports Select API on HDFS
- [x] Implement multipart API support
- [x] Completion of ListObjects support
2019-04-17 09:52:08 -07:00
kannappanr
5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Harshavardhana
0188009c7e Expose total and available disk space (#7453) 2019-04-05 09:51:50 +05:30
Harshavardhana
4a698c731b HealObjects should remove objects without quorum (#7407)
This PR adds a way to list objects without quorum
such that they can purged by `mc admin heal --remove`
2019-03-26 14:57:44 -07:00
Anis Elleuch
facbd653ba Add normal/deep type of heal scanning (#7251)
Healing scan used to read all objects parts to check for bitrot
checksum. This commit will add a quicker way of healing scan
by only checking if parts are actually present in disks or not.
2019-03-14 13:08:51 -07:00
Harshavardhana
7079abc931 Implement HealObjects API to simplify healing (#7351) 2019-03-13 17:35:09 -07:00
kannappanr
ce4563370c
Distributed: Allow healing if all disks are on root partitions (#7358)
If all the disks are on root partitions in distributed mode, consider it
to be a test setup and allow healing to proceed.

Fixes #7346
2019-03-12 16:47:06 -07:00
Harshavardhana
ce588d1489 Improve ListObjects performance by listing in parallel (#7270)
The side affect of this change memory
increase, but this is a trade-off between
performance and actual memory usage.

For all practical scenarios this should be
an adequate change.
2019-02-27 14:39:22 -08:00
Harshavardhana
b6c00405ec Do not pro-actively return false in isObjectDir() (#7246)
We should change the logic for both isObject()
and isObjectDir() leaf detection to be done
with quorum, due to how our directory navigation
works - this allows for properly deleting all
the dangling directories or objects if any.
2019-02-15 16:21:19 -08:00
Harshavardhana
df35d7db9d Introduce staticcheck for stricter builds (#7035) 2019-02-13 18:29:36 +05:30
Harshavardhana
082f777281 Revamp bucket metadata healing (#7208)
Bucket metadata healing in the current code was executed multiple
times each time for a given set. Bucket metadata just like
objects are hashed in accordance with its name on any given set,
to allow hashing to play a role we should let the top level
code decide where to navigate.

Current code also had 3 bucket metadata files hardcoded, whereas
we should make it generic by listing and navigating the .minio.sys
to heal such objects.

We also had another bug where due to isObjectDangling changes
without pre-existing bucket metadata files, we were erroneously
reporting it as grey/corrupted objects.

This PR fixes all of the above items.
2019-02-11 09:23:13 +05:30
poornas
40b8d11209 Move metadata into ObjectOptions for NewMultipart and PutObject (#7060) 2019-02-09 11:01:06 +05:30
Krishna Srinivas
2d168b532b Allow format.json healing on dev/test setup (single node XL, all root disks) (#7170) 2019-02-06 11:44:19 -08:00
Harshavardhana
30135eed86 Redo how to handle stale dangling files (#7171)
foo.CORRUPTED should never be created because when
multiple sets are involved we would hash the file
to wrong a location, this PR removes the code.

But allows DeleteBucket() to work properly to delete
dangling buckets/objects. Also adds another option
to Healing where a user needs to specify `--remove`
such that all dangling objects will be deleted with
user confirmation.
2019-02-05 17:58:48 -08:00
Krishna Srinivas
82af0be1aa Healing process should not heal root disk (#7089) 2019-01-23 15:29:29 -08:00
poornas
5a80cbec2a Add double encryption at S3 gateway. (#6423)
This PR adds pass-through, single encryption at gateway and double
encryption support (gateway encryption with pass through of SSE
headers to backend).

If KMS is set up (either with Vault as KMS or using
MINIO_SSE_MASTER_KEY),gateway will automatically perform
single encryption. If MINIO_GATEWAY_SSE is set up in addition to
Vault KMS, double encryption is performed.When neither KMS nor
MINIO_GATEWAY_SSE is set, do a pass through to backend.

When double encryption is specified, MINIO_GATEWAY_SSE can be set to
"C" for SSE-C encryption at gateway and backend, "S3" for SSE-S3
encryption at gateway/backend or both to support more than one option.

Fixes #6323, #6696
2019-01-05 14:16:42 -08:00
poornas
f6980c4630 fix ConfigSys and NotificationSys initialization for NAS (#6920) 2018-12-05 14:03:42 -08:00
poornas
5f6d717b7a Fix: Preserve MD5Sum for SSE encrypted objects (#6680)
To conform with AWS S3 Spec on ETag for SSE-S3 encrypted objects,
encrypt client sent MD5Sum and store it on backend as ETag.Extend
this behavior to SSE-C encrypted objects.
2018-11-14 17:36:41 -08:00
Harshavardhana
3f643acb99 HealBucket was double counting endpoints (#6707)
Endpoint comparisons blindly without looking
if its local is wrong because the actual drive
for a local disk is always going to provide just
the path without the HTTP endpoint.

Add code such that this is taken care properly in
all situations. Without this PR HealBucket() would
wrongly conclude that the healing doesn't have quorum
when there are larger number of local disks involved.

Fixes #6703
2018-10-26 10:25:52 -07:00
Harshavardhana
54ae364def Introduce STS client grants API and OPA policy integration (#6168)
This PR introduces two new features

- AWS STS compatible STS API named AssumeRoleWithClientGrants

```
POST /?Action=AssumeRoleWithClientGrants&Token=<jwt>
```

This API endpoint returns temporary access credentials, access
tokens signature types supported by this API

  - RSA keys
  - ECDSA keys

Fetches the required public key from the JWKS endpoints, provides
them as rsa or ecdsa public keys.

- External policy engine support, in this case OPA policy engine

- Credentials are stored on disks
2018-10-09 14:00:01 -07:00
Anis Elleuch
cbc5d78a09 Handle read/quorum errors when initializing all subsystems (#6585)
- Only require len(disks)/2 to initialize the cluster
- Fix checking of read/write quorm in subsystems init
- Add retry mechanism in policy and notification to avoid aborting in case of read/write quorums errors
2018-10-08 15:47:13 -07:00
Krishna Srinivas
81bee93b8d Move remote disk StorageAPI abstraction from RPC to REST (#6464) 2018-10-04 17:44:06 -07:00
Praveen raj Mani
ce9d36d954 Add object compression support (#6292)
Add support for streaming (golang/LZ77/snappy) compression.
2018-09-28 09:06:17 +05:30
poornas
ed703c065d Add ObjectOptions to GetObjectNInfo (#6533) 2018-09-27 15:36:45 +05:30
Anis Elleuch
aa4e2b1542 Use GetObjectNInfo in CopyObject and CopyObjectPart (#6489) 2018-09-25 12:39:46 -07:00
Aditya Manthramurthy
36e51d0cee Add GetObjectNInfo to object layer (#6449)
The new call combines GetObjectInfo and GetObject, and returns an
object with a ReadCloser interface.

Also adds a number of end-to-end encryption tests at the handler
level.
2018-09-20 19:22:09 -07:00
poornas
5c0b98abf0 Add ObjectOptions to ObjectLayer calls (#6382) 2018-09-10 09:42:43 -07:00
Harshavardhana
d0d015361c Fix config subsystem to wait on quorum number of formatted disks (#6407) 2018-09-05 20:55:55 +05:30
Harshavardhana
4487f70f08 Revert all GetObjectNInfo related PRs (#6398)
* Revert "Encrypted reader wrapped in NewGetObjectReader should be closed (#6383)"

This reverts commit 53a0bbeb5b.

* Revert "Change SelectAPI to use new GetObjectNInfo API (#6373)"

This reverts commit 5b05df215a.

* Revert "Implement GetObjectNInfo object layer call (#6290)"

This reverts commit e6d740ce09.
2018-08-31 13:10:12 -07:00
Aditya Manthramurthy
e6d740ce09 Implement GetObjectNInfo object layer call (#6290)
This combines calling GetObjectInfo and GetObject while returning a
io.ReadCloser for the object's body. This allows the two operations to
be under a single lock, fixing a race between getting object info and
reading the object body.
2018-08-27 15:28:23 +05:30
Krishna Srinivas
52f6d5aafc Rename of structs and methods (#6230)
Rename of ErasureStorage to Erasure (and rename of related variables and methods)
2018-08-23 23:35:37 -07:00
Harshavardhana
0fe9e95250 Validate prefixes on all sets (#6294)
This PR fixes a regression introduced in 8eb838bf91
where hashing technique was used on prefixes to get the right set
to perform the operation, this is not correct since prefixes and
their corresponding keys might hash to a different value depending
on the key length.

For prefixes/directories we should look everywhere to support proper
quorum based listing.

Fixes #6293
2018-08-16 16:49:38 -07:00
Oleg Kovalov
37de2dbd3b simplifying if-else chains to switches (#6208) 2018-08-06 10:26:40 -07:00
Harshavardhana
556a51120c Deprecate ListLocks and ClearLocks (#6233)
No locks are ever left in memory, we also
have a periodic interval of clearing stale locks
anyways. The lock instrumentation was not complete
and was seldom used.

Deprecate this for now and bring it back later if
it is really needed. This also in-turn seems to improve
performance slightly.
2018-08-02 23:09:42 +05:30
Harshavardhana
ad86454580 Make sure to handle FaultyDisks in listing ops (#6204)
Continuing from PR 157ed65c35

Our posix.go implementation did not handle  I/O errors
properly on the disks, this led to situations where
top-level callers such as ListObjects might return early
without even verifying all the available disks.

This commit tries to address this in Kubernetes, drbd/nbd based
persistent volumes which can disconnect under load and
result in the situations with disks return I/O errors.

This commit also simplifies listing operation, listing
never returns any error. We can avoid this since we pretty
much ignore most of the errors anyways. When objects are
accessed directly we return proper errors.
2018-07-27 15:32:19 -07:00
Anis Elleuch
be1700f595 Avoid startup abort when a notify target is down (#6126)
Minio server was preventing itself to start when any notification
target is down and not running. The PR changes the behavior by
avoiding startup abort in that case, so the user will still
be able to access Minio server using mc admin commands after
a restart or set config commands.
2018-07-10 07:20:31 +05:30
wd256
25f9b0bc3b Handle ListObjectsV2 start-after parameter in ObjectLayer (#6078) 2018-07-01 09:52:45 +05:30
Harshavardhana
e5e522fc61
docs: fix all Chinese doc links for the new docs site (#6097)
Additionally fix typos, default to US locale words
2018-06-28 16:02:02 -07:00
Harshavardhana
abf209b1dd load bucket policies using object layer API (#6084)
This PR fixes an issue during gateway mode
where underlying policies were not translated
into meaningful policies.
2018-06-27 12:29:48 +05:30
Harshavardhana
df1b33013f Fix byte pool usage, use only one pool for all sets. (#5990) 2018-06-01 16:41:23 -07:00
Harshavardhana
000e360196 Deprecate showing drive capacity and total free (#5976)
This addresses a situation that we shouldn't be
displaying Total/Free anymore, instead we should simply
show the total usage.
2018-05-23 17:30:25 -07:00
Harshavardhana
e6ec645035 Implement support for calculating disk usage per tenant (#5969)
Fixes #5961
2018-05-23 15:41:29 +05:30
Harshavardhana
c872c30ea3 fix: introduce isLeafDir in healing to fix the crash (#5920)
This PR also supports healing directories.

Fixes #5917
2018-05-10 16:53:42 -07:00
Anis Elleuch
6d5f2a4391 Better support of empty directories (#5890)
Better support of HEAD and listing of zero sized objects with trailing
slash (a.k.a empty directory). For that, isLeafDir function is added
to indicate if the specified object is an empty directory or not. Each
backend (xl, fs) has the responsibility to store that information.
Currently, in both of XL & FS, an empty directory is represented by
an empty directory in the backend.

isLeafDir() checks if the given path is an empty directory or not,
since dir listing is costly if the latter contains too many objects,
readDirN() is added in this PR to list only N number of entries.
In isLeadDir(), we will only list one entry to check if a directory
is empty or not.
2018-05-09 01:38:21 -07:00
Anis Elleuch
9439dfef64 Use defer style to stop tickers to avoid current/possible misuse (#5883)
This commit ensures that all tickers are stopped using defer ticker.Stop()
style. This will also fix one bug seen when a client starts to listen to
event notifications and that case will result a leak in tickers.
2018-05-04 10:43:20 -07:00
Harshavardhana
5f9041571f Heal only when atleast one of the disk is unformatted (#5866)
Current healing has an issue when disks are healed
even when they are offline without knowing if disk
is unformatted. This can lead to issues of pre-maturely
removing the disk from the set just because it was
temporarily offline.

There is an increasing number of `mc admin heal` usage
on a cron or regular basis. It is possible that if healing
code saw disk is offline it might prematurely take it down,
this causes availability issues.

Fixes #5826
2018-05-01 09:07:39 +05:30
Krishna Srinivas
9aace6d36d Continue healing other objects even if objects without quorum exist (#5851)
fixes #5815
2018-04-25 11:56:39 -07:00
Bala FA
0d52126023 Enhance policy handling to support SSE and WORM (#5790)
- remove old bucket policy handling
- add new policy handling
- add new policy handling unit tests

This patch brings support to bucket policy to have more control not
limiting to anonymous.  Bucket owner controls to allow/deny any rest
API.

For example server side encryption can be controlled by allowing
PUT/GET objects with encryptions including bucket owner.
2018-04-24 15:53:30 -07:00
Harshavardhana
4a874dfbc1
Ignore prefix renames when dest directory is not empty (#5798)
Also make sure to not modify the underlying errors from
layers, we should return the error as is and one object
layer should translate the errors.

Fixes #5797
2018-04-11 17:15:42 -07:00
kannappanr
cef992a395
Remove error package and cause functions (#5784) 2018-04-10 09:36:37 -07:00
Harshavardhana
1d31ad499f Make sure to re-load reference format after HealFormat (#5772)
This PR introduces ReloadFormat API call at objectlayer
to facilitate this. Previously we repurposed HealFormat
but we never ended up updating our reference format on
peers.

Fixes #5700
2018-04-09 22:55:41 +05:30
Krishna Srinivas
ae8e863ff4 disk.String() represents just path and not URL when disk is a local disk (#5785) 2018-04-06 16:59:31 -07:00
kannappanr
f8a3fd0c2a
Create logger package and rename errorIf to LogIf (#5678)
Removing message from error logging
Replace errors.Trace with LogIf
2018-04-05 15:04:40 -07:00
Harshavardhana
85a57d2021 Make sure to close the disk connections (#5752)
Since we do not re-use storageDisks after moving
the connections to object layer we should close them
appropriately otherwise we have a lot of connection
leaks and these can compound as the time goes by.

This PR also refactors the initialization code to
re-use storageDisks for given set of endpoints until
we have confirmed a valid reference format.
2018-04-04 10:28:48 +05:30
Harshavardhana
8eb838bf91 Extend quorum based listing for prefixes (#5749)
Previous PR 2afd196c83 fixed
the issue of quorum based listing for regular objects, this
PR continues on this idea by extending this support to
object directory prefixes as well.

Fixes #5733
2018-04-02 17:26:34 -07:00
Harshavardhana
6e9c853312 After healing re-load disks with the new format (#5718)
This PR also fixes correct calculation of drive states
before and after healing of objects.

Fixes #5700
Fixes #5708
2018-03-28 06:41:39 +05:30
Harshavardhana
de44be86d0 Use readQuorum instead of writeQuorum to check bucket exists (#5715)
Fixes #5708
Fixes #5700
2018-03-26 16:36:57 -07:00
Harshavardhana
f23944aed7 Fix heal bucket deadlock after replacing disks (#5661)
Fixes #5659
2018-03-16 15:09:31 -07:00
Krishna Srinivas
9ede179a21 Use context.Background() instead of nil
Rename Context[Get|Set] -> [Get|Set]Context
2018-03-15 16:28:25 -07:00
Krishna Srinivas
e452377b24 Add context to the object-interface methods.
Make necessary changes to xl fs azure sia
2018-03-15 16:28:25 -07:00
Krishna Srinivas
9083bc152e Flat multipart backend implementation for Erasure backend (#5447) 2018-03-15 13:55:23 -07:00
Bala FA
0e4431725c make notification as separate package (#5294)
* Remove old notification files

* Add net package

* Add event package

* Modify minio to take new notification system
2018-03-15 13:03:41 -07:00
Anis Elleuch
120b061966 Add multipart support in SSE-C encryption (#5576)
*) Add Put/Get support of multipart in encryption
*) Add GET Range support for encryption
*) Add CopyPart encrypted support
*) Support decrypting of large single PUT object
2018-03-01 11:37:57 -08:00
Harshavardhana
7cc678c653 Support encryption for CopyObject, GET-Range requests (#5544)
- Implement CopyObject encryption support
- Handle Range GETs for encrypted objects

Fixes #5193
2018-02-23 15:07:21 -08:00
Harshavardhana
0ea54c9858 Change CopyObject{Part} to single srcInfo argument (#5553)
Refactor such that metadata and etag are
combined to a single argument `srcInfo`.

This is a precursor change for #5544 making
it easier for us to provide encryption/decryption
functions.
2018-02-21 14:18:47 +05:30
poornas
25107c2e11 Add NAS gateway support (#5516) 2018-02-20 12:21:12 -08:00
Harshavardhana
03923947c4
Fix delete bucket policies properly (#5540)
There was bug in previous PR where deleteBucketMetadata()
was never called, fix it correctly.
2018-02-16 20:16:48 -08:00
Harshavardhana
fb96779a8a Add large bucket support for erasure coded backend (#5160)
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.

This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.

Some design details and restrictions:

- Objects are distributed using consistent ordering
  to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
  properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
  requirement, you can start with multiple
  such sets statically.
- Static sets set of disks and cannot be
  changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
  changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
  slower since List happens on all servers,
  and is merged at this sets layer.

Fixes #5465
Fixes #5464
Fixes #5461
Fixes #5460
Fixes #5459
Fixes #5458
Fixes #5460
Fixes #5488
Fixes #5489
Fixes #5497
Fixes #5496
2018-02-15 17:45:57 -08:00