Commit graph

64 commits

Author SHA1 Message Date
Harshavardhana a63bc9254d Add 'disk' tag to log output to enhance 'disk not found' errors (#6460) 2018-09-13 21:42:50 -07:00
Krishna Srinivas 52f6d5aafc Rename of structs and methods (#6230)
Rename of ErasureStorage to Erasure (and rename of related variables and methods)
2018-08-23 23:35:37 -07:00
Harshavardhana 556a51120c Deprecate ListLocks and ClearLocks (#6233)
No locks are ever left in memory, we also
have a periodic interval of clearing stale locks
anyways. The lock instrumentation was not complete
and was seldom used.

Deprecate this for now and bring it back later if
it is really needed. This also in-turn seems to improve
performance slightly.
2018-08-02 23:09:42 +05:30
Harshavardhana ad86454580 Make sure to handle FaultyDisks in listing ops (#6204)
Continuing from PR 157ed65c35

Our posix.go implementation did not handle  I/O errors
properly on the disks, this led to situations where
top-level callers such as ListObjects might return early
without even verifying all the available disks.

This commit tries to address this in Kubernetes, drbd/nbd based
persistent volumes which can disconnect under load and
result in the situations with disks return I/O errors.

This commit also simplifies listing operation, listing
never returns any error. We can avoid this since we pretty
much ignore most of the errors anyways. When objects are
accessed directly we return proper errors.
2018-07-27 15:32:19 -07:00
Harshavardhana 000e360196 Deprecate showing drive capacity and total free (#5976)
This addresses a situation that we shouldn't be
displaying Total/Free anymore, instead we should simply
show the total usage.
2018-05-23 17:30:25 -07:00
Harshavardhana e6ec645035 Implement support for calculating disk usage per tenant (#5969)
Fixes #5961
2018-05-23 15:41:29 +05:30
Bala FA 0d52126023 Enhance policy handling to support SSE and WORM (#5790)
- remove old bucket policy handling
- add new policy handling
- add new policy handling unit tests

This patch brings support to bucket policy to have more control not
limiting to anonymous.  Bucket owner controls to allow/deny any rest
API.

For example server side encryption can be controlled by allowing
PUT/GET objects with encryptions including bucket owner.
2018-04-24 15:53:30 -07:00
kannappanr cef992a395
Remove error package and cause functions (#5784) 2018-04-10 09:36:37 -07:00
kannappanr f8a3fd0c2a
Create logger package and rename errorIf to LogIf (#5678)
Removing message from error logging
Replace errors.Trace with LogIf
2018-04-05 15:04:40 -07:00
Harshavardhana 85a57d2021 Make sure to close the disk connections (#5752)
Since we do not re-use storageDisks after moving
the connections to object layer we should close them
appropriately otherwise we have a lot of connection
leaks and these can compound as the time goes by.

This PR also refactors the initialization code to
re-use storageDisks for given set of endpoints until
we have confirmed a valid reference format.
2018-04-04 10:28:48 +05:30
Harshavardhana 6e9c853312 After healing re-load disks with the new format (#5718)
This PR also fixes correct calculation of drive states
before and after healing of objects.

Fixes #5700
Fixes #5708
2018-03-28 06:41:39 +05:30
Krishna Srinivas e452377b24 Add context to the object-interface methods.
Make necessary changes to xl fs azure sia
2018-03-15 16:28:25 -07:00
Krishna Srinivas 9083bc152e Flat multipart backend implementation for Erasure backend (#5447) 2018-03-15 13:55:23 -07:00
Harshavardhana fb96779a8a Add large bucket support for erasure coded backend (#5160)
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.

This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.

Some design details and restrictions:

- Objects are distributed using consistent ordering
  to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
  properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
  requirement, you can start with multiple
  such sets statically.
- Static sets set of disks and cannot be
  changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
  changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
  slower since List happens on all servers,
  and is merged at this sets layer.

Fixes #5465
Fixes #5464
Fixes #5461
Fixes #5460
Fixes #5459
Fixes #5458
Fixes #5460
Fixes #5488
Fixes #5489
Fixes #5497
Fixes #5496
2018-02-15 17:45:57 -08:00
Harshavardhana 8de6cf4124 update dsync implementation to fix a regression (#5513)
Currently minio master requires 4 servers, we
have decided to run on a minimum of 2 servers
instead - fixes a regression from previous
releases where 3 server setups were supported.
2018-02-12 15:16:12 +05:30
poornas 4f73fd9487 Unify gateway and object layer. (#5487)
* Unify gateway and object layer. Bring bucket policies into
object layer.
2018-02-09 15:19:30 -08:00
Harshavardhana 0c880bb852 Deprecate and remove in-memory object caching (#5481)
in-memory caching cannot be cleanly implemented
without the access to GC which Go doesn't naturally
provide. At times we have seen that object caching
is more of an hindrance rather than a boon for
our use cases.

Removing it completely from our implementation
  related to #5160 and #5182
2018-02-02 10:17:13 -08:00
Nitish Tiwari e2d5a87b26 Fix free and total space reported in startup banner (#5419)
With storage class support, the free and total space
reported in Minio XL startup banner should be based on
totalDisks - standardClassParityDisks, instead of totalDisks/2.

fixes #5416
2018-01-17 11:25:51 -08:00
poornas 0bb6247056 Move nslocking from s3 layer to object layer (#5382)
Fixes #5350
2018-01-13 10:04:52 +05:30
Nitish Tiwari 42633748db
Update madmin package to return storage class parity (#5387)
After the addition of Storage Class support, readQuorum
and writeQuorum are decided on a per object basis, instead
of deployment wide static quorums.

This PR updates madmin api to remove readQuorum/writeQuorum
and add Standard storage class and reduced redundancy storage
class parity as return values. Since these parity values are
used to decide the quorum for each object.

Fixes #5378
2018-01-12 07:52:52 +05:30
Nitish Tiwari 545a9e4a82 Fix storage class related issues (#5322)
- Add storage class metadata validation for request header
- Change storage class header values to be consistent with AWS S3
- Refactor internal method to take only the reqd argument
2017-12-27 10:06:16 +05:30
Nitish Tiwari 1a3dbbc9dd
Add x-amz-storage-class support (#5295)
This adds configurable data and parity options on a per object
basis. To use variable parity

- Users can set environment variables to cofigure variable
parity

- Then add header x-amz-storage-class to putobject requests
with relevant storage class values

Fixes #4997
2017-12-22 16:58:13 +05:30
Harshavardhana 490c30f853
erasure: Support cleaning up of stale multipart objects (#5250)
Just like our single directory/disk setup, this PR brings
the functionality to cleanup stale multipart objects
older > 2 weeks.
2017-11-30 18:11:42 -08:00
Harshavardhana 8efa82126b
Convert errors tracer into a separate package (#5221) 2017-11-25 11:58:29 -08:00
Nitish Tiwari fcc61fa46a Remove minimum inodes reqd check (#4747) 2017-08-03 20:07:22 -07:00
Harshavardhana 075b8903d7 fs: Add safe locking semantics for format.json (#4523)
This patch also reverts previous changes which were
merged for migration to the newer disk format. We will
be bringing these changes in subsequent releases. But
we wish to add protection in this release such that
future release migrations are protected.

Revert "fs: Migration should handle bucketConfigs as regular objects. (#4482)"
This reverts commit 976870a391.

Revert "fs: Migrate object metadata to objects directory. (#4195)"
This reverts commit 76f4f20609.
2017-06-12 17:40:28 -07:00
Harshavardhana 7765081db7 cache: Increasing caching GC percent from 20 to 50. (#4041)
Previous value was set to avoid large cache value build
up but we can clearly see this can cause lots of GC
pauses which can lead to significant drop in performance.

Change this value to 50% and decrease the value to 25%
once the 75% cache size is used. To have a larger
window for GC pauses.

Another change is to only allow caching if a server has
more than 24GB of RAM instead of 8GB.
2017-04-15 02:16:49 -07:00
Aditya Manthramurthy 604417baf4 Allow cluster to start when only n/2 servers are up (#4066)
Fixes #3234.

Relaxes the quorum requirement to start the object layer, and skips
quick-healing at start-up (as no write quorum is present).
2017-04-09 00:28:27 -07:00
Bala FA 2df8160f6a server: handle command line and env variables at one place. (#3975) 2017-03-30 11:21:19 -07:00
Harshavardhana 43317530d5 Fix odd shadowing bug in XL init. (#3874)
Fixes #3873
2017-03-08 20:42:45 -08:00
Harshavardhana 47ac410ab0 Code cleanup - simplify server side code. (#3870)
Fix all the issues reported by `gosimple` tool.
2017-03-08 10:00:47 -08:00
Harshavardhana e49efcb9d9 xl: quickHeal heal bucket only when needed. (#3854)
This improves the startup time significantly
for clusters which have lot of buckets.

Also fixes a bug where `.minio.sys` is created
on disks which do not have `format.json`
2017-03-06 02:00:15 -08:00
Bala FA 208dd15245 Remove globalMaxCacheSize and globalCacheExpiry variables (#3826)
This patch fixes below

* Remove global variables globalMaxCacheSize and globalCacheExpiry.
* Make global variables into constant in objcache package.
2017-03-02 10:34:37 -08:00
Harshavardhana 9df01035da Remove XL references in public docs to Erasure. (#3725)
Ref #3722
2017-02-09 23:26:44 -08:00
Krishnan Parthasarathi 586058f079 Implement mgmt REST APIs to heal storage format. (#3604)
* Implement heal format REST API handler
* Implement admin peer rpc handler to re-initialize storage
* Implement HealFormat API in pkg/madmin
* Update pkg/madmin API.md to incl. HealFormat
* Added unit tests for ReInitDisks rpc handler and HealFormatHandler
2017-01-23 00:32:55 -08:00
Harshavardhana 1c699d8d3f fs: Re-implement object layer to remember the fd (#3509)
This patch re-writes FS backend to support shared backend sharing locks for safe concurrent access across multiple servers.
2017-01-16 17:05:00 -08:00
Harshavardhana 7bbb532b4b Add a isErr function to check for errs.
DisksInfo() should handle collection of some
base errors as offlineDisks.
2017-01-02 10:52:43 -08:00
Harshavardhana b363709c11 caching: Optimize memory allocations. (#3405)
This change brings in changes at multiple places

 - Reuse buffers at almost all locations ranging
   from rpc, fs, xl, checksum etc.
 - Change caching behavior to disable itself
   under low memory conditions i.e < 8GB of RAM.
 - Only objects cached are of size 1/10th the size
   of the cache for example if 4GB is the cache size
   the maximum object size which will be cached
   is going to be 400MB. This change is an
   optimization to cache more objects rather
   than few larger objects.
 - If object cache is enabled default GC
   percent has been reduced to 20% in lieu
   with newly found behavior of GC. If the cache
   utilization reaches 75% of the maximum value
   GC percent is reduced to 10% to make GC
   more aggressive.
 - Do not use *bytes.Buffer* due to its growth
   requirements. For every allocation *bytes.Buffer*
   allocates an additional buffer for its internal
   purposes. This is undesirable for us, so
   implemented a new cappedWriter which is capped to a
   desired size, beyond this all writes rejected.

Possible fix for #3403.
2016-12-08 20:35:07 -08:00
Harshavardhana 46a6fde813 xl/fs: Fix initializing meta volume bug. 2016-11-25 18:17:53 -08:00
Bala FA 0f2e493c9a Use isErrIgnored() function wherever applicable. (#3343) 2016-11-23 20:05:04 -08:00
Bala FA 825000bc34 Use humanize constants for KiB, MiB and GiB units. (#3322) 2016-11-22 18:18:22 -08:00
Harshavardhana 5197649081 utils: reduceErrs returns and validates quorum errors. (#3300)
This is needed as explained by @krisis

Lets say we have following errors.

```
[]error{nil, errFileNotFound, errDiskAccessDenied, errDiskAccesDenied}
```

Since the last two errors are filtered, the maximum is nil,
depending on map order.

Let's say we get nil from reduceErr. Clearly at this point
we don't have quorum nodes agreeing about the data and since
GetObject only requires N/2 (Read quorum) and isDiskQuorum
would have returned true. This is problematic and can lead to
undersiable consequences.

Fixes #3298
2016-11-21 01:47:26 -08:00
Harshavardhana 0b9f0d14a1 auth/rpc: Take remote disk offline after maximum allowed attempts. (#3288)
Disks when are offline for a long period of time, we should
ignore the disk after trying Login upto 5 times.

This is to reduce the network chattiness, this also reduces
the overall time spent on `net.Dial`.

Fixes #3286
2016-11-20 16:57:12 -08:00
Harshavardhana c91d3791f9 heal: Add healing support for bucket, bucket metadata files. (#3252)
This patch implements healing in general but it is only used
as part of quickHeal().

Fixes #3237
2016-11-16 16:42:23 -08:00
Harshavardhana 716316f711 Reduce number of envs and options from command line. (#3230)
Ref #3229

After review with @abperiasamy we decided to remove all the unnecessary options

- MINIO_BROWSER (Implemented as a security feature but now deemed obsolete
  since even if blocking access to MINIO_BROWSER, s3 API port is open)
- MINIO_CACHE_EXPIRY (Defaults to 72h)
- MINIO_MAXCONN (No one used this option and we don't test this)
- MINIO_ENABLE_FSMETA (Enable FSMETA all the time)

Remove --ignore-disks option - this option was implemented when XL layer
 would initialize the backend disks and heal them automatically to disallow
 XL accidentally using the root partition itself this option was introduced.

This behavior has been changed XL no longer automatically initializes
`format.json`  a HEAL is controlled activity, so ignore-disks is not
useful anymore. This change also addresses the problems of our documentation
going forward and keeps things simple. This patch brings in reduction of
options and defaulting them to a valid known inputs.  This patch also
serves as a guideline of limiting many ways to do the same thing.
2016-11-11 16:40:55 -08:00
Krishnan Parthasarathi 7d50361ca9 Move housekeeping before object layer initialization (#3001)
In a distributed setup that the server should not perform any operation
on the storage layer after it is exported via RPC. e.g, cleaning up of
temporary directories under .minio.sys/tmp may interfere with ongoing
PUT objects being served by the distributed setup.
2016-10-19 19:59:48 -07:00
Harshavardhana f8e13fb00e server: Startup sequence should be more idempotent. (#2974)
Fixes #2971 - honors ignore-disks option properly.
Fixes #2969 - change the net.Dial to have a timeout of 3secs.
2016-10-17 14:31:33 -07:00
Anis Elleuch 334cdb5d64 XL total/free space calculation is done inside xl module (#2945) 2016-10-16 14:24:15 -07:00
Harshavardhana f22862aa28 heal: Refactor heal command. (#2901)
- return errors for heal operation through rpc replies.
  - implement rotating wheel for healing status.

Fixes #2491
2016-10-14 19:57:40 -07:00
Harshavardhana 3cfb23750a control: Implement service command 'stop,restart,status'. (#2883)
- stop - stops all the servers.
- restart - restart all the servers.
- status - prints status of storage info about the cluster.
2016-10-09 23:03:10 -07:00