Commit graph

26 commits

Author SHA1 Message Date
Harshavardhana 029758cb20
fix: retain the previous UUID for newly replaced drives (#10759)
only newly replaced drives get the new `format.json`,
this avoids disks reloading their in-memory reference
format, ensures that drives are online without
reloading the in-memory reference format.

keeping reference format in-tact means UUIDs
never change once they are formatted.
2020-10-26 10:29:29 -07:00
Harshavardhana 6a8c62f9fd
make sure to preserve UUID from reference format (#10748)
reference format should be source of truth
for inconsistent drives which reconnect,
add them back to their original position

remove automatic fix for existing offline
disk uuids
2020-10-24 13:23:08 -07:00
Harshavardhana ad726b49b4
rename zones to serverSets to avoid terminology conflict (#10679)
we are bringing in availability zones, we should avoid
zones as per server expansion concept.
2020-10-15 14:28:50 -07:00
Harshavardhana f1cc16e788
fix: background heal rely on getOnlineDisks() (#10687) 2020-10-15 13:06:23 -07:00
Harshavardhana 00eb6f6bc9
cache DiskInfo at storage layer for performance (#10586)
`mc admin info` on busy setups will not move HDD
heads unnecessarily for repeated calls, provides
a better responsiveness for the call overall.

Bonus change allow listTolerancePerSet be N-1
for good entries, to avoid skipping entries
for some reason one of the disk went offline.
2020-09-29 09:54:41 -07:00
Harshavardhana 66174692a2
add '.healing.bin' for tracking currently healing disk (#10573)
add a hint on the disk to allow for tracking fresh disk
being healed, to allow for restartable heals, and also
use this as a way to track and remove disks.

There are more pending changes where we should move
all the disk formatting logic to backend drives, this
PR doesn't deal with this refactor instead makes it
easier to track healing in the future.
2020-09-28 19:39:32 -07:00
Harshavardhana ca989eb0b3
avoid ListBuckets returning quorum errors when node is down (#10555)
Also, revamp the way ListBuckets work make few portions
of the healing logic parallel

- walk objects for healing disks in parallel
- collect the list of buckets in parallel across drives
- provide consistent view for listBuckets()
2020-09-24 09:53:38 -07:00
Harshavardhana e60834838f
fix: background disk heal, to reload format consistently (#10502)
It was observed in VMware vsphere environment during a
pod replacement, `mc admin info` might report incorrect
offline nodes for the replaced drive. This issue eventually
goes away but requires quite a lot of time for all servers
to be in sync.

This PR fixes this behavior properly.
2020-09-16 21:14:35 -07:00
Harshavardhana b0e1d4ce78
re-attach offline drive after new drive replacement (#10416)
inconsistent drive healing when one of the drive is offline
while a new drive was replaced, this change is to ensure
that we can add the offline drive back into the mix by
healing it again.
2020-09-04 17:09:02 -07:00
Harshavardhana a359e36e35
tolerate listing with only readQuorum disks (#10357)
We can reduce this further in the future, but this is a good
value to keep around. With the advent of continuous healing,
we can be assured that namespace will eventually be
consistent so we are okay to avoid the necessity to
a list across all drives on all sets.

Bonus Pop()'s in parallel seem to have the potential to
wait too on large drive setups and cause more slowness
instead of gaining any performance remove it for now.

Also, implement load balanced reply for local disks,
ensuring that local disks have an affinity for

- cleanupStaleMultipartUploads()
2020-08-26 19:29:35 -07:00
Harshavardhana 74116204ce
handle fresh setup with mixed drives (#10273)
fresh drive setups when one of the drive is
a root drive, we should ignore such a root
drive and not proceed to format.

This PR handles this properly by marking
the disks which are root disk and they are
taken offline.
2020-08-18 14:37:26 -07:00
Harshavardhana 2a9819aff8
fix: refactor background heal for cluster health (#10225) 2020-08-07 19:43:06 -07:00
Harshavardhana 6c6137b2e7
add cluster maintenance healthcheck drive heal affinity (#10218) 2020-08-07 13:22:53 -07:00
Harshavardhana 187c3f62df
fix: heal replaced drives properly (#10069)
healing was not working properly when drives were
replaced, due to the error check in root disk
calculation this PR fixes this behavior

This PR also adds additional fix for missing
metadata entries from .minio.sys as part of
disk healing as well.

Added code to ignore and print more context
sensitive errors for better debugging.

This PR is continuation of fix in 7b14e9b660
2020-07-17 10:08:04 -07:00
Harshavardhana 7b14e9b660
fix: diskInfo should check diskID only if disk is online (#10058)
closes #10057
2020-07-16 07:30:05 -07:00
Harshavardhana 4915433bd2
Support bucket versioning (#9377)
- Implement a new xl.json 2.0.0 format to support,
  this moves the entire marshaling logic to POSIX
  layer, top layer always consumes a common FileInfo
  construct which simplifies the metadata reads.
- Implement list object versions
- Migrate to siphash from crchash for new deployments
  for object placements.

Fixes #2111
2020-06-12 20:04:01 -07:00
Harshavardhana 342ade03f6
deprecate listDir usage for healing (#9792)
listDir was incorrectly used for healing which
is slower, instead use Walk() to heal the entire
set.
2020-06-09 17:09:19 -07:00
Harshavardhana 71ce63f79c
fix: background heal to call HealFormat only if needed (#9491)
In large setups this avoids unnecessary data transfer
across nodes and potential locks.

This PR also optimizes heal result channel, which should
be avoided for each queueHealTask as its expensive
to create/close channels for large number of objects.
2020-04-30 20:23:00 -07:00
Bala FA 95e89f1712
proactive deep heal object when a bitrot is detected (#9192) 2020-04-01 12:14:00 -07:00
Harshavardhana cfc9cfd84a
fix: various optimizations, idiomatic changes (#9179)
- acquire since leader lock for all background operations
  - healing, crawling and applying lifecycle policies.

- simplify lifecyle to avoid network calls, which was a
  bug in implementation - we should hold a leader and
  do everything from there, we have access to entire
  name space.

- make listing, walking not interfere by slowing itself
  down like the crawler.

- effectively use global context everywhere to ensure
  proper shutdown, in cache, lifecycle, healing

- don't read `format.json` for prometheus metrics in
  StorageInfo() call.
2020-03-22 12:16:36 -07:00
Harshavardhana 442e1698cb
heal: Avoid spinning up object healing during startup (#8819)
auto-heal disks, metadata and buckets in background but
not objects, let the auto heal kick in for objects after
the cluster has been up for a while.
2020-01-15 01:08:39 -08:00
Harshavardhana 5aa5dcdc6d
lock: improve locker initialization at init (#8776)
Use reference format to initialize lockers
during startup, also handle `nil` for NetLocker
in dsync and remove *errorLocker* implementation

Add further tuning parameters such as

 - DialTimeout is now 15 seconds from 30 seconds
 - KeepAliveTimeout is not 20 seconds, 5 seconds
   more than default 15 seconds
 - ResponseHeaderTimeout to 10 seconds
 - ExpectContinueTimeout is reduced to 3 seconds
 - DualStack is enabled by default remove setting
   it to `true`
 - Reduce IdleConnTimeout to 30 seconds from
   1 minute to avoid idleConn build up

Fixes #8773
2020-01-10 02:35:06 -08:00
Harshavardhana 347b29d059 Implement bucket expansion (#8509) 2019-11-19 17:42:27 -08:00
Harshavardhana 822eb5ddc7 Bring in safe mode support (#8478)
This PR refactors object layer handling such
that upon failure in sub-system initialization
server reaches a stage of safe-mode operation
wherein only certain API operations are enabled
and available.

This allows for fixing many scenarios such as

 - incorrect configuration in vault, etcd,
   notification targets
 - missing files, incomplete config migrations
   unable to read encrypted content etc
 - any other issues related to notification,
   policies, lifecycle etc
2019-11-09 09:27:23 -08:00
Harshavardhana 9e7a3e6adc Extend further validation of config values (#8469)
- This PR allows config KVS to be validated properly
  without being affected by ENV overrides, rejects
  invalid values during set operation

- Expands unit tests and refactors the error handling
  for notification targets, returns error instead of
  ignoring targets for invalid KVS

- Does all the prep-work for implementing safe-mode
  style operation for MinIO server, introduces a new
  global variable to toggle safe mode based operations
  NOTE: this PR itself doesn't provide safe mode operations
2019-10-30 23:39:09 -07:00
Anis Elleuch a49d4a9cb2 xl: Rewrite auto-healing and implement auto new-disk healer (#8114)
The new auto healing model selects one node always responsible
for auto-healing the whole cluster, erasure set by erasure set.
If that node dies, another node will be elected as a leading
operator to perform healing.

This code also adds a goroutine which checks each 10 minutes
if there are any new unformatted disks and performs its healing
in that case, only the erasure set which has the new disk will
be healed.
2019-10-28 10:27:49 -07:00