Commit graph

6 commits

Author SHA1 Message Date
Frank Wessels 98b62cbec8 Implement an offline mode for a distributed node (#4646)
Implement an offline mode for remote storage to cache the
offline status of a node in order to prevent network calls
that are bound to fail. After a time interval an attempt
will be made to restore the connection and mark the node
as online if successful.

Fixes #4183
2017-08-11 11:49:35 -07:00
Aditya Manthramurthy 8975da4e84 Add new ReadFileWithVerify storage-layer API (#4349)
This is an enhancement to the XL/distributed-XL mode. FS mode is
unaffected.

The ReadFileWithVerify storage-layer call is similar to ReadFile with
the additional functionality of performing bit-rot checking. It
accepts additional parameters for a hashing algorithm to use and the
expected hex-encoded hash string.

This patch provides significant performance improvement because:

1. combines the step of reading the file (during
erasure-decoding/reconstruction) with bit-rot verification;

2. limits the number of file-reads; and

3. avoids transferring the file over the network for bit-rot
verification.

ReadFile API is implemented as ReadFileWithVerify with empty hashing
arguments.

Credits to AB and Harsha for the algorithmic improvement.

Fixes #4236.
2017-05-16 14:21:52 -07:00
Harshavardhana 62f8343879 Add constants for commonly used values. (#3588)
This is a consolidation effort, avoiding usage
of naked strings in codebase. Whenever possible
use constants which can be repurposed elsewhere.

This also fixes `goconst ./...` reported issues.
2017-01-18 12:24:34 -08:00
Harshavardhana 8562b22823 Fix delays and iterim fix for the partial fix in #3502 (#3511)
This patch uses a technique where in a retryable storage
before object layer initialization has a higher delay
and waits for longer period upto 4 times with time unit
of seconds.

And uses another set of configuration after the disks
have been formatted, i.e use a lower retry backoff rate
and retrying only once per 5 millisecond.

Network IO error count is reduced to a lower value i.e 256
before we reject the disk completely. This is done so that
combination of retry logic and total error count roughly
come to around 2.5secs which is when we basically take the
disk offline completely.

NOTE: This patch doesn't fix the issue of what if the disk
is completely dead and comes back again after the initialization.
Such a mutating state requires a change in our startup sequence
which will be done subsequently. This is an interim fix to alleviate
users from these issues.
2016-12-30 17:08:02 -08:00
Harshavardhana 41cf580bb1 Improve reconnection logic, allow jitters. (#3502)
Attempt a reconnect also if disk not found.

This is needed since any network operation error
is converted to disk not found but we also need
to make sure if disk is really not available. 

Additionally we also need to retry more than
once because the server might be in startup
sequence which would render other servers to
wrongly think that the server is offline.
2016-12-29 03:13:51 -08:00
Harshavardhana 6efee2072d objectLayer: Check for format.json in a wrapped disk. (#3311)
This is needed to validate if the `format.json` indeed exists
when a fresh node is brought online.

This wrapped implementation also connects to the remote node
by attempting a re-login. Subsequently after a successful
connect `format.json` is validated as well.

Fixes #3207
2016-11-23 15:48:10 -08:00