Merge #16577: util: CBufferedFile fixes and unit test

efd2474d17 util: CBufferedFile fixes (Larry Ruane)

Pull request description:

  The `CBufferedFile` object guarantees its user is able to "rewind" the data stream (that's being read from a file) up to a certain number of bytes, as specified by the user in the constructor. This guarantee is not honored due to a bug in the `SetPos` method.

  Such rewinding is done in `LoadExternalBlockFile()` (currently the only user of this object), which deserializes a series of `CBlock` objects. If that function encounters something unexpected in the data stream, which is coming from a `blocks/blk00???.dat` file, it "rewinds" to an earlier position in the stream to try to get in sync again. The `CBufferedFile` object does not actually rewind its file offset; it simply repositions its internal offset, `nReadPos`, to an earlier position within the object's private buffer; this is why there's a limit to how far the user may rewind.

  If `LoadExternalBlockFile()` needs to rewind (call `blkdat.SetPos()`), the stream may not be positioned as it should be, causing errors in deserialization. This need to rewind is probably rare, which is likely why this bug hasn't been noticed already. But if this object is used elsewhere in the future, this could be a serious problem, especially as, due to the nature of the bug, the `SetPos()` _sometimes_ works.

  This PR adds a unit test for `CBufferedFile` that fails due to this bug. (Until now it has had no unit tests.) The unit test provides good documentation and examples for developers trying to understand `LoadExternalBlockFile()` and for future users of this object.

  This PR also adds code to throw an exception from the constructor if the rewind argument is not less than the buffer size (since that doesn't make any sense).

  Finally, I discovered that the object is too restrictive in one respect: When the deserialization methods call this object's `read` method, a check ensures that the number of bytes being requested is less than the size of the buffer (adjusting for the rewind size), else it throws an exception. This restriction is unnecessary; the object being deserialized can be larger than the buffer because multiple reads from disk can satisfy the request.

ACKs for top commit:
  laanwj:
    ACK ~after squash.~ efd2474d17
  mzumsande:
    I had intended to follow up earlier on my last comment, ACK efd2474d17. I reviewed the code, ran tests and did a successful reindex on testnet with this branch.

Tree-SHA512: 695529e0af38bae2af4e0cc2895dda56a71b9059c3de04d32e09c0165a50f6aacee499f2042156ab5eaa6f0349bab6bcca4ef9f6f9ded4e60d4483beab7e4554
This commit is contained in:
Wladimir J. van der Laan 2019-09-26 13:37:06 +02:00
commit ab765c2ec7
No known key found for this signature in database
GPG key ID: 1E4AED62986CD25D
2 changed files with 257 additions and 11 deletions

View file

@ -735,16 +735,17 @@ protected:
size_t nBytes = fread((void*)&vchBuf[pos], 1, readNow, src);
if (nBytes == 0) {
throw std::ios_base::failure(feof(src) ? "CBufferedFile::Fill: end of file" : "CBufferedFile::Fill: fread failed");
} else {
nSrcPos += nBytes;
return true;
}
nSrcPos += nBytes;
return true;
}
public:
CBufferedFile(FILE *fileIn, uint64_t nBufSize, uint64_t nRewindIn, int nTypeIn, int nVersionIn) :
nType(nTypeIn), nVersion(nVersionIn), nSrcPos(0), nReadPos(0), nReadLimit(std::numeric_limits<uint64_t>::max()), nRewind(nRewindIn), vchBuf(nBufSize, 0)
{
if (nRewindIn >= nBufSize)
throw std::ios_base::failure("Rewind limit must be less than buffer size");
src = fileIn;
}
@ -777,8 +778,6 @@ public:
void read(char *pch, size_t nSize) {
if (nSize + nReadPos > nReadLimit)
throw std::ios_base::failure("Read attempted past buffer limit");
if (nSize + nRewind > vchBuf.size())
throw std::ios_base::failure("Read larger than buffer size");
while (nSize > 0) {
if (nReadPos == nSrcPos)
Fill();
@ -802,16 +801,19 @@ public:
//! rewind to a given reading position
bool SetPos(uint64_t nPos) {
nReadPos = nPos;
if (nReadPos + nRewind < nSrcPos) {
nReadPos = nSrcPos - nRewind;
size_t bufsize = vchBuf.size();
if (nPos + bufsize < nSrcPos) {
// rewinding too far, rewind as far as possible
nReadPos = nSrcPos - bufsize;
return false;
} else if (nReadPos > nSrcPos) {
}
if (nPos > nSrcPos) {
// can't go this far forward, go as far as possible
nReadPos = nSrcPos;
return false;
} else {
return true;
}
nReadPos = nPos;
return true;
}
bool Seek(uint64_t nPos) {

View file

@ -2,6 +2,7 @@
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
#include <random.h>
#include <streams.h>
#include <test/setup_common.h>
@ -202,4 +203,247 @@ BOOST_AUTO_TEST_CASE(streams_serializedata_xor)
std::string(ds.begin(), ds.end()));
}
BOOST_AUTO_TEST_CASE(streams_buffered_file)
{
FILE* file = fsbridge::fopen("streams_test_tmp", "w+b");
// The value at each offset is the offset.
for (uint8_t j = 0; j < 40; ++j) {
fwrite(&j, 1, 1, file);
}
rewind(file);
// The buffer size (second arg) must be greater than the rewind
// amount (third arg).
try {
CBufferedFile bfbad(file, 25, 25, 222, 333);
BOOST_CHECK(false);
} catch (const std::exception& e) {
BOOST_CHECK(strstr(e.what(),
"Rewind limit must be less than buffer size") != nullptr);
}
// The buffer is 25 bytes, allow rewinding 10 bytes.
CBufferedFile bf(file, 25, 10, 222, 333);
BOOST_CHECK(!bf.eof());
// These two members have no functional effect.
BOOST_CHECK_EQUAL(bf.GetType(), 222);
BOOST_CHECK_EQUAL(bf.GetVersion(), 333);
uint8_t i;
bf >> i;
BOOST_CHECK_EQUAL(i, 0);
bf >> i;
BOOST_CHECK_EQUAL(i, 1);
// After reading bytes 0 and 1, we're positioned at 2.
BOOST_CHECK_EQUAL(bf.GetPos(), 2);
// Rewind to offset 0, ok (within the 10 byte window).
BOOST_CHECK(bf.SetPos(0));
bf >> i;
BOOST_CHECK_EQUAL(i, 0);
// We can go forward to where we've been, but beyond may fail.
BOOST_CHECK(bf.SetPos(2));
bf >> i;
BOOST_CHECK_EQUAL(i, 2);
// If you know the maximum number of bytes that should be
// read to deserialize the variable, you can limit the read
// extent. The current file offset is 3, so the following
// SetLimit() allows zero bytes to be read.
BOOST_CHECK(bf.SetLimit(3));
try {
bf >> i;
BOOST_CHECK(false);
} catch (const std::exception& e) {
BOOST_CHECK(strstr(e.what(),
"Read attempted past buffer limit") != nullptr);
}
// The default argument removes the limit completely.
BOOST_CHECK(bf.SetLimit());
// The read position should still be at 3 (no change).
BOOST_CHECK_EQUAL(bf.GetPos(), 3);
// Read from current offset, 3, forward until position 10.
for (uint8_t j = 3; j < 10; ++j) {
bf >> i;
BOOST_CHECK_EQUAL(i, j);
}
BOOST_CHECK_EQUAL(bf.GetPos(), 10);
// We're guaranteed (just barely) to be able to rewind to zero.
BOOST_CHECK(bf.SetPos(0));
BOOST_CHECK_EQUAL(bf.GetPos(), 0);
bf >> i;
BOOST_CHECK_EQUAL(i, 0);
// We can set the position forward again up to the farthest
// into the stream we've been, but no farther. (Attempting
// to go farther may succeed, but it's not guaranteed.)
BOOST_CHECK(bf.SetPos(10));
bf >> i;
BOOST_CHECK_EQUAL(i, 10);
BOOST_CHECK_EQUAL(bf.GetPos(), 11);
// Now it's only guaranteed that we can rewind to offset 1
// (current read position, 11, minus rewind amount, 10).
BOOST_CHECK(bf.SetPos(1));
BOOST_CHECK_EQUAL(bf.GetPos(), 1);
bf >> i;
BOOST_CHECK_EQUAL(i, 1);
// We can stream into large variables, even larger than
// the buffer size.
BOOST_CHECK(bf.SetPos(11));
{
uint8_t a[40 - 11];
bf >> a;
for (uint8_t j = 0; j < sizeof(a); ++j) {
BOOST_CHECK_EQUAL(a[j], 11 + j);
}
}
BOOST_CHECK_EQUAL(bf.GetPos(), 40);
// We've read the entire file, the next read should throw.
try {
bf >> i;
BOOST_CHECK(false);
} catch (const std::exception& e) {
BOOST_CHECK(strstr(e.what(),
"CBufferedFile::Fill: end of file") != nullptr);
}
// Attempting to read beyond the end sets the EOF indicator.
BOOST_CHECK(bf.eof());
// Still at offset 40, we can go back 10, to 30.
BOOST_CHECK_EQUAL(bf.GetPos(), 40);
BOOST_CHECK(bf.SetPos(30));
bf >> i;
BOOST_CHECK_EQUAL(i, 30);
BOOST_CHECK_EQUAL(bf.GetPos(), 31);
// We're too far to rewind to position zero.
BOOST_CHECK(!bf.SetPos(0));
// But we should now be positioned at least as far back as allowed
// by the rewind window (relative to our farthest read position, 40).
BOOST_CHECK(bf.GetPos() <= 30);
// We can explicitly close the file, or the destructor will do it.
bf.fclose();
fs::remove("streams_test_tmp");
}
BOOST_AUTO_TEST_CASE(streams_buffered_file_rand)
{
// Make this test deterministic.
SeedInsecureRand(true);
for (int rep = 0; rep < 50; ++rep) {
FILE* file = fsbridge::fopen("streams_test_tmp", "w+b");
size_t fileSize = InsecureRandRange(256);
for (uint8_t i = 0; i < fileSize; ++i) {
fwrite(&i, 1, 1, file);
}
rewind(file);
size_t bufSize = InsecureRandRange(300) + 1;
size_t rewindSize = InsecureRandRange(bufSize);
CBufferedFile bf(file, bufSize, rewindSize, 222, 333);
size_t currentPos = 0;
size_t maxPos = 0;
for (int step = 0; step < 100; ++step) {
if (currentPos >= fileSize)
break;
// We haven't read to the end of the file yet.
BOOST_CHECK(!bf.eof());
BOOST_CHECK_EQUAL(bf.GetPos(), currentPos);
// Pretend the file consists of a series of objects of varying
// sizes; the boundaries of the objects can interact arbitrarily
// with the CBufferFile's internal buffer. These first three
// cases simulate objects of various sizes (1, 2, 5 bytes).
switch (InsecureRandRange(5)) {
case 0: {
uint8_t a[1];
if (currentPos + 1 > fileSize)
continue;
bf.SetLimit(currentPos + 1);
bf >> a;
for (uint8_t i = 0; i < 1; ++i) {
BOOST_CHECK_EQUAL(a[i], currentPos);
currentPos++;
}
break;
}
case 1: {
uint8_t a[2];
if (currentPos + 2 > fileSize)
continue;
bf.SetLimit(currentPos + 2);
bf >> a;
for (uint8_t i = 0; i < 2; ++i) {
BOOST_CHECK_EQUAL(a[i], currentPos);
currentPos++;
}
break;
}
case 2: {
uint8_t a[5];
if (currentPos + 5 > fileSize)
continue;
bf.SetLimit(currentPos + 5);
bf >> a;
for (uint8_t i = 0; i < 5; ++i) {
BOOST_CHECK_EQUAL(a[i], currentPos);
currentPos++;
}
break;
}
case 3: {
// Find a byte value (that is at or ahead of the current position).
size_t find = currentPos + InsecureRandRange(8);
if (find >= fileSize)
find = fileSize - 1;
bf.FindByte(static_cast<char>(find));
// The value at each offset is the offset.
BOOST_CHECK_EQUAL(bf.GetPos(), find);
currentPos = find;
bf.SetLimit(currentPos + 1);
uint8_t i;
bf >> i;
BOOST_CHECK_EQUAL(i, currentPos);
currentPos++;
break;
}
case 4: {
size_t requestPos = InsecureRandRange(maxPos + 4);
bool okay = bf.SetPos(requestPos);
// The new position may differ from the requested position
// because we may not be able to rewind beyond the rewind
// window, and we may not be able to move forward beyond the
// farthest position we've reached so far.
currentPos = bf.GetPos();
BOOST_CHECK_EQUAL(okay, currentPos == requestPos);
// Check that we can position within the rewind window.
if (requestPos <= maxPos &&
maxPos > rewindSize &&
requestPos >= maxPos - rewindSize) {
// We requested a position within the rewind window.
BOOST_CHECK(okay);
}
break;
}
}
if (maxPos < currentPos)
maxPos = currentPos;
}
}
fs::remove("streams_test_tmp");
}
BOOST_AUTO_TEST_SUITE_END()