Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect behavior with ifeof() when file length matches default buffer size #29

Open
welchr opened this issue Feb 28, 2023 · 0 comments

Comments

@welchr
Copy link
Member

welchr commented Feb 28, 2023

I believe this library is deprecated, but wanted to leave this info for future reference in case someone else runs into this problem.

It seems as though ifeof() does not return true when the last line in the file fills the buffer exactly (to length DEFAULT_BUFFER_SIZE).

If you read through a file using String::readLine():

int String::ReadLine(IFILE & f)
{
int ch;
char *ptr = buffer;
char *endBuffer = buffer + size;
len = 0;
while ( ((ch = f->ifgetc()) != EOF) && (ch != '\n'))
{
if (ptr >= endBuffer - 1)
{
// resize: 1 byte for the next character, 1 byte
// for the NUL at the end.
Grow(len + 2);
endBuffer = buffer + size;
ptr = buffer + len;
}
*ptr++ = ch;
len++;
}
// NB: assumes that buffer is always allocated.
buffer[len] = 0;
if ((ch == EOF) && (len == 0))
{
// Indicate error/EOF if nothing was read.
return -1;
}
return 1;
}
#endif

It calls ifgetc() repeatedly:

inline int ifgetc()
{
if (myBufferIndex >= myCurrentBufferSize)
{
// at the last index, read a new buffer.
myCurrentBufferSize = readFromFile(myFileBuffer, myAllocatedBufferSize);
myBufferIndex = 0;
// If the buffer index is still greater than or equal to the
// myCurrentBufferSize, then we failed to read the file - return EOF.
// NB: This only needs to be checked when myCurrentBufferSize
// is changed. Simplify check - readFromFile returns zero on EOF
if (myCurrentBufferSize == 0)
{
return(EOF);
}
}
return(myFileBuffer[myBufferIndex++]);
}

When the last line fills the buffer exactly, myBufferIndex == myCurrentBufferSize == DEFAULT_BUFFER_SIZE.

But when ifeof() checks to see if we have arrived at EOF:

inline int ifeof() const
{
// Not EOF if we are not at the end of the buffer.
if (myBufferIndex < myCurrentBufferSize)
{
// There are still available bytes in the buffer, so NOT EOF.
return false;
}
else
{
if (myFileTypePtr == NULL)
{
// No myFileTypePtr, so not eof (return 0).
return 0;
}
// exhausted our buffer, so check the file for eof.
return myFileTypePtr->eof();
}
}

It checks if myBufferIndex < myCurrentBufferSize, which will be false, and it thinks the file should continue being read from.

This can result in a seg fault depending on what you try to do with the buffer at that point, after you've tried to read another line (that doesn't exist) from the file.

Code sample:

#include "InputFile.h"
#include "StringBasics.h"
#include "StringArray.h"
#include <iostream>
#include <string>

int main(int argc, char *argv[]) {
  std::string bad_filepath = argv[1];
  IFILE test = ifopen(bad_filepath.c_str(), "r");

  // This loop is roughly what RAREMETAL does when reading in a covariance file
  while (!ifeof(test)) {
    String buffer;
    buffer.ReadLine(test);

    StringArray tokens;
    tokens.AddTokens(buffer, "\t ");

    int p = tokens[0].Find("chr");
  }

  std::cout << "Finished";
  return 0;
}

Attached file, run with above code, will cause a seg fault.

The simple solution to this (if you're a user looking to get around this bug) is just add a zero somewhere it doesn't matter at the end of the file, or an extra space, and the file will be read successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant