Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LittleFS file caching #18

Open
atanisoft opened this issue Nov 8, 2020 · 59 comments
Open

LittleFS file caching #18

atanisoft opened this issue Nov 8, 2020 · 59 comments

Comments

@atanisoft
Copy link

I've encountered an odd case where the caching in LittleFS seems to be causing an issue that I'm not seeing on SD or SPIFFS.

Using a code pattern similar to below the second file handle will not see the data from the first file handle update:

uint8_t buf1[1] = {'a'};
uint8_t buf2[1] = {};
int fd1 = open("/fs/file.bin", O_RDWR);
assert(fd1 >= 0);
int fd2 = open("/fs/file.bin", O_RDWR);
assert(fd2 >= 0);
lseek(fd1, 0, SEEK_SET);
write(fd1, &buf1, sizeof(buf1));
fsync(fd1);
lseek(fd2, 0, SEEK_SET);
read(fd2, &buf2, sizeof(buf2));
assert(buf1[0] == buf2[0]);

Using two handles for the same file doesn't make a ton of sense in most cases, especially within the same function, but this pattern works on other FS types.

Any ideas or suggestions on config tweaks to reduce the chances of the two file handles being out of sync?

@BrianPugh
Copy link
Member

investigating

@BrianPugh
Copy link
Member

So I'm working on branch:

https://github.com/joltwallet/esp_littlefs/tree/multi-fd-sync

I added your code snippet as a unit test, and there is some non-determinism going on. It usually works the first time, but fails the second time. Trying to figure out if its a problem in my port, or if its an upstream issue.

@atanisoft
Copy link
Author

Interesting, is it using a cache per file handle or is it per file in the FS? If it is per handle, is it possibly the caches are getting out of sync?

@BrianPugh
Copy link
Member

so within my port, the function lfs_file_open will be called twice, and two unique file descriptors will be returned. This means there are also two independent lfs_file_t structures. I'm looking through upstream lfs code right now, and I really can't figure out why this works sometimes. I can't figure out why the first time this code snippet is ran, the (relatively) independent seconds lfs_file_t structure gets updated with the new file length. Still looking.

@BrianPugh
Copy link
Member

operating under the assumption that upstream littlefs cannot handle the same file being open multiple times (waiting to hear back from their authors), I can add this functionality to my port relatively easily. However, in the easy way to implement this, opening the same file multiple times would return the fame small-int-value file descriptor. Would this be acceptable? For example:

uint8_t buf1[1] = {'a'};
uint8_t buf2[1] = {};
int fd1 = open("/fs/file.bin", O_RDWR);     # lets say, returns value 3
assert(fd1 >= 0);
int fd2 = open("/fs/file.bin", O_RDWR);     # also returns 3
assert(fd2 >= 0);
lseek(fd1, 0, SEEK_SET);
write(fd1, &buf1, sizeof(buf1));
fsync(fd1);
lseek(fd2, 0, SEEK_SET);
read(fd2, &buf2, sizeof(buf2));
assert(buf1[0] == buf2[0]);

close(fd1);           # each descriptor still needs to be closed, or really "close" just needs to be called the same number of times on any of the descriptors
close(fd2);

@atanisoft
Copy link
Author

Perhaps track the lfs_file_t instance and return the same in your API as part of littlefs_vfs_open()? If that is what you are proposing I think it should be fine to return the same FD back to the caller (ESP VFS). The ESP VFS layer will assign a unique FD and translate it back to the VFS IMPL FD automatically.

I'd suggest tracking N usages so your VFS adapter holds onto the lfs_file_t until the last FD has been closed.

@BrianPugh
Copy link
Member

that's exactly how I was going to implement it 👍 . I can work on that more this evening.

It seems simpler to have a count in the vfs_littlefs_file_t struct rather than having a unique fd and searching along the linked list if another instance of the same file is open. Actually, the more I think about it, it might not be too hard to just do this "properly" with unique file descriptors. I'll investigate that as well.

@atanisoft
Copy link
Author

atanisoft commented Nov 9, 2020

Once you have something ready to test I can test it in my application and see how it performs. If you have a CAN-USB adapter for your PC and a CAN transceiver you can test it locally yourself possibly as well.

@BrianPugh
Copy link
Member

@atanisoft I implemented (and merged) the lazy solution where the same small-integer FD is returned. It required significantly less code change, and doing it the "proper" way may have some performance impact on people that really push the file system. Let me know if this solution works for you (it passes the unit test). Thanks!

See:
#19

@atanisoft
Copy link
Author

I put one comment on #19, but it won't impact my testing of the fix. I'll do some testing in the full setup and confirm there are no issues.

@atanisoft
Copy link
Author

This appears to have fixed the caching issue I was seeing, more work will be needed in the dependency which is opening the same file multiple times so that it can be consolidated to a single file handle being used for all flows but that will take more work.

@BrianPugh
Copy link
Member

I can't see your comment on the PR, are you sure you submitted it?

Also, after sleeping on it, I realized that this fix doesn't totally solve your issue, unless your usecase is exactly reflected in that snippet. With the current solution, each individual file descriptor doesn't have its own position in the file. Thinking of a way around this...

@atanisoft
Copy link
Author

With the current solution, each individual file descriptor doesn't have its own position in the file. Thinking of a way around this...

Yeah, I was thinking of the same and was considering commenting on it as well.

I can't see your comment on the PR, are you sure you submitted it?

Somehow it got stuck in a Pending state...

@BrianPugh
Copy link
Member

alright, I have a greater understanding of the upstream lfs codebase now, and I also realized my test would pass the first time due to artifacts left behind from a previous failed test. Now that I know nothing magical is going on behind the scenes (like the lfs file struct magically getting updated), i can tackle this a bit better.

@BrianPugh
Copy link
Member

BrianPugh commented Nov 10, 2020

Alright here is my new attack plan:

  1. Create a new struct like:
typedef struct _ {
    lfs_file_t file;
    uint8_t open_count;
} vfs_littlefs_file_wrapper_t

With this new setup, every open gets a unique FD (as it should), but multiple FD's can be pointing back to the same vfs_littlefs_file_wrapper_t

  1. replace vfs_littlefs_file_t.file with a pointer to struct described in (1). Add a new attribute lfs_off_t pos into vfs_littlefs_file_t. This will keep track of this file descriptor's position in file.
  2. before every file operation, run lfs_file_seek(..., vfs_littlefs_file_t.pos, LFS_SEEK_SET) if is mismatches the lfs_file_t.pos attribute. This function just flushes some writes are in memory, and then sets the internal pos attrtibute, so it should be a very cheap operation.
  3. After every file operation, copy over lfs_file_t.pos to vfs_littlefs_file_t.pos.
  4. Like currently, on every close, decrement the open_count and deallocate the vfs_littlefs_file_wrapper_t when it reaches 0.

How does this plan sound?

Possible downsides:

  1. Each fopen now requires an additional malloc, which itself would increase memory usage per FD by 4bytes for the pointer, a few bytes (not sure how much overhead esp-idf's malloc has) for the additional malloc, and then 4 more bytes for the aligned open_count. All in all, lets say this would increase the memory overhead by 12 bytes per FD

@atanisoft
Copy link
Author

I think overall it sounds good but do add some locking of some sort to ensure no two threads are directly modifying the lfs_file_t concurrently.

@BrianPugh
Copy link
Member

all the above operations would be done while the filesystem mutex is obtained, so there should be no issues.

@BrianPugh
Copy link
Member

@X-Ryl669 do you have any thoughts on this? You have a keen eye for these sorts of things.

@X-Ryl669
Copy link
Contributor

Honestly, this is a lot of change for a barely invalid usage pattern (so it adds a lot potential bugs and multitask issues).
If I were you, I would:

  1. If the filename are stored in the structures (depends on some build flags, IIRC)
  2. Prevent opening an already opened file by searching for it. It returns -1 if you try to open an already opened file and set errno to EBUSY. This means adding a hash check in the opened hash table (this is an acceptable cost, IMHO). No bug, no multitask issue. I would prefer this one.

Please notice that the file might not be opened with the same attributes (for example, once in read mode and later in read/write mode).
In that case, you can't reuse the file descriptor nor the file_t object.
What could you do in that case ?
Close the file and reopen with the new mode (or a combined mode) ?
If you do that, suddenly a read only file descriptor will be allowed to write!

So unless it's fixed upstream, there is no valid solution that's not a hack like the proposed implementation above, IMHO.

@atanisoft
Copy link
Author

Note that this is a perfectly valid use case, multiple file handles on the same file from different threads. It works on other FS types (including SPIFFS).

But you are right, the multiple file handles would potentially have different mode flags which would need to be accounted for in the read/write implementations.

@X-Ryl669
Copy link
Contributor

I never had the need for this. Anyway, I don't think the HAL should be the right layer to implement this if the underlying driver does not, since in all cases, it'll lead to bugs that are almost impossible to solve here (I'm not even talking about partial write, truncate and all the other oddities)
Hopefully, Brian already reported the bug to LittleFS developers, so it'll likely be fixed at the right place IMHO, that is: the filesystem code. Meanwhile, you can already query, in your code, for a existing file descriptor via its path and if it fails, open it.

@atanisoft
Copy link
Author

I never had the need for this.

Yeah, this would definitely be considered a corner case.

I don't think the HAL should be the right layer to implement this if the underlying driver does not

I suspect the underlying FS code does support this but it may be dependent on using the same lfs_file_t or enforcing some sort of cache synchronization method.

Meanwhile, you can already query, in your code, for a existing file descriptor via its path and if it fails, open it.

If that is how this is implemented, LittleFS will no longer be an option for a number of projects I know of. If there was a way to disable or reduce the cache size that may be a better option than throwing an error if you attempt to open the same file in multiple code paths.

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Nov 11, 2020

I suspect the underlying FS code does support this but it may be dependent on using the same lfs_file_t or enforcing some sort of cache synchronization method.

No, this is not possible, since the open option can be different, and the current position in the file will have to be different, and so on (think about ftruncate for example).

If that is how this is implemented, LittleFS will no longer be an option for a number of projects I know of.

I don't know about your project. I understand how simpler it seems not to care about the file path when running open. Yet, this is a wormhole hiding in this simplicity, and I really doubt SPIFFS is acting correctly in that case (and for sure FatFS does not, even if it seems to work for dumb test case).

In your example, it seems to work but it'll break as soon as the operation happens in different task unless you have a big lock/mutex for all your FS operations, but it's not how ESP-IDF is implemented since it would really hurt performance.

This is because underlying write to the flash is not atomic once it touches multiple pages. So you might have situation like Task X writes "A..B..C" and Task Y will read "A.." or "A..B" or "A..B..C" or the unexpected "A.. ..C" since the writing the B page is not finished yet. What happens if one task seeks to B and the other task truncate to A ?

Honestly, I don't know about your software, but if it relies on atomic operation on the FS, I don't think it'll work correctly anyway on ESP-IDF, whatever the FS.

@BrianPugh
Copy link
Member

these are good points. Alright I think my action plan on this (for now) is to:

  1. Revert PR19 on master in the meantime, since its a half-ass attempt that doesn't really solve it.
  2. Continue working on PR 21. Its a mostly working solution right now, i just want to polish it up a bit and test it more. As @X-Ryl669 said, this won't properly handle modes. I could add that in in the same way I handled file position, but I also agree with the fact that these features shouldn't be added in my layer. That said, as I was going through the LFS codebase, i think the chances are slim that this is/will be properly supported at the littlefs level. I don't think files really know of each other at the root lfs layer.
  3. Reorganize (2) a little bit so we can enable/disable this feature.

@atanisoft
Copy link
Author

The issue of atomicity is not a problem in the code in using and is working correctly on SPIFFS and FatFS (SD) as well as on other platforms (Linux mainly).

I'll take a look at PR 21, as long as it allows the file to be opened multiple times it should be fine. It can be on the consumer code to ensure atomicity is preserved across tasks if/when needed.

@geky
Copy link

geky commented Nov 15, 2020

Hello, I left some comments in the littlefs thread (littlefs-project/littlefs#483 (comment)). TLDR: littlefs files are transactional in the sense that each open file gets an independent snapshot of the file's contents.

What is the use case for opening the same file multiple times? Is it primarily a usability thing to keep track of multiple file positions?

@atanisoft
Copy link
Author

@geky it's a small bit of a bug in a library I'm using but it works on all FS types except LittleFS from what I've tested so far. It would seem there should be a way to flush the caches even across multiple file handles so that one handle being written to can be read by another after a flush operation.

@BrianPugh
Copy link
Member

So its hard to determine what the best path forward here is. There are multiple possible solutions, but they're all sort of hacky/weird.

@geky
Copy link

geky commented Nov 15, 2020

Wait no I misread that. I wonder if this is actually an undefined area for FatFS and you risk data corruption if the two file descriptor's caches overlap each other.

@atanisoft
Copy link
Author

It's possible in when using flash for storage but a bit unlikely on SD storage.

@johnnytolengo
Copy link

johnnytolengo commented Sep 17, 2021

Hello @BrianPugh , I got an error compiling your multi-fd branch:

esp_littlefs.c:204:9: error: unknown field 'pwrite_p' specified in initializer seems that pwrite_p is for some reason duplicated from write_p. Removing .pwrite_p and .pread_p it compiles :)

what a pitty that's not documented..
I in fact I am hoping that this branch is already thread-safe for ESP32, specially when the same file is attemped to be open multiple times.

  1. Is it already thread-safe?
  2. what are the improvements from the default branch?

Thanks

@BrianPugh
Copy link
Member

So i abandoned that effort; it was too big of a hack that didn't actually really solve the fundamental problem.

This littlefs driver is threadsafe, but you cannot safely write and read from the same file using multiple file descriptors.

@johnnytolengo
Copy link

ok, so if I well understand the master branch is threadsafe when you are accessing to multiple different files but it's not safe when you are accessing to the same file from 2 different cores at the same time.
is that correct?

@johnnytolengo
Copy link

Hello @BrianPugh I was reading the esp_littlefs code for several hours, I am looking for some solution to avoid crashes when writing and reading the same file simultaneously.
Any idea how to do that? some workaround for that, at least for now?
Why you delete the "multi-fd-sync" branch? something was wrong with it?

Thanks

@BrianPugh
Copy link
Member

It's an upstream littlefs limitation. See this upstream PR draft:

littlefs-project/littlefs#513

I should just delete that branch, it was an attempt at a workaround, but it ended up just making a mess and not really solving the underlying problem

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Sep 22, 2021

Meanwhile, in this specific usage case, you can also do this:

  1. whenever you write to a file (or call fsync)
  2. Maintain a counter
  3. Walk the efs->cache to look for your file's name hash. Increment the counter upon find one
  4. If the counter is higher than 2, walk the efs->cache again and close all matching files, then reopen them. You might want to remember the current position in the file (via ftellg and restore it once opened).

This requires storing the file's path (so no CONFIG_LITTLEFS_USE_ONLY_HASH).

This scheme will obviously not work if you use ftruncate to reduce the size of the file smaller than any other reader's reading position.

@johnnytolengo
Copy link

@X-Ryl669 at first look your proposal sounds reasonable taking in mind that the MCUs will never have simultaneously too many files reading and writing.
In my case is because I have a "core 1" task saving values in "logs.log" and another task in "core 0" web-service reading continuously that "logs.log" file.
Without control of that overlapping the FS crashes and the only way to recover that is formatting it.
I think that is a super common use case :)

@BrianPugh
Copy link
Member

@johnnytolengo can you use a circular buffer in memory instead? Or possibly cycle through a few log files (log_0.txt, log_1.txt, etc) For a lot of filesystems (I'm pretty sure) the action you are describing is undefined.

@X-Ryl669
Copy link
Contributor

Append only mode is where littlefs is performing less efficiently. In your situation, I would either:

  1. Use a mutex that's preventing writing when a read is requested.
  2. Upon a read, open the file, read it completely and close it
  3. Release the mutex and let any pending write goes through

You could have a single file descriptor either too (and still use a mutex) so when a read is requested, ftell the current write position, fseek to the part to read, read and seek back to the write position.

On ESP32, we have plenty of memory, you can spare 64kb for the live logs, and as Brian says, swap the live buffer once it's full by writing all at once.

@johnnytolengo
Copy link

what a pity that littlefs is not file-descriptor safe :)
I think that a good alternative is to use some Ram buffer (in case that's available) to keep logs in othewise just use mutex to protect the FS while writing.

Seems that's a simple thing but wide complicated than anything else..

@BrianPugh
Copy link
Member

BrianPugh commented Sep 23, 2021

just to be clear, this isn't an issue with the filesystem being threadsafe. The issue is having multiple independent file descriptors open at the same time pointing at the same file where at least one of them is writing.

@johnnytolengo
Copy link

well.. that's your point of view, in fact if you are running on different threads the filesystem is partial-safe, sadly the chances to corrupt it are very high.

@BrianPugh
Copy link
Member

Could you be a bit more precise on how you think a corruption occurs? You can have multiple file descriptors in multiple threads, all doing writes, and everything will be fine so long as they aren't pointing to the same filename.

@johnnytolengo
Copy link

you are right, just with multiple file descriptors :) but we have to take care in case the same fd is used at the same time.
Will be fine if the FS can store the paths for a while to check if that file is already taken by another task and at least return false or use lock() in order to wait until the file will be safetly available.

@BrianPugh
Copy link
Member

You can share the same file descriptor across threads and it will work. Once again, the issue is multiple file descriptors pointing to the same file where at least one of them is writing.

@johnnytolengo
Copy link

johnnytolengo commented Sep 24, 2021

mmm.. I never thought about that possibility but I have not clue how to use the same file descriptor in multiple threads.
The only function I know that takes and return the fd is eg. File fdlog = fs.open("mylog.log");
how to share fdlog and what happens once the fdlog.close() in the another thread?

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Sep 24, 2021

The code that's creating the tasks should open the descriptor and close it. So you don't care about sharing here, since it's not the task responsibility to do that, and you'll be safe. You'll have something like:

main()
{
   fd = open("log.log");
   createTask(&task1Handle, task1, (void*)&fd);
   createTask(&task2Handle, task2, (void*)&fd);
   while(isRunning) {}
   close(fd);
}

Or you can use a mutex for the file operations:

int fd = -1;
Mutex m;

main()
{
   createTask(&task1Handle, task1, (void*)&fd);
   createTask(&task2Handle, task2, (void*)&fd);
   while(isRunning) {}
}

task1()
{
[...]
   {
   ScopeMutex scope(m);
   if (fd == -1) fd = open("whatever");
   // Process fd
   fd.read() or fd.write()
   }
}

task2()
{
[...]
   { 
   ScopeMutex scope(m);
   if (fd == -1) fd = open("whatever");
   // Process fd
   fd.read() or fd.write()
   fd.close(); fd = -1 
   }
}

In that case, the mutex protects the fd variable, not what's pointed by the descriptor (which has its own mutex/semaphore anyway).

@johnnytolengo
Copy link

impressive! thank you for such an example

@alonbl
Copy link

alonbl commented Oct 19, 2023

I am having the same issue with the log use case, one task writes to the log and another should be able to read and present it, this means that the log file is opened for append in one task and open for read in the other task to read its content. I cannot close the writer and need the reader to be able to read.

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Oct 19, 2023

That's a bit of necromancy going on here. To sum up what's written above:

  1. littlefs provides atomic change for the flash's filesystem status. So it means that a file is neither "half" modified on disk.
  2. Yet the operations to ensure this aren't atomic by themselves. So while you are writing to a file, a temporary bunch of pages are modified. The larger the file, the larger the number of pages that can be rewritten (the longer it takes)
  3. Only when the last data is written successfully, that the "pointer" to the new file is updated (this is the "atomic" warranty that littlefs provides)
  4. This doesn't apply, obviously to things modifying metadata about the files, such at the current reading position, the current file size, and so on: these only exists in RAM
  5. There is no protection in littlefs (and this ESP IDF driver) about misuse of these metadata. This means that if you have 2 threads whose behavior depends on such metadata you must synchronize them yourself. Typically, if one thread is writing/truncating/deleting a file, it's changing the file size, and the write or read position but the other thread won't be informed about those changes and will not behave correctly in that case (understand: it'll crash).
  6. You can add this synchronization on your side if you do this (see post above).

In the "write to log and read it" case, that seems to bother here, the easiest solution is to protect the log access with a mutex so that either write or read can happen at the same time. In the read thread, take the mutex, open the file, read it, close it, release the mutex. In the write thread, take the mutex upon writing and syncing, then release it.

Another solution that's working a bit better is to delegate the write and read operation to only one thread and post an event to read or write to its main event loop. This thread will react upon a read event by rewinding the read position to the buffer size to capture and read it, post the buffer to the task that has requested the read. The write event will fetch the current log buffer and write it to the file. Only a single file descriptor is required in that case.

@alonbl
Copy link

alonbl commented Oct 19, 2023

Thank you for explaining this, yes, special custom workarounds may be applied.

The "atomic" warranty is the unexpected behavior in file system based set, these limitation (or differences from standard filesystem) need to be communicated so that people understand they need to create these workarounds, for example now I understand that a log file must be periodically closed in order for it to survive power loss, as the entire log will be lost.

@atanisoft
Copy link
Author

@alonbl A possible (simple) solution is to call flush(fd) from the task that handles writing, this should trigger the per-fd caching to be invalidated and changes persisted.

@BrianPugh
Copy link
Member

Hey guys, thanks for posting work arounds here. However, if you want this to be properly solved, maybe voice your thoughts to geky at littlefs-project/littlefs#513

@alonbl
Copy link

alonbl commented Oct 19, 2023

@alonbl A possible (simple) solution is to call flush(fd) from the task that handles writing, this should trigger the per-fd caching to be invalidated and changes persisted.

Hi @atanisoft,

I do not see flush(fd) available

esp_littlefs_example.c:65:5: error: implicit declaration of function 'flush'; did you mean 'fflush'? [-Werror=implicit-function-declaration]
   65 |     flush(fd);

I also expect that if this works, then the file content is committed and should be also visible to other open() attempts, which is in conflict to what @X-Ryl669 wrote in his summary.

Have you checked your solution and found it working? can you please share a complete example?

Thanks,

@X-Ryl669
Copy link
Contributor

You need to close the read's fd (if it's in its own thread) while you hold the mutex because the metadata it refers to will be wrong as soon as the write fd modifies the file. Once you've closed it, the only way to read is to reopen so it forces fetching the updated metadata.

On the write thread, you don't have to close it, you can sync it (with fflush) but this must be done with the mutex taken (so the sequence is hold the mutex, write data, sync, release mutex).

@atanisoft
Copy link
Author

Have you checked your solution and found it working? can you please share a complete example?

Yes, I've used it and it does work. I use the VFS approach to open(path, mode) and later call fsync(fd) to force flush the writes to storage.

In my original use case there were two file handles being created for the same underlying file, this was a bug in the library that I was using as it should be using a single file handle. There were different behaviors observed between the two file handles, much like @X-Ryl669 indicates above. After fixing the library to uniformly use a single file handle and implementing flushing to storage the issues were largely gone. The critical piece that needs to be handle is ensure you have locking to prevent concurrent read / write operations if the latest written data is interesting to the reader and either reopen the read file handle or use a single file handle across both read and write (which increases the need for locking).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants