Tuesday 6 December 2016

performance - Writing a binary file in C++ very fast

The best solution is to implement an async writing with double buffering.


Look at the time line:


------------------------------------------------>
FF|WWWWWWWW|FF|WWWWWWWW|FF|WWWWWWWW|FF|WWWWWWWW|

The 'F' represents time for buffer filling, and 'W' represents time for writing buffer to disk. So the problem in wasting time between writing buffers to file. However, by implementing writing on a separate thread, you can start filling the next buffer right away like this:


------------------------------------------------> (main thread, fills buffers)
FF|ff______|FF______|ff______|________|
------------------------------------------------> (writer thread)
|WWWWWWWW|wwwwwwww|WWWWWWWW|wwwwwwww|

F - filling 1st buffer
f - filling 2nd buffer
W - writing 1st buffer to file
w - writing 2nd buffer to file
_ - wait while operation is completed


This approach with buffer swaps is very useful when filling a buffer requires more complex computation (hence, more time).
I always implement a CSequentialStreamWriter class that hides asynchronous writing inside, so for the end-user the interface has just Write function(s).


And the buffer size must be a multiple of disk cluster size. Otherwise, you'll end up with poor performance by writing a single buffer to 2 adjacent disk clusters.


Writing the last buffer.
When you call Write function for the last time, you have to make sure that the current buffer is being filled should be written to disk as well. Thus CSequentialStreamWriter should have a separate method, let's say Finalize (final buffer flush), which should write to disk the last portion of data.


Error handling.
While the code start filling 2nd buffer, and the 1st one is being written on a separate thread, but write fails for some reason, the main thread should be aware of that failure.


------------------------------------------------> (main thread, fills buffers)
FF|fX|
------------------------------------------------> (writer thread)
__|X|

Let's assume the interface of a CSequentialStreamWriter has Write function returns bool or throws an exception, thus having an error on a separate thread, you have to remember that state, so next time you call Write or Finilize on the main thread, the method will return False or will throw an exception. And it does not really matter at which point you stopped filling a buffer, even if you wrote some data ahead after the failure - most likely the file would be corrupted and useless.

No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...