Saturday, 22 April 2017

c++ - Why are std::fstreams so slow?

I was working on a simple parser and when profiling I observed the bottleneck is in... file read! I extracted very simple test to compare the performance of fstreams and FILE* when reading a big blob of data:



#include 
#include
#include
#include
#include


void measure(const std::string& test, std::function function)
{
auto start_time = std::chrono::high_resolution_clock::now();

function();

auto duration = std::chrono::duration_cast(std::chrono::high_resolution_clock::now() - start_time);
std::cout<(duration.count()) * 0.000001<<" ms"<}


#define BUFFER_SIZE (1024 * 1024 * 1024)

int main(int argc, const char * argv[])
{
auto buffer = new char[BUFFER_SIZE];
memset(buffer, 123, BUFFER_SIZE);

measure("FILE* write", [buffer]()
{
FILE* file = fopen("test_file_write", "wb");

fwrite(buffer, 1, BUFFER_SIZE, file);
fclose(file);
});
measure("FILE* read", [buffer]()
{
FILE* file = fopen("test_file_read", "rb");
fread(buffer, 1, BUFFER_SIZE, file);
fclose(file);
});
measure("fstream write", [buffer]()

{
std::ofstream stream("test_stream_write", std::ios::binary);
stream.write(buffer, BUFFER_SIZE);
});
measure("fstream read", [buffer]()
{
std::ifstream stream("test_stream_read", std::ios::binary);
stream.read(buffer, BUFFER_SIZE);
});


delete[] buffer;
}


The results of running this code on my machine are:



FILE* write 1388.59 ms
FILE* read 1292.51 ms
fstream write 3105.38 ms
fstream read 3319.82 ms



fstream write/read are about 2 times slower than FILE* write/read! And this while reading a big blob of data, without any parsing or other features of fstreams. I'm running the code on Mac OS, Intel I7 2.6GHz, 16GB 1600 MHz Ram, SSD drive. Please note that running again same code the time for FILE* read is very low (about 200 ms) probably because the file gets cached... This is why the files opened for reading are not created using the code.



Why when reading just a blob of binary data using fstream is so slow compared to FILE*?



EDIT 1: I updated the code and the times. Sorry for the delay!



EDIT 2: I added command line and new results (very similar to previous ones!)




$ clang++  main.cpp -std=c++11 -stdlib=libc++ -O3
$ ./a.out
FILE* write 1417.9 ms
FILE* read 1292.59 ms
fstream write 3214.02 ms
fstream read 3052.56 ms


Following the results for the second run:




$ ./a.out
FILE* write 1428.98 ms
FILE* read 196.902 ms
fstream write 3343.69 ms
fstream read 2285.93 ms


It looks like the file gets cached when reading for both FILE* and stream as the time reduces with the same amount for both of them.



EDIT 3: I reduced the code to this:




FILE* file = fopen("test_file_write", "wb");
fwrite(buffer, 1, BUFFER_SIZE, file);
fclose(file);

std::ofstream stream("test_stream_write", std::ios::binary);
stream.write(buffer, BUFFER_SIZE);


And started the profiler. It seems like stream spends lots of time in xsputn function, and the actual write calls have the same duration (as it should be, it's the same function...)




Running    Time     Self       Symbol Name
3266.0ms 66.9% 0,0 std::__1::basic_ostream >::write(char const*, long)
3265.0ms 66.9% 2145,0 std::__1::basic_streambuf >::xsputn(char const*, long)
1120.0ms 22.9% 7,0 std::__1::basic_filebuf >::overflow(int)
1112.0ms 22.7% 2,0 fwrite
1127.0ms 23.0% 0,0 fwrite


EDIT 4 For some reason this question is marked as duplicate. I wanted to point out that I don't use printf at all, I use only std::cout to write the time. The files used in the read part are the output from the write part, copied with different name to avoid caching

No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...