Saturday, 30 April 2016

c++ - Performance bottleneck with CSV parser



My current parser is given below - Reading in ~10MB CSV to an STL vector takes ~30secs, which is too slow for my liking given I've got over 100MB which needs to be read in every time the program is run. Can anyone give some advice on how to improve performance? Indeed, would it be faster in plain C?




int main() {
std::vector data;
std::ifstream infile( "data.csv" );
infile >> data;
std::cin.get();
return 0;
}

std::istream& operator >> (std::istream& ins, std::vector& data)

{
data.clear();

// Reserve data vector
std::string line, field;
std::getline(ins, line);
std::stringstream ssl(line), ssf;

std::size_t rows = 1, cols = 0;
while (std::getline(ssl, field, ',')) cols++;

while (std::getline(ins, line)) rows++;

std::cout << rows << " x " << cols << "\n";

ins.clear(); // clear bad state after eof
ins.seekg(0);

data.reserve(rows*cols);

// Populate data

double f = 0.0;
while (std::getline(ins, line)) {
ssl.str(line);
ssl.clear();
while (std::getline(ssl, field, ',')) {
ssf.str(field);
ssf.clear();
ssf >> f;
data.push_back(f);
}

}
return ins;
}


NB: I have also have openMP at my disposal, and the contents will eventually be used for GPGPU computation with CUDA.


Answer



On my machine, your reserve code takes about 1.1 seconds and your populate code takes 8.5 seconds.



Adding std::ios::sync_with_stdio(false); made no difference to my compiler.




The below C code takes 2.3 seconds.



int i = 0;
int j = 0;
while( true ) {
float x;
j = fscanf( file, "%f", & x );
if( j == EOF ) break;
data[i++] = x;

// skip ',' or '\n'
int ch = getc(file);
}

No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...