My current parser is given below - Reading in ~10MB CSV to an STL vector takes ~30secs, which is too slow for my liking given I've got over 100MB which needs to be read in every time the program is run. Can anyone give some advice on how to improve performance? Indeed, would it be faster in plain C?
int main() {
std::vector data;
std::ifstream infile( "data.csv" );
infile >> data;
std::cin.get();
return 0;
}
std::istream& operator >> (std::istream& ins, std::vector& data)
{
data.clear();
// Reserve data vector
std::string line, field;
std::getline(ins, line);
std::stringstream ssl(line), ssf;
std::size_t rows = 1, cols = 0;
while (std::getline(ssl, field, ',')) cols++;
while (std::getline(ins, line)) rows++;
std::cout << rows << " x " << cols << "\n";
ins.clear(); // clear bad state after eof
ins.seekg(0);
data.reserve(rows*cols);
// Populate data
double f = 0.0;
while (std::getline(ins, line)) {
ssl.str(line);
ssl.clear();
while (std::getline(ssl, field, ',')) {
ssf.str(field);
ssf.clear();
ssf >> f;
data.push_back(f);
}
}
return ins;
}
NB: I have also have openMP at my disposal, and the contents will eventually be used for GPGPU computation with CUDA.
Answer
On my machine, your reserve code takes about 1.1 seconds and your populate code takes 8.5 seconds.
Adding std::ios::sync_with_stdio(false); made no difference to my compiler.
The below C code takes 2.3 seconds.
int i = 0;
int j = 0;
while( true ) {
float x;
j = fscanf( file, "%f", & x );
if( j == EOF ) break;
data[i++] = x;
// skip ',' or '\n'
int ch = getc(file);
}
No comments:
Post a Comment