Tuesday, 24 January 2017

python - How to read a large file line by line



I want to iterate over each line of an entire file. One way to do this is by reading the entire file, saving it to a list, then going over the line of interest. This method uses a lot of memory, so I am looking for an alternative.



My code so far:



for each_line in fileinput.input(input_file):
do_something(each_line)

for each_line_again in fileinput.input(input_file):

do_something(each_line_again)


Executing this code gives an error message: device active.



Any suggestions?



The purpose is to calculate pair-wise string similarity, meaning for each line in file, I want to calculate the Levenshtein distance with every other line.


Answer



The correct, fully Pythonic way to read a file is the following:




with open(...) as f:
for line in f:
# Do something with 'line'


The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered I/O and memory management so you don't have to worry about large files.




There should be one -- and preferably only one -- obvious way to do it.




No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...