Monday, 5 June 2017

python - Why is printing to stdout so slow? Can it be sped up?



I've always been amazed/frustrated with how long it takes to simply output to the terminal with a print statement. After some recent painfully slow logging I decided to look into it and was quite surprised to find that almost all the time spent is waiting for the terminal to process the results.



Can writing to stdout be sped up somehow?




I wrote a script ('print_timer.py' at the bottom of this question) to compare timing when writing 100k lines to stdout, to file, and with stdout redirected to /dev/null. Here is the timing result:



$ python print_timer.py
this is a test
this is a test

this is a test
-----
timing summary (100k lines each)
-----

print :11.950 s
write to file (+ fsync) : 0.122 s
print with stdout = /dev/null : 0.050 s


Wow. To make sure python isn't doing something behind the scenes like recognizing that I reassigned stdout to /dev/null or something, I did the redirection outside the script...



$ python print_timer.py > /dev/null
-----
timing summary (100k lines each)

-----
print : 0.053 s
write to file (+fsync) : 0.108 s
print with stdout = /dev/null : 0.045 s


So it isn't a python trick, it is just the terminal. I always knew dumping output to /dev/null sped things up, but never figured it was that significant!



It amazes me how slow the tty is. How can it be that writing to physical disk is WAY faster than writing to the "screen" (presumably an all-RAM op), and is effectively as fast as simply dumping to the garbage with /dev/null?




This link talks about how the terminal will block I/O so it can "parse [the input], update its frame buffer, communicate with the X server in order to scroll the window and so on"... but I don't fully get it. What can be taking so long?



I expect there is no way out (short of a faster tty implementation?) but figure I'd ask anyway.






UPDATE: after reading some comments I wondered how much impact my screen size actually has on the print time, and it does have some significance. The really slow numbers above are with my Gnome terminal blown up to 1920x1200. If I reduce it very small I get...



-----
timing summary (100k lines each)

-----
print : 2.920 s
write to file (+fsync) : 0.121 s
print with stdout = /dev/null : 0.048 s


That is certainly better (~4x), but doesn't change my question. It only adds to my question as I don't understand why the terminal screen rendering should slow down an application writing to stdout. Why does my program need to wait for screen rendering to continue?



Are all terminal/tty apps not created equal? I have yet to experiment. It really seems to me like a terminal should be able to buffer all incoming data, parse/render it invisibly, and only render the most recent chunk that is visible in the current screen configuration at a sensible frame rate. So if I can write+fsync to disk in ~0.1 seconds, a terminal should be able to complete the same operation in something of that order (with maybe a few screen updates while it did it).




I'm still kind of hoping there is a tty setting that can be changed from the application side to make this behaviour better for programmer. If this is strictly a terminal application issue, then this maybe doesn't even belong on StackOverflow?



What am I missing?






Here is the python program used to generate the timing:



import time, sys, tty
import os


lineCount = 100000
line = "this is a test"
summary = ""

cmd = "print"
startTime_s = time.time()
for x in range(lineCount):
print line
t = time.time() - startTime_s

summary += "%-30s:%6.3f s\n" % (cmd, t)

#Add a newline to match line outputs above...
line += "\n"

cmd = "write to file (+fsync)"
fp = file("out.txt", "w")
startTime_s = time.time()
for x in range(lineCount):
fp.write(line)

os.fsync(fp.fileno())
t = time.time() - startTime_s
summary += "%-30s:%6.3f s\n" % (cmd, t)

cmd = "print with stdout = /dev/null"
sys.stdout = file(os.devnull, "w")
startTime_s = time.time()
for x in range(lineCount):
fp.write(line)
t = time.time() - startTime_s

summary += "%-30s:%6.3f s\n" % (cmd, t)

print >> sys.stderr, "-----"
print >> sys.stderr, "timing summary (100k lines each)"
print >> sys.stderr, "-----"
print >> sys.stderr, summary

Answer



Thanks for all the comments! I've ended up answering it myself with your help. It feels dirty answering your own question, though.




Question 1: Why is printing to stdout slow?



Answer: Printing to stdout is not inherently slow. It is the terminal you work with that is slow. And it has pretty much zero to do with I/O buffering on the application side (eg: python file buffering). See below.



Question 2: Can it be sped up?



Answer: Yes it can, but seemingly not from the program side (the side doing the 'printing' to stdout). To speed it up, use a faster different terminal emulator.



Explanation...




I tried a self-described 'lightweight' terminal program called wterm and got significantly better results. Below is the output of my test script (at the bottom of the question) when running in wterm at 1920x1200 in on the same system where the basic print option took 12s using gnome-terminal:




-----
timing summary (100k lines each)
-----
print : 0.261 s
write to file (+fsync) : 0.110 s
print with stdout = /dev/null : 0.050 s



0.26s is MUCH better than 12s! I don't know whether wterm is more intelligent about how it renders to screen along the lines of how I was suggesting (render the 'visible' tail at a reasonable frame rate), or whether it just "does less" than gnome-terminal. For the purposes of my question I've got the answer, though. gnome-terminal is slow.



So - If you have a long running script that you feel is slow and it spews massive amounts of text to stdout... try a different terminal and see if it is any better!



Note that I pretty much randomly pulled wterm from the ubuntu/debian repositories. This link might be the same terminal, but I'm not sure. I did not test any other terminal emulators.






Update: Because I had to scratch the itch, I tested a whole pile of other terminal emulators with the same script and full screen (1920x1200). My manually collected stats are here:





wterm 0.3s
aterm 0.3s
rxvt 0.3s
mrxvt 0.4s
konsole 0.6s
yakuake 0.7s
lxterminal 7s
xterm 9s

gnome-terminal 12s
xfce4-terminal 12s
vala-terminal 18s
xvt 48s


The recorded times are manually collected, but they were pretty consistent. I recorded the best(ish) value. YMMV, obviously.



As a bonus, it was an interesting tour of some of the various terminal emulators available out there! I'm amazed my first 'alternate' test turned out to be the best of the bunch.


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...