Sunday, February 26, 2023

Use xargs

Get out of the habit of using while read as an idiom and instead use xargs to process arguments when you're doing batch compute stuff.

For example, imagine you're piping some data out with cat:

$ time ( cat data.txt | while read line; do echo $line; done )
a
b
c
d
e
f
g
0.00s user 0.00s system 85% cpu 0.005 total

This starts multiple processes. And with large batches of data, this can really add up. Though, with our small example here, surprisingly, the while-read loop is faster, as we'll see. But the point here is that, using xargs, our data is processed altogether at once:

$ time ( cat data.txt | xargs echo )                                                
a b c d e f g
0.00s user 0.01s system 110% cpu 0.008 total

But now consider this same idiom while processing a large amount of data—hundreds or thousands of lines, etc. Here, we'll run our benchmark again:

$ time ( head -n 10000 Notes/all.txt | while read line; do echo $line; done )
..
.. // omitted for brevity
0.10s user 0.18s system 105% cpu 0.267 total

$ time ( head -n 10000 Notes/all.txt | xargs echo; )  
..
.. // snipped again
0.02s user 0.08s system 100% cpu 0.100 total

When possible, use xargs. You'll likely save time and CPU cycles. :)

No comments:

Post a Comment