Tuesday, 9 February 2016

bash - Parsing JSON with Unix tools



I'm trying to parse JSON returned from a curl request, like so:



curl 'http://twitter.com/users/username.json' |
sed -e 's/[{}]/''/g' |

awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}'


The above splits the JSON into fields, for example:



% ...
"geo_enabled":false
"friends_count":245
"profile_text_color":"000000"
"status":"in_reply_to_screen_name":null

"source":"web"
"truncated":false
"text":"My status"
"favorited":false
% ...


How do I print a specific field (denoted by the -v k=text)?


Answer



There are a number of tools specifically designed for the purpose of manipulating JSON from the command line, and will be a lot easier and more reliable than doing it with Awk, such as jq:




curl -s 'https://api.github.com/users/lambda' | jq -r '.name'


You can also do this with tools that are likely already installed on your system, like Python using the json module, and so avoid any extra dependencies, while still having the benefit of a proper JSON parser. The following assume you want to use UTF-8, which the original JSON should be encoded in and is what most modern terminals use as well:



Python 3:



curl -s 'https://api.github.com/users/lambda' | \
python3 -c "import sys, json; print(json.load(sys.stdin)['name'])"



Python 2:



export PYTHONIOENCODING=utf8
curl -s 'https://api.github.com/users/lambda' | \
python2 -c "import sys, json; print json.load(sys.stdin)['name']"


Historical notes




This answer originally recommended jsawk, which should still work, but is a little more cumbersome to use than jq, and depends on a standalone JavaScript interpreter being installed which is less common than a Python interpreter, so the above answers are probably preferable:



curl -s 'https://api.github.com/users/lambda' | jsawk -a 'return this.name'


This answer also originally used the Twitter API from the question, but that API no longer works, making it hard to copy the examples to test out, and the new Twitter API requires API keys, so I've switched to using the GitHub API which can be used easily without API keys. The first answer for the original question would be:



curl 'http://twitter.com/users/username.json' | jq -r '.text'


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...