First, your data is not valid json, there is a comma too much:
{
"TestNames": [
{
"Name": "test1",
"CreateDate": "2016-08-30T10:52:52Z",
"Id": "testId1", <--- Remove that!
},
{
"Name": "test2",
"CreateDate": "2016-08-30T10:52:13Z",
"Id": "testId2"
}
]
}
Once you've fixed that you can use jq for parsing json on the command line:
echo "$x" | jq -r '.TestNames[]|"\(.Name) , \(.Id)"'
if you need to keep the output values.
declare -A map1
while read name id ; do
echo "$name"
echo "$id"
map1[$name]=$id
done < <(echo "$x" | jq -r '.TestNames[]|"\(.Name) \(.Id)"')
echo "count : ${#map1[@]}"
echo "in loop: ${map1[$name]}"
Answer from hek2mgl on Stack OverflowFirst, your data is not valid json, there is a comma too much:
{
"TestNames": [
{
"Name": "test1",
"CreateDate": "2016-08-30T10:52:52Z",
"Id": "testId1", <--- Remove that!
},
{
"Name": "test2",
"CreateDate": "2016-08-30T10:52:13Z",
"Id": "testId2"
}
]
}
Once you've fixed that you can use jq for parsing json on the command line:
echo "$x" | jq -r '.TestNames[]|"\(.Name) , \(.Id)"'
if you need to keep the output values.
declare -A map1
while read name id ; do
echo "$name"
echo "$id"
map1[$name]=$id
done < <(echo "$x" | jq -r '.TestNames[]|"\(.Name) \(.Id)"')
echo "count : ${#map1[@]}"
echo "in loop: ${map1[$name]}"
I'd recommend using jq, a command-line JSON parser :
$ echo '''{
"Name": "test1",
"CreateDate": "2016-08-30T10:52:52Z",
"Id": "testId1"
}''' | jq '.Name + " , " + .Id'
"test1 , testId1"
$ echo '''{ "TestNames":
[{
"Name": "test1",
"CreateDate": "2016-08-30T10:52:52Z",
"Id": "testId1"
},
{
"Name": "test2",
"CreateDate": "2016-08-30T10:52:13Z",
"Id": "testId2"
}]
}''' | jq '.TestNames[] | .Name + " , " + .Id'
"test1 , testId1"
"test2 , testId2"
If you really cannot use a proper JSON parser such as jq[1]
, try an awk-based solution:
Bash 4.x:
readarray -t values < <(awk -F\" 'NF>=3 {print $4}' myfile.json)
Bash 3.x:
IFS=$'\n' read -d '' -ra values < <(awk -F\" 'NF>=3 {print $4}' myfile.json)
This stores all property values in Bash array ${values[@]}, which you can inspect with
declare -p values.
These solutions have limitations:
- each property must be on its own line,
- all values must be double-quoted,
- embedded escaped double quotes are not supported.
All these limitations reinforce the recommendation to use a proper JSON parser.
Note: The following alternative solutions use the Bash 4.x+ readarray -t values command, but they also work with the Bash 3.x alternative, IFS=$'\n' read -d '' -ra values.
grep + cut combination: A single grep command won't do (unless you use GNU grep - see below), but adding cut helps:
readarray -t values < <(grep '"' myfile.json | cut -d '"' -f4)
GNU grep: Using -P to support PCREs, which support \K to drop everything matched so far (a more flexible alternative to a look-behind assertion) as well as look-ahead assertions ((?=...)):
readarray -t values < <(grep -Po ':\s*"\K.+(?="\s*,?\s*$)' myfile.json)
Finally, here's a pure Bash (3.x+) solution:
What makes this a viable alternative in terms of performance is that no external utilities are called in each loop iteration; however, for larger input files, a solution based on external utilities will be much faster.
#!/usr/bin/env bash
declare -a values # declare the array
# Read each line and use regex parsing (with Bash's `=~` operator)
# to extract the value.
while read -r line; do
# Extract the value from between the double quotes
# and add it to the array.
[[ $line =~ :[[:blank:]]+\"(.*)\" ]] && values+=( "${BASH_REMATCH[1]}" )
done < myfile.json
declare -p values # print the array
[1] Here's what a robust jq-based solution would look like (Bash 4.x):
readarray -t values < <(jq -r '.[]' myfile.json)
jq is good enough to solve this problem
paste -s <(jq '.files[].name' YourJsonString) <(jq '.files[].age' YourJsonString) <( jq '.files[].websiteurl' YourJsonString)
So that you get a table and you can grep any rows or awk print any columns you want
ubuntu - parse one field from an JSON array into bash array - Unix & Linux Stack Exchange
text processing - How to parse JSON with shell scripting in Linux? - Unix & Linux Stack Exchange
arrays - Json Object Parsing using shell script - Stack Overflow
bash - Parsing JSON with Unix tools - Stack Overflow
Videos
Using jq :
readarray arr < <(jq '.[].item2' json)
printf '%s\n' "${arr[@]}"
If you need a more hardened way:
readarray -td '' arr
for inputs with newlines or other special characters, avoiding word splitting.
Output:
value2
value2_2
Check:
Process Substitution >(command ...) or <(...) is replaced by a temporary filename. Writing or reading that file causes bytes to get piped to the command inside. Often used in combination with file redirection: cmd1 2> >(cmd2).
See http://mywiki.wooledge.org/ProcessSubstitution http://mywiki.wooledge.org/BashFAQ/024
The following is actually buggy:
# BAD: Output line of * is replaced with list of local files; can't deal with whitespace
arr=( $( curl -k "$url" | jq -r '.[].item2' ) )
If you have bash 4.4 or newer, a best-of-all-worlds option is available:
# BEST: Supports bash 4.4+, with failure detection and newlines in data
{ readarray -t -d '' arr && wait "$!"; } < <(
set -o pipefail
curl --fail -k "$url" | jq -j '.[].item2 | (., "\u0000")'
)
...whereas with bash 4.0, you can have terseness at the cost of failure detection and literal newline support:
# OK (with bash 4.0), but can't detect failure and doesn't support values with newlines
readarray -t arr < <(curl -k "$url" | jq -r '.[].item2' )
...or bash 3.x compatibility and failure detection, but without newline support:
# OK: Supports bash 3.x; no support for newlines in values, but can detect failures
IFS=$'\n' read -r -d '' -a arr < <(
set -o pipefail
curl --fail -k "$url" | jq -r '.[].item2' && printf '\0'
)
...or bash 3.x compatibility and newline support, but without failure detection:
# OK: Supports bash 3.x and supports newlines in values; does not detect failures
arr=( )
while IFS= read -r -d '' item; do
arr+=( "$item" )
done < <(curl --fail -k "$url" | jq -j '.[] | (.item2, "\u0000")')
The availability of parsers in nearly every programming language is one of the advantages of JSON as a data-interchange format.
Rather than trying to implement a JSON parser, you are likely better off using either a tool built for JSON parsing such as jq or a general purpose script language that has a JSON library.
For example, using jq, you could pull out the ImageID from the first item of the Instances array as follows:
jq '.Instances[0].ImageId' test.json
Alternatively, to get the same information using Ruby's JSON library:
ruby -rjson -e 'j = JSON.parse(File.read("test.json")); puts j["Instances"][0]["ImageId"]'
I won't answer all of your revised questions and comments but the following is hopefully enough to get you started.
Suppose that you had a Ruby script that could read a from STDIN and output the second line in your example output[0]. That script might look something like:
#!/usr/bin/env ruby
require 'json'
data = JSON.parse(ARGF.read)
instance_id = data["Instances"][0]["InstanceId"]
name = data["Instances"][0]["Tags"].find {|t| t["Key"] == "Name" }["Value"]
owner = data["Instances"][0]["Tags"].find {|t| t["Key"] == "Owner" }["Value"]
cost_center = data["Instances"][0]["SubnetId"].split("-")[1][0..3]
puts "#{instance_id}\t#{name}\t#{cost_center}\t#{owner}"
How could you use such a script to accomplish your whole goal? Well, suppose you already had the following:
- a command to list all your instances
- a command to get the json above for any instance on your list and output it to STDOU
One way would be to use your shell to combine these tools:
echo -e "Instance id\tName\tcost centre\tOwner"
for instance in $(list-instances); do
get-json-for-instance $instance | ./ugly-ruby-scriptrb
done
Now, maybe you have a single command that give you one json blob for all instances with more items in that "Instances" array. Well, if that is the case, you'll just need to modify the script a bit to iterate through the array rather than simply using the first item.
In the end, the way to solve this problem, is the way to solve many problems in Unix. Break it down into easier problems. Find or write tools to solve the easier problem. Combine those tools with your shell or other operating system features.
[0] Note that I have no idea where you get cost-center from, so I just made it up.
You can use following python script to parse that data. Lets assume that you have JSON data from arrays in files like array1.json, array2.json and so on.
import json
import sys
from pprint import pprint
jdata = open(sys.argv[1])
data = json.load(jdata)
print "InstanceId", " - ", "Name", " - ", "Owner"
print data["Instances"][0]["InstanceId"], " - " ,data["Instances"][0]["Tags"][1]["Value"], " - " ,data["Instances"][0]["Tags"][2]["Value"]
jdata.close()
And then just run:
$ for x in `ls *.json`; do python parse.py $x; done
InstanceId - Name - Owner
i-1234576 - RDS_Machine (us-east-1c) - Jyoti Bhanot
I haven't seen cost in your data, that's why I didn't include that.
According to discussion in comments, I have updated parse.py script:
import json
import sys
from pprint import pprint
jdata = sys.stdin.read()
data = json.loads(jdata)
print "InstanceId", " - ", "Name", " - ", "Owner"
print data["Instances"][0]["InstanceId"], " - " ,data["Instances"][0]["Tags"][1]["Value"], " - " ,data["Instances"][0]["Tags"][2]["Value"]
You can try to run following command:
#ec2-describe-instance <instance> | python parse.py
This can be done entirely in jq.
jq -r '
.countries |
map(
.country as $country |
.city | map("country: \( $country ), city: \( . )\n") | add
) |
join("\n")
'
Gives:
country: India, city: India1
country: India, city: India2
country: India, city: India3
country: USA, city: USA1
country: USA, city: USA2
country: USA, city: USA3
jqplay
If you don't need that blank line, it's a lot simpler.
jq -r '
.countries[] |
.country as $country |
.city[] |
"country: \( $country ), city: \( . )"
'
jqplay
With a tip from Oliver Sinclair InfoSleuth, the later can be reduced to
jq -r '.countries[] | "country: \( .country ), city: \( .city[] )"'
jqplay
Assuming the file is formatted as presented ...
One simple awk script based on the double quote as delimiter:
awk -F'"' '
$2=="country" { for (i=8; i<=NF; i+=2)
printf "country:%s, city:%s\n", $4, $i
}
' country.json
This generates:
country:India, city:India1
country:India, city:India2
country:India, city:India3
country:USA, city:USA1
country:USA, city:USA2
country:USA, city:USA3
If you really want a blank line between sections of output:
awk -F'"' '
$2=="country" { printf "%s", pfx
for (i=8; i<=NF; i+=2)
printf "country:%s, city:%s\n", $4, $i
pfx="\n"
}
' country.json
This generates:
country:India, city:India1 # no leading blank line
country:India, city:India2
country:India, city:India3
# blank line only between sections
country:USA, city:USA1
country:USA, city:USA2
country:USA, city:USA3 # no trailing blank line
There are a number of tools specifically designed for the purpose of manipulating JSON from the command line, and will be a lot easier and more reliable than doing it with Awk, such as jq:
curl -s 'https://api.github.com/users/lambda' | jq -r '.name'
You can also do this with tools that are likely already installed on your system, like Python using the json module, and so avoid any extra dependencies, while still having the benefit of a proper JSON parser. The following assume you want to use UTF-8, which the original JSON should be encoded in and is what most modern terminals use as well:
Python 3:
curl -s 'https://api.github.com/users/lambda' | \
python3 -c "import sys, json; print(json.load(sys.stdin)['name'])"
Python 2:
export PYTHONIOENCODING=utf8
curl -s 'https://api.github.com/users/lambda' | \
python2 -c "import sys, json; print json.load(sys.stdin)['name']"
Frequently Asked Questions
Why not a pure shell solution?
The standard POSIX/Single Unix Specification shell is a very limited language which doesn't contain facilities for representing sequences (list or arrays) or associative arrays (also known as hash tables, maps, dicts, or objects in some other languages). This makes representing the result of parsing JSON somewhat tricky in portable shell scripts. There are somewhat hacky ways to do it, but many of them can break if keys or values contain certain special characters.
Bash 4 and later, zsh, and ksh have support for arrays and associative arrays, but these shells are not universally available (macOS stopped updating Bash at Bash 3, due to a change from GPLv2 to GPLv3, while many Linux systems don't have zsh installed out of the box). It's possible that you could write a script that would work in either Bash 4 or zsh, one of which is available on most macOS, Linux, and BSD systems these days, but it would be tough to write a shebang line that worked for such a polyglot script.
Finally, writing a full fledged JSON parser in shell would be a significant enough dependency that you might as well just use an existing dependency like jq or Python instead. It's not going to be a one-liner, or even small five-line snippet, to do a good implementation.
Why not use awk, sed, or grep?
It is possible to use these tools to do some quick extraction from JSON with a known shape and formatted in a known way, such as one key per line. There are several examples of suggestions for this in other answers.
However, these tools are designed for line based or record based formats; they are not designed for recursive parsing of matched delimiters with possible escape characters.
So these quick and dirty solutions using awk/sed/grep are likely to be fragile, and break if some aspect of the input format changes, such as collapsing whitespace, or adding additional levels of nesting to the JSON objects, or an escaped quote within a string. A solution that is robust enough to handle all JSON input without breaking will also be fairly large and complex, and so not too much different than adding another dependency on jq or Python.
I have had to deal with large amounts of customer data being deleted due to poor input parsing in a shell script before, so I never recommend quick and dirty methods that may be fragile in this way. If you're doing some one-off processing, see the other answers for suggestions, but I still highly recommend just using an existing tested JSON parser.
Historical notes
This answer originally recommended jsawk, which should still work, but is a little more cumbersome to use than jq, and depends on a standalone JavaScript interpreter being installed which is less common than a Python interpreter, so the above answers are probably preferable:
curl -s 'https://api.github.com/users/lambda' | jsawk -a 'return this.name'
This answer also originally used the Twitter API from the question, but that API no longer works, making it hard to copy the examples to test out, and the new Twitter API requires API keys, so I've switched to using the GitHub API which can be used easily without API keys. The first answer for the original question would be:
curl 'http://twitter.com/users/username.json' | jq -r '.text'
To quickly extract the values for a particular key, I personally like to use "grep -o", which only returns the regex's match. For example, to get the "text" field from tweets, something like:
grep -Po '"text":.*?[^\\]",' tweets.json
This regex is more robust than you might think; for example, it deals fine with strings having embedded commas and escaped quotes inside them. I think with a little more work you could make one that is actually guaranteed to extract the value, if it's atomic. (If it has nesting, then a regex can't do it of course.)
And to further clean (albeit keeping the string's original escaping) you can use something like: | perl -pe 's/"text"://; s/^"//; s/",$//'. (I did this for this analysis.)
To all the haters who insist you should use a real JSON parser -- yes, that is essential for correctness, but
- To do a really quick analysis, like counting values to check on data cleaning bugs or get a general feel for the data, banging out something on the command line is faster. Opening an editor to write a script is distracting.
grep -ois orders of magnitude faster than the Python standardjsonlibrary, at least when doing this for tweets (which are ~2 KB each). I'm not sure if this is just becausejsonis slow (I should compare to yajl sometime); but in principle, a regex should be faster since it's finite state and much more optimizable, instead of a parser that has to support recursion, and in this case, spends lots of CPU building trees for structures you don't care about. (If someone wrote a finite state transducer that did proper (depth-limited) JSON parsing, that would be fantastic! In the meantime we have "grep -o".)
To write maintainable code, I always use a real parsing library. I haven't tried jsawk, but if it works well, that would address point #1.
One last, wackier, solution: I wrote a script that uses Python json and extracts the keys you want, into tab-separated columns; then I pipe through a wrapper around awk that allows named access to columns. In here: the json2tsv and tsvawk scripts. So for this example it would be:
json2tsv id text < tweets.json | tsvawk '{print "tweet " $id " is: " $text}'
This approach doesn't address #2, is more inefficient than a single Python script, and it's a little brittle: it forces normalization of newlines and tabs in string values, to play nice with awk's field/record-delimited view of the world. But it does let you stay on the command line, with more correctness than grep -o.
Just use a filter that would return each item in the array. Then loop over the results, just make sure you use the compact output option (-c) so each result is put on a single line and is treated as one item in the loop.
jq -c '.[]' input.json | while read i; do
# do stuff with $i
done
By leveraging the power of Bash arrays, you can do something like:
# read each item in the JSON array to an item in the Bash array
readarray -t my_array < <(jq --compact-output '.[]' input.json)
# iterate through the Bash array
for item in "${my_array[@]}"; do
original_name=$(jq --raw-output '.original_name' <<< "$item")
changed_name=$(jq --raw-output '.changed_name' <<< "$item")
# do your stuff
done
A bash function like below can be used:
function jsonValue() {
KEY=$1
num=$2
awk -F"[,:}]" '{for(i=1;i<=NF;i++){if($i~/'$KEY'\042/){print $(i+1)}}}' | tr -d '"' | sed -n ${num}p
}
I've saved this function as jsonVal and then sourced this file using source jsonVal. You can very well use it within your script.
It expects two arguments. First argument is the property name. If you need all values, skip second argument. If specific value is needed, you can add the second argument as shown below.
[root@localhost Desktop]# cat data.json | jsonValue id
4568734
3678976
[root@localhost Desktop]# cat data.json | jsonValue id 1
4568734
[root@localhost Desktop]# cat data.json | jsonValue id 2
3678976
[root@localhost Desktop]# cat data.json | jsonValue name
suneel
adi
[root@localhost Desktop]# cat data.json | jsonValue name 1
suneel
[root@localhost Desktop]# cat data.json | jsonValue name 2
adi
[root@localhost Desktop]#
Hope this helps.
Using tools other than proper json parsers will always be prone to errors or security issues.
Your best option: If you don't have the tools you need to do your work, ask your IT/Server admin to install them.
Anyways, the following will work at least for your example:
Using grep -P:
$ curl ... | grep -Po '"name":"\K[^"]*'
suneel
adi
With normal grep:
$ curl ... | grep -o '"name":"[^"]*' | cut -d'"' -f4
suneel
adi
If you have "name" somewhere outside of "people" which you do not want, this will obviously fail.