First, your data is not valid json, there is a comma too much:

{
  "TestNames": [
    {
      "Name": "test1",
      "CreateDate": "2016-08-30T10:52:52Z",
      "Id": "testId1", <--- Remove that!
    },
    {
      "Name": "test2",
      "CreateDate": "2016-08-30T10:52:13Z",
      "Id": "testId2"
    }
  ]
}

Once you've fixed that you can use jq for parsing json on the command line:

echo "$x" | jq -r '.TestNames[]|"\(.Name) , \(.Id)"'

if you need to keep the output values.

declare -A map1

while read name id ; do
    echo "$name"
    echo "$id"
    map1[$name]=$id

done < <(echo "$x" | jq -r '.TestNames[]|"\(.Name) \(.Id)"')

echo "count : ${#map1[@]}"
echo "in loop: ${map1[$name]}"
Answer from hek2mgl on Stack Overflow
Top answer
1 of 4
22

If you really cannot use a proper JSON parser such as jq[1] , try an awk-based solution:

Bash 4.x:

readarray -t values < <(awk -F\" 'NF>=3 {print $4}' myfile.json)

Bash 3.x:

IFS=$'\n' read -d '' -ra values < <(awk -F\" 'NF>=3 {print $4}' myfile.json)

This stores all property values in Bash array ${values[@]}, which you can inspect with
declare -p values.

These solutions have limitations:

  • each property must be on its own line,
  • all values must be double-quoted,
  • embedded escaped double quotes are not supported.

All these limitations reinforce the recommendation to use a proper JSON parser.


Note: The following alternative solutions use the Bash 4.x+ readarray -t values command, but they also work with the Bash 3.x alternative, IFS=$'\n' read -d '' -ra values.

grep + cut combination: A single grep command won't do (unless you use GNU grep - see below), but adding cut helps:

readarray -t values < <(grep '"' myfile.json | cut -d '"' -f4)

GNU grep: Using -P to support PCREs, which support \K to drop everything matched so far (a more flexible alternative to a look-behind assertion) as well as look-ahead assertions ((?=...)):

readarray -t values < <(grep -Po ':\s*"\K.+(?="\s*,?\s*$)' myfile.json)

Finally, here's a pure Bash (3.x+) solution:

What makes this a viable alternative in terms of performance is that no external utilities are called in each loop iteration; however, for larger input files, a solution based on external utilities will be much faster.

#!/usr/bin/env bash

declare -a values # declare the array                                                                                                                                                                  

# Read each line and use regex parsing (with Bash's `=~` operator)
# to extract the value.
while read -r line; do
  # Extract the value from between the double quotes
  # and add it to the array.
  [[ $line =~ :[[:blank:]]+\"(.*)\" ]] && values+=( "${BASH_REMATCH[1]}" )
done < myfile.json                                                                                                                                          

declare -p values # print the array

[1] Here's what a robust jq-based solution would look like (Bash 4.x):
readarray -t values < <(jq -r '.[]' myfile.json)

2 of 4
4

jq is good enough to solve this problem

paste -s <(jq '.files[].name' YourJsonString) <(jq '.files[].age' YourJsonString) <( jq '.files[].websiteurl' YourJsonString) 

So that you get a table and you can grep any rows or awk print any columns you want

Discussions

ubuntu - parse one field from an JSON array into bash array - Unix & Linux Stack Exchange
Note that bash's readarray now supports readarray -td '' to read NUL-delimited data into an array. ... which is then evaluated by your shell. Evaluating that string will create the named array arr with the two elements value2 and value2_2: $ eval "$( jq -r '@sh "arr=( \([.[].item2]) )"' file.json ... More on unix.stackexchange.com
🌐 unix.stackexchange.com
text processing - How to parse JSON with shell scripting in Linux? - Unix & Linux Stack Exchange
Now, maybe you have a single command that give you one json blob for all instances with more items in that "Instances" array. Well, if that is the case, you'll just need to modify the script a bit to iterate through the array rather than simply using the first item. In the end, the way to solve this problem, is the way to solve many problems in Unix. Break it down into easier problems. Find or write tools to solve the easier problem. Combine those tools with your shell ... More on unix.stackexchange.com
🌐 unix.stackexchange.com
March 27, 2014
arrays - Json Object Parsing using shell script - Stack Overflow
If jq is not available, and shell script really is the only tool at hand, then I would suggest a two-phase strategy. In first phase, convert the JSON into some format which many shell tools can easily handle: that'll give us more choices for implementing the second phase. More on stackoverflow.com
🌐 stackoverflow.com
bash - Parsing JSON with Unix tools - Stack Overflow
The standard POSIX/Single Unix ... or arrays) or associative arrays (also known as hash tables, maps, dicts, or objects in some other languages). This makes representing the result of parsing JSON somewhat tricky in portable shell scripts.... More on stackoverflow.com
🌐 stackoverflow.com
Top answer
1 of 6
36

Using jq :

readarray arr < <(jq '.[].item2' json)
printf '%s\n' "${arr[@]}"

If you need a more hardened way:

readarray -td '' arr

for inputs with newlines or other special characters, avoiding word splitting.

Output:

value2
value2_2

Check:

Process Substitution >(command ...) or <(...) is replaced by a temporary filename. Writing or reading that file causes bytes to get piped to the command inside. Often used in combination with file redirection: cmd1 2> >(cmd2). See http://mywiki.wooledge.org/ProcessSubstitution http://mywiki.wooledge.org/BashFAQ/024

2 of 6
9

The following is actually buggy:

# BAD: Output line of * is replaced with list of local files; can't deal with whitespace
arr=( $( curl -k "$url" | jq -r '.[].item2' ) )

If you have bash 4.4 or newer, a best-of-all-worlds option is available:

# BEST: Supports bash 4.4+, with failure detection and newlines in data
{ readarray -t -d '' arr && wait "$!"; } < <(
  set -o pipefail
  curl --fail -k "$url" | jq -j '.[].item2 | (., "\u0000")'
)

...whereas with bash 4.0, you can have terseness at the cost of failure detection and literal newline support:

# OK (with bash 4.0), but can't detect failure and doesn't support values with newlines
readarray -t arr < <(curl -k "$url" | jq -r '.[].item2' )

...or bash 3.x compatibility and failure detection, but without newline support:

# OK: Supports bash 3.x; no support for newlines in values, but can detect failures
IFS=$'\n' read -r -d '' -a arr < <(
  set -o pipefail
  curl --fail -k "$url" | jq -r '.[].item2' && printf '\0'
)

...or bash 3.x compatibility and newline support, but without failure detection:

# OK: Supports bash 3.x and supports newlines in values; does not detect failures
arr=( )
while IFS= read -r -d '' item; do
  arr+=( "$item" )
done < <(curl --fail -k "$url" | jq -j '.[] | (.item2, "\u0000")')
Top answer
1 of 13
89

The availability of parsers in nearly every programming language is one of the advantages of JSON as a data-interchange format.

Rather than trying to implement a JSON parser, you are likely better off using either a tool built for JSON parsing such as jq or a general purpose script language that has a JSON library.

For example, using jq, you could pull out the ImageID from the first item of the Instances array as follows:

jq '.Instances[0].ImageId' test.json

Alternatively, to get the same information using Ruby's JSON library:

ruby -rjson -e 'j = JSON.parse(File.read("test.json")); puts j["Instances"][0]["ImageId"]'

I won't answer all of your revised questions and comments but the following is hopefully enough to get you started.

Suppose that you had a Ruby script that could read a from STDIN and output the second line in your example output[0]. That script might look something like:

#!/usr/bin/env ruby
require 'json'

data = JSON.parse(ARGF.read)
instance_id = data["Instances"][0]["InstanceId"]
name = data["Instances"][0]["Tags"].find {|t| t["Key"] == "Name" }["Value"]
owner = data["Instances"][0]["Tags"].find {|t| t["Key"] == "Owner" }["Value"]
cost_center = data["Instances"][0]["SubnetId"].split("-")[1][0..3]
puts "#{instance_id}\t#{name}\t#{cost_center}\t#{owner}"

How could you use such a script to accomplish your whole goal? Well, suppose you already had the following:

  • a command to list all your instances
  • a command to get the json above for any instance on your list and output it to STDOU

One way would be to use your shell to combine these tools:

echo -e "Instance id\tName\tcost centre\tOwner"
for instance in $(list-instances); do
    get-json-for-instance $instance | ./ugly-ruby-scriptrb
done

Now, maybe you have a single command that give you one json blob for all instances with more items in that "Instances" array. Well, if that is the case, you'll just need to modify the script a bit to iterate through the array rather than simply using the first item.

In the end, the way to solve this problem, is the way to solve many problems in Unix. Break it down into easier problems. Find or write tools to solve the easier problem. Combine those tools with your shell or other operating system features.

[0] Note that I have no idea where you get cost-center from, so I just made it up.

2 of 13
17

You can use following python script to parse that data. Lets assume that you have JSON data from arrays in files like array1.json, array2.json and so on.

import json
import sys
from pprint import pprint

jdata = open(sys.argv[1])

data = json.load(jdata)

print "InstanceId", " - ", "Name", " - ", "Owner"
print data["Instances"][0]["InstanceId"], " - " ,data["Instances"][0]["Tags"][1]["Value"], " - " ,data["Instances"][0]["Tags"][2]["Value"] 

jdata.close()

And then just run:

$ for x in `ls *.json`; do python parse.py $x; done
InstanceId  -  Name  -  Owner
i-1234576  -  RDS_Machine (us-east-1c)  -  Jyoti Bhanot

I haven't seen cost in your data, that's why I didn't include that.

According to discussion in comments, I have updated parse.py script:

import json
import sys
from pprint import pprint

jdata = sys.stdin.read()

data = json.loads(jdata)

print "InstanceId", " - ", "Name", " - ", "Owner"
print data["Instances"][0]["InstanceId"], " - " ,data["Instances"][0]["Tags"][1]["Value"], " - " ,data["Instances"][0]["Tags"][2]["Value"] 

You can try to run following command:

#ec2-describe-instance <instance> | python parse.py
🌐
Educative
educative.io › answers › how-to-parse-json-with-bash
How to parse JSON with Bash
Line 2: Read the contents of the file names.json and stores them in a variable named json. Line 3: Extracts the value associated with the key "number" from the JSON data stored in the json variable. The jq command is used to parse the JSON data.
Find elsewhere
🌐
Baeldung
baeldung.com › home › scripting › parsing, validating, and printing json in shell scripts
Parsing, Validating, and Printing JSON in Shell Scripts | Baeldung on Linux
March 18, 2024 - The jsonlint npm package is based on the service at jsonlint.com and makes pretty prints and validations of JSON in the shell trivial. ... $ npm install -g jsonlint $ jsonlint --help Usage: jsonlint [file] [options] file file to parse; otherwise uses stdin [...] The extra functions jsonlint offers on top of the main ones we already discussed are key sorting and JSON schema selection. We can prettify a JSON by simply passing it to the stdin of the jsonlint tool: $ echo '{"field":"data", "array": ["i1", "i2"], "object":{"subfield":"subdata"}}' | jsonlint { "field": "data", "array": [ "i1", "i2" ], "object": { "subfield": "subdata" } }
🌐
Delft Stack
delftstack.com › home › howto › linux › parse json in bash
How to Parse JSON in Bash | Delft Stack
February 2, 2024 - We can get values of any specific ...r.typicode.com/posts" | jq '.[0].title' Output: "delectus aut autem" grep command can also be used for parsing JSON data....
🌐
GitHub
github.com › dominictarr › JSON.sh
GitHub - dominictarr/JSON.sh: a pipeable JSON parser written in Bash
$ json_parse < package.json ["name"] "JSON.sh" ["version"] "0.0.0" ["description"] "" ["homepage"] "http://github.com/dominictarr/JSON.sh" ["repository","type"] "git" ["repository","url"] "https://github.com/dominictarr/JSON.sh.git" ["repository"] {"type":"git","url":"https://github.com/dominictarr/JSON.sh.git"} ["bin","json_parse"] "./JSON.sh" ["bin"] {"json_parse":"./JSON.sh"} ["dependencies"] {} # ... etc ... Brief output. Combines 'Leaf only' and 'Prune empty' options. ... Leaf only. Only show leaf nodes, which stops data duplication. ... Prune empty. Exclude fields with empty values. ... No-head. Don't show nodes that have no path. Normally these output a leading '[]', which you can't use in a bash array.
Starred by 2K users
Forked by 267 users
Languages   Shell 90.7% | Python 9.3% | Shell 90.7% | Python 9.3%
🌐
iO Flood
ioflood.com › blog › bash-parse-json
Bash and JSON: Your Guide to Parsing JSON in Bash
December 4, 2023 - It’s widely used for its ability to control job control, shell functions and scripts, command-line editing, and more. Bash scripts allow you to automate tasks on your Unix or Linux system. ... In this example, the ‘echo’ command is used to print the string ‘Hello, world!’ to the terminal. ‘jq’ is a powerful command-line JSON processor. It can parse JSON data, allowing you to filter, map, and transform JSON structures.
Top answer
1 of 16
1791

There are a number of tools specifically designed for the purpose of manipulating JSON from the command line, and will be a lot easier and more reliable than doing it with Awk, such as jq:

curl -s 'https://api.github.com/users/lambda' | jq -r '.name'

You can also do this with tools that are likely already installed on your system, like Python using the json module, and so avoid any extra dependencies, while still having the benefit of a proper JSON parser. The following assume you want to use UTF-8, which the original JSON should be encoded in and is what most modern terminals use as well:

Python 3:

curl -s 'https://api.github.com/users/lambda' | \
    python3 -c "import sys, json; print(json.load(sys.stdin)['name'])"

Python 2:

export PYTHONIOENCODING=utf8
curl -s 'https://api.github.com/users/lambda' | \
    python2 -c "import sys, json; print json.load(sys.stdin)['name']"

Frequently Asked Questions

Why not a pure shell solution?

The standard POSIX/Single Unix Specification shell is a very limited language which doesn't contain facilities for representing sequences (list or arrays) or associative arrays (also known as hash tables, maps, dicts, or objects in some other languages). This makes representing the result of parsing JSON somewhat tricky in portable shell scripts. There are somewhat hacky ways to do it, but many of them can break if keys or values contain certain special characters.

Bash 4 and later, zsh, and ksh have support for arrays and associative arrays, but these shells are not universally available (macOS stopped updating Bash at Bash 3, due to a change from GPLv2 to GPLv3, while many Linux systems don't have zsh installed out of the box). It's possible that you could write a script that would work in either Bash 4 or zsh, one of which is available on most macOS, Linux, and BSD systems these days, but it would be tough to write a shebang line that worked for such a polyglot script.

Finally, writing a full fledged JSON parser in shell would be a significant enough dependency that you might as well just use an existing dependency like jq or Python instead. It's not going to be a one-liner, or even small five-line snippet, to do a good implementation.

Why not use awk, sed, or grep?

It is possible to use these tools to do some quick extraction from JSON with a known shape and formatted in a known way, such as one key per line. There are several examples of suggestions for this in other answers.

However, these tools are designed for line based or record based formats; they are not designed for recursive parsing of matched delimiters with possible escape characters.

So these quick and dirty solutions using awk/sed/grep are likely to be fragile, and break if some aspect of the input format changes, such as collapsing whitespace, or adding additional levels of nesting to the JSON objects, or an escaped quote within a string. A solution that is robust enough to handle all JSON input without breaking will also be fairly large and complex, and so not too much different than adding another dependency on jq or Python.

I have had to deal with large amounts of customer data being deleted due to poor input parsing in a shell script before, so I never recommend quick and dirty methods that may be fragile in this way. If you're doing some one-off processing, see the other answers for suggestions, but I still highly recommend just using an existing tested JSON parser.

Historical notes

This answer originally recommended jsawk, which should still work, but is a little more cumbersome to use than jq, and depends on a standalone JavaScript interpreter being installed which is less common than a Python interpreter, so the above answers are probably preferable:

curl -s 'https://api.github.com/users/lambda' | jsawk -a 'return this.name'

This answer also originally used the Twitter API from the question, but that API no longer works, making it hard to copy the examples to test out, and the new Twitter API requires API keys, so I've switched to using the GitHub API which can be used easily without API keys. The first answer for the original question would be:

curl 'http://twitter.com/users/username.json' | jq -r '.text'
2 of 16
338

To quickly extract the values for a particular key, I personally like to use "grep -o", which only returns the regex's match. For example, to get the "text" field from tweets, something like:

grep -Po '"text":.*?[^\\]",' tweets.json

This regex is more robust than you might think; for example, it deals fine with strings having embedded commas and escaped quotes inside them. I think with a little more work you could make one that is actually guaranteed to extract the value, if it's atomic. (If it has nesting, then a regex can't do it of course.)

And to further clean (albeit keeping the string's original escaping) you can use something like: | perl -pe 's/"text"://; s/^"//; s/",$//'. (I did this for this analysis.)

To all the haters who insist you should use a real JSON parser -- yes, that is essential for correctness, but

  1. To do a really quick analysis, like counting values to check on data cleaning bugs or get a general feel for the data, banging out something on the command line is faster. Opening an editor to write a script is distracting.
  2. grep -o is orders of magnitude faster than the Python standard json library, at least when doing this for tweets (which are ~2 KB each). I'm not sure if this is just because json is slow (I should compare to yajl sometime); but in principle, a regex should be faster since it's finite state and much more optimizable, instead of a parser that has to support recursion, and in this case, spends lots of CPU building trees for structures you don't care about. (If someone wrote a finite state transducer that did proper (depth-limited) JSON parsing, that would be fantastic! In the meantime we have "grep -o".)

To write maintainable code, I always use a real parsing library. I haven't tried jsawk, but if it works well, that would address point #1.

One last, wackier, solution: I wrote a script that uses Python json and extracts the keys you want, into tab-separated columns; then I pipe through a wrapper around awk that allows named access to columns. In here: the json2tsv and tsvawk scripts. So for this example it would be:

json2tsv id text < tweets.json | tsvawk '{print "tweet " $id " is: " $text}'

This approach doesn't address #2, is more inefficient than a single Python script, and it's a little brittle: it forces normalization of newlines and tabs in string values, to play nice with awk's field/record-delimited view of the world. But it does let you stay on the command line, with more correctness than grep -o.

🌐
LinuxSimply
linuxsimply.com › home › bash scripting tutorial › a complete guide to bash array › array operations in bash › how to convert a json array into bash array [5 methods]
How to Convert a JSON Array into Bash Array [5 Methods] - LinuxSimply
March 17, 2024 - To convert a JSON array into a Bash array, you can use the jq command line tool with process substitution with the syntax bash_array=($(echo "$json_array" | jq -r '.[]')). This parses the JSON array stored in the variable json_array and assigns ...
🌐
codestudy
codestudy.net › blog › iterating-through-json-array-in-shell-script
How to Iterate Through a JSON Array in Shell Script Using jq: Step-by-Step Guide to Loop and Extract Values — codestudy.net
Issue: jq throws errors like parse error: Expected value before ','. Fix: Validate your JSON with jq . "$JSON_FILE". If invalid, use a tool like JSONLint to debug syntax. Issue: If JSON values contain spaces (e.g., name: "Mary Ann"), a for loop may split the element into multiple lines. Fix: Use a while IFS= read -r loop (as shown earlier) instead of for. IFS= prevents splitting on whitespace. Issue: JSON arrays are zero-indexed, but shell loops may accidentally start at 1.
🌐
Ingernet
ingernet.github.io › bash › jq › json › 2020 › 04 › 16 › json-array-bash-array.html
Converting a JSON array to a bash array
# using gcloud output as a source because why not use the hardest shit possible bork=$(gcloud --project=<project-id> container images list-tags us.gcr.io/<project-id>/<image-name> --filter='tags:DEPLOYED' --format=json | jq '.[0].tags') echo $bork [ "260", "61a1d7aef75421f5c209c42304716ba44e86ab7a", "DEPLOYED.2019-11-12T17.04.37.772145800Z", "DEPLOYED.2019-11-13T00.00.29.525908800Z" ] # ^ output is obviously not a bash array # strip out all the things you don't want - square brackets and commas borkstring=$(echo $bork | sed -e 's/\[ //g' -e 's/\ ]//g' -e 's/\,//g') arr=( $borkstring ) echo $arr ( "260" "61a1d7aef75421f5c209c42304716ba44e86ab7a" "DEPLOYED.2019-11-12T17.04.37.772145800Z" "DEPLOYED.2019-11-13T00.00.29.525908800Z" ) # ^ now THAT is a bash array
🌐
Medium
medium.com › @alwaysHopeGood › json-in-shell-script-e6651fab9c88
JSON in Shell Script. Declare shell variable with JSON… | by Rohit | Medium
December 1, 2023 - #!/bin/bash id=$RANDOM json_object='{ "status": "success", "message": "Employee list", "start": 0, "total_results": 1, "data": [ { "empId": "'"$id"'", "name": "Tim", "designation": "Engineer" } ] }' echo $id #Output: 29820 empId=$(echo $json_object | \ python3 -c "import sys, json; print(json.load(sys.stdin)['data'][0]['empId'])") echo $empId #Output: 29820 · *please remove ‘\n\r’ character before test above bash script, while copy paste. ... Parse and fetch employ Id “empId” from attribute “data”.