@Jeff Mercado blew my mind! I didn't know array subtraction was allowed...

echo -n '{"all":["A","B","C","ABC"],"some":["B","C"]}' | jq '.all-.some'

yields

[
  "A",
  "ABC"
]
Answer from Jon on Stack Overflow
🌐
Ente
ente.io › home › blog › an introduction to the magic of jq
An introduction to the magic of jq
May 31, 2022 - What we want is a single array. We can flatten the two arrays into a single one using the "flatten" operator. However, before flattening we'll need to put both the arrays into a single container array.
Discussions

linux - How to merge arrays from multiple arrays in two JSON with JQ - Unix & Linux Stack Exchange
I have two JSON files (file1.json and file2.json) with the following same structure as defined below with two array lists as shown below. The first file is (file1.json): { "Lists1": [ ... More on unix.stackexchange.com
🌐 unix.stackexchange.com
diff - Using jq or alternative command line tools to compare JSON files - Stack Overflow
This script reports whether the two entities are equivalent in the sense that their normalized values are equal, where normalization of all component arrays is achieved by recursively sorting them, innermost first. This script assumes that the jq of interest is $JQ if it exists and otherwise ... More on stackoverflow.com
🌐 stackoverflow.com
jq - Merge JSON arrays on dissimilar keys - Unix & Linux Stack Exchange
I have 2 JSON files with arrays (extracted from restAPI using curl in bash). Both files are arrays with a .result object at the top which needs to remain. The first has a .name field and many (o... More on unix.stackexchange.com
🌐 unix.stackexchange.com
how to get the intersection of two JSON arrays using jq - Stack Overflow
Given arrays X and Y (preferably both as inputs, but otherwise, with one as input and the other hardcoded), how can I use jq to output the array containing all elements common to both? e.g. what is a More on stackoverflow.com
🌐 stackoverflow.com
December 23, 2020
🌐
GitHub
github.com › jqlang › jq › issues › 1181
how to get the intersection of two JSON arrays using jq? · Issue #1181 · jqlang/jq
July 14, 2016 - Posted details at http://stackoverflow.com/questions/38364458/how-to-get-the-intersection-of-two-json-arrays-using-jq/38364583#38364583 and figured why not go straight to the source. Any ideas? Given arrays X and Y (preferably both as in...
Author   mewalig
🌐
Zendesk Developer Docs
developer.zendesk.com › documentation › integration-services › developer-guide › jq-cheat-sheet
jq cheat sheet | Zendesk Developer Docs
The following expression returns elements from the email_cc_ids array that aren't in the user_ids array. ... Use the subtraction operator and a parenthesis grouping to find the intersection of two arrays.
🌐
Exercism
exercism.org › tracks › jq › concepts › compare
Compare in jq on Exercism
The jq == operator is like Javascript's ... has type string, # the value on the right has type number. Two arrays are equal if all the corresponding elements are equal....
🌐
GitHub
gist.github.com › ipan › e5e86d5495f16216e31fe12ebc9532a4
compare two JSONs with jq #json #jq · GitHub
You can use: diff <(jq 'keys' file1.json) <(jq 'keys' file2.json) This will just give you the list of keys that are different.
🌐
Medium
medium.com › @capeta1024 › json-diff-using-jq-vimdiff-b94829de40ff
JSON Diff using jq & vimdiff. JSON is a very commonly used data… | by Ankit Deshpande | Medium
September 26, 2022 - Step 1: Create two JSON files with two similar objects that need to be compared. ... { "env": "production", "s3_bucket": "app1-production", "db": "app-production", "username": "thanos" } Step 2: Use JQ to create files with sorted key order.
Find elsewhere
Top answer
1 of 2
6

Assuming the top-most keys of all documents are always the same across all documents, extract the keys into a separate variable, then reduce (accumulate) the data over these keys.

jq -s '
    (.[0] | keys[]) as $k |
    reduce .[] as $item (null; .[$k] += $item[$k])' file*.json

Note the use of -s to read all the input into a single array.

This, more or less, iterates over the keys Lists1 and Lists2 for each document, accumulating the data in a new structure (null from the start).

Assuming that the input JSON documents are well-formed:

{
"Lists1": [{"point":"a","coordinates":[2289.48096,2093.48096]}],
"Lists2": [{"point":"b","coordinates":[2289.48096,2093.48096]}]
}
{
"Lists1": [{"point":"c","coordinates":[2289.48096,2093.48096]}],
"Lists2": [{"point":"d","coordinates":[2289.48096,2093.48096]}]
}

You will get the following resulting document containing two objects:

{
"Lists1": [{"point":"a","coordinates":[2289.48096,2093.48096]},{"point":"c","coordinates":[2289.48096,2093.48096]}]
}
{
"Lists2": [{"point":"b","coordinates":[2289.48096,2093.48096]},{"point":"d","coordinates":[2289.48096,2093.48096]}]
}

Would you want the two keys in the same object:

jq -s '
    [ (.[0] | keys[]) as $k |
      reduce .[] as $item (null; .[$k] += $item[$k]) ] | add' file*.json
2 of 2
2

If the keys are not always the same across the document this one will do the job:

jq --slurp '
    reduce (.[] | to_entries | .[]) as {$key, $value} (
        {};
        .[$k] += $v
    )
    ' file*.json

Given these two files:

{
    "Lists1": [{"point":"a","coordinates":"..."],
    "Lists2": [{"point":"b","coordinates":"..."}]
}
{
    "Lists1": [{"point":"c","coordinates":"..."}],
    "Lists2": [{"point":"d","coordinates":"..."}],
    "Lists3": [{"point":"e","coordinates":"..."}]
}

the output is:

{
    "Lists1":[{"point":"a","coordinates":"..."},{"point":"c","coordinates":"..."}],
    "Lists2":[{"point":"b","coordinates":"..."},{"point":"d","coordinates":"..."}],
    "Lists3":[{"point":"e","coordinates":"..."}]
}

🌐
jq
jqlang.org › manual
jq 1.8 Manual
Once you understand the "," operator, you can look at jq's array syntax in a different light: the expression [1,2,3] is not using a built-in syntax for comma-separated arrays, but is instead applying the [] operator (collect results) to the expression 1,2,3 (which produces three different results).
🌐
LinkedIn
linkedin.com › pulse › fuzzy-snapshot-testing-jq-diff-measures-for-justice-institute
Fuzzy snapshot testing with jq and diff
March 28, 2023 - For objects nested within arrays inside other objects, we can plop the same iterator into the middle of our jq expression without breaking a sweat. To add the last piece of the puzzle, we can leverage diff tools in place of eyeball comparisons, especially as real API responses are much larger ...
🌐
codestudy
codestudy.net › blog › how-can-i-completely-sort-arbitrary-json-using-jq
How to Completely Sort Arbitrary JSON with jq (Including Arrays) for Accurate Diff Comparisons — codestudy.net
Mixed-Type Array Errors If sort fails on mixed types, explicitly handle type ordering with sort_by(type + tostring) (see Advanced Cases). Sorting JSON with jq eliminates false differences caused by unordered keys and arrays, making diff comparisons accurate and reliable.
🌐
Medium
lucasbru.medium.com › comparison-of-json-files-9b8d2fc320ca
Comparison of JSON files. Say, you want to compare to JSON files… | by Lucas Bruxxx | Medium
August 3, 2017 - This filter will traverse the JSON file recursively and sort all arrays. Now, compare again: cat X.json | jq -S -f walk.filter | 1.json cat Y.json | jq -S -f walk.filter | 2.json meld 1.json 2.json
Top answer
1 of 1
5

The following uses the relational JOIN() function in jq to join the two result arrays on the elements that are equal with respect to the lower-case variant of the name key (servers) and the cmdb.name key (IPs). It also uses INDEX() to build an index of the IP file's result array. The JOIN() function gives us arrays (pairs in the example) of matching objects that we merge using the add function. After joining and merging, we are left with objects containing both the name and the cmdb.name keys, so we delete the latter in each object.

jq '.result = [JOIN(INDEX(input.result[]; ."cmdb.name"|ascii_downcase); .result[]; .name|ascii_downcase) | add | del(."cmdb.name")]' servers.json ips.json

The jq expression, nicely formatted:

.result =
  [
    JOIN(
      # index on the second file's .cmdb.name key in each result object
      INDEX(
        input.result[];
        ."cmdb.name" | ascii_downcase
      );
      .result[];             # join on the first file's result objects
      .name | ascii_downcase # match using the .name key
    )
    | add               # merge the matched objects
    | del(."cmdb.name") # delete that key we don't want
  ]

Result:

{
  "result": [
    {
      "os": "Microsoft Windows Server 2019 Standard",
      "name": "SERVER1",
      "ip_address": "10.0.0.10",
      "interface": "Intel Wireless-AC 9560 160MHz"
    },
    {
      "os": "Microsoft Windows Server 2019 Standard",
      "name": "SERVER2",
      "ip_address": "10.0.0.10",
      "interface": "Wi-Fi"
    },
    {
      "os": "Microsoft Windows Server 2019 Standard",
      "name": "server3",
      "ip_address": ""
    },
    {
      "os": "Microsoft Windows Server 2016 Standard",
      "name": "server4",
      "ip_address": "10.0.0.10",
      "interface": "Intel Dual Band Wireless-AC 8265"
    }
  ]
}
🌐
Genius Engineering
genius.engineering › faster-and-simpler-with-the-command-line-deep-comparing-two-5gb-json-files-3x-faster-by-ditching-the-code
Faster and simpler with the command line: deep-comparing two 5GB JSON files 3X faster by ditching the code
December 6, 2018 - This means that even if the actual JSON content of the files was 100% the same, it would look 100% different with our naive diff strategy. My first thought was to write a ruby script to parse and compare the two exports, but after spending a little time coding something up I had a program that was starting to get fairly complicated, didn't work correctly, and was too slow—my first cut took well over an hour. Then I thought: is this one of those situations where a simple series of shell commands can replace a complex purpose-built script? Enter jq, a powerful command-line tool for processing JSON objects.
Top answer
1 of 5
14

NOTE: This solution assumes array1 has no duplicates.

Simple Explanation

The complexity of all these answers obscures understanding the principle. That's unfortunate because the principle is simple:

  • array1 minus array2 returns:
  • everything that's left in array1
  • after removing everything that is in array2
  • (and discarding the rest of array2)

Simple Demo

# From array1, subtract array2, leaving the remainder
$ jq --null-input '[1,2,3,4] - [2,4,6,8]'
[
  1,
  3
]

# Subtract the remainder from the original
$ jq --null-input '[1,2,3,4] - [1,3]'
[
  2,
  4
]

# Put it all together
$ jq --null-input '[1,2,3,4] - ([1,2,3,4] - [2,4,6,8])'
[
  2,
  4
]

comm Demo

def comm:
  (.[0] - (.[0] - .[1])) as $d |
    [.[0]-$d, .[1]-$d, $d]
;

With that understanding, I was able to imitate the behavior of the *nix comm command

With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files.

$ echo 'def comm: (.[0]-(.[0]-.[1])) as $d | [.[0]-$d,.[1]-$d, $d];' > comm.jq
$ echo '{"a":101, "b":102, "c":103, "d":104}'                        > 1.json
$ echo '{         "b":202,          "d":204, "f":206, "h":208}'      > 2.json

$ jq --slurp '.' 1.json 2.json
[
  {
    "a": 101,
    "b": 102,
    "c": 103,
    "d": 104
  },
  {
    "b": 202,
    "d": 204,
    "f": 206,
    "h": 208
  }
]

$ jq --slurp '[.[] | keys | sort]' 1.json 2.json
[
  [
    "a",
    "b",
    "c",
    "d"
  ],
  [
    "b",
    "d",
    "f",
    "h"
  ]
]

$ jq --slurp 'include "comm"; [.[] | keys | sort] | comm' 1.json 2.json
[
  [
    "a",
    "c"
  ],
  [
    "f",
    "h"
  ],
  [
    "b",
    "d"
  ]
]

$ jq --slurp 'include "comm"; [.[] | keys | sort] | comm[2]' 1.json 2.json
[
  "b",
  "d"
]
2 of 5
5

A simple and quite fast (but somewhat naive) filter that probably does essentially what you want can be defined as follows:

   # x and y are arrays
   def intersection(x;y):
     ( (x|unique) + (y|unique) | sort) as $sorted
     | reduce range(1; $sorted|length) as $i
         ([]; if $sorted[$i] == $sorted[$i-1] then . + [$sorted[$i]] else . end) ;

If x is provided as input on STDIN, and y is provided in some other way (e.g. def y: ...), then you could use this as: intersection(.;y)

Other ways to provide two distinct arrays as input include:

  • using the --slurp option
  • using --arg a v (or --argjson a v if available in your jq)

Here's a simpler but slower def that's nevertheless quite fast in practice:

    def i(x;y):
       if (y|length) == 0 then []
       else (x|unique) as $x
       | $x - ($x - y)
       end ;

Here's a standalone filter for finding the intersection of arbitrarily many arrays:

# Input: an array of arrays
def intersection:
  def i(y): ((unique + (y|unique)) | sort) as $sorted
  | reduce range(1; $sorted|length) as $i
       ([]; if $sorted[$i] == $sorted[$i-1] then . + [$sorted[$i]] else . end) ;
  reduce .[1:][] as $a (.[0]; i($a)) ;

Examples:

[ [1,2,4], [2,4,5], [4,5,6]] #=> [4]
[[]]                         #=> []
[]                           #=> null

Of course if x and y are already known to be sorted and/or unique, more efficient solutions are possible. See in particular Finite Sets of JSON Entities

🌐
Hacker News
news.ycombinator.com › item
Faster and simpler with the command line: deep-comparing JSON files with jq | Hacker News
December 17, 2018 - Btw I'm surprised you needed -M, since I thought jq would suppress colors if it saw it wasn't writing to a tty · Even when reading the article I thought about it :)
Top answer
1 of 2
4
$ jq -r '[ .[].list1[] ] | join(" ")' file
val1 val2 val3 val4 val5 val6

Create a new array with all the elements of each list1 array from each top-level key. Then, join its elements with spaces. This would give you the values in the order they occur in the input file.

An alternative (and arguably neater) approach is with map(.list1) which returns an array of arrays that you may flatten and join up:

$ jq -r 'map(.list1) | flatten | join(" ")' file
val1 val2 val3 val4 val5 val6

Your attempt generates one joined string per top-level key due to .list being one of the list1 arrays in turn. Your approach would work if you encapsulated everything up to the last pipe symbol in a [ ... ] (and expand the .list with .list[]) to generate a single array that you then join. This is what I do in my first approach above; only I use a slightly shorter expression to generate the elements of that array.

$ jq -r '[ to_entries[] |  { list: .value.list1 } | .list[] ] | join(" ")' file
val1 val2 val3 val4 val5 val6
2 of 2
0

Using Raku (formerly known as Perl_6)

~$ raku -MJSON::Tiny -e 'my %hash = from-json($_) given lines;  
                         my @a = %hash.values.map({ $_.values if $_{"list1"} }); 
                         .say for @a.sort.join(" ");'  file

OR:

~$ raku -MJSON::Tiny -e 'my  %hash = from-json($_) given lines; 
                         for %hash.values.sort() { print .values.sort ~ " " if $_{"list1"} };
                         put "";'  file

Raku is a programming language in the Perl-family that provides high-level support for Unicode. Like Perl, Raku has associative arrays (hashes and/or maps) built-in. The above code is admittedly rather verbose (first example), but you should be able to get the flavor of the language from both examples above:

  • Raku's community-supported JSON::Tiny is called at the command line,
  • All lines are given as one data element to the from-json function, which decodes the input and stores it in %hash,
  • First Example: Using a map, the values of the hash are searched through for "list1" keys. If (if) found, these are stored in the @a array. Then the @a array is printed.
  • Second Example: the %hash is iterated through using for, searched through for "list1" keys, and if found the associated values are printed (with at end-of-line). A final put call adds a newline.

Sample Input (includes bogus "list2" elements)

{
    "key1": {
        "list1": [
            "val1",
            "val2",
            "val3"
        ]
    },
    "key2": {
        "list1": [
            "val4",
            "val5"
        ]
    },
    "key3": {
        "list1": [
            "val6"
        ]
    },
    "key4": {
        "list2": [
            "val7"
        ]
    }
}

Sample Output:

val1 val2 val3 val4 val5 val6

Finally, in any programming solution it is often instructive to look at intermediate data-structures. So here's what the %hash looks like after decoding JSON input:

~$ raku -MJSON::Tiny -e 'my %hash = from-json($_) given lines;  .say for %hash.sort;'  file
key1 => {list1 => [val1 val2 val3]}
key2 => {list1 => [val4 val5]}
key3 => {list1 => [val6]}
key4 => {list2 => [val7]}

https://raku.land/cpan:MORITZ/JSON::Tiny
https://docs.raku.org/language/hashmap
https://raku.org