Brave Search

stackoverflow.com › questions › 51359783 › how-to-flatten-multilevel-nested-json

I used the following function (details can be found here):

def flatten_data(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

This unfortunately completely flattens whole JSON, meaning that if you have multi-level JSON (many nested dictionaries), it might flatten everything into single line with tons of columns.

What I used, in the end, was json_normalize() and specified structure that I required. A nice example of how to do it that way can be found here.

Answer from Bostjan on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 51359783 › how-to-flatten-multilevel-nested-json

python - How to flatten multilevel/nested JSON? - Stack Overflow

Top answer

1 of 10

I used the following function (details can be found here):

def flatten_data(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

This unfortunately completely flattens whole JSON, meaning that if you have multi-level JSON (many nested dictionaries), it might flatten everything into single line with tons of columns.

What I used, in the end, was json_normalize() and specified structure that I required. A nice example of how to do it that way can be found here.

2 of 10

Cross-posting (but then adapting further) from https://stackoverflow.com/a/62186053/4355695 : In this repo: https://github.com/ScriptSmith/socialreaper/blob/master/socialreaper/tools.py#L8 , I found an implementation of the list-inclusion comment by @roneo to the answer posted by @Imran.

I've added checks to it for catching empty lists and empty dicts. And also added print lines that will help one understand precisely how this function works. You can turn on those print statements by passing crumbs=True in the function's args.

from collections.abc import MutableMapping
def flatten(dictionary, parent_key=False, separator='.', crumbs=False):
    """
    Turn a nested dictionary into a flattened dictionary
    :param dictionary: The dictionary to flatten
    :param parent_key: The string to prepend to dictionary's keys
    :param separator: The string used to separate flattened keys
    :return: A flattened dictionary
    """

    items = []
    for key, value in dictionary.items():
        if crumbs: print('checking:',key)
        new_key = str(parent_key) + separator + key if parent_key else key
        if isinstance(value, MutableMapping):
            if crumbs: print(new_key,': dict found')
            if not value.items():
                if crumbs: print('Adding key-value pair:',new_key,None)
                items.append((new_key,None))
            else:
                items.extend(flatten(value, new_key, separator).items())
        elif isinstance(value, list):
            if crumbs: print(new_key,': list found')
            if len(value):
                for k, v in enumerate(value):
                    items.extend(flatten({str(k): v}, new_key, separator).items())
            else:
                if crumbs: print('Adding key-value pair:',new_key,None)
                items.append((new_key,None))
        else:
            if crumbs: print('Adding key-value pair:',new_key,value)
            items.append((new_key, value))
    return dict(items)

Test it:

ans = flatten({'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3], 'e':{'f':[], 'g':{}} })
print('\nflattened:',ans)

Output:

checking: a
Adding key-value pair: a 1
checking: c
c : dict found
checking: a
Adding key-value pair: c.a 2
checking: b
c.b : dict found
checking: x
Adding key-value pair: c.b.x 5
checking: y
Adding key-value pair: c.b.y 10
checking: d
d : list found
checking: 0
Adding key-value pair: d.0 1
checking: 1
Adding key-value pair: d.1 2
checking: 2
Adding key-value pair: d.2 3
checking: e
e : dict found
checking: f
e.f : list found
Adding key-value pair: e.f None
checking: g
e.g : dict found
Adding key-value pair: e.g None

flattened: {'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'd.0': 1, 'd.1': 2, 'd.2': 3, 'e.f': None, 'e.g': None}

Annd that does the job I need done: I throw any complicated json at this and it flattens it out for me. I added a check to the original code to handle empty lists too

Credits to https://github.com/ScriptSmith whose repo I found the intial flatten function in.

Testing OP's sample json, here's the output:

{'count': 13,
 'virtualmachine.0.id': '1082e2ed-ff66-40b1-a41b-26061afd4a0b',
 'virtualmachine.0.name': 'test-2',
 'virtualmachine.0.displayname': 'test-2',
 'virtualmachine.0.securitygroup.0.id': '9e649fbc-3e64-4395-9629-5e1215b34e58',
 'virtualmachine.0.securitygroup.0.name': 'test',
 'virtualmachine.0.securitygroup.0.tags': None,
 'virtualmachine.0.nic.0.id': '79568b14-b377-4d4f-b024-87dc22492b8e',
 'virtualmachine.0.nic.0.networkid': '05c0e278-7ab4-4a6d-aa9c-3158620b6471',
 'virtualmachine.0.nic.1.id': '3d7f2818-1f19-46e7-aa98-956526c5b1ad',
 'virtualmachine.0.nic.1.networkid': 'b4648cfd-0795-43fc-9e50-6ee9ddefc5bd',
 'virtualmachine.0.nic.1.traffictype': 'Guest',
 'virtualmachine.0.hypervisor': 'KVM',
 'virtualmachine.0.affinitygroup': None,
 'virtualmachine.0.isdynamicallyscalable': False}

So you'll see that 'tags' and 'affinitygroup' keys are also handled and added to output. Original code was omitting them.

2021-05-30 : Updated: collections.MutableMapping is changed to collections.abc.MutableMapping

2023-01-11 : edited, added separator arg in second items.extend() call as advised by @MHebes

2024-02-20 : how did that .abc go missing from the import statement?

2025-07-21 : moved crumbs param into function's args

PyPI

pypi.org › project › flatten-json

flatten-json · PyPI

Flattens JSON objects in Python.

      » pip install flatten-json

Published Oct 27, 2023

Version 0.1.14

Homepage https://github.com/amirziai/flatten

Discussions

How to Flatten Nested Json Files Efficiently?

I would absolutely do this in SQL first if you have the option. SQL was designed to be lightning fast at tasks like this (look up lateral flatten if you are unfamiliar with how to do this). More on reddit.com

r/datascience

January 15, 2024

Help with flattening really deeply nested json.

Here's a version that enumerates list items and throws out lists with only 1 element in them: def flatten(old_data, new_data=None, parent_key='', sep='_', width=2): ''' Json-style nested dictionary / list flattener :old_data: the original data :new_data: the result dictionary :parent_key: all keys will have this prefix :sep: the separator between the keys :width: width of the field when converting list indexes ''' if new_data is None: new_data = {} if isinstance(old_data, dict): for k, v in old_data.items(): new_key = parent_key + sep + k if parent_key else k flatten(v, new_data, new_key, sep, width) elif isinstance(old_data, list): if len(old_data) == 1: flatten(old_data[0], new_data, parent_key, sep, width) else: for i, elem in enumerate(old_data): new_key = "{}{}{:0>{width}}".format(parent_key, sep if parent_key else '', i, width=width) flatten(elem, new_data, new_key, sep, width) else: if parent_key not in new_data: new_data[parent_key] = old_data else: raise AttributeError("key {} is already used".format(parent_key)) return new_data More on reddit.com

r/learnpython

July 10, 2017

How do you flatten nested json/xml?

It depends… Unnesting structs is fine, it’s when you start exploding arrays that you have to think about what your data is representing and what you are trying to achieve with your data. Imagine having an order with information from the order header and an array of order lines, if you explode this array and keep it in the same table the data from your header becomes harder to use for some use cases. In this case I would create 2 tables: order_header (drop the array and unnest everything) order_line (explode the array and drop all header data except for order_id) In many cases I end up flattening data with multiple arrays into multiple tables. In most cases just unnesting everything and exploding all arrays doesn’t make sense. If the array always only contains 1 item m you can also do column[0] to get the first (and only) element before unnesting. More on reddit.com

r/dataengineering

January 24, 2024

Dealing with big JSON objects - flatten into tabular or find a way to query JSON efficiently?

It really depends on how you plan to use your data. Months ago I tried treating the JSON directly after the request and the storage is really simple; however, whenever my client needed more fields added, I had to re-extract all the data which included making hundreds of thousands of requests. If you believe you are going to need extra fields in the future, I strongly recommend saving the data as JSON and then treating it. As to how you’re going to save it/your pipeline, I don’t know how to help you since I mainly use python and parquet files with local storage + airflow More on reddit.com

r/dataengineering

November 22, 2023

Videos