You’re trying to flatten 2 different “depths” in the json file, which can’t be done in a single json_normalize call. You could simply use 2 pd.json_normalize since all entries contain ids to match all the parsed data later:

>>> pd.json_normalize(d, record_path='view')
       id  user_id parent_id            created_at            updated_at rating_count rating_sum        message                                            replies
0  109205     6354      None  2020-11-03T23:32:49Z  2020-11-03T23:32:49Z         None       None  message text1  [{'id': 109298, 'user_id': 5457, 'parent_id': ...
>>> pd.json_normalize(d, record_path=['view', 'replies'])
       id  user_id  parent_id            created_at            updated_at rating_count rating_sum        message
0  109298     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text2
1  109299     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text3

(I’ve added as second reply to your example with same data and id incremented by 1 so we can see what happens for several replies per view.)

Alternately, you can use your second pd.json_normalize on the replies column of your previous result, which is probably less work. This is more interesting if you .explode() the column first to get one row per reply:

>>> pd.json_normalize(view['replies'].explode())
       id  user_id  parent_id            created_at            updated_at rating_count rating_sum        message
0  109298     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text2
1  109299     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text3

So here’s a way to construct a single dataframe with all the info:

>>> view = pd.json_normalize(d, record_path='view')
>>> df = pd.merge(
...     view.drop(columns=['replies']),
...     pd.json_normalize(view['replies'].explode()),
...     left_on='id', right_on='parent_id', how='right',
...     suffixes=('_view', '_reply')
... )
>>> df
   id_view  user_id_view parent_id_view       created_at_view       updated_at_view rating_count_view rating_sum_view   message_view  id_reply  user_id_reply  parent_id_reply      created_at_reply      updated_at_reply rating_count_reply rating_sum_reply  message_reply
0   109205          6354           None  2020-11-03T23:32:49Z  2020-11-03T23:32:49Z              None            None  message text1    109298           5457           109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z               None             None  message text2
1   109205          6354           None  2020-11-03T23:32:49Z  2020-11-03T23:32:49Z              None            None  message text1    109299           5457           109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z               None             None  message text3
>>> df[['user_id_view', 'message_view', 'user_id_reply', 'message_reply']]
   user_id_view   message_view  user_id_reply  message_reply
0          6354  message text1           5457  message text2
1          6354  message text1           5457  message text3
Answer from Cimbali on Stack Overflow
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.json_normalize.html
pandas.json_normalize — pandas 3.0.2 documentation
Normalizes nested data up to level 1. >>> data = [ ... { ... "id": 1, ... "name": "Cole Volk", ... "fitness": {"height": 130, "weight": 60}, ... }, ... {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}}, ... { ... "id": 2, ... "name": "Faye Raker", ... "fitness": {"height": 130, "weight": 60}, ... }, ... ] >>> pd.json_normalize(data, max_level=1) id name fitness.height fitness.weight 0 1.0 Cole Volk 130 60 1 NaN Mark Reg 130 60 2 2.0 Faye Raker 130 60
Top answer
1 of 1
3

You’re trying to flatten 2 different “depths” in the json file, which can’t be done in a single json_normalize call. You could simply use 2 pd.json_normalize since all entries contain ids to match all the parsed data later:

>>> pd.json_normalize(d, record_path='view')
       id  user_id parent_id            created_at            updated_at rating_count rating_sum        message                                            replies
0  109205     6354      None  2020-11-03T23:32:49Z  2020-11-03T23:32:49Z         None       None  message text1  [{'id': 109298, 'user_id': 5457, 'parent_id': ...
>>> pd.json_normalize(d, record_path=['view', 'replies'])
       id  user_id  parent_id            created_at            updated_at rating_count rating_sum        message
0  109298     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text2
1  109299     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text3

(I’ve added as second reply to your example with same data and id incremented by 1 so we can see what happens for several replies per view.)

Alternately, you can use your second pd.json_normalize on the replies column of your previous result, which is probably less work. This is more interesting if you .explode() the column first to get one row per reply:

>>> pd.json_normalize(view['replies'].explode())
       id  user_id  parent_id            created_at            updated_at rating_count rating_sum        message
0  109298     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text2
1  109299     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text3

So here’s a way to construct a single dataframe with all the info:

>>> view = pd.json_normalize(d, record_path='view')
>>> df = pd.merge(
...     view.drop(columns=['replies']),
...     pd.json_normalize(view['replies'].explode()),
...     left_on='id', right_on='parent_id', how='right',
...     suffixes=('_view', '_reply')
... )
>>> df
   id_view  user_id_view parent_id_view       created_at_view       updated_at_view rating_count_view rating_sum_view   message_view  id_reply  user_id_reply  parent_id_reply      created_at_reply      updated_at_reply rating_count_reply rating_sum_reply  message_reply
0   109205          6354           None  2020-11-03T23:32:49Z  2020-11-03T23:32:49Z              None            None  message text1    109298           5457           109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z               None             None  message text2
1   109205          6354           None  2020-11-03T23:32:49Z  2020-11-03T23:32:49Z              None            None  message text1    109299           5457           109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z               None             None  message text3
>>> df[['user_id_view', 'message_view', 'user_id_reply', 'message_reply']]
   user_id_view   message_view  user_id_reply  message_reply
0          6354  message text1           5457  message text2
1          6354  message text1           5457  message text3
Discussions

Pandas.DataFrame.json_normalize optimization
Is it also possible the slow piece is concatenating the dataframes together? How would I go about optimizing that? By showing us your code. I have a suspicion. More on reddit.com
🌐 r/learnpython
10
1
December 9, 2022
python - pandas json_normalize with very nested json - Stack Overflow
I have been trying to normalize a very nested json file I will later analyze. What I am struggling with is how to go more than one level deep to normalize. I went through the pandas.io.json. More on stackoverflow.com
🌐 stackoverflow.com
Json_normalize help, not able to normalize in loop
Hello All, At a fix right now, when I use a cell in excel as an input to json_normalize, it works perfectly. When I use iteration for each cell, json_normalize gives error. I even tried creating a dict of all cells, still the json_normalize does not work over it. More on discuss.python.org
🌐 discuss.python.org
0
0
July 19, 2023
Json normalize (pandas) in mixed type columns
maybe play around with max_level pandas.json_normalize — pandas 2.2.3 documentation then you can probably do whatever you need with .apply against the column with nested structures. More on reddit.com
🌐 r/learnpython
1
1
February 12, 2025
Top answer
1 of 2
28

You could just pass data without any extra params.

df = pd.io.json.json_normalize(data)
df

   complete    mid.c    mid.h    mid.l    mid.o                  time  volume
0      True  119.743  119.891  119.249  119.341  1488319200.000000000   14651
1      True  119.893  119.954  119.552  119.738  1488348000.000000000   10738
2      True  119.946  120.221  119.840  119.888  1488376800.000000000   10041

If you want to change the column order, use df.reindex:

df = df.reindex(columns=['time', 'volume', 'complete', 'mid.h', 'mid.l', 'mid.c', 'mid.o'])
df

                   time  volume  complete    mid.h    mid.l    mid.c    mid.o
0  1488319200.000000000   14651      True  119.891  119.249  119.743  119.341
1  1488348000.000000000   10738      True  119.954  119.552  119.893  119.738
2  1488376800.000000000   10041      True  120.221  119.840  119.946  119.888
2 of 2
9

The data in the OP (after deserialized from a json string preferably using json.load()) is a list of nested dictionaries, which is an ideal data structure for pd.json_normalize() because it converts a list of dictionaries and flattens each dictionary into a single row. So the length of the list determines the number of rows and the total number of key-value pairs in the dictionaries determine the number of columns.

However, if a value under some key is a list, then that no longer is true because presumably the items in the those lists need to be in their separate rows. For example, if my_data.json file is like:

# my_data.json
[
    {'price': {'mid': ['119.743', '119.891', '119.341'], 'time': '123'}},
    {'price': {'mid': ['119.893', '119.954', '119.552'], 'time': '456'}},
    {'price': {'mid': ['119.946', '120.221', '119.840'], 'time': '789'}}
]

and then you'll want to put each value in the list as its own row. In that case, you can pass the path to these lists as record_path= argument. Also, you can make each record have its accompanying metadata, whose path you can also pass as meta= argument.

# deserialize json into a python data structure
import json
with open('my_data.json', 'r') as f:
    data = json.load(f)

# normalize the python data structure
df = pd.json_normalize(data, record_path=['price', 'mid'], meta=[['price', 'time']], record_prefix='mid.')

Ultimately, pd.json_normalize() cannot handle anything more complex than this kind of structure. For example, it cannot add another metadata to the above example if it's nested inside another dictionary. Depending on the data, you'll most probably need a recursive function to parse it (FYI, pd.json_normalize() is a recursive function as well but it's for a general case and won't work for a lot of specific objects).

Often times, you'll need a combination of explode(), pd.DataFrame(col.tolist()) etc. to completely parse the data.

Pandas also has a convenience function pd.read_json() as well but it's even more limited than pd.json_normalize() in that it can only correctly parse a json array of one nesting level. Unlike pd.json_normalize() however, it deserializes a json string under the hood so you can directly pass the path to a json file to it (no need for json.load()). In other words, the following two produce the same output:

df1 = pd.read_json("my_data.json") 
df2 = pd.json_normalize(data, max_level=0)  # here, `data` is deserialized `my_data.json`
df1.equals(df2)  # True
🌐
PyPI
pypi.org › project › json-normalize
json-normalize
June 29, 2021 - JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser
Find elsewhere
🌐
Reddit
reddit.com › r/learnpython › pandas.dataframe.json_normalize optimization
r/learnpython on Reddit: Pandas.DataFrame.json_normalize optimization
December 9, 2022 -

I’ve been using the json_normalize function in pandas to read through a folder of json files and build a dataframe for the entire folder. It’s working today, but the issue is that it takes much longer than I was hoping it would, are there any optimizations that I can look into or even an alternative to pandas that could normalize json faster?

For context the json folder holds about 800mb of json files (each file ~2mb), and it takes roughly 14 minutes to parse through them all and build the dataframe

Is it also possible the slow piece is concatenating the dataframes together? How would I go about optimizing that?

🌐
DEV Community
dev.to › ernestinem › normalize-nested-json-objects-with-pandas-1g7m
Normalize nested JSON objects with pandas - DEV Community
August 3, 2020 - Make a python list of the keys we care about. We can accesss nested objects with the dot notation · Put the unserialized JSON Object to our function json_normalize
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-pandas-flatten-nested-json
Python Pandas - Flatten nested JSON - GeeksforGeeks
December 10, 2025 - Converting JSON data into a Pandas DataFrame makes it easier to analyze, manipulate, and visualize. Pandas provides a built-in function- json_normalize(), which efficiently flattens simple to moderately nested JSON data into a flat tabular format.
🌐
Note.nkmk.me
note.nkmk.me › home › python › pandas
pandas: Convert a list of dictionaries to DataFrame with json_normalize | note.nkmk.me
March 16, 2023 - You can convert a list of dictionaries with shared keys to pandas.DataFrame with pandas.json_normalize().
Top answer
1 of 3
62

In the pandas example (below) what do the brackets mean? Is there a logic to be followed to go deeper with the []. [...]

result = json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])

Each string or list of strings in the ['state', 'shortname', ['info', 'governor']] value is a path to an element to include, in addition to the selected rows. The second argument json_normalize() argument (record_path, set to 'counties' in the documentation example) tells the function how to select elements from the input data structure that make up the rows in the output, and the meta paths adds further metadata that will be included with each of the rows. Think of these as table joins in a database, if you will.

The input for the US States documentation example has two dictionaries in a list, and both of these dictionaries have a counties key that references another list of dicts:

>>> data = [{'state': 'Florida',
...          'shortname': 'FL',
...         'info': {'governor': 'Rick Scott'},
...         'counties': [{'name': 'Dade', 'population': 12345},
...                      {'name': 'Broward', 'population': 40000},
...                      {'name': 'Palm Beach', 'population': 60000}]},
...         {'state': 'Ohio',
...          'shortname': 'OH',
...          'info': {'governor': 'John Kasich'},
...          'counties': [{'name': 'Summit', 'population': 1234},
...                       {'name': 'Cuyahoga', 'population': 1337}]}]
>>> pprint(data[0]['counties'])
[{'name': 'Dade', 'population': 12345},
 {'name': 'Broward', 'population': 40000},
 {'name': 'Palm Beach', 'population': 60000}]
>>> pprint(data[1]['counties'])
[{'name': 'Summit', 'population': 1234},
 {'name': 'Cuyahoga', 'population': 1337}]

Between them there are 5 rows of data to use in the output:

>>> json_normalize(data, 'counties')
         name  population
0        Dade       12345
1     Broward       40000
2  Palm Beach       60000
3      Summit        1234
4    Cuyahoga        1337

The meta argument then names some elements that live next to those counties lists, and those are then merged in separately. The values from the first data[0] dictionary for those meta elements are ('Florida', 'FL', 'Rick Scott'), respectively, and for data[1] the values are ('Ohio', 'OH', 'John Kasich'), so you see those values attached to the counties rows that came from the same top-level dictionary, repeated 3 and 2 times respectively:

>>> data[0]['state'], data[0]['shortname'], data[0]['info']['governor']
('Florida', 'FL', 'Rick Scott')
>>> data[1]['state'], data[1]['shortname'], data[1]['info']['governor']
('Ohio', 'OH', 'John Kasich')
>>> json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])
         name  population    state shortname info.governor
0        Dade       12345  Florida        FL    Rick Scott
1     Broward       40000  Florida        FL    Rick Scott
2  Palm Beach       60000  Florida        FL    Rick Scott
3      Summit        1234     Ohio        OH   John Kasich
4    Cuyahoga        1337     Ohio        OH   John Kasich

So, if you pass in a list for the meta argument, then each element in the list is a separate path, and each of those separate paths identifies data to add to the rows in the output.

In your example JSON, there are only a few nested lists to elevate with the first argument, like 'counties' did in the example. The only example in that datastructure is the nested 'authors' key; you'd have to extract each ['_source', 'authors'] path, after which you can add other keys from the parent object to augment those rows.

The second meta argument then pulls in the _id key from the outermost objects, followed by the nested ['_source', 'title'] and ['_source', 'journal'] nested paths.

The record_path argument takes the authors lists as the starting point, these look like:

>>> d['hits']['hits'][0]['_source']['authors']   # this value is None, and is skipped
>>> d['hits']['hits'][1]['_source']['authors']
[{'affiliations': ['Punjabi University'],
  'author_id': '780E3459',
  'author_name': 'munish puri'},
 {'affiliations': ['Punjabi University'],
  'author_id': '48D92C79',
  'author_name': 'rajesh dhaliwal'},
 {'affiliations': ['Punjabi University'],
  'author_id': '7D9BD37C',
  'author_name': 'r s singh'}]
>>> d['hits']['hits'][2]['_source']['authors']
[{'author_id': '7FF872BC',
  'author_name': 'barbara eileen ryan'}]
>>> # etc.

and so gives you the following rows:

>>> json_normalize(d['hits']['hits'], ['_source', 'authors'])
           affiliations author_id          author_name
0  [Punjabi University]  780E3459          munish puri
1  [Punjabi University]  48D92C79      rajesh dhaliwal
2  [Punjabi University]  7D9BD37C            r s singh
3                   NaN  7FF872BC  barbara eileen ryan
4                   NaN  0299B8E9     fraser j harbutt
5                   NaN  7DAB7B72   richard m freeland

and then we can use the third meta argument to add more columns like _id, _source.title and _source.journal, using ['_id', ['_source', 'journal'], ['_source', 'title']]:

>>> json_normalize(
...     data['hits']['hits'],
...     ['_source', 'authors'],
...     ['_id', ['_source', 'journal'], ['_source', 'title']]
... )
           affiliations author_id          author_name       _id   \
0  [Punjabi University]  780E3459          munish puri  7AF8EBC3  
1  [Punjabi University]  48D92C79      rajesh dhaliwal  7AF8EBC3
2  [Punjabi University]  7D9BD37C            r s singh  7AF8EBC3
3                   NaN  7FF872BC  barbara eileen ryan  7521A721
4                   NaN  0299B8E9     fraser j harbutt  7DAEB9A4
5                   NaN  7DAB7B72   richard m freeland  7B3236C5

                                     _source.journal
0  Journal of Industrial Microbiology & Biotechno...
1  Journal of Industrial Microbiology & Biotechno...
2  Journal of Industrial Microbiology & Biotechno...
3                     The American Historical Review
4                     The American Historical Review
5                     The American Historical Review

                                       _source.title  \
0  Development of a stable continuous flow immobi...
1  Development of a stable continuous flow immobi...
2  Development of a stable continuous flow immobi...
3  Feminism and the women's movement : dynamics o...
4  The iron curtain : Churchill, America, and the...
5  The Truman Doctrine and the origins of McCarth...
2 of 3
25

You can also have a look at the library flatten_json, which does not require you to write column hierarchies as in json_normalize:

from flatten_json import flatten

data = d['hits']['hits']
dict_flattened = (flatten(record, '.') for record in data)
df = pd.DataFrame(dict_flattened)
print(df)

See https://github.com/amirziai/flatten.

🌐
AskPython
askpython.com › home › how to normalize semi-structured json data into a flat table?
How to Normalize semi-structured JSON data into a flat table? - AskPython
February 23, 2023 - We have seen the syntax of pd.json_normalize and its parameters. Coming to the examples, we have seen how to create a JSON from python dictionaries and then normalize it with the help of json.loads and json.dumps.
🌐
Python.org
discuss.python.org › python help
Json_normalize help, not able to normalize in loop - Python Help - Discussions on Python.org
July 19, 2023 - Hello All, At a fix right now, when I use a cell in excel as an input to json_normalize, it works perfectly. When I use iteration for each cell, json_normalize gives error. I even tried creating a dict of all cells, s…
🌐
Medium
medium.com › @whyamit404 › what-is-pandas-json-normalize-and-why-use-it-50ae0cf2d12d
What is pandas.json_normalize and Why Use It? | by whyamit404 | Medium
February 26, 2025 - I’ve been there too, juggling Python basics one day and feeling lost the next. That’s why I created Data Science Roadmap with Projects [Week-by-Week] — a clear, actionable plan to guide you step-by-step. With practical projects and a structured path, you’ll finally connect the dots and build real-world skills with confidence. Click here to get your roadmap! ... Well, that’s where pandas.json_normalize steps in—it’s like having a magic tool that flattens all those complex layers into a neat, easy-to-read table (a DataFrame).
🌐
Reddit
reddit.com › r/learnpython › json normalize (pandas) in mixed type columns
r/learnpython on Reddit: Json normalize (pandas) in mixed type columns
February 12, 2025 -

Hi all, I'm trying to flatten a json to dataframe using json_normalize but for one column I have mixed type data like first few rows as json and later rows as arrays, so json_normalize is not working as expected for this column, Any help would be appreciated

🌐
GitHub
github.com › BindiChen › machine-learning › blob › main › data-analysis › 028-pandas-json_normalize › pandas-json_normalize.ipynb
machine-learning/data-analysis/028-pandas-json_normalize/pandas-json_normalize.ipynb at main · BindiChen/machine-learning
"# load data using Python JSON module\n", "with open('data/simple.json','r') as f:\n", " data = json.loads(f.read())\n", " \n", "# Normalizing data\n", "pd.json_normalize(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8.
Author   BindiChen
🌐
Pandas
pandas.pydata.org › docs › dev › reference › api › pandas.json_normalize.html
pandas.json_normalize — pandas documentation
Normalizes nested data up to level 1. >>> data = [ ... { ... "id": 1, ... "name": "Cole Volk", ... "fitness": {"height": 130, "weight": 60}, ... }, ... {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}}, ... { ... "id": 2, ... "name": "Faye Raker", ... "fitness": {"height": 130, "weight": 60}, ... }, ... ] >>> pd.json_normalize(data, max_level=1) id name fitness.height fitness.weight 0 1.0 Cole Volk 130 60 1 NaN Mark Reg 130 60 2 2.0 Faye Raker 130 60
🌐
YouTube
youtube.com › jie jenn
Normalize JSON Dataset With pandas - YouTube
In this tutorial I will go over 2 examples on how to normalize a dictionary and a JSON dataset into a tabular format that can be easily analyzed and processe...
Published   February 13, 2023
🌐
Medium
medium.com › @mrgon.pongsakorn › normalize-json-data-using-pandas-json-normalize-7880b8e4d3c9
Normalize JSON Data Using pandas.json_normalize | by Pongsakorn Akkaramanee | Medium
July 13, 2024 - Fortunately, the pandas library provides a powerful function called json_normalize that can simplify this task by flattening nested JSON data into a more manageable tabular format.
🌐
Stack Overflow
stackoverflow.com › questions › tagged › json-normalize
Newest 'json-normalize' Questions - Stack Overflow
I'm finding an error message when I try to normalize a Json structure that follows. I pasted the JSON structure and the python code to normalize that is giving the ERROR message GOAL: Normalize ...