Brave Search

Python & Pandas: Flattening nested json with pd.json_normalize

stackoverflow.com › questions › 68864580 › python-pandas-flattening-nested-json-with-pd-json-normalize

You’re trying to flatten 2 different “depths” in the json file, which can’t be done in a single json_normalize call. You could simply use 2 pd.json_normalize since all entries contain ids to match all the parsed data later:

>>> pd.json_normalize(d, record_path='view')
       id  user_id parent_id            created_at            updated_at rating_count rating_sum        message                                            replies
0  109205     6354      None  2020-11-03T23:32:49Z  2020-11-03T23:32:49Z         None       None  message text1  [{'id': 109298, 'user_id': 5457, 'parent_id': ...
>>> pd.json_normalize(d, record_path=['view', 'replies'])
       id  user_id  parent_id            created_at            updated_at rating_count rating_sum        message
0  109298     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text2
1  109299     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text3

(I’ve added as second reply to your example with same data and id incremented by 1 so we can see what happens for several replies per view.)

Alternately, you can use your second pd.json_normalize on the replies column of your previous result, which is probably less work. This is more interesting if you .explode() the column first to get one row per reply:

>>> pd.json_normalize(view['replies'].explode())
       id  user_id  parent_id            created_at            updated_at rating_count rating_sum        message
0  109298     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text2
1  109299     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text3

So here’s a way to construct a single dataframe with all the info:

>>> view = pd.json_normalize(d, record_path='view')
>>> df = pd.merge(
...     view.drop(columns=['replies']),
...     pd.json_normalize(view['replies'].explode()),
...     left_on='id', right_on='parent_id', how='right',
...     suffixes=('_view', '_reply')
... )
>>> df
   id_view  user_id_view parent_id_view       created_at_view       updated_at_view rating_count_view rating_sum_view   message_view  id_reply  user_id_reply  parent_id_reply      created_at_reply      updated_at_reply rating_count_reply rating_sum_reply  message_reply
0   109205          6354           None  2020-11-03T23:32:49Z  2020-11-03T23:32:49Z              None            None  message text1    109298           5457           109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z               None             None  message text2
1   109205          6354           None  2020-11-03T23:32:49Z  2020-11-03T23:32:49Z              None            None  message text1    109299           5457           109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z               None             None  message text3
>>> df[['user_id_view', 'message_view', 'user_id_reply', 'message_reply']]
   user_id_view   message_view  user_id_reply  message_reply
0          6354  message text1           5457  message text2
1          6354  message text1           5457  message text3

Answer from Cimbali on Stack Overflow

Pandas

pandas.pydata.org › docs › reference › api › pandas.json_normalize.html

pandas.json_normalize — pandas 3.0.2 documentation

Normalizes nested data up to level 1. >>> data = [ ... { ... "id": 1, ... "name": "Cole Volk", ... "fitness": {"height": 130, "weight": 60}, ... }, ... {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}}, ... { ... "id": 2, ... "name": "Faye Raker", ... "fitness": {"height": 130, "weight": 60}, ... }, ... ] >>> pd.json_normalize(data, max_level=1) id name fitness.height fitness.weight 0 1.0 Cole Volk 130 60 1 NaN Mark Reg 130 60 2 2.0 Faye Raker 130 60

Stack Overflow

stackoverflow.com › questions › 68864580 › python-pandas-flattening-nested-json-with-pd-json-normalize

Python & Pandas: Flattening nested json with pd.json_normalize - Stack Overflow

Top answer

1 of 1

>>> pd.json_normalize(d, record_path='view')
       id  user_id parent_id            created_at            updated_at rating_count rating_sum        message                                            replies
0  109205     6354      None  2020-11-03T23:32:49Z  2020-11-03T23:32:49Z         None       None  message text1  [{'id': 109298, 'user_id': 5457, 'parent_id': ...
>>> pd.json_normalize(d, record_path=['view', 'replies'])
       id  user_id  parent_id            created_at            updated_at rating_count rating_sum        message
0  109298     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text2
1  109299     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text3

(I’ve added as second reply to your example with same data and id incremented by 1 so we can see what happens for several replies per view.)

>>> pd.json_normalize(view['replies'].explode())
       id  user_id  parent_id            created_at            updated_at rating_count rating_sum        message
0  109298     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text2
1  109299     5457     109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z         None       None  message text3

So here’s a way to construct a single dataframe with all the info:

>>> view = pd.json_normalize(d, record_path='view')
>>> df = pd.merge(
...     view.drop(columns=['replies']),
...     pd.json_normalize(view['replies'].explode()),
...     left_on='id', right_on='parent_id', how='right',
...     suffixes=('_view', '_reply')
... )
>>> df
   id_view  user_id_view parent_id_view       created_at_view       updated_at_view rating_count_view rating_sum_view   message_view  id_reply  user_id_reply  parent_id_reply      created_at_reply      updated_at_reply rating_count_reply rating_sum_reply  message_reply
0   109205          6354           None  2020-11-03T23:32:49Z  2020-11-03T23:32:49Z              None            None  message text1    109298           5457           109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z               None             None  message text2
1   109205          6354           None  2020-11-03T23:32:49Z  2020-11-03T23:32:49Z              None            None  message text1    109299           5457           109205  2020-11-04T19:42:59Z  2020-11-04T19:42:59Z               None             None  message text3
>>> df[['user_id_view', 'message_view', 'user_id_reply', 'message_reply']]
   user_id_view   message_view  user_id_reply  message_reply
0          6354  message text1           5457  message text2
1          6354  message text1           5457  message text3

Discussions

Pandas.DataFrame.json_normalize optimization

Is it also possible the slow piece is concatenating the dataframes together? How would I go about optimizing that? By showing us your code. I have a suspicion. More on reddit.com

r/learnpython

December 9, 2022

python - pandas json_normalize with very nested json - Stack Overflow

I have been trying to normalize a very nested json file I will later analyze. What I am struggling with is how to go more than one level deep to normalize. I went through the pandas.io.json. More on stackoverflow.com

stackoverflow.com

Json_normalize help, not able to normalize in loop

Hello All, At a fix right now, when I use a cell in excel as an input to json_normalize, it works perfectly. When I use iteration for each cell, json_normalize gives error. I even tried creating a dict of all cells, still the json_normalize does not work over it. More on discuss.python.org

discuss.python.org

July 19, 2023

Json normalize (pandas) in mixed type columns

maybe play around with max_level pandas.json_normalize — pandas 2.2.3 documentation then you can probably do whatever you need with .apply against the column with nested structures. More on reddit.com

r/learnpython

February 12, 2025

Videos