Brave Search

How to normalize json correctly by Pandas

stackoverflow.com › questions › 46091362 › how-to-normalize-json-correctly-by-pandas

You could just pass data without any extra params.

df = pd.io.json.json_normalize(data)
df

   complete    mid.c    mid.h    mid.l    mid.o                  time  volume
0      True  119.743  119.891  119.249  119.341  1488319200.000000000   14651
1      True  119.893  119.954  119.552  119.738  1488348000.000000000   10738
2      True  119.946  120.221  119.840  119.888  1488376800.000000000   10041

If you want to change the column order, use df.reindex:

df = df.reindex(columns=['time', 'volume', 'complete', 'mid.h', 'mid.l', 'mid.c', 'mid.o'])
df

                   time  volume  complete    mid.h    mid.l    mid.c    mid.o
0  1488319200.000000000   14651      True  119.891  119.249  119.743  119.341
1  1488348000.000000000   10738      True  119.954  119.552  119.893  119.738
2  1488376800.000000000   10041      True  120.221  119.840  119.946  119.888

Answer from coldspeed95 on Stack Overflow

Pandas

pandas.pydata.org › docs › reference › api › pandas.json_normalize.html

pandas.json_normalize — pandas 3.0.2 documentation

Normalizes nested data up to level 1. >>> data = [ ... { ... "id": 1, ... "name": "Cole Volk", ... "fitness": {"height": 130, "weight": 60}, ... }, ... {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}}, ... { ... "id": 2, ... "name": "Faye Raker", ... "fitness": {"height": 130, ...

Medium

medium.com › @avishek2020 › exploring-the-power-of-json-normalize-a-step-by-step-guide-to-flattening-complex-json-data-in-5a2d694dd26

Exploring the Power of json_normalize: A Step-by-Step Guide to Flattening Complex JSON Data in Python Data Science | by Avishek | Medium

May 17, 2023 - In summary, json_normalize is a useful tool for working with nested JSON data in Python.

Discussions

Having difficulty building a dataframe with pandas from json data. Are there norms I'm not following?

meta can be very confusing. I think one of the issues is that you're using "values" (id/year) as keys. e.g. if you had: >>> data [{'id': 1, 'square footage': 2000, 'sales': [{'product': 'hammers', 'year': '2016', 'sold': 23}, {'product': 'hammers', 'year': '2017', 'sold': 10}, {'product': 'screws', 'year': '2017', 'sold': 5}]}, {'id': 2, 'square footage': 4000, 'sales': [{'product': 'nails', 'year': '2020', 'sold': 200}]}] You could do: df = pd.DataFrame(data).explode("sales") sales = pd.json_normalize(df["sales"]).add_prefix("sales.") df[sales.columns] = sales df = df.drop(columns="sales") >>> df id square footage sales.product sales.year sales.sold 0 1 2000 hammers 2016 23 0 1 2000 hammers 2016 23 0 1 2000 hammers 2016 23 1 2 4000 hammers 2017 10 As for how your data is currently structured - maybe the following example is helpful: Let's look at what normalize does on your data: >>> pd.json_normalize(data) 1.square footage 1.sales.products 1.sales.2016 1.sales.2017 2.square footage 2.sales.products 2.sales.2016 2.sales.2017 0 2000 [hammers, screws] [23, 10] [10, 5] 1500 [hammers, screws] [7, 18] [11, 2] normalize looks for "records" - i.e. lists of dicts - you have a single dict which is parsed as a single record - hence 1 row. You could instead just create a regular dataframe from the dict: >>> df = pd.DataFrame(data.items(), columns=["store.id", "sales"]) >>> df store.id sales 0 1 {'square footage': 2000, 'sales': {'products':... 1 2 {'square footage': 1500, 'sales': {'products':... Then normalize the sales: >>> pd.json_normalize(df["sales"]) square footage sales.products sales.2016 sales.2017 0 2000 [hammers, screws] [23, 10] [10, 5] 1 1500 [hammers, screws] [7, 18] [11, 2] You can put this back into the dataframe, and drop the existing sales column: sales = pd.json_normalize(df["sales"]) df[sales.columns] = sales df = df.drop(columns="sales") >>> df store.id square footage sales.products sales.2016 sales.2017 0 1 2000 [hammers, screws] [23, 10] [10, 5] 1 2 1500 [hammers, screws] [7, 18] [11, 2] From here, it depends on what you want to do really. You could turn the year columns into rows with .wide_to_long >>> pd.wide_to_long(df, stubnames="sales.", i="store.id", j="year") square footage sales.products sales. store.id year 1 2016 2000 [hammers, screws] [23, 10] 2 2016 1500 [hammers, screws] [7, 18] 1 2017 2000 [hammers, screws] [10, 5] 2 2017 1500 [hammers, screws] [11, 2] We can give the sales. column a new name, and .explode() the lists: (pd.wide_to_long(df, stubnames="sales.", i="store.id", j="year") .rename(columns={"sales.": "sales.sold"}) .explode(["sales.products", "sales.sold"]) Which gives you: square footage sales.products sales.sold store.id year 1 2016 2000 hammers 23 2016 2000 screws 10 2 2016 1500 hammers 7 2016 1500 screws 18 1 2017 2000 hammers 10 2017 2000 screws 5 2 2017 1500 hammers 11 2017 1500 screws 2 If you reset the index - you will have: store.id year square footage sales.products sales.sold 0 1 2016 2000 hammers 23 1 1 2016 2000 screws 10 2 2 2016 1500 hammers 7 3 2 2016 1500 screws 18 4 1 2017 2000 hammers 10 5 1 2017 2000 screws 5 6 2 2017 1500 hammers 11 7 2 2017 1500 screws 2 More on reddit.com

r/learnpython

6

2

March 16, 2023

Pandas.DataFrame.json_normalize optimization

Is it also possible the slow piece is concatenating the dataframes together? How would I go about optimizing that? By showing us your code. I have a suspicion. More on reddit.com

r/learnpython

10

1

December 9, 2022

Is there a difference between flattening or normalizing a JSON response and if so what is the value or use case?

Flattening the JSON is usually the first step as you unfold the nested data within and spread it out into a table. This usually results in several repeating field values, depending on the number of nesting. That's why normalizing it is the next step where you break down the table into smaller, related ones. You could also go straight to normalizing the JSON, if it's straightforward. We had one use case for this. We have a CSV data source and the schema keeps changing and breaking our pipelines. So we decided to ingest the data as JSON to accomodate added/deleted/renamed columns. It then undergoes a series of flattening and normalization to bring out the data needed. All the raw data gets persisted, and our pipelines don't break when there's a schema change. Problem solved. More on reddit.com

r/dataengineering

6

7

July 29, 2024

python - pandas json_normalize with very nested json - Stack Overflow

I have been trying to normalize a very nested json file I will later analyze. What I am struggling with is how to go more than one level deep to normalize. I went through the pandas.io.json. More on stackoverflow.com

stackoverflow.com

Videos

youtube.com

Normalize JSON Dataset With pandas - YouTube

February 13, 2023

23:31

YouTube

How to use the json_normalize Function in Pandas - YouTube

July 25, 2022

youtube.com

Pandas DataFrame vs json_normalize Explained

10:25

YouTube

Normalize JSON Data in RECORD TIME with this Hack - YouTube

July 18, 2024

19:55

YouTube

Mastering JSON in Pandas | Read, Normalize, and Manipulate JSON ...

July 21, 2025

youtube.com

Handling Null Values in JSON with Pandas json_normalize: A ...

View all

Stack Overflow

stackoverflow.com › questions › 46091362 › how-to-normalize-json-correctly-by-pandas

python - How to normalize json correctly by Pandas - Stack Overflow

Top answer

1 of 2

28

You could just pass data without any extra params.

df = pd.io.json.json_normalize(data)
df

   complete    mid.c    mid.h    mid.l    mid.o                  time  volume
0      True  119.743  119.891  119.249  119.341  1488319200.000000000   14651
1      True  119.893  119.954  119.552  119.738  1488348000.000000000   10738
2      True  119.946  120.221  119.840  119.888  1488376800.000000000   10041

If you want to change the column order, use df.reindex:

df = df.reindex(columns=['time', 'volume', 'complete', 'mid.h', 'mid.l', 'mid.c', 'mid.o'])
df

                   time  volume  complete    mid.h    mid.l    mid.c    mid.o
0  1488319200.000000000   14651      True  119.891  119.249  119.743  119.341
1  1488348000.000000000   10738      True  119.954  119.552  119.893  119.738
2  1488376800.000000000   10041      True  120.221  119.840  119.946  119.888

2 of 2

9

The data in the OP (after deserialized from a json string preferably using json.load()) is a list of nested dictionaries, which is an ideal data structure for pd.json_normalize() because it converts a list of dictionaries and flattens each dictionary into a single row. So the length of the list determines the number of rows and the total number of key-value pairs in the dictionaries determine the number of columns.

However, if a value under some key is a list, then that no longer is true because presumably the items in the those lists need to be in their separate rows. For example, if my_data.json file is like:

# my_data.json
[
    {'price': {'mid': ['119.743', '119.891', '119.341'], 'time': '123'}},
    {'price': {'mid': ['119.893', '119.954', '119.552'], 'time': '456'}},
    {'price': {'mid': ['119.946', '120.221', '119.840'], 'time': '789'}}
]

and then you'll want to put each value in the list as its own row. In that case, you can pass the path to these lists as record_path= argument. Also, you can make each record have its accompanying metadata, whose path you can also pass as meta= argument.

# deserialize json into a python data structure
import json
with open('my_data.json', 'r') as f:
    data = json.load(f)

# normalize the python data structure
df = pd.json_normalize(data, record_path=['price', 'mid'], meta=[['price', 'time']], record_prefix='mid.')

Ultimately, pd.json_normalize() cannot handle anything more complex than this kind of structure. For example, it cannot add another metadata to the above example if it's nested inside another dictionary. Depending on the data, you'll most probably need a recursive function to parse it (FYI, pd.json_normalize() is a recursive function as well but it's for a general case and won't work for a lot of specific objects).

Often times, you'll need a combination of explode(), pd.DataFrame(col.tolist()) etc. to completely parse the data.

Pandas also has a convenience function pd.read_json() as well but it's even more limited than pd.json_normalize() in that it can only correctly parse a json array of one nesting level. Unlike pd.json_normalize() however, it deserializes a json string under the hood so you can directly pass the path to a json file to it (no need for json.load()). In other words, the following two produce the same output:

df1 = pd.read_json("my_data.json") 
df2 = pd.json_normalize(data, max_level=0)  # here, `data` is deserialized `my_data.json`
df1.equals(df2)  # True

DEV Community

dev.to › ernestinem › normalize-nested-json-objects-with-pandas-1g7m

Normalize nested JSON objects with pandas - DEV Community

August 3, 2020 - FIELDS = ['list of keys I care about'] def clean_data(data): table = pd.DataFrame() for i in range(len(data) - 1): df = pd.json_normalize(data[i + 1]) df = df[FIELDS] table = table.append(df) return table

Medium

medium.com › @whyamit404 › what-is-pandas-json-normalize-and-why-use-it-50ae0cf2d12d

What is pandas.json_normalize and Why Use It? | by whyamit404 | Medium

February 26, 2025 - Dealing with JSON Files: Whether it’s a local JSON file or some web-scraped data, json_normalize makes it simple to convert into tabular format.

PyPI

pypi.org › project › json-normalize

json-normalize

June 29, 2021 - JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser

Find elsewhere

Google Bing Mojeek

GitHub

github.com › funnel-io › json-normalize

GitHub - funnel-io/json-normalize: A python package for normalizing (or flattening) a JSON-like structure into a list (genarator) or flat dicts. · GitHub

A python package for normalizing (or flattening) a JSON-like structure into a list (genarator) or flat dicts. - funnel-io/json-normalize

Starred by 28 users

Forked by 3 users

Languages Python 89.9% | Makefile 10.1%

reddit.com › r/learnpython › having difficulty building a dataframe with pandas from json data. are there norms i'm not following?

r/learnpython on Reddit: Having difficulty building a dataframe with pandas from json data. Are there norms I'm not following?

March 16, 2023 -

I had data in multiple spreadsheets that was a bit confusingly organized, so I wrote a custom script in python to pull everything into a dictionary, and then converted to a json file. I didn't really give any thought to the structure of my json file other than efficiency and the avoidance of redundancy. There are a bunch of nested dictionaries. Some high level keys should be their own column headings in a dataframe, but then there are some lower level keys whose values are other dictionaries that contain both the column headings and the values.

For example, if this was store data, with each data record its own store location, there is a key labelled "sales" whose value is a dictionary. In that dictionary there is a key labelled "products" whose value is a list of products ['soap', 'hammer', 'toys',...] and then there are also multiple keys for each year. For each year, the value is a list of numbers [34, 0, 28, ...] which correspond positionally to the list of products. What I would like eventually to have is a column for each product-year combination.

I have been trying to use the normalize function of pandas and the "meta" parameter but I find it very confusing and my code keeps failing, often on key errors. I'm trying to use bing chat for advice but it's very difficult.

So really what I'm asking is whether there are norms for building dictionaries or json files, in terms of structuring the keys and values and lists? Or does it not really matter at all how I structure it because I can use the "meta" method to restructure it when building the dataframe and my problem is just that I need a better understanding of that method?

Top answer

1 of 2

3

meta can be very confusing. I think one of the issues is that you're using "values" (id/year) as keys. e.g. if you had: >>> data [{'id': 1, 'square footage': 2000, 'sales': [{'product': 'hammers', 'year': '2016', 'sold': 23}, {'product': 'hammers', 'year': '2017', 'sold': 10}, {'product': 'screws', 'year': '2017', 'sold': 5}]}, {'id': 2, 'square footage': 4000, 'sales': [{'product': 'nails', 'year': '2020', 'sold': 200}]}] You could do: df = pd.DataFrame(data).explode("sales") sales = pd.json_normalize(df["sales"]).add_prefix("sales.") df[sales.columns] = sales df = df.drop(columns="sales") >>> df id square footage sales.product sales.year sales.sold 0 1 2000 hammers 2016 23 0 1 2000 hammers 2016 23 0 1 2000 hammers 2016 23 1 2 4000 hammers 2017 10 As for how your data is currently structured - maybe the following example is helpful: Let's look at what normalize does on your data: >>> pd.json_normalize(data) 1.square footage 1.sales.products 1.sales.2016 1.sales.2017 2.square footage 2.sales.products 2.sales.2016 2.sales.2017 0 2000 [hammers, screws] [23, 10] [10, 5] 1500 [hammers, screws] [7, 18] [11, 2] normalize looks for "records" - i.e. lists of dicts - you have a single dict which is parsed as a single record - hence 1 row. You could instead just create a regular dataframe from the dict: >>> df = pd.DataFrame(data.items(), columns=["store.id", "sales"]) >>> df store.id sales 0 1 {'square footage': 2000, 'sales': {'products':... 1 2 {'square footage': 1500, 'sales': {'products':... Then normalize the sales: >>> pd.json_normalize(df["sales"]) square footage sales.products sales.2016 sales.2017 0 2000 [hammers, screws] [23, 10] [10, 5] 1 1500 [hammers, screws] [7, 18] [11, 2] You can put this back into the dataframe, and drop the existing sales column: sales = pd.json_normalize(df["sales"]) df[sales.columns] = sales df = df.drop(columns="sales") >>> df store.id square footage sales.products sales.2016 sales.2017 0 1 2000 [hammers, screws] [23, 10] [10, 5] 1 2 1500 [hammers, screws] [7, 18] [11, 2] From here, it depends on what you want to do really. You could turn the year columns into rows with .wide_to_long >>> pd.wide_to_long(df, stubnames="sales.", i="store.id", j="year") square footage sales.products sales. store.id year 1 2016 2000 [hammers, screws] [23, 10] 2 2016 1500 [hammers, screws] [7, 18] 1 2017 2000 [hammers, screws] [10, 5] 2 2017 1500 [hammers, screws] [11, 2] We can give the sales. column a new name, and .explode() the lists: (pd.wide_to_long(df, stubnames="sales.", i="store.id", j="year") .rename(columns={"sales.": "sales.sold"}) .explode(["sales.products", "sales.sold"]) Which gives you: square footage sales.products sales.sold store.id year 1 2016 2000 hammers 23 2016 2000 screws 10 2 2016 1500 hammers 7 2016 1500 screws 18 1 2017 2000 hammers 10 2017 2000 screws 5 2 2017 1500 hammers 11 2017 1500 screws 2 If you reset the index - you will have: store.id year square footage sales.products sales.sold 0 1 2016 2000 hammers 23 1 1 2016 2000 screws 10 2 2 2016 1500 hammers 7 3 2 2016 1500 screws 18 4 1 2017 2000 hammers 10 5 1 2017 2000 screws 5 6 2 2017 1500 hammers 11 7 2 2017 1500 screws 2

2 of 2

2

Not a pro, but I guess it depends on the specific case. If I have some access to the file, it would be fun to play around with it. May initial guess is that if you turn some variables into dummy variables, it would simplify the issue with the nested dictionaries. https://www.geeksforgeeks.org/how-to-create-dummy-variables-in-python-with-pandas/ But I can't tell more without seeing the data.

Polars

docs.pola.rs › api › python › version › 1 › reference › api › polars.json_normalize.html

polars.json_normalize — Polars documentation

Normalize semi-structured deserialized JSON data into a flat table.

reddit.com › r/learnpython › pandas.dataframe.json_normalize optimization

r/learnpython on Reddit: Pandas.DataFrame.json_normalize optimization

December 9, 2022 -

I’ve been using the json_normalize function in pandas to read through a folder of json files and build a dataframe for the entire folder. It’s working today, but the issue is that it takes much longer than I was hoping it would, are there any optimizations that I can look into or even an alternative to pandas that could normalize json faster?

For context the json folder holds about 800mb of json files (each file ~2mb), and it takes roughly 14 minutes to parse through them all and build the dataframe

Is it also possible the slow piece is concatenating the dataframes together? How would I go about optimizing that?

Top answer

1 of 1

1

Is it also possible the slow piece is concatenating the dataframes together? How would I go about optimizing that? By showing us your code. I have a suspicion.

JSON Editor Online

jsoneditoronline.org

JSON Editor Online: edit JSON, format JSON, query JSON

JSON Editor Online is the original and most copied JSON Editor on the web. Use it to view, edit, format, repair, compare, query, transform, validate, and share your JSON data.

Pandas

pandas.pydata.org › docs › dev › reference › api › pandas.json_normalize.html

pandas.json_normalize — pandas documentation

Normalizes nested data up to level 1. >>> data = [ ... { ... "id": 1, ... "name": "Cole Volk", ... "fitness": {"height": 130, "weight": 60}, ... }, ... {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}}, ... { ... "id": 2, ... "name": "Faye Raker", ... "fitness": {"height": 130, ...

reddit.com › r/dataengineering › is there a difference between flattening or normalizing a json response and if so what is the value or use case?

r/dataengineering on Reddit: Is there a difference between flattening or normalizing a JSON response and if so what is the value or use case?

July 29, 2024 -

My understanding is that flattening a JSON response just lands the data all in one table, whereas normalizing will break the data into a main table and sub tables.

Top answer

1 of 6

11

Flattening the JSON is usually the first step as you unfold the nested data within and spread it out into a table. This usually results in several repeating field values, depending on the number of nesting. That's why normalizing it is the next step where you break down the table into smaller, related ones. You could also go straight to normalizing the JSON, if it's straightforward. We had one use case for this. We have a CSV data source and the schema keeps changing and breaking our pipelines. So we decided to ingest the data as JSON to accomodate added/deleted/renamed columns. It then undergoes a series of flattening and normalization to bring out the data needed. All the raw data gets persisted, and our pipelines don't break when there's a schema change. Problem solved.

2 of 6

3

It depends... In how the data looks, on how you need to use data later, on how new data streams in, on what your limitations are for e.g. storage costs. Usually you'd flatten it for a staging layer, and in a later layer model your data.

Stack Overflow

stackoverflow.com › questions › 47242845 › pandas-json-normalize-with-very-nested-json

python - pandas json_normalize with very nested json - Stack Overflow

Top answer

1 of 3

62

In the pandas example (below) what do the brackets mean? Is there a logic to be followed to go deeper with the []. [...]
result = json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])

Each string or list of strings in the ['state', 'shortname', ['info', 'governor']] value is a path to an element to include, in addition to the selected rows. The second argument json_normalize() argument (record_path, set to 'counties' in the documentation example) tells the function how to select elements from the input data structure that make up the rows in the output, and the meta paths adds further metadata that will be included with each of the rows. Think of these as table joins in a database, if you will.

The input for the US States documentation example has two dictionaries in a list, and both of these dictionaries have a counties key that references another list of dicts:

>>> data = [{'state': 'Florida',
...          'shortname': 'FL',
...         'info': {'governor': 'Rick Scott'},
...         'counties': [{'name': 'Dade', 'population': 12345},
...                      {'name': 'Broward', 'population': 40000},
...                      {'name': 'Palm Beach', 'population': 60000}]},
...         {'state': 'Ohio',
...          'shortname': 'OH',
...          'info': {'governor': 'John Kasich'},
...          'counties': [{'name': 'Summit', 'population': 1234},
...                       {'name': 'Cuyahoga', 'population': 1337}]}]
>>> pprint(data[0]['counties'])
[{'name': 'Dade', 'population': 12345},
 {'name': 'Broward', 'population': 40000},
 {'name': 'Palm Beach', 'population': 60000}]
>>> pprint(data[1]['counties'])
[{'name': 'Summit', 'population': 1234},
 {'name': 'Cuyahoga', 'population': 1337}]

Between them there are 5 rows of data to use in the output:

>>> json_normalize(data, 'counties')
         name  population
0        Dade       12345
1     Broward       40000
2  Palm Beach       60000
3      Summit        1234
4    Cuyahoga        1337

The meta argument then names some elements that live next to those counties lists, and those are then merged in separately. The values from the first data[0] dictionary for those meta elements are ('Florida', 'FL', 'Rick Scott'), respectively, and for data[1] the values are ('Ohio', 'OH', 'John Kasich'), so you see those values attached to the counties rows that came from the same top-level dictionary, repeated 3 and 2 times respectively:

>>> data[0]['state'], data[0]['shortname'], data[0]['info']['governor']
('Florida', 'FL', 'Rick Scott')
>>> data[1]['state'], data[1]['shortname'], data[1]['info']['governor']
('Ohio', 'OH', 'John Kasich')
>>> json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])
         name  population    state shortname info.governor
0        Dade       12345  Florida        FL    Rick Scott
1     Broward       40000  Florida        FL    Rick Scott
2  Palm Beach       60000  Florida        FL    Rick Scott
3      Summit        1234     Ohio        OH   John Kasich
4    Cuyahoga        1337     Ohio        OH   John Kasich

So, if you pass in a list for the meta argument, then each element in the list is a separate path, and each of those separate paths identifies data to add to the rows in the output.

In your example JSON, there are only a few nested lists to elevate with the first argument, like 'counties' did in the example. The only example in that datastructure is the nested 'authors' key; you'd have to extract each ['_source', 'authors'] path, after which you can add other keys from the parent object to augment those rows.

The second meta argument then pulls in the _id key from the outermost objects, followed by the nested ['_source', 'title'] and ['_source', 'journal'] nested paths.

The record_path argument takes the authors lists as the starting point, these look like:

>>> d['hits']['hits'][0]['_source']['authors']   # this value is None, and is skipped
>>> d['hits']['hits'][1]['_source']['authors']
[{'affiliations': ['Punjabi University'],
  'author_id': '780E3459',
  'author_name': 'munish puri'},
 {'affiliations': ['Punjabi University'],
  'author_id': '48D92C79',
  'author_name': 'rajesh dhaliwal'},
 {'affiliations': ['Punjabi University'],
  'author_id': '7D9BD37C',
  'author_name': 'r s singh'}]
>>> d['hits']['hits'][2]['_source']['authors']
[{'author_id': '7FF872BC',
  'author_name': 'barbara eileen ryan'}]
>>> # etc.

and so gives you the following rows:

>>> json_normalize(d['hits']['hits'], ['_source', 'authors'])
           affiliations author_id          author_name
0  [Punjabi University]  780E3459          munish puri
1  [Punjabi University]  48D92C79      rajesh dhaliwal
2  [Punjabi University]  7D9BD37C            r s singh
3                   NaN  7FF872BC  barbara eileen ryan
4                   NaN  0299B8E9     fraser j harbutt
5                   NaN  7DAB7B72   richard m freeland

and then we can use the third meta argument to add more columns like _id, _source.title and _source.journal, using ['_id', ['_source', 'journal'], ['_source', 'title']]:

>>> json_normalize(
...     data['hits']['hits'],
...     ['_source', 'authors'],
...     ['_id', ['_source', 'journal'], ['_source', 'title']]
... )
           affiliations author_id          author_name       _id   \
0  [Punjabi University]  780E3459          munish puri  7AF8EBC3  
1  [Punjabi University]  48D92C79      rajesh dhaliwal  7AF8EBC3
2  [Punjabi University]  7D9BD37C            r s singh  7AF8EBC3
3                   NaN  7FF872BC  barbara eileen ryan  7521A721
4                   NaN  0299B8E9     fraser j harbutt  7DAEB9A4
5                   NaN  7DAB7B72   richard m freeland  7B3236C5

                                     _source.journal
0  Journal of Industrial Microbiology & Biotechno...
1  Journal of Industrial Microbiology & Biotechno...
2  Journal of Industrial Microbiology & Biotechno...
3                     The American Historical Review
4                     The American Historical Review
5                     The American Historical Review

                                       _source.title  \
0  Development of a stable continuous flow immobi...
1  Development of a stable continuous flow immobi...
2  Development of a stable continuous flow immobi...
3  Feminism and the women's movement : dynamics o...
4  The iron curtain : Churchill, America, and the...
5  The Truman Doctrine and the origins of McCarth...

2 of 3

25

You can also have a look at the library flatten_json, which does not require you to write column hierarchies as in json_normalize:

from flatten_json import flatten

data = d['hits']['hits']
dict_flattened = (flatten(record, '.') for record in data)
df = pd.DataFrame(dict_flattened)
print(df)

See https://github.com/amirziai/flatten.

GeeksforGeeks

geeksforgeeks.org › python › python-pandas-flatten-nested-json

Python Pandas - Flatten nested JSON - GeeksforGeeks

December 10, 2025 - Converting JSON data into a Pandas DataFrame makes it easier to analyze, manipulate, and visualize. Pandas provides a built-in function- json_normalize(), which efficiently flattens simple to moderately nested JSON data into a flat tabular format.

AskPython

askpython.com › home › how to normalize semi-structured json data into a flat table?

How to Normalize semi-structured JSON data into a flat table? - AskPython

February 23, 2023 - We are going to see the usage of record_path to normalize specific columns. Let us create a nested JSON with the help of the key: value pair format and normalize it.

Stack Overflow

stackoverflow.com › questions › tagged › json-normalize

Newest 'json-normalize' Questions - Stack Overflow

I pasted the JSON structure and the python code to normalize that is giving the ERROR message GOAL: Normalize ...

JSON Formatter

jsonformatter.curiousconcept.com

JSON Formatter & Validator

The JSON Formatter & Validator beautifies and debugs JSON data with advanced formatting and validation algorithms.

Stackademic

blog.stackademic.com › flattening-evolving-json-in-databricks-without-an-explicit-schema-79a9e618aa2d

Flattening Evolving JSON in Databricks -Without an Explicit Schema | by Avinash Narala | Feb, 2026 | Stackademic

February 25, 2026 - One unified item JSON string — regardless of what shape it came in. Some collections return [{...}, {...}]. Others return ["abc", "xyz"]. We normalize everything into Array<Map<string,string>>:

Note.nkmk.me

note.nkmk.me › home › python › pandas

pandas: Convert a list of dictionaries to DataFrame with json_normalize | note.nkmk.me

March 16, 2023 - You can convert a list of dictionaries with shared keys to pandas.DataFrame with pandas.json_normalize().