json_normalize(
   ds, 
   record_path=['subGroups', 'people'], 
   meta=[
           'name', 
           ['subGroups', 'subGroup']   # each meta field needs its own path
   ], 
   errors='ignore'
)

  firstname    name  subGroups.subGroup
0      Tony  groupa                   1
1     Brian  groupa                   1
2      Tony  groupb                   1
3     Brian  groupb                   1
Answer from cs95 on Stack Overflow
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.json_normalize.html
pandas.json_normalize — pandas 3.0.2 documentation
pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[source]#
🌐
GitHub
github.com › pandas-dev › pandas › issues › 27220
json_normalize does not handle nested meta paths when also using a nested record_path · Issue #27220 · pandas-dev/pandas
July 3, 2019 - # using the same data from before print(json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])) name population state shortname info.governor 0 Dade 12345 Florida FL Rick Scott 1 Broward 40000 Florida FL Rick Scott 2 Palm Beach 60000 Florida FL Rick Scott 3 Summit 1234 Ohio OH John Kasich 4 Cuyahoga 1337 Ohio OH John Kasich · The result is that it is much more difficult to access nested meta data.
Author   connormcmk
Top answer
1 of 3
62

In the pandas example (below) what do the brackets mean? Is there a logic to be followed to go deeper with the []. [...]

result = json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])

Each string or list of strings in the ['state', 'shortname', ['info', 'governor']] value is a path to an element to include, in addition to the selected rows. The second argument json_normalize() argument (record_path, set to 'counties' in the documentation example) tells the function how to select elements from the input data structure that make up the rows in the output, and the meta paths adds further metadata that will be included with each of the rows. Think of these as table joins in a database, if you will.

The input for the US States documentation example has two dictionaries in a list, and both of these dictionaries have a counties key that references another list of dicts:

>>> data = [{'state': 'Florida',
...          'shortname': 'FL',
...         'info': {'governor': 'Rick Scott'},
...         'counties': [{'name': 'Dade', 'population': 12345},
...                      {'name': 'Broward', 'population': 40000},
...                      {'name': 'Palm Beach', 'population': 60000}]},
...         {'state': 'Ohio',
...          'shortname': 'OH',
...          'info': {'governor': 'John Kasich'},
...          'counties': [{'name': 'Summit', 'population': 1234},
...                       {'name': 'Cuyahoga', 'population': 1337}]}]
>>> pprint(data[0]['counties'])
[{'name': 'Dade', 'population': 12345},
 {'name': 'Broward', 'population': 40000},
 {'name': 'Palm Beach', 'population': 60000}]
>>> pprint(data[1]['counties'])
[{'name': 'Summit', 'population': 1234},
 {'name': 'Cuyahoga', 'population': 1337}]

Between them there are 5 rows of data to use in the output:

>>> json_normalize(data, 'counties')
         name  population
0        Dade       12345
1     Broward       40000
2  Palm Beach       60000
3      Summit        1234
4    Cuyahoga        1337

The meta argument then names some elements that live next to those counties lists, and those are then merged in separately. The values from the first data[0] dictionary for those meta elements are ('Florida', 'FL', 'Rick Scott'), respectively, and for data[1] the values are ('Ohio', 'OH', 'John Kasich'), so you see those values attached to the counties rows that came from the same top-level dictionary, repeated 3 and 2 times respectively:

>>> data[0]['state'], data[0]['shortname'], data[0]['info']['governor']
('Florida', 'FL', 'Rick Scott')
>>> data[1]['state'], data[1]['shortname'], data[1]['info']['governor']
('Ohio', 'OH', 'John Kasich')
>>> json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])
         name  population    state shortname info.governor
0        Dade       12345  Florida        FL    Rick Scott
1     Broward       40000  Florida        FL    Rick Scott
2  Palm Beach       60000  Florida        FL    Rick Scott
3      Summit        1234     Ohio        OH   John Kasich
4    Cuyahoga        1337     Ohio        OH   John Kasich

So, if you pass in a list for the meta argument, then each element in the list is a separate path, and each of those separate paths identifies data to add to the rows in the output.

In your example JSON, there are only a few nested lists to elevate with the first argument, like 'counties' did in the example. The only example in that datastructure is the nested 'authors' key; you'd have to extract each ['_source', 'authors'] path, after which you can add other keys from the parent object to augment those rows.

The second meta argument then pulls in the _id key from the outermost objects, followed by the nested ['_source', 'title'] and ['_source', 'journal'] nested paths.

The record_path argument takes the authors lists as the starting point, these look like:

>>> d['hits']['hits'][0]['_source']['authors']   # this value is None, and is skipped
>>> d['hits']['hits'][1]['_source']['authors']
[{'affiliations': ['Punjabi University'],
  'author_id': '780E3459',
  'author_name': 'munish puri'},
 {'affiliations': ['Punjabi University'],
  'author_id': '48D92C79',
  'author_name': 'rajesh dhaliwal'},
 {'affiliations': ['Punjabi University'],
  'author_id': '7D9BD37C',
  'author_name': 'r s singh'}]
>>> d['hits']['hits'][2]['_source']['authors']
[{'author_id': '7FF872BC',
  'author_name': 'barbara eileen ryan'}]
>>> # etc.

and so gives you the following rows:

>>> json_normalize(d['hits']['hits'], ['_source', 'authors'])
           affiliations author_id          author_name
0  [Punjabi University]  780E3459          munish puri
1  [Punjabi University]  48D92C79      rajesh dhaliwal
2  [Punjabi University]  7D9BD37C            r s singh
3                   NaN  7FF872BC  barbara eileen ryan
4                   NaN  0299B8E9     fraser j harbutt
5                   NaN  7DAB7B72   richard m freeland

and then we can use the third meta argument to add more columns like _id, _source.title and _source.journal, using ['_id', ['_source', 'journal'], ['_source', 'title']]:

>>> json_normalize(
...     data['hits']['hits'],
...     ['_source', 'authors'],
...     ['_id', ['_source', 'journal'], ['_source', 'title']]
... )
           affiliations author_id          author_name       _id   \
0  [Punjabi University]  780E3459          munish puri  7AF8EBC3  
1  [Punjabi University]  48D92C79      rajesh dhaliwal  7AF8EBC3
2  [Punjabi University]  7D9BD37C            r s singh  7AF8EBC3
3                   NaN  7FF872BC  barbara eileen ryan  7521A721
4                   NaN  0299B8E9     fraser j harbutt  7DAEB9A4
5                   NaN  7DAB7B72   richard m freeland  7B3236C5

                                     _source.journal
0  Journal of Industrial Microbiology & Biotechno...
1  Journal of Industrial Microbiology & Biotechno...
2  Journal of Industrial Microbiology & Biotechno...
3                     The American Historical Review
4                     The American Historical Review
5                     The American Historical Review

                                       _source.title  \
0  Development of a stable continuous flow immobi...
1  Development of a stable continuous flow immobi...
2  Development of a stable continuous flow immobi...
3  Feminism and the women's movement : dynamics o...
4  The iron curtain : Churchill, America, and the...
5  The Truman Doctrine and the origins of McCarth...
2 of 3
25

You can also have a look at the library flatten_json, which does not require you to write column hierarchies as in json_normalize:

from flatten_json import flatten

data = d['hits']['hits']
dict_flattened = (flatten(record, '.') for record in data)
df = pd.DataFrame(dict_flattened)
print(df)

See https://github.com/amirziai/flatten.

🌐
Pandas
pandas.pydata.org › pandas-docs › version › 1.1 › reference › api › pandas.json_normalize.html
pandas.json_normalize — pandas 1.1.5 documentation
pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[source]¶
🌐
Pandas
pandas.pydata.org › docs › dev › reference › api › pandas.json_normalize.html
pandas.json_normalize — pandas documentation - PyData |
pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[source]#
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 0.22.0 › generated › pandas.io.json.json_normalize.html
pandas.io.json.json_normalize — pandas 0.22.0 documentation
pandas.io.json.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.')[source]¶
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 0.17.0 › generated › pandas.io.json.json_normalize.html
pandas.io.json.json_normalize — pandas 0.17.0 documentation
pandas.io.json.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None)¶ · “Normalize” semi-structured JSON data into a flat table · Examples · >>> data = [{'state': 'Florida', ... 'shortname': 'FL', ... 'info': { ... 'governor': 'Rick Scott' ...
Find elsewhere
🌐
Beautiful Soup
tedboy.github.io › pandas › generated › pandas.io.json.json_normalize.html
pandas.io.json.json_normalize — Pandas Doc
pandas.io.json.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None)[source] “Normalize” semi-structured JSON data into a flat table · Examples · >>> data = [{'state': 'Florida', ... 'shortname': 'FL', ... 'info': { ... 'governor': 'Rick Scott' ...
🌐
GitHub
github.com › pandas-dev › pandas › issues › 31507
json_normalize in 1.0.0 with meta path specified - expects iterable · Issue #31507 · pandas-dev/pandas
February 1, 2020 - import json from pandas.io.json import json_normalize the_json = """ [{"id": 99, "data": [{"one": 1, "two": 2}] }] """ print(json_normalize(json.loads(the_json), record_path=['data'], meta=['id']))
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 1.2.0 › reference › api › pandas.json_normalize.html
pandas.json_normalize — pandas 1.2.0 documentation
pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[source]¶
🌐
pandas
pandas.pydata.org › pandas-docs › dev › reference › api › pandas.json_normalize.html
pandas.json_normalize — pandas 3.0.0rc0+34.g04a554c9f1 documentation
pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[source]#
🌐
Note.nkmk.me
note.nkmk.me › home › python › pandas
pandas: Convert a list of dictionaries to DataFrame with json_normalize | note.nkmk.me
March 16, 2023 - source: pandas_json_normalize.py · If you want to convert other keys' values, specify them with the meta argument. You can add a prefix to the column names with the meta_prefix argument.
🌐
W3cubDocs
docs.w3cub.com › pandas~0.25 › reference › api › pandas.io.json.json_normalize
pandas.io.json.json_normalize() - Pandas 0.25 - W3cubDocs
pandas.io.json.json_normalize(data: Union[Dict, List[Dict]], record_path: Union[str, List, NoneType] = None, meta: Union[str, List, NoneType] = None, meta_prefix: Union[str, NoneType] = None, record_prefix: Union[str, NoneType] = None, errors: Union[str, NoneType] = 'raise', sep: str = '.', max_level: Union[int, NoneType] = None) [source]
🌐
DataCamp
campus.datacamp.com › courses › streamlined-data-ingestion-with-pandas › importing-json-data-and-working-with-apis
Working with nested JSONs | Python
Nested data can be flattened by passing their record paths as sub-lists. To make clear what came from where and avoid duplicate column names, specify a meta prefix. Let's see this in action with categories. We pass the businesses data to JSON normalize, and specify the separator.
🌐
GitHub
github.com › pandas-dev › pandas › blob › main › pandas › io › json › _normalize.py
pandas/pandas/io/json/_normalize.py at main · pandas-dev/pandas
" f"Found ... ".", max_level: int | None = None, ) -> DataFrame: """ Normalize semi-structured JSON data into a flat table....
Author   pandas-dev
🌐
GeeksforGeeks
geeksforgeeks.org › python-pandas-flatten-nested-json
Python Pandas - Flatten nested JSON - GeeksforGeeks
December 5, 2023 - import pandas as pd data = [ { "id": 1, "candidate": "Roberto mathews", "health_index": {"bmi": 22, "blood_pressure": 130}, }, {"candidate": "Shane wade", "health_index": {"bmi": 28, "blood_pressure": 160}}, { "id": 2, "candidate": "Bruce tommy", "health_index": {"bmi": 31, "blood_pressure": 190}, }, ] pd.json_normalize(data, max_level=1) ... Here, unlike the before example we flatten the first two rows of the JSON which gives a complete structure for the flat table. id candidate health_index.bmi health_index.blood_pressure 0 1 Roberto mathews 22 130 1 NaN Shane wade 28 160 2 2 Bruce tommy 31 190 · Finally, let us consider a deeply nested JSON structure that can be converted to a flat table by passing the meta arguments to the json_normalize function as shown below.
🌐
Runebook.dev
runebook.dev › en › docs › pandas › reference › api › pandas.json_normalize
Common Issues and Solutions for pandas.json_normalize
The correct way to handle this ... of dictionaries you want to normalize. You can also use the meta parameter to bring along other top-level data, like the order_id....
🌐
Pythonmana
pythonmana.com › 2021 › 08 › 20210809143233849o.html
You must know the function of pandas to parse JSON data - JSON_ normalize() - Python知识
September 8, 2021 - For metadata meta-> Prefix , Will be nested key Change the separator between to ->, The output is : · adopt URL Getting data requires requests library , Please install the corresponding library by yourself . import requests from pandas import json_normalize # Through the weather API, Get Shenzhen near 7 Days of the weather url = 'https://tianqiapi.com/free/week' # Pass in url, And set the corresponding params r = requests.get(url, params={"appid":"59257444", "appsecret":"uULlTGV9 ", 'city':' Shenzhen '}) # Convert the obtained value to json object result = r.json() df = json_normalize(result, meta=['city', 'cityid', 'update_time'], record_path=['data']) df Copy code
🌐
ActiveState
activestate.com › home › blog › the ten most important pandas functions, and how to work with them
The ten most important Pandas functions, and how to work with them
May 20, 2021 - df = pd.json_normalize(jlist['pokemon'], record_path='next_evolution', meta=['id','name'], errors='ignore') df.head() --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-20-b9ae9df69b00> in <module> ----> 1 df = pd.json_normalize(jlist['pokemon'], record_path='next_evolution', meta=['id','name'], errors='ignore') 2 df.head() ~/.virtualenvs/fixate/lib/python3.8/site-packages/pandas/io/json/_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level) 334 records.exte