Brave Search

Getting meta values from multiple level with json_normalize

stackoverflow.com › questions › 49411190 › getting-meta-values-from-multiple-level-with-json-normalize

json_normalize(
   ds, 
   record_path=['subGroups', 'people'], 
   meta=[
           'name', 
           ['subGroups', 'subGroup']   # each meta field needs its own path
   ], 
   errors='ignore'
)

  firstname    name  subGroups.subGroup
0      Tony  groupa                   1
1     Brian  groupa                   1
2      Tony  groupb                   1
3     Brian  groupb                   1

Answer from cs95 on Stack Overflow

Pandas

pandas.pydata.org › docs › reference › api › pandas.json_normalize.html

pandas.json_normalize — pandas 3.0.2 documentation

pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[source]#

Stack Overflow

stackoverflow.com › questions › 49411190 › getting-meta-values-from-multiple-level-with-json-normalize

python - Getting meta values from multiple level with json_normalize - Stack Overflow

Videos

youtube.com

Normalize JSON Dataset With pandas - YouTube

February 13, 2023

23:31

YouTube

How to use the json_normalize Function in Pandas - YouTube

July 25, 2022

09:07

YouTube

JSON PARSING EXAMPLE | PYTHON | PANDAS EXPLODE | JSON NORMALIZE ...

July 11, 2022

View all

GitHub

github.com › pandas-dev › pandas › issues › 27220

json_normalize does not handle nested meta paths when also using a nested record_path · Issue #27220 · pandas-dev/pandas

July 3, 2019 - # using the same data from before print(json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])) name population state shortname info.governor 0 Dade 12345 Florida FL Rick Scott 1 Broward 40000 Florida FL Rick Scott 2 Palm Beach 60000 Florida FL Rick Scott 3 Summit 1234 Ohio OH John Kasich 4 Cuyahoga 1337 Ohio OH John Kasich · The result is that it is much more difficult to access nested meta data.

Author connormcmk

Stack Overflow

stackoverflow.com › questions › 47242845 › pandas-json-normalize-with-very-nested-json

python - pandas json_normalize with very nested json - Stack Overflow

Top answer

1 of 3

In the pandas example (below) what do the brackets mean? Is there a logic to be followed to go deeper with the []. [...]
result = json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])

Each string or list of strings in the ['state', 'shortname', ['info', 'governor']] value is a path to an element to include, in addition to the selected rows. The second argument json_normalize() argument (record_path, set to 'counties' in the documentation example) tells the function how to select elements from the input data structure that make up the rows in the output, and the meta paths adds further metadata that will be included with each of the rows. Think of these as table joins in a database, if you will.

The input for the US States documentation example has two dictionaries in a list, and both of these dictionaries have a counties key that references another list of dicts:

>>> data = [{'state': 'Florida',
...          'shortname': 'FL',
...         'info': {'governor': 'Rick Scott'},
...         'counties': [{'name': 'Dade', 'population': 12345},
...                      {'name': 'Broward', 'population': 40000},
...                      {'name': 'Palm Beach', 'population': 60000}]},
...         {'state': 'Ohio',
...          'shortname': 'OH',
...          'info': {'governor': 'John Kasich'},
...          'counties': [{'name': 'Summit', 'population': 1234},
...                       {'name': 'Cuyahoga', 'population': 1337}]}]
>>> pprint(data[0]['counties'])
[{'name': 'Dade', 'population': 12345},
 {'name': 'Broward', 'population': 40000},
 {'name': 'Palm Beach', 'population': 60000}]
>>> pprint(data[1]['counties'])
[{'name': 'Summit', 'population': 1234},
 {'name': 'Cuyahoga', 'population': 1337}]

Between them there are 5 rows of data to use in the output:

>>> json_normalize(data, 'counties')
         name  population
0        Dade       12345
1     Broward       40000
2  Palm Beach       60000
3      Summit        1234
4    Cuyahoga        1337

The meta argument then names some elements that live next to those counties lists, and those are then merged in separately. The values from the first data[0] dictionary for those meta elements are ('Florida', 'FL', 'Rick Scott'), respectively, and for data[1] the values are ('Ohio', 'OH', 'John Kasich'), so you see those values attached to the counties rows that came from the same top-level dictionary, repeated 3 and 2 times respectively:

>>> data[0]['state'], data[0]['shortname'], data[0]['info']['governor']
('Florida', 'FL', 'Rick Scott')
>>> data[1]['state'], data[1]['shortname'], data[1]['info']['governor']
('Ohio', 'OH', 'John Kasich')
>>> json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])
         name  population    state shortname info.governor
0        Dade       12345  Florida        FL    Rick Scott
1     Broward       40000  Florida        FL    Rick Scott
2  Palm Beach       60000  Florida        FL    Rick Scott
3      Summit        1234     Ohio        OH   John Kasich
4    Cuyahoga        1337     Ohio        OH   John Kasich

So, if you pass in a list for the meta argument, then each element in the list is a separate path, and each of those separate paths identifies data to add to the rows in the output.

In your example JSON, there are only a few nested lists to elevate with the first argument, like 'counties' did in the example. The only example in that datastructure is the nested 'authors' key; you'd have to extract each ['_source', 'authors'] path, after which you can add other keys from the parent object to augment those rows.

The second meta argument then pulls in the _id key from the outermost objects, followed by the nested ['_source', 'title'] and ['_source', 'journal'] nested paths.

The record_path argument takes the authors lists as the starting point, these look like:

>>> d['hits']['hits'][0]['_source']['authors']   # this value is None, and is skipped
>>> d['hits']['hits'][1]['_source']['authors']
[{'affiliations': ['Punjabi University'],
  'author_id': '780E3459',
  'author_name': 'munish puri'},
 {'affiliations': ['Punjabi University'],
  'author_id': '48D92C79',
  'author_name': 'rajesh dhaliwal'},
 {'affiliations': ['Punjabi University'],
  'author_id': '7D9BD37C',
  'author_name': 'r s singh'}]
>>> d['hits']['hits'][2]['_source']['authors']
[{'author_id': '7FF872BC',
  'author_name': 'barbara eileen ryan'}]
>>> # etc.

and so gives you the following rows:

>>> json_normalize(d['hits']['hits'], ['_source', 'authors'])
           affiliations author_id          author_name
0  [Punjabi University]  780E3459          munish puri
1  [Punjabi University]  48D92C79      rajesh dhaliwal
2  [Punjabi University]  7D9BD37C            r s singh
3                   NaN  7FF872BC  barbara eileen ryan
4                   NaN  0299B8E9     fraser j harbutt
5                   NaN  7DAB7B72   richard m freeland

and then we can use the third meta argument to add more columns like _id, _source.title and _source.journal, using ['_id', ['_source', 'journal'], ['_source', 'title']]:

>>> json_normalize(
...     data['hits']['hits'],
...     ['_source', 'authors'],
...     ['_id', ['_source', 'journal'], ['_source', 'title']]
... )
           affiliations author_id          author_name       _id   \
0  [Punjabi University]  780E3459          munish puri  7AF8EBC3  
1  [Punjabi University]  48D92C79      rajesh dhaliwal  7AF8EBC3
2  [Punjabi University]  7D9BD37C            r s singh  7AF8EBC3
3                   NaN  7FF872BC  barbara eileen ryan  7521A721
4                   NaN  0299B8E9     fraser j harbutt  7DAEB9A4
5                   NaN  7DAB7B72   richard m freeland  7B3236C5

                                     _source.journal
0  Journal of Industrial Microbiology & Biotechno...
1  Journal of Industrial Microbiology & Biotechno...
2  Journal of Industrial Microbiology & Biotechno...
3                     The American Historical Review
4                     The American Historical Review
5                     The American Historical Review

                                       _source.title  \
0  Development of a stable continuous flow immobi...
1  Development of a stable continuous flow immobi...
2  Development of a stable continuous flow immobi...
3  Feminism and the women's movement : dynamics o...
4  The iron curtain : Churchill, America, and the...
5  The Truman Doctrine and the origins of McCarth...

2 of 3

You can also have a look at the library flatten_json, which does not require you to write column hierarchies as in json_normalize:

from flatten_json import flatten

data = d['hits']['hits']
dict_flattened = (flatten(record, '.') for record in data)
df = pd.DataFrame(dict_flattened)
print(df)

See https://github.com/amirziai/flatten.

Pandas

pandas.pydata.org › pandas-docs › version › 1.1 › reference › api › pandas.json_normalize.html

pandas.json_normalize — pandas 1.1.5 documentation

pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[source]¶

Pandas

pandas.pydata.org › docs › dev › reference › api › pandas.json_normalize.html

pandas.json_normalize — pandas documentation - PyData |

pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[source]#

Pandas

pandas.pydata.org › pandas-docs › version › 0.22.0 › generated › pandas.io.json.json_normalize.html

pandas.io.json.json_normalize — pandas 0.22.0 documentation

pandas.io.json.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.')[source]¶

Pandas

pandas.pydata.org › pandas-docs › version › 0.17.0 › generated › pandas.io.json.json_normalize.html

pandas.io.json.json_normalize — pandas 0.17.0 documentation

pandas.io.json.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None)¶ · “Normalize” semi-structured JSON data into a flat table · Examples · >>> data = [{'state': 'Florida', ... 'shortname': 'FL', ... 'info': { ... 'governor': 'Rick Scott' ...

Find elsewhere

Google Bing Mojeek

Beautiful Soup

tedboy.github.io › pandas › generated › pandas.io.json.json_normalize.html

pandas.io.json.json_normalize — Pandas Doc

pandas.io.json.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None)[source] “Normalize” semi-structured JSON data into a flat table · Examples · >>> data = [{'state': 'Florida', ... 'shortname': 'FL', ... 'info': { ... 'governor': 'Rick Scott' ...

GitHub

github.com › pandas-dev › pandas › issues › 31507

json_normalize in 1.0.0 with meta path specified - expects iterable · Issue #31507 · pandas-dev/pandas

February 1, 2020 - import json from pandas.io.json import json_normalize the_json = """ [{"id": 99, "data": [{"one": 1, "two": 2}] }] """ print(json_normalize(json.loads(the_json), record_path=['data'], meta=['id']))

Pandas

pandas.pydata.org › pandas-docs › version › 1.2.0 › reference › api › pandas.json_normalize.html

pandas.json_normalize — pandas 1.2.0 documentation

pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[source]¶

pandas

pandas.pydata.org › pandas-docs › dev › reference › api › pandas.json_normalize.html

pandas.json_normalize — pandas 3.0.0rc0+34.g04a554c9f1 documentation

pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[source]#

Note.nkmk.me

note.nkmk.me › home › python › pandas

pandas: Convert a list of dictionaries to DataFrame with json_normalize | note.nkmk.me

March 16, 2023 - source: pandas_json_normalize.py · If you want to convert other keys' values, specify them with the meta argument. You can add a prefix to the column names with the meta_prefix argument.

W3cubDocs

docs.w3cub.com › pandas~0.25 › reference › api › pandas.io.json.json_normalize

pandas.io.json.json_normalize() - Pandas 0.25 - W3cubDocs

pandas.io.json.json_normalize(data: Union[Dict, List[Dict]], record_path: Union[str, List, NoneType] = None, meta: Union[str, List, NoneType] = None, meta_prefix: Union[str, NoneType] = None, record_prefix: Union[str, NoneType] = None, errors: Union[str, NoneType] = 'raise', sep: str = '.', max_level: Union[int, NoneType] = None) [source]

DataCamp

campus.datacamp.com › courses › streamlined-data-ingestion-with-pandas › importing-json-data-and-working-with-apis

Working with nested JSONs | Python

Nested data can be flattened by passing their record paths as sub-lists. To make clear what came from where and avoid duplicate column names, specify a meta prefix. Let's see this in action with categories. We pass the businesses data to JSON normalize, and specify the separator.

GitHub

github.com › pandas-dev › pandas › blob › main › pandas › io › json › _normalize.py

pandas/pandas/io/json/_normalize.py at main · pandas-dev/pandas

" f"Found ... ".", max_level: int | None = None, ) -> DataFrame: """ Normalize semi-structured JSON data into a flat table....

Author pandas-dev

GeeksforGeeks

geeksforgeeks.org › python-pandas-flatten-nested-json

Python Pandas - Flatten nested JSON - GeeksforGeeks

December 5, 2023 - import pandas as pd data = [ { "id": 1, "candidate": "Roberto mathews", "health_index": {"bmi": 22, "blood_pressure": 130}, }, {"candidate": "Shane wade", "health_index": {"bmi": 28, "blood_pressure": 160}}, { "id": 2, "candidate": "Bruce tommy", "health_index": {"bmi": 31, "blood_pressure": 190}, }, ] pd.json_normalize(data, max_level=1) ... Here, unlike the before example we flatten the first two rows of the JSON which gives a complete structure for the flat table. id candidate health_index.bmi health_index.blood_pressure 0 1 Roberto mathews 22 130 1 NaN Shane wade 28 160 2 2 Bruce tommy 31 190 · Finally, let us consider a deeply nested JSON structure that can be converted to a flat table by passing the meta arguments to the json_normalize function as shown below.

Runebook.dev

runebook.dev › en › docs › pandas › reference › api › pandas.json_normalize

Common Issues and Solutions for pandas.json_normalize

The correct way to handle this ... of dictionaries you want to normalize. You can also use the meta parameter to bring along other top-level data, like the order_id....

Pythonmana

pythonmana.com › 2021 › 08 › 20210809143233849o.html

You must know the function of pandas to parse JSON data - JSON_ normalize() - Python知识

September 8, 2021 - For metadata meta-> Prefix , Will be nested key Change the separator between to ->, The output is ： · adopt URL Getting data requires requests library , Please install the corresponding library by yourself . import requests from pandas import json_normalize # Through the weather API, Get Shenzhen near 7 Days of the weather url = 'https://tianqiapi.com/free/week' # Pass in url, And set the corresponding params r = requests.get(url, params={"appid":"59257444", "appsecret":"uULlTGV9 ", 'city':' Shenzhen '}) # Convert the obtained value to json object result = r.json() df = json_normalize(result, meta=['city', 'cityid', 'update_time'], record_path=['data']) df Copy code

ActiveState

activestate.com › home › blog › the ten most important pandas functions, and how to work with them

The ten most important Pandas functions, and how to work with them

May 20, 2021 - df = pd.json_normalize(jlist['pokemon'], record_path='next_evolution', meta=['id','name'], errors='ignore') df.head() --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-20-b9ae9df69b00> in <module> ----> 1 df = pd.json_normalize(jlist['pokemon'], record_path='next_evolution', meta=['id','name'], errors='ignore') 2 df.head() ~/.virtualenvs/fixate/lib/python3.8/site-packages/pandas/io/json/_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level) 334 records.exte