We can use DataFrame.from_dict and orient='index':
v = {
"assetPropertyValues": {
"A001234": {
"PV": 1.2345
},
"A001235": {
"PV": 1.234678
},
"A001236": {
"PV": 1.234678
}
}
}
df = pd.DataFrame.from_dict(v['assetPropertyValues'], orient='index')
Or when reading from a file with json.load:
import json
import pandas as pd
with open('source.json') as f:
df = pd.DataFrame.from_dict(
json.load(f)['assetPropertyValues'],
orient='index'
)
df:
PV
A001234 1.234500
A001235 1.234678
A001236 1.234678
Answer from Henry Ecker on Stack OverflowPython Nested JSON Import - Convert to table - Stack Overflow
python - Convert nested JSON to CSV or table - Stack Overflow
python - Converting embedded JSON into flat table - Code Review Stack Exchange
Better way to parse insanely complex nested json data
Videos
- standard techniques for dealing with nested json
json_normalize()explode()apply(pd.Series)
- finally some cleanup, drop unwanted rows and replace
nanwith empty string
import json
js = """{"menu": {
"header": "SVG Viewer",
"items": [
{"id": "Open"},
{"id": "OpenNew", "label": "Open New"},
null,
{"id": "ZoomIn", "label": "Zoom In"},
{"id": "ZoomOut", "label": "Zoom Out"},
{"id": "OriginalView", "label": "Original View"},
null,
{"id": "Quality"},
{"id": "Pause"},
{"id": "Mute"},
null,
{"id": "Find", "label": "Find..."},
{"id": "FindAgain", "label": "Find Again"},
{"id": "Copy"},
{"id": "CopyAgain", "label": "Copy Again"},
{"id": "CopySVG", "label": "Copy SVG"},
{"id": "ViewSVG", "label": "View SVG"},
{"id": "ViewSource", "label": "View Source"},
{"id": "SaveAs", "label": "Save As"},
null,
{"id": "Help"},
{"id": "About", "label": "About Adobe CVG Viewer..."}
]
}}"""
df = pd.json_normalize(json.loads(js)).explode("menu.items").reset_index(drop=True)
df.drop(columns=["menu.items"]).join(df["menu.items"].apply(pd.Series)).dropna(subset=["id"]).fillna("")
| menu.header | id | label | |
|---|---|---|---|
| 0 | SVG Viewer | Open | |
| 1 | SVG Viewer | OpenNew | Open New |
| 3 | SVG Viewer | ZoomIn | Zoom In |
| 4 | SVG Viewer | ZoomOut | Zoom Out |
| 5 | SVG Viewer | OriginalView | Original View |
| 7 | SVG Viewer | Quality | |
| 8 | SVG Viewer | Pause | |
| 9 | SVG Viewer | Mute | |
| 11 | SVG Viewer | Find | Find... |
| 12 | SVG Viewer | FindAgain | Find Again |
| 13 | SVG Viewer | Copy | |
| 14 | SVG Viewer | CopyAgain | Copy Again |
| 15 | SVG Viewer | CopySVG | Copy SVG |
| 16 | SVG Viewer | ViewSVG | View SVG |
| 17 | SVG Viewer | ViewSource | View Source |
| 18 | SVG Viewer | SaveAs | Save As |
| 20 | SVG Viewer | Help | |
| 21 | SVG Viewer | About | About Adobe CVG Viewer... |
utility function
- if you don't want to name columns, but take first list column
- identify first column that contains lists
explode()andapply(pd.Series)to that column- provided option to expand all lists
def normalize(js, expand_all=False):
df = pd.json_normalize(json.loads(js) if type(js)==str else js)
# get first column that contains lists
col = df.applymap(type).astype(str).eq("<class 'list'>").all().idxmax()
# explode list and expand embedded dictionaries
df = df.explode(col).reset_index(drop=True)
df = df.drop(columns=[col]).join(df[col].apply(pd.Series), rsuffix=f".{col}")
# any lists left?
if expand_all and df.applymap(type).astype(str).eq("<class 'list'>").any(axis=1).all():
df = normalize(df.to_dict("records"))
return df
js = """{ "id": "0001", "type": "donut", "name": "Cake", "ppu": 0.55, "batters": { "batter": [ { "id": "1001", "type": "Regular" }, { "id": "1002", "type": "Chocolate" }, { "id": "1003", "type": "Blueberry" }, { "id": "1004", "type": "Devil's Food" } ] }, "topping": [ { "id": "5001", "type": "None" }, { "id": "5002", "type": "Glazed" }, { "id": "5005", "type": "Sugar" } ] }"""
normalize(js, expand_all=True)
| id | type | name | ppu | id.topping | type.topping | id.batters.batter | type.batters.batter | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0001 | donut | Cake | 0.55 | 5001 | None | 1001 | Regular |
| 1 | 0001 | donut | Cake | 0.55 | 5001 | None | 1002 | Chocolate |
| 2 | 0001 | donut | Cake | 0.55 | 5001 | None | 1003 | Blueberry |
| 3 | 0001 | donut | Cake | 0.55 | 5001 | None | 1004 | Devil's Food |
| 4 | 0001 | donut | Cake | 0.55 | 5002 | Glazed | 1001 | Regular |
| 5 | 0001 | donut | Cake | 0.55 | 5002 | Glazed | 1002 | Chocolate |
| 6 | 0001 | donut | Cake | 0.55 | 5002 | Glazed | 1003 | Blueberry |
| 7 | 0001 | donut | Cake | 0.55 | 5002 | Glazed | 1004 | Devil's Food |
| 8 | 0001 | donut | Cake | 0.55 | 5005 | Sugar | 1001 | Regular |
| 9 | 0001 | donut | Cake | 0.55 | 5005 | Sugar | 1002 | Chocolate |
| 10 | 0001 | donut | Cake | 0.55 | 5005 | Sugar | 1003 | Blueberry |
| 11 | 0001 | donut | Cake | 0.55 | 5005 | Sugar | 1004 | Devil's Food |
consider each list independent
- copy way this works https://data.page/json/csv
- this is a limited use case, it does not honor general data modelling principles
def n2(js):
df = pd.json_normalize(json.loads(js))
# columns that contain lists
cols = [i for i, c in df.applymap(type).astype(str).eq("<class 'list'>").all().iteritems() if c]
# use list from first row
return pd.concat(
[df.drop(columns=cols)]
+ [pd.json_normalize(df.loc[0, c]).pipe(lambda d: d.rename(columns={c2: f"{c}.{c2}" for c2 in d.columns}))
for c in cols],
axis=1,
).fillna("")
In python you can use pandas to do this but It will repeat the header values for each line like below
code
output
Hi all,
Long time programmer here, but new to Python. This will be a long and, I think, complicated issue, so appreciate anyone who reads through it all and has any suggestions. I've looked up different ways to pull this data and don't seem to be making any progress. I'm sure there's a much better way.
I'm writing a program that will connect to our library to pull a list of everything we have checked out and I want to output a sorted list by due date and whether it has holds or not. I've got the code working to log in and pull a json data structure, but I cannot get it to export the data in the correct order. The json data is(to me) hideously complex with some data(due date) in one section and other data in another section. I'm able to pull the fields I want, but keeping them together is proving challenging.
For example, the title and subtitle are in the 'bibs/briefinfo' section with a key value of 'title' or 'subtitle'. Due Date is also in the 'checkouts' section with a key value of 'dueDate'. When I loop through them, though, the Titles are in one order, the due dates are in another order and the subtitles another.
I used BeautifulSoup because it's a webpage with json in it, so used BS to read the webpage.
I'm wanting to pull the following fields for each book so I can display the info for each book:
title, subtitle, contentType from briefinfo section
duedate from checkouts section heldcopies and availablecopies from the availability section
Here's the pertinent section of my code:
soup = BeautifulSoup(index_page.text, 'html.parser')
all_scripts = soup.find_all('script', {"type":"application/json"})
for script in all_scripts:
jsondata = json.loads(script.text)
print(jsondata)
output = []
for i in item_generator(jsondata, "bibTitle"):
ans = {i}
print(i)
output.append(ans)
for i in item_generator(jsondata, "dueDate"):
ans = {i}
output.append(ans)
print("Subtitle----------------------")
for i in item_generator(jsondata, "subtitle"):
ans = {i}
print(i)
output.append(ans)
print(output)Here's the json output from my print statement so I can see what I'm working with. I tried to format it so it's easier to read. I removed a lot of other elements to keep the size down. Hopefully I didn't break any of the brackets.
{
'app':
{
'coreCssFingerprint': '123123123',
'coreAssets':
{
'cdnHost': 'https://xyz.com',
'cssPath': '/dynamic_stylesheet',
'defaultStylesheet': 'xyz.css'
},
},
'entities':
{
'listItems': {},
'cards': {},
'accounts':
{
'88888888':
{
'barcode': '999999999',
'expiryDate': None,
'id': 88888888,
}
},
'shelves':
{
'88888888':
{
'1111222222':
{
'id': 1111222222,
'metadataId': 'S00A1122334',
'shelf': 'for_later',
'privateItem': True,
'dateAdded': '2023-12-30',
},
}
},
'users':
{
'88888888':
{
'accounts': \[88888888\],
'status': 'A',
'showGroupingDebug': False,
'avatarUrl': '',
'id': 88888888,
}
},
'eventPrograms': {},
'checkouts':
{
'112233445566778899':
{
'checkoutId': '112233445566778899',
'materialType': 'PHYSICAL',
'dueDate': '2024-08-26',
'metadataId': 'S99Z000000',
'bibTitle': "The Lord of the Rings"
},
'998877665544332211':
{
'checkoutId': 998877665544332211',
'materialType': 'PHYSICAL',
'dueDate': '2024-08-26',
'metadataId': 'S88Y00000',
'bibTitle': 'The Lord of the Rings'
},
},
'eventSeries': {},
'catalogBibs': {},
'bibs':
{
'S88Y00000':
{
'id': 'S88Y00000',
'briefInfo':
{
'superFormats': ['BOOKS', 'MODERN_FORMATS'],
'genreForm': [],
'callNumber': '123.456',
'authors': ['Tolkien, J.R.R.'],
'metadataId': 'S88Y00000',
'jacket':
{
'type': 'hardcover',
'local_url': None
},
'contentType': 'FICTION',
'format': 'BK',
'subtitle': 'The Two Towers',
'title': 'The Lord of the Rings',
'id': 'S88Y00000',
},
'availability':
{
'heldCopies': 0,
'singleBranch': False,
'metadataId': 'S88Y00000',
'statusType': 'AVAILABLE',
'totalCopies': 3,
'availableCopies': 2
}
},
'S77X12345':
{
'id': 'S77X12345',
'briefInfo':
{
'superFormats': ['BOOKS', 'MODERN_FORMATS'],
'genreForm': [],
'callNumber': '123.457',
'authors': ['Tolkien, J.R.R.'],
'metadataId': 'S77X12345',
'jacket':
{
'type': 'hardcover',
'local_url': None
},
'contentType': 'FICTION',
'format': 'BK',
'subtitle': 'The Fellowship of the Ring',
'title': 'The Lord of the Rings',
'id': 'S77X12345',
},
'availability':
{
'heldCopies': 0,
'singleBranch': False,
'metadataId': 'S77X12345',
'statusType': 'AVAILABLE',
'totalCopies': 2,
'availableCopies': 1
}
}
Anyone know of a better way to parse this data? Thanks!
I was giving the task to flatten a JSON file which basically means to take every arrays and turn it table (not a actual table, all I need to do is give them a spreadsheet with the mapping). The only thing that I need to make sure is that there are a key between those 2 tables. There are many layered / nested arrays in the JSON.
Is there any automatic way of achieving this? There isn't any logic, just making sure that all layers talk to each other.
You can convert the Json to a dictionary in python using json.load.
This dictionary can be converted to a dataframe using Pandas.Dataframe.
You can export this dataframe as .csv using pandas.Dataframe.to_csv to be consumed in Postgres.
Note: This requires Pandas library to be installed. Or else,you can simply install Anaconda (if you are using any other IDE) and most frequently used packages come installed with it.
Use the code below.
I have used PrettyTable module for printing in a table like structure. Use this - https://www.geeksforgeeks.org/how-to-make-a-table-in-python/ for table procedure.
Also, all the headers and values will be stored in headers and values variable.
import json
from prettytable import PrettyTable
value = ['''
{
"_id": {
"$Col1": "XXXXXXX2443"
},
"col2": false,
"col3": "359335050111111",
"startedAt": {
"$date": 1633309625000
},
"endedAt": {
"$date": 1633310213000
},
"col4": "YYYYYYYYYYYYYYYYYY",
"created_at": {
"$date": 1633310846935
},
"updated_at": {
"$date": 1633310846935
},
"__v": 0
}''']
dictionary = json.loads(value[0])
headers = []
values = []
for key in dictionary:
head = key
value = ""
if type(dictionary[key]) == type({}):
for key2 in dictionary[key]:
head += "/" + key2
value = dictionary[key][key2]
headers.append(head)
values.append(value)
else:
value = dictionary[key]
headers.append(head)
values.append(value)
print(headers)
print(values)
myTable = PrettyTable(headers)
myTable.add_row(values)
print(myTable)
Output
['_id/$Col1', 'col2', 'col3', 'startedAt/$date', 'endedAt/$date', 'col4', 'created_at/$date', 'updated_at/$date', '__v']
['XXXXXXX2443', False, '359335050111111', 1633309625000, 1633310213000, 'YYYYYYYYYYYYYYYYYY', 1633310846935, 1633310846935, 0]
+-------------+-------+-----------------+-----------------+---------------+--------------------+------------------+------------------+-----+
| _id/$Col1 | col2 | col3 | startedAt/$date | endedAt/$date | col4 | created_at/$date | updated_at/$date | __v |
+-------------+-------+-----------------+-----------------+---------------+--------------------+------------------+------------------+-----+
| XXXXXXX2443 | False | 359335050111111 | 1633309625000 | 1633310213000 | YYYYYYYYYYYYYYYYYY | 1633310846935 | 1633310846935 | 0 |
+-------------+-------+-----------------+-----------------+---------------+--------------------+------------------+------------------+-----+