There are many possible solutions. Generally though, you'll probably want to:

  1. Not loop over fields; instead let Pandas split the fields for you
  2. Use an actual missing value
    • But later if you want to represent it differently, you can do that, e.g. using the na_rep parameter to df.style.format

For the first step, you can look at Split / Explode a column of dictionaries into separate columns with pandas. I'll use Lech Birek's solution (json_normalize) then drop the "id" columns and rename the "value" columns.

headers_mapping = {'1': 'field1', '2': 'field2', '3': 'field3', '4': 'field4'}
(
    pd.json_normalize(df['json_field'])
    .filter(like='value')
    .rename(columns=lambda label: headers_mapping[label.rstrip('.value')])
)
   field1  field2  field3  field4
0  value1  value2     NaN     NaN
1  value1     NaN  value3     NaN
2     NaN     NaN  value3  value4

If you also need to sort the columns, tack this on at the end:

.reindex(columns=headers_mapping.values())
Answer from wjandrea on Stack Overflow
🌐
Stack Overflow
stackoverflow.com › questions › 64916148 › how-to-split-a-json-string-column-in-pandas-spark-dataframe
python - How to split a json string column in pandas/spark dataframe? - Stack Overflow
raw_data = \ [{'id': 1, 'name': 'NATALIE', 'json_result': '{"0": {"_source": {"person_id": 101, "firstname": "NATALIE", "lastname": "OSHO", "city_name": "WESTON"}}}'}, \ {'id': 2, 'name': 'MARK', 'json_result': '{"0": {"_source": {"person_id": 102, "firstname": "MARK", "lastname": "BROWN", "city_name": "NEW YORK"}}}'}, \ {'id': 3, 'name': 'NANCY', 'json_result': '{"0": {"_source": {"person_id": 103, "firstname": "NANCY", "lastname": "GATES", "city_name": "LA"}}}'}] df = spark.createDataFrame(raw_data) json_schema = spark.read.json(df.rdd.map(lambda rec: rec.json_result)).schema df = df.withColumn('json', F.from_json(F.col('json_result'), json_schema)) \ .select("id", "name", "json.0._source.*") df.show() ... Sign up to request clarification or add additional context in comments. ... I'd like for somebody experienced in pandas to show me a better way but this is what I came up with.
Discussions

python - How can I split a pandas column containing a json into new columns in a given dataframe? - Stack Overflow
I am looking to split a dataframe column that contains a string of a dictionary into separate columns. I've seen a few methods, but I want to avoid splitting the string since there are some More on stackoverflow.com
🌐 stackoverflow.com
How to extract JSON into DataFrame columns?
Whats your current code and can you post a example of the json? More on reddit.com
🌐 r/learnpython
6
2
October 17, 2022
Dividing json row data into multiple columns of pandas dataframe - Stack Overflow
While reading data from json to pandas, a multi criteria hotel ratings columns is read as shown below. I have 2 columns in my dataframe Ratings and ReviewID. Since I read the dataframe from a large... More on stackoverflow.com
🌐 stackoverflow.com
February 1, 2020
Help with splitting JSON column from dataframe
Hi there, first time posting. I'm looking for some assistance with dealing with a dataframe which contains a JSON column (I hope I am describing it accurately as such, I'm fairly new to dealing with data of this kind). I have some data I have collected from a pilot experiment. More on forum.posit.co
🌐 forum.posit.co
1
0
January 23, 2020
🌐
Stack Overflow
stackoverflow.com › questions › 54971005 › splitting-a-pandas-data-frames-column-containing-json-data-into-multiple-column
python 3.x - Splitting a pandas data frame's column containing json data into multiple columns - Stack Overflow
March 4, 2019 - I loaded and normalized a json data as: json_string = json.loads(data) df_norm = json_normalize(json_string, errors='ignore') Say it has now 2 columns: Group Members A [{'id':'1', '
Top answer
1 of 2
1

I hope I've understood your question well. Try:

from ast import literal_eval

df["experimental_properties"] = df["experimental_properties"].apply(
    lambda x: {d["name"]: d["property"] for d in literal_eval(x)}
)
df = pd.concat([df, df.pop("experimental_properties").apply(pd.Series)], axis=1)

print(df)

Prints:

            Boiling Point                                Density
0                115.3 °C                                    NaN
1  91 °C @ Press: 20 Torr                                    NaN
2  58 °C @ Press: 12 Torr  0.8753 g/cm<sup>3</sup> @ Temp: 20 °C
2 of 2
0

Is the expected output really what you are looking for? Another way to visualise the data would be to have "name", "property", and "sourceNumber" as column names.

import json
import pandas as pd

data = [
'''[{'name': 'Boiling Point', 'property': '115.3 °C', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '91 °C @ Press: 20 Torr', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '58 °C @ Press: 12 Torr', 'sourceNumber': 1}, {'name': 'Density', 'property': '0.8753 g/cm<sup>3</sup> @ Temp: 20 °C', 'sourceNumber': 1}]''']

#Initialise a naiveList
naiveList = []

#String to List
for i in data:
    tempStringOfData = i
    tempStringOfData = tempStringOfData.replace("\'", "\"")
    tempJsonData = json.loads(tempStringOfData)
    naiveList.append(tempJsonData)

#Initialise a List for Dictionaries
newListOfDictionaries = []
for i in naiveList:
    for j in i:
        newListOfDictionaries.append(j)

df = pd.DataFrame(newListOfDictionaries)
print(df)

Which gives you

            name                               property  sourceNumber
0  Boiling Point                               115.3 °C             1
1  Boiling Point                 91 °C @ Press: 20 Torr             1
2  Boiling Point                 58 °C @ Press: 12 Torr             1
3        Density  0.8753 g/cm<sup>3</sup> @ Temp: 20 °C             1
🌐
Reddit
reddit.com › r/learnpython › how to extract json into dataframe columns?
r/learnpython on Reddit: How to extract JSON into DataFrame columns?
October 17, 2022 -

I have a large line JSON file that I am reading through in chunks using pandas read_json.

Everything is going well, except for one field that is coming across in its original JSON form, which is fine, but I need to further parse it into columns.

The fields looks like:

{'food': 'apple', 'type': 'fruit'},{'food': 'beef', 'type': 'meat'},{'food': 'ice-cream', 'type': 'desert'}

I'd like to have three columns in the DataFrame 'Food1','Food2,'Food3' that I would like to populate with data from this field - there are 28 columns before these 3 that read_json is working fine for.

Some rows don't have the above field populated.

But for this record, I'd like the result to be:

col1 col2 .... col28 food1 food2 food3
xxx xxx xxx xxx apple beef ice-cream

There seem to be three issues.

  1. extracting the JSON so I can parse how many food items I have in this record (0-3)

  2. applying this to the entire chunk that I've read from the JSON Lines file

  3. dealing with lines that don't have this field filled in

I could write some code to parse each line individually, but this looks like it will be much slower than using pandas.

I've tried json_loads, ast.literal_eval and none seems to be getting me closer to what I'm looking for... help!

🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.DataFrame.to_json.html
pandas.DataFrame.to_json — pandas 3.0.2 documentation
orient='table' contains a ‘pandas_version’ field under ‘schema’. This stores the version of pandas used in the latest revision of the schema. ... >>> from json import loads, dumps >>> df = pd.DataFrame( ... [["a", "b"], ["c", "d"]], ... index=["row 1", "row 2"], ... columns=["col 1", "col 2"], ... ) >>> result = df.to_json(orient="split") >>> parsed = loads(result) >>> dumps(parsed, indent=4) { "columns": [ "col 1", "col 2" ], "index": [ "row 1", "row 2" ], "data": [ [ "a", "b" ], [ "c", "d" ] ] }
Find elsewhere
🌐
CopyProgramming
copyprogramming.com › howto › pandas-dataframe-split-json-into-columns
Json: Splitting JSON into columns in a Pandas dataframe
June 24, 2023 - Slice pandas dataframe json column ... (str) only change the bibliographic and series position. You need to process the columns individually and join them all together to get the format that you need....
🌐
Swdevnotes
swdevnotes.com › python › 2022 › extract-data-from-json-in-pandas-dataframe
Extract data from JSON in Pandas Dataframe | Software Development Notes
July 24, 2022 - Here are a number of ways to extract all the elements from json objects at once and append the data as columns to the Dataframe. The first loads the JSON data twice once for values and once for keys, this could be improved by defining a function to load the json and return a pandas series.
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.read_json.html
pandas.read_json — pandas 3.0.1 documentation - PyData |
This is because index is also used by DataFrame.to_json() to denote a missing Index name, and the subsequent read_json() operation cannot distinguish between the two. The same limitation is encountered with a MultiIndex and any names beginning with 'level_'. ... >>> from io import StringIO >>> df = pd.DataFrame( ... [["a", "b"], ["c", "d"]], ... index=["row 1", "row 2"], ... columns=["col 1", "col 2"], ... ) Encoding/decoding a Dataframe using 'split' formatted JSON:
Top answer
1 of 4
1

should add ignore_index=True argument in explode function to make sure the following join is not messed up.

df = pd.DataFrame(data).explode('countries', ignore_index=True)
df = df.join(pd.json_normalize(df.pop('countries')))
print(df)
2 of 4
0

You could try this with explode:

df=df.explode('countries')
#we add to each dictionary the respective value of year with key 'year'
df['countries']=[{**dc,**{'year':y}} for dc,y in zip(df['countries'],df['year'])]
pd.DataFrame(df['countries'].tolist())

Example:

j = [{'continent': 'europe',
 'country': 'Yugoslavia',
 'income': None,
  'life_exp': None,
'population': 4687422},
{'continent': 'asia',
'country': 'United Korea (former)',
'income': None,
'life_exp': None,
'population': 13740000}]
df=pd.DataFrame({'countries':[j,j],'year':[1800,1900]})
print(df)

df=df.explode('countries')
print(df)

#Here we add the key 'year' with the respective year row value to each dictionary
df['countries']=[{**dc,**{'year':y}} for dc,y in zip(df['countries'],df['year'])]
print(df['countries'])

finaldf=pd.DataFrame(df['countries'].tolist())
print(finaldf)

Output:

original df:
                                           countries  year
0  [{'continent': 'europe', 'country': 'Yugoslavi...  1800
1  [{'continent': 'europe', 'country': 'Yugoslavi...  1900


    

df(after explode): 
                                                                                            
                                           countries  year
0  {'continent': 'europe', 'country': 'Yugoslavia...  1800
0  {'continent': 'asia', 'country': 'United Korea...  1800
1  {'continent': 'europe', 'country': 'Yugoslavia...  1900
1  {'continent': 'asia', 'country': 'United Korea...  1900


df.countries(with year added):
0    {'continent': 'europe', 'country': 'Yugoslavia', 'income': None, 'life_exp': None, 'population': 4687422, 'year': 1800}
0    {'continent': 'asia', 'country': 'United Korea (former)', 'income': None, 'life_exp': None, 'population': 13740000, 'year': 1800}
1    {'continent': 'europe', 'country': 'Yugoslavia', 'income': None, 'life_exp': None, 'population': 4687422, 'year': 1900}
1    {'continent': 'asia', 'country': 'United Korea (former)', 'income': None, 'life_exp': None, 'population': 13740000, 'year': 1900}
Name: countries, dtype: object

finaldf
  continent                country income life_exp  population  year
0    europe             Yugoslavia   None     None     4687422  1800
1      asia  United Korea (former)   None     None    13740000  1800
2    europe             Yugoslavia   None     None     4687422  1900
3      asia  United Korea (former)   None     None    13740000  1900
🌐
Kaggle
kaggle.com › code › jboysen › quick-tutorial-flatten-nested-json-in-pandas
Quick Tutorial: Flatten Nested JSON in Pandas
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds