pandas split json into columns

stackoverflow.com › questions › 64916148 › how-to-split-a-json-string-column-in-pandas-spark-dataframe

python - How to split a json string column in pandas/spark dataframe? - Stack Overflow

raw_data = \ [{'id': 1, 'name': 'NATALIE', 'json_result': '{"0": {"_source": {"person_id": 101, "firstname": "NATALIE", "lastname": "OSHO", "city_name": "WESTON"}}}'}, \ {'id': 2, 'name': 'MARK', 'json_result': '{"0": {"_source": {"person_id": 102, "firstname": "MARK", "lastname": "BROWN", "city_name": "NEW YORK"}}}'}, \ {'id': 3, 'name': 'NANCY', 'json_result': '{"0": {"_source": {"person_id": 103, "firstname": "NANCY", "lastname": "GATES", "city_name": "LA"}}}'}] df = spark.createDataFrame(raw_data) json_schema = spark.read.json(df.rdd.map(lambda rec: rec.json_result)).schema df = df.withColumn('json', F.from_json(F.col('json_result'), json_schema)) \ .select("id", "name", "json.0._source.*") df.show() ... Sign up to request clarification or add additional context in comments. ... I'd like for somebody experienced in pandas to show me a better way but this is what I came up with.

stackoverflow.com › questions › 77821660 › split-json-data-into-multiple-column-with-pandas

python - Split json data into multiple column with pandas - Stack Overflow

1 of 3

There are many possible solutions. Generally though, you'll probably want to:

Not loop over fields; instead let Pandas split the fields for you
Use an actual missing value
- But later if you want to represent it differently, you can do that, e.g. using the na_rep parameter to df.style.format

headers_mapping = {'1': 'field1', '2': 'field2', '3': 'field3', '4': 'field4'}
(
    pd.json_normalize(df['json_field'])
    .filter(like='value')
    .rename(columns=lambda label: headers_mapping[label.rstrip('.value')])
)

   field1  field2  field3  field4
0  value1  value2     NaN     NaN
1  value1     NaN  value3     NaN
2     NaN     NaN  value3  value4

If you also need to sort the columns, tack this on at the end:

.reindex(columns=headers_mapping.values())

2 of 3

You can try:

import json

# apply `json.loads` if necessary
df["json_field"] = df["json_field"].apply(json.loads)

data = []
for d in df["json_field"]:
    dct = {}
    for k, v in d.items():
        dct[f"field{k}"] = v["value"]
    data.append(dct)

out = pd.DataFrame(data)
print(out)

Prints:

   field1  field2  field3  field4
0  value1  value2     NaN     NaN
1  value1     NaN  value3     NaN
2     NaN     NaN  value3  value4

Discussions

python - How can I split a pandas column containing a json into new columns in a given dataframe? - Stack Overflow

I am looking to split a dataframe column that contains a string of a dictionary into separate columns. I've seen a few methods, but I want to avoid splitting the string since there are some More on stackoverflow.com

stackoverflow.com

How to extract JSON into DataFrame columns?

Whats your current code and can you post a example of the json? More on reddit.com

r/learnpython

October 17, 2022

Dividing json row data into multiple columns of pandas dataframe - Stack Overflow

While reading data from json to pandas, a multi criteria hotel ratings columns is read as shown below. I have 2 columns in my dataframe Ratings and ReviewID. Since I read the dataframe from a large... More on stackoverflow.com

stackoverflow.com

February 1, 2020

Help with splitting JSON column from dataframe

Hi there, first time posting. I'm looking for some assistance with dealing with a dataframe which contains a JSON column (I hope I am describing it accurately as such, I'm fairly new to dealing with data of this kind). I have some data I have collected from a pilot experiment. More on forum.posit.co

forum.posit.co

January 23, 2020

stackoverflow.com › questions › 54971005 › splitting-a-pandas-data-frames-column-containing-json-data-into-multiple-column

python 3.x - Splitting a pandas data frame's column containing json data into multiple columns - Stack Overflow

March 4, 2019 - I loaded and normalized a json data as: json_string = json.loads(data) df_norm = json_normalize(json_string, errors='ignore') Say it has now 2 columns: Group Members A [{'id':'1', '

stackoverflow.com › questions › 74020257 › how-can-i-split-a-pandas-column-containing-a-json-into-new-columns-in-a-given-da

python - How can I split a pandas column containing a json into new columns in a given dataframe? - Stack Overflow

youtube.com › low code no code

1 of 2

I hope I've understood your question well. Try:

from ast import literal_eval

df["experimental_properties"] = df["experimental_properties"].apply(
    lambda x: {d["name"]: d["property"] for d in literal_eval(x)}
)
df = pd.concat([df, df.pop("experimental_properties").apply(pd.Series)], axis=1)

print(df)

Prints:

            Boiling Point                                Density
0                115.3 °C                                    NaN
1  91 °C @ Press: 20 Torr                                    NaN
2  58 °C @ Press: 12 Torr  0.8753 g/cm<sup>3</sup> @ Temp: 20 °C

2 of 2

Is the expected output really what you are looking for? Another way to visualise the data would be to have "name", "property", and "sourceNumber" as column names.

import json
import pandas as pd

data = [
'''[{'name': 'Boiling Point', 'property': '115.3 °C', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '91 °C @ Press: 20 Torr', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '58 °C @ Press: 12 Torr', 'sourceNumber': 1}, {'name': 'Density', 'property': '0.8753 g/cm<sup>3</sup> @ Temp: 20 °C', 'sourceNumber': 1}]''']

#Initialise a naiveList
naiveList = []

#String to List
for i in data:
    tempStringOfData = i
    tempStringOfData = tempStringOfData.replace("\'", "\"")
    tempJsonData = json.loads(tempStringOfData)
    naiveList.append(tempJsonData)

#Initialise a List for Dictionaries
newListOfDictionaries = []
for i in naiveList:
    for j in i:
        newListOfDictionaries.append(j)

df = pd.DataFrame(newListOfDictionaries)
print(df)

Which gives you

            name                               property  sourceNumber
0  Boiling Point                               115.3 °C             1
1  Boiling Point                 91 °C @ Press: 20 Torr             1
2  Boiling Point                 58 °C @ Press: 12 Torr             1
3        Density  0.8753 g/cm<sup>3</sup> @ Temp: 20 °C             1

YouTube

Python Pandas Data wrangling json column to proper DataFrame - YouTube

February 19, 2022 - How to deal with Json columns in Pandas Dataframes

reddit.com › r/learnpython › how to extract json into dataframe columns?

r/learnpython on Reddit: How to extract JSON into DataFrame columns?

October 17, 2022 -

I have a large line JSON file that I am reading through in chunks using pandas read_json.

Everything is going well, except for one field that is coming across in its original JSON form, which is fine, but I need to further parse it into columns.

The fields looks like:

{'food': 'apple', 'type': 'fruit'},{'food': 'beef', 'type': 'meat'},{'food': 'ice-cream', 'type': 'desert'}

I'd like to have three columns in the DataFrame 'Food1','Food2,'Food3' that I would like to populate with data from this field - there are 28 columns before these 3 that read_json is working fine for.

Some rows don't have the above field populated.

But for this record, I'd like the result to be:

col1	col2	....	col28	food1	food2	food3
xxx	xxx	xxx	xxx	apple	beef	ice-cream

There seem to be three issues.

extracting the JSON so I can parse how many food items I have in this record (0-3)
applying this to the entire chunk that I've read from the JSON Lines file
dealing with lines that don't have this field filled in

I could write some code to parse each line individually, but this looks like it will be much slower than using pandas.

I've tried json_loads, ast.literal_eval and none seems to be getting me closer to what I'm looking for... help!

Whats your current code and can you post a example of the json?

1 of 2

2 of 2

Have a look at the pandas json_normalize() method. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.to_json.html

pandas.DataFrame.to_json — pandas 3.0.2 documentation

orient='table' contains a ‘pandas_version’ field under ‘schema’. This stores the version of pandas used in the latest revision of the schema. ... >>> from json import loads, dumps >>> df = pd.DataFrame( ... [["a", "b"], ["c", "d"]], ... index=["row 1", "row 2"], ... columns=["col 1", "col 2"], ... ) >>> result = df.to_json(orient="split") >>> parsed = loads(result) >>> dumps(parsed, indent=4) { "columns": [ "col 1", "col 2" ], "index": [ "row 1", "row 2" ], "data": [ [ "a", "b" ], [ "c", "d" ] ] }

stackoverflow.com › questions › 60009828 › dividing-json-row-data-into-multiple-columns-of-pandas-dataframe

Dividing json row data into multiple columns of pandas dataframe - Stack Overflow

copyprogramming.com › howto › pandas-dataframe-split-json-into-columns

The below code worked for me `

Rating = result['Ratings'].values.tolist()
 rate = pd.DataFrame(Rating,columns =['Service', 'Cleanliness','Overall'])


   Service   Cleanliness     Overall
         0        5               5
         1        4               4`

Find elsewhere

Google Bing Mojeek

CopyProgramming

Json: Splitting JSON into columns in a Pandas dataframe

June 24, 2023 - Slice pandas dataframe json column ... (str) only change the bibliographic and series position. You need to process the columns individually and join them all together to get the format that you need....

Posit Community

forum.posit.co › tidyverse

Help with splitting JSON column from dataframe - tidyverse - Posit Community

You have NA rows, you have to get rid of them. library(tidyverse) library(jsonlite) url <- "https://drive.google.com/uc?export=download&id=139G1oa5kQOMNgb9gKELpa0H0rgt0AE2q" node <- read_csv(url) node %>% drop_na() %>% mutate(property2 = map(property2, fromJSON)) %>% unnest_wider(col…

stackoverflow.com › questions › 72693498 › how-to-split-a-json-response-into-different-columns-with-pandas

python - How to split a json response into different columns with pandas? - Stack Overflow

swdevnotes.com › python › 2022 › extract-data-from-json-in-pandas-dataframe

You can achieve this using read_json() to covert this json to a pandas dataframe

    df = pd.read_json(json_name)

The documentation for it can be found here: https://pandas.pydata.org/docs/reference/api/pandas.read_json.html

Swdevnotes

Extract data from JSON in Pandas Dataframe | Software Development Notes

July 24, 2022 - Here are a number of ways to extract all the elements from json objects at once and append the data as columns to the Dataframe. The first loads the JSON data twice once for values and once for keys, this could be improved by defining a function to load the json and return a pandas series.

stackoverflow.com › questions › 74014343 › how-to-split-pandas-column-containing-several-json-arrays

python - How to split pandas column containing several .json arrays? - Stack Overflow

IIUC this is exactly what json_normalize is for:

import pandas as pd

data = {"_id": {
    "$oid": "6090ba8fccd167ce183b5ebc"
  },
  "day": "2021-05-04",
  "sensorType": "eurecam",
  "sensorid": "74",
  "first": {
    "$numberLong": "1620097683000"
  },
  "last": {
    "$numberLong": "1620103653000"
  },
  "nsamples": 200,
  "samples": [
    {
      "occupancy_state": "1,0,0,0",
      "rtc_utc_time": "2021-05-04 03:07:58",
      "sdcard_site": "BE",
      "sdcard_chain": "DAG",
      "sdcard_line": "1,BE",
      "utc": {
        "$numberLong": "1620097683000"
      }
    },
    {
      "occupancy_state": "1,0,0,0",
      "rtc_utc_time": "2021-05-04 03:08:27",
      "sdcard_site": "BE",
      "sdcard_chain": "DAG",
      "sdcard_line": "1,BE",
      "utc": {
        "$numberLong": "1620097712000"
      }
    },
    {
      "occupancy_state": "1,0,0,0",
      "rtc_utc_time": "2021-05-04 03:08:57",
      "sdcard_site": "BE",
      "sdcard_chain": "DAG",
      "sdcard_line": "1,BE",
      "utc": {
        "$numberLong": "1620097742000"
      }
    }
  ]
}

df = pd.json_normalize(data, record_path="samples")
print(df)

Output:

  occupancy_state         rtc_utc_time sdcard_site sdcard_chain sdcard_line utc.$numberLong
0         1,0,0,0  2021-05-04 03:07:58          BE          DAG        1,BE   1620097683000
1         1,0,0,0  2021-05-04 03:08:27          BE          DAG        1,BE   1620097712000
2         1,0,0,0  2021-05-04 03:08:57          BE          DAG        1,BE   1620097742000

stackoverflow.com › questions › 49107520 › pandas-splitting-json-list-value-into-new-columns

python - Pandas: Splitting JSON list value into new columns - Stack Overflow

You can use split with removing [] by strip:

df1 = df.pop('values').str.strip('[]').str.split(',',expand=True).astype(float)
df[['values_a', 'values_b', 'values_c']] = df1

Solution if There is no NaNs:

L = [x.split(',') for x in df.pop('values').str.strip('[]').values.tolist()]
df[['values_a', 'values_b', 'values_c']] = pd.DataFrame(L).astype(float)

solutions with converting columns first to list and then is used DataFrame constructor:

import ast
s = df.pop('values').apply(ast.literal_eval)
df[['values_a', 'values_b', 'values_c']] = pd.DataFrame(s.values.tolist()).astype(float)

Similar:

df = pd.read_csv(file converters={'values':ast.literal_eval})
print (df)
   value                                  values
0   56.0      [-0.5554548, 10.0748005, 4.232949]
1   72.0  [-0.1953888, 0.15093994, -0.058532715]

df1 = pd.DataFrame(df.pop('values').tolist()).astype(float)
df[['values_a', 'values_b', 'values_c']] = df1

Final:

print (df)
   value  values_a   values_b  values_c
0   56.0 -0.555455  10.074801  4.232949
1   72.0 -0.195389   0.150940 -0.058533

EDIT:

If is possible in some column is more as 3 value then is not possible assign to 3 new columns. Solution is use join:

df = df.join(df1.add_prefix('val'))
print (df)
   value      val0       val1      val2
0   56.0 -0.555455  10.074801  4.232949
1   72.0 -0.195389   0.150940 -0.058533

stackoverflow.com › questions › 75526348 › how-to-split-a-json-single-column-into-pandas-dataframe-multiple-columns

python 3.x - How to split a JSON single column into Pandas DataFrame multiple columns? - Stack Overflow

pandas.pydata.org › docs › reference › api › pandas.read_json.html

Since the json file's schema is unknown, there is no way to guess what's going wrong. Update the question with a sample raw json file from the url where you are fetching it from.

Usually you don't have to do much other than read_json.

for example in the below snippet, json is already loaded and dataframe is created with corresponding columns,

import pandas as pd

df = pd.read_json("https://api.github.com/users/microsoft/repos")
print(df.id)

Output:
0       6104546
1     104510411
2      58656723
3      13121042
...
28     13141936
29     20027360
Name: id, dtype: int64

Edit: Updated with OP's json schema.

import pandas as pd
import json

df = pd.read_json("https://sejabtg.com/api/offices.json")   
def parse_registrations(row):
    registration = pd.json_normalize(row.result)
    return registration
        
s = df.apply(parse_registrations, axis=1)
print(type(s))
print(f"df3.0: {s[0].cnpj}")
print(f"df3.1: {s[10].cnpj}")

df = pd.DataFrame()
# iterate over the series and concatenate the dataframes
for i in s:
    df = pd.concat([df, i], axis=0)
print(type(df))
print(df.cnpj)

Output:

<class 'pandas.core.series.Series'>
df3.0: 0    15673644000118
Name: cnpj, dtype: object
df3.1: 0    15410039000154
Name: cnpj, dtype: object
<class 'pandas.core.frame.DataFrame'>
0    15673644000118
0    15673644000118
0    15673644000118
0    24151517000140
0    18146649000180
          ...      
0    15410039000154
0    15410039000154
0    15410039000154
0    17263248000148
0    17263248000148
Name: cnpj, Length: 362, dtype: object

Pandas

pandas.read_json — pandas 3.0.1 documentation - PyData |

This is because index is also used by DataFrame.to_json() to denote a missing Index name, and the subsequent read_json() operation cannot distinguish between the two. The same limitation is encountered with a MultiIndex and any names beginning with 'level_'. ... >>> from io import StringIO >>> df = pd.DataFrame( ... [["a", "b"], ["c", "d"]], ... index=["row 1", "row 2"], ... columns=["col 1", "col 2"], ... ) Encoding/decoding a Dataframe using 'split' formatted JSON:

reddit.com › r/datamining › split a json-string inside a csv-file

r/datamining on Reddit: Split a JSON-string inside a CSV-file

October 4, 2023 -

Hi!

I have a CSV file that consists of an id, which is an unique movie, and the keywords for this movie. It looks something like this: 15602,"[{'id': 1495, 'name': 'fishing'}, {'id': 12392, 'name': 'best friend'}, {'id': 179431, 'name': 'duringcreditsstinger'}, {'id': 208510, 'name': 'old men'}]"

I want to split the data so every movie (the id) gets every keyword. But using read csv-file, it only gets me a column with the id and then one column with all the keywords, including keyword-id and 'name'. Is there any solution to only get the specific keyword?

Looks like you don’t have a valid CSV file . It should look like this: « Id », « name » 1496, « Fishing » 1392, « best friend » What you gave looks like JSON . You will probably need to write some code to join and transform that data . Also, it’s not clear what you’re asking: Please give an example of the end result you’re looking for.

1 of 2

2 of 2

Try reading your CSV. Not sure if it’s a string or if it’ll try to parse the json. If it parses things, try to use .explode() to split the list into rows. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html You’ll end up with a dictionary. You could expand that to another data frame and join it back in to get your columns

stackoverflow.com › questions › 63166570 › pandas-convert-a-json-column-with-multiple-rows-into-multiple-dataframe-rows

python - Pandas: Convert a JSON column with multiple rows into multiple dataframe rows - Stack Overflow

1 of 4

should add ignore_index=True argument in explode function to make sure the following join is not messed up.

df = pd.DataFrame(data).explode('countries', ignore_index=True)
df = df.join(pd.json_normalize(df.pop('countries')))
print(df)

2 of 4

You could try this with explode:

df=df.explode('countries')
#we add to each dictionary the respective value of year with key 'year'
df['countries']=[{**dc,**{'year':y}} for dc,y in zip(df['countries'],df['year'])]
pd.DataFrame(df['countries'].tolist())

Example:

j = [{'continent': 'europe',
 'country': 'Yugoslavia',
 'income': None,
  'life_exp': None,
'population': 4687422},
{'continent': 'asia',
'country': 'United Korea (former)',
'income': None,
'life_exp': None,
'population': 13740000}]
df=pd.DataFrame({'countries':[j,j],'year':[1800,1900]})
print(df)

df=df.explode('countries')
print(df)

#Here we add the key 'year' with the respective year row value to each dictionary
df['countries']=[{**dc,**{'year':y}} for dc,y in zip(df['countries'],df['year'])]
print(df['countries'])

finaldf=pd.DataFrame(df['countries'].tolist())
print(finaldf)

Output:

original df:
                                           countries  year
0  [{'continent': 'europe', 'country': 'Yugoslavi...  1800
1  [{'continent': 'europe', 'country': 'Yugoslavi...  1900


    

df(after explode): 
                                                                                            
                                           countries  year
0  {'continent': 'europe', 'country': 'Yugoslavia...  1800
0  {'continent': 'asia', 'country': 'United Korea...  1800
1  {'continent': 'europe', 'country': 'Yugoslavia...  1900
1  {'continent': 'asia', 'country': 'United Korea...  1900


df.countries(with year added):
0    {'continent': 'europe', 'country': 'Yugoslavia', 'income': None, 'life_exp': None, 'population': 4687422, 'year': 1800}
0    {'continent': 'asia', 'country': 'United Korea (former)', 'income': None, 'life_exp': None, 'population': 13740000, 'year': 1800}
1    {'continent': 'europe', 'country': 'Yugoslavia', 'income': None, 'life_exp': None, 'population': 4687422, 'year': 1900}
1    {'continent': 'asia', 'country': 'United Korea (former)', 'income': None, 'life_exp': None, 'population': 13740000, 'year': 1900}
Name: countries, dtype: object

finaldf
  continent                country income life_exp  population  year
0    europe             Yugoslavia   None     None     4687422  1800
1      asia  United Korea (former)   None     None    13740000  1800
2    europe             Yugoslavia   None     None     4687422  1900
3      asia  United Korea (former)   None     None    13740000  1900

stackoverflow.com › questions › 38353945 › pandas-write-dataframe-to-json-with-split

python - Pandas: Write dataframe to json with split - Stack Overflow