I used the following function (details can be found here):

def flatten_data(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

This unfortunately completely flattens whole JSON, meaning that if you have multi-level JSON (many nested dictionaries), it might flatten everything into single line with tons of columns.

What I used, in the end, was json_normalize() and specified structure that I required. A nice example of how to do it that way can be found here.

Answer from Bostjan on Stack Overflow
Top answer
1 of 10
73

I used the following function (details can be found here):

def flatten_data(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

This unfortunately completely flattens whole JSON, meaning that if you have multi-level JSON (many nested dictionaries), it might flatten everything into single line with tons of columns.

What I used, in the end, was json_normalize() and specified structure that I required. A nice example of how to do it that way can be found here.

2 of 10
20

Cross-posting (but then adapting further) from https://stackoverflow.com/a/62186053/4355695 : In this repo: https://github.com/ScriptSmith/socialreaper/blob/master/socialreaper/tools.py#L8 , I found an implementation of the list-inclusion comment by @roneo to the answer posted by @Imran.

I've added checks to it for catching empty lists and empty dicts. And also added print lines that will help one understand precisely how this function works. You can turn on those print statements by passing crumbs=True in the function's args.

from collections.abc import MutableMapping
def flatten(dictionary, parent_key=False, separator='.', crumbs=False):
    """
    Turn a nested dictionary into a flattened dictionary
    :param dictionary: The dictionary to flatten
    :param parent_key: The string to prepend to dictionary's keys
    :param separator: The string used to separate flattened keys
    :return: A flattened dictionary
    """

    items = []
    for key, value in dictionary.items():
        if crumbs: print('checking:',key)
        new_key = str(parent_key) + separator + key if parent_key else key
        if isinstance(value, MutableMapping):
            if crumbs: print(new_key,': dict found')
            if not value.items():
                if crumbs: print('Adding key-value pair:',new_key,None)
                items.append((new_key,None))
            else:
                items.extend(flatten(value, new_key, separator).items())
        elif isinstance(value, list):
            if crumbs: print(new_key,': list found')
            if len(value):
                for k, v in enumerate(value):
                    items.extend(flatten({str(k): v}, new_key, separator).items())
            else:
                if crumbs: print('Adding key-value pair:',new_key,None)
                items.append((new_key,None))
        else:
            if crumbs: print('Adding key-value pair:',new_key,value)
            items.append((new_key, value))
    return dict(items)

Test it:

ans = flatten({'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3], 'e':{'f':[], 'g':{}} })
print('\nflattened:',ans)

Output:

checking: a
Adding key-value pair: a 1
checking: c
c : dict found
checking: a
Adding key-value pair: c.a 2
checking: b
c.b : dict found
checking: x
Adding key-value pair: c.b.x 5
checking: y
Adding key-value pair: c.b.y 10
checking: d
d : list found
checking: 0
Adding key-value pair: d.0 1
checking: 1
Adding key-value pair: d.1 2
checking: 2
Adding key-value pair: d.2 3
checking: e
e : dict found
checking: f
e.f : list found
Adding key-value pair: e.f None
checking: g
e.g : dict found
Adding key-value pair: e.g None

flattened: {'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'd.0': 1, 'd.1': 2, 'd.2': 3, 'e.f': None, 'e.g': None}

Annd that does the job I need done: I throw any complicated json at this and it flattens it out for me. I added a check to the original code to handle empty lists too

Credits to https://github.com/ScriptSmith whose repo I found the intial flatten function in.

Testing OP's sample json, here's the output:

{'count': 13,
 'virtualmachine.0.id': '1082e2ed-ff66-40b1-a41b-26061afd4a0b',
 'virtualmachine.0.name': 'test-2',
 'virtualmachine.0.displayname': 'test-2',
 'virtualmachine.0.securitygroup.0.id': '9e649fbc-3e64-4395-9629-5e1215b34e58',
 'virtualmachine.0.securitygroup.0.name': 'test',
 'virtualmachine.0.securitygroup.0.tags': None,
 'virtualmachine.0.nic.0.id': '79568b14-b377-4d4f-b024-87dc22492b8e',
 'virtualmachine.0.nic.0.networkid': '05c0e278-7ab4-4a6d-aa9c-3158620b6471',
 'virtualmachine.0.nic.1.id': '3d7f2818-1f19-46e7-aa98-956526c5b1ad',
 'virtualmachine.0.nic.1.networkid': 'b4648cfd-0795-43fc-9e50-6ee9ddefc5bd',
 'virtualmachine.0.nic.1.traffictype': 'Guest',
 'virtualmachine.0.hypervisor': 'KVM',
 'virtualmachine.0.affinitygroup': None,
 'virtualmachine.0.isdynamicallyscalable': False}

So you'll see that 'tags' and 'affinitygroup' keys are also handled and added to output. Original code was omitting them.

2021-05-30 : Updated: collections.MutableMapping is changed to collections.abc.MutableMapping

2023-01-11 : edited, added separator arg in second items.extend() call as advised by @MHebes

2024-02-20 : how did that .abc go missing from the import statement?

2025-07-21 : moved crumbs param into function's args

Top answer
1 of 5
32

Please scroll down for the newer, faster solution

This is an older question, but I struggled the entire night to get a satisfactory result for a similar situation, and I came up with this:

import json
import pandas

def cross_join(left, right):
    return left.assign(key=1).merge(right.assign(key=1), on='key', how='outer').drop('key', 1)

def json_to_dataframe(data_in):
    def to_frame(data, prev_key=None):
        if isinstance(data, dict):
            df = pandas.DataFrame()
            for key in data:
                df = cross_join(df, to_frame(data[key], prev_key + '.' + key))
        elif isinstance(data, list):
            df = pandas.DataFrame()
            for i in range(len(data)):
                df = pandas.concat([df, to_frame(data[i], prev_key)])
        else:
            df = pandas.DataFrame({prev_key[1:]: [data]})
        return df
    return to_frame(data_in)

if __name__ == '__main__':
    with open('somefile') as json_file:
        json_data = json.load(json_file)

    df = json_to_dataframe(json_data)
    df.to_csv('data.csv', mode='w')

Explanation:

The cross_join function is a neat way I found to do a cartesian product. (credit: here)

The json_to_dataframe function does the logic, using pandas dataframes. In my case, the json was deeply nested, and I wanted to split dictionary key:value pairs into columns, but the lists I wanted to transform into rows for a column -- hence the concat -- which I then cross join with the upper level, thus multiplying the records number so that each value from the list has its own row, while the previous columns are identical.

The recursiveness creates stacks that cross join with the one below, until the last one is returned.

Then with the dataframe in a table format, it's easy to convert to CSV with the "df.to_csv()" dataframe object method.

This should work with deeply nested JSON, being able to normalize all of it into rows by the logic described above.

I hope this will help someone, someday. Just trying to give back to this awesome community.

---------------------------------------------------------------------------------------------

LATER EDIT: NEW SOLUTION

I'm coming back to this as while the dataframe option kinda worked, it took the app minutes to parse not so large JSON data. Therefore I thought of doing what the dataframes do, but by myself:

from copy import deepcopy
import pandas


def cross_join(left, right):
    new_rows = [] if right else left
    for left_row in left:
        for right_row in right:
            temp_row = deepcopy(left_row)
            for key, value in right_row.items():
                temp_row[key] = value
            new_rows.append(deepcopy(temp_row))
    return new_rows


def flatten_list(data):
    for elem in data:
        if isinstance(elem, list):
            yield from flatten_list(elem)
        else:
            yield elem


def json_to_dataframe(data_in):
    def flatten_json(data, prev_heading=''):
        if isinstance(data, dict):
            rows = [{}]
            for key, value in data.items():
                rows = cross_join(rows, flatten_json(value, prev_heading + '.' + key))
        elif isinstance(data, list):
            rows = []
            for item in data:
                [rows.append(elem) for elem in flatten_list(flatten_json(item, prev_heading))]
        else:
            rows = [{prev_heading[1:]: data}]
        return rows

    return pandas.DataFrame(flatten_json(data_in))


if __name__ == '__main__':
    json_data = {
        "id": "0001",
        "type": "donut",
        "name": "Cake",
        "ppu": 0.55,
        "batters":
            {
                "batter":
                    [
                        {"id": "1001", "type": "Regular"},
                        {"id": "1002", "type": "Chocolate"},
                        {"id": "1003", "type": "Blueberry"},
                        {"id": "1004", "type": "Devil's Food"}
                    ]
            },
        "topping":
            [
                {"id": "5001", "type": "None"},
                {"id": "5002", "type": "Glazed"},
                {"id": "5005", "type": "Sugar"},
                {"id": "5007", "type": "Powdered Sugar"},
                {"id": "5006", "type": "Chocolate with Sprinkles"},
                {"id": "5003", "type": "Chocolate"},
                {"id": "5004", "type": "Maple"}
            ],
        "something": []
    }
    df = json_to_dataframe(json_data)
    print(df)

OUTPUT:

      id   type  name   ppu batters.batter.id batters.batter.type topping.id              topping.type
0   0001  donut  Cake  0.55              1001             Regular       5001                      None
1   0001  donut  Cake  0.55              1001             Regular       5002                    Glazed
2   0001  donut  Cake  0.55              1001             Regular       5005                     Sugar
3   0001  donut  Cake  0.55              1001             Regular       5007            Powdered Sugar
4   0001  donut  Cake  0.55              1001             Regular       5006  Chocolate with Sprinkles
5   0001  donut  Cake  0.55              1001             Regular       5003                 Chocolate
6   0001  donut  Cake  0.55              1001             Regular       5004                     Maple
7   0001  donut  Cake  0.55              1002           Chocolate       5001                      None
8   0001  donut  Cake  0.55              1002           Chocolate       5002                    Glazed
9   0001  donut  Cake  0.55              1002           Chocolate       5005                     Sugar
10  0001  donut  Cake  0.55              1002           Chocolate       5007            Powdered Sugar
11  0001  donut  Cake  0.55              1002           Chocolate       5006  Chocolate with Sprinkles
12  0001  donut  Cake  0.55              1002           Chocolate       5003                 Chocolate
13  0001  donut  Cake  0.55              1002           Chocolate       5004                     Maple
14  0001  donut  Cake  0.55              1003           Blueberry       5001                      None
15  0001  donut  Cake  0.55              1003           Blueberry       5002                    Glazed
16  0001  donut  Cake  0.55              1003           Blueberry       5005                     Sugar
17  0001  donut  Cake  0.55              1003           Blueberry       5007            Powdered Sugar
18  0001  donut  Cake  0.55              1003           Blueberry       5006  Chocolate with Sprinkles
19  0001  donut  Cake  0.55              1003           Blueberry       5003                 Chocolate
20  0001  donut  Cake  0.55              1003           Blueberry       5004                     Maple
21  0001  donut  Cake  0.55              1004        Devil's Food       5001                      None
22  0001  donut  Cake  0.55              1004        Devil's Food       5002                    Glazed
23  0001  donut  Cake  0.55              1004        Devil's Food       5005                     Sugar
24  0001  donut  Cake  0.55              1004        Devil's Food       5007            Powdered Sugar
25  0001  donut  Cake  0.55              1004        Devil's Food       5006  Chocolate with Sprinkles
26  0001  donut  Cake  0.55              1004        Devil's Food       5003                 Chocolate
27  0001  donut  Cake  0.55              1004        Devil's Food       5004                     Maple

As per what the above does, well, the cross_join function does pretty much the same thing as in the dataframe solution, but without dataframes, thus being faster.

I added the flatten_list generator as I wanted to make sure that the JSON arrays are all nice and flattened, then provided as a single list of dictionaries comprising of the previous key from one iteration before assigned to each of the list's values. This pretty much mimics the pandas.concat behaviour in this case.

The logic in the main function, json_to_dataframe is then the same as before. All that needed to change was having the operations performed by dataframes as coded functions.

Also, in the dataframes solution I was not appending the previous heading to the nested object, but unless you are 100% sure you do not have conflicts in column names, then it is pretty much mandatory.

I hope this helps :).

EDIT: Modified the cross_join function to deal with the case when a nested list is empty, basically maintaining the previous result set unmodified. The output is unchanged even after adding the empty JSON list in the example JSON data. Thank you, @Nazmus Sakib for pointing it out.

2 of 5
6

For the JSON data you have given, you could do this by parsing the JSON structure to just return a list of all the leaf nodes.

This assumes that your structure is consistent throughout, if each entry can have different fields, see the second approach.

For example:

import json
import csv

def get_leaves(item, key=None):
    if isinstance(item, dict):
        leaves = []
        for i in item.keys():
            leaves.extend(get_leaves(item[i], i))
        return leaves
    elif isinstance(item, list):
        leaves = []
        for i in item:
            leaves.extend(get_leaves(i, key))
        return leaves
    else:
        return [(key, item)]


with open('json.txt') as f_input, open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output)
    write_header = True

    for entry in json.load(f_input):
        leaf_entries = sorted(get_leaves(entry))

        if write_header:
            csv_output.writerow([k for k, v in leaf_entries])
            write_header = False

        csv_output.writerow([v for k, v in leaf_entries])

If your JSON data is a list of entries in the format you have given, then you should get output as follows:

address_line_1,company_number,country_of_residence,etag,forename,kind,locality,middle_name,month,name,nationality,natures_of_control,notified_on,postal_code,premises,region,self,surname,title,year
Address 1,12345678,England,26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00,John,individual-person-with-significant-control,Henley-On-Thames,M,2,John M Smith,Vietnamese,ownership-of-shares-50-to-75-percent,2016-04-06,RG9 1DP,161,Oxfordshire,/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl,Smith,Mrs,1977
Address 1,12345679,England,26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00,John,individual-person-with-significant-control,Henley-On-Thames,M,2,John M Smith,Vietnamese,ownership-of-shares-50-to-75-percent,2016-04-06,RG9 1DP,161,Oxfordshire,/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl,Smith,Mrs,1977

If each entry can contain different (or possibly missing) fields, then a better approach would be to use a DictWriter. In this case, all of the entries would need to be processed to determine the complete list of possible fieldnames so that the correct header can be written.

import json
import csv

def get_leaves(item, key=None):
    if isinstance(item, dict):
        leaves = {}
        for i in item.keys():
            leaves.update(get_leaves(item[i], i))
        return leaves
    elif isinstance(item, list):
        leaves = {}
        for i in item:
            leaves.update(get_leaves(i, key))
        return leaves
    else:
        return {key : item}


with open('json.txt') as f_input:
    json_data = json.load(f_input)

# First parse all entries to get the complete fieldname list
fieldnames = set()

for entry in json_data:
    fieldnames.update(get_leaves(entry).keys())

with open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=sorted(fieldnames))
    csv_output.writeheader()
    csv_output.writerows(get_leaves(entry) for entry in json_data)
Discussions

How to transform a JSON file into a CSV one in Python?
Hi Everyone, I have a quick question. Thanks to a previous post : Python: Extract Data from an Interactive Map on the Web, with Several Years, into a CSV file I was been able to extract JSON data from the web. Thanks again to @FelixLeg & @kknechtel to their useful help and advices. More on discuss.python.org
🌐 discuss.python.org
0
0
April 8, 2024
How to convert JSON to csv in Python?
You have two steps: flatten the json code write the CSV file. You can create a recursive function to flatten the json, or use the library: https://github.com/amirziai/flatten flatten_json After that is completed, writing the CSV is trivial. More on reddit.com
🌐 r/learnpython
3
1
June 17, 2024
python - Flatten JSON response and output to csv - Stack Overflow
I appear to have exhausted the internet searching for what feels like a common occurrence, and I need some help, please. I'm making an API call using the requests library, which returns one JSON re... More on stackoverflow.com
🌐 stackoverflow.com
python - Nested JSON file flattened and stored as CSV - Stack Overflow
These are some solutions that I've tried but I'm unabe to get any of them working: Convert list into a pandas data frame DataFrame from list of list Convert Nested JSON to Excel using Python · How can I get a table like the one below that I can export to a csv? More on stackoverflow.com
🌐 stackoverflow.com
August 2, 2017
🌐
Gigasheet
gigasheet.com › post › convert-json-to-csv-python
How to Convert JSON to CSV in Python
Now, what happens when you’d like to convert a complex JSON to a CSV? This is where you’ll be spending some time debugging issues with your data, normalizing or flattening objects, and more – requiring more time! Python’s Pandas does have a function to normalize the JSON from a semi-structured dataset to a flattened object (with no nestings, easier for CSV conversion).
🌐
Python.org
discuss.python.org › python help
How to transform a JSON file into a CSV one in Python? - Python Help - Discussions on Python.org
April 8, 2024 - Hi Everyone, I have a quick question. Thanks to a previous post : Python: Extract Data from an Interactive Map on the Web, with Several Years, into a CSV file I was been able to extract JSON data from the web. Thanks again to @FelixLeg & @kknechtel to their useful help and advices.
🌐
GitHub
gist.github.com › kevinschaul › 4736d32c33445ebd5b88
Convert a .json file into a CSV file, flattening any embedded objects and arrays · GitHub
July 25, 2019 - Convert a .json file into a CSV file, flattening any embedded objects and arrays - flattenIn2csv.py
🌐
Saralgyaan
saralgyaan.com › posts › convert-json-to-csv-using-python
Convert JSON to CSV using Python-SaralGyaan
February 21, 2019 - You can easily convert a flat JSON file to CSV using Python Pandas module using the following steps:- 1. We will read the JSON file using json module. 2. Flatten the JSON file using json_normalize module. 3. Convert the JSON file to Pandas Dataframe.
🌐
Like Geeks
likegeeks.com › home › python › pandas › convert nested json to csv using python pandas
Convert Nested JSON to CSV using Python Pandas
... In this example, a simple nested JSON structure is converted into a CSV format. Pandas json_normalize function is used to flatten the nested JSON and then it’s converted to CSV.
Find elsewhere
🌐
Reddit
reddit.com › r/learnpython › how to convert json to csv in python?
r/learnpython on Reddit: How to convert JSON to csv in Python?
June 17, 2024 -

Hey everybody,

I try to convert a huge file (~65 GB) into smaller subsets. The goal is to split the files into the smaller subsets e.g. 1 mil tweets per file and convert this data to a csv format. I currently have a working splitting code that splits the ndjson file to smaller ndjson files, but I have trouble to convert the data to csv. The important part is to create columns for each exsisting varaiable, so columns named __crawled_url or w1_balanced. There are quite a few nested variabels in the data, like w1_balanced is contained in the variable theme_topic, that need to be flattened.

Splitting code:

import json
#function to split big ndjson file to multiple smaller files
def split_file(input_file, lines_per_file): #variables that the function calls
    file_count = 0
    line_count = 0
    output_lines = []
    with open(input_file, 'r', encoding="utf8") as infile:
        for line in infile:
            output_lines.append(line)
            line_count += 1
            if line_count == lines_per_file:
                with open(f'1mio_split_{file_count}.ndjson', 'w', encoding="utf8") as outfile:
                    outfile.writelines(output_lines)
                file_count += 1
                line_count = 0
                output_lines = []
        #handle any remaining lines
        if output_lines:
            with open(f'1mio_split_{file_count}.ndjson', 'w',encoding="utf8") as outfile:
                outfile.writelines(output_lines)
#file containing tweets
input_file = input("path to big file:" )
#example filepath: C:/Users/YourName/Documents/tweet.ndjson
#how many lines/tweets should the new file contain?
lines_per_file = int(input ("Split after how many lines?: "))
split_file(input_file, lines_per_file)
print("Splitting done!")

Here are 2 sample lines from the data I use:

[{"__crawled_url":"https://twitter.com/example1","theme_topic":{"w1_balanced":{"label":"__label__a","confidence":0.3981},"w5_balanced":{"label":"__label__c","confidence":1}},"author":"author1","author_userid":"116718988","author_username":"author1","canonical_url":"https://twitter.com/example1","collected_by":"User","collection_method":"tweety 1.0.9.4","collection_time":"2024-05-27T14:40:32","collection_time_epoch":1716813632,"isquoted":false,"isreply":true,"isretweet":false,"language":"de","mentioning/replying":"twitteruser","num_likes":"0","num_retweets":"0","plain_text":"@twitteruser here is an exmaple text 🤔","published_time":"2024-04-18T20:14:51","published_time_epoch":1713471291,"published_time_original":"2024-04-18 20:14:51+00:00","replied_tweet":{"author":"Twitter User","author_userid":"1053198649700827136","author_username":"twitteruser"},"spacy_annotations":{"de_core_news_lg":{"noun_chunks":[{"text":"@twitteruser","start_char":0,"end_char":9},{"text":"more exapmle text","start_char":20,"end_char":34},{"text":"Gel","start_char":40,"end_char":43},{"text":"Haar","start_char":47,"end_char":51}],"named_entities":[{"text":"@twitteruser","start_char":0,"end_char":9,"label_":"MISC"}]},"xx_ent_wiki_sm":{"named_entities":{}},"da_core_news_lg":{"noun_chunks":{},"named_entities":{}},"en_core_web_lg":{"noun_chunks":{},"named_entities":{}},"fr_core_news_lg":{"noun_chunks":{},"named_entities":{}},"it_core_news_lg":{"noun_chunks":{},"named_entities":{}},"pl_core_news_lg":{"named_entities":{}},"es_core_news_lg":{"noun_chunks":{},"named_entities":{}},"fi_core_news_lg":{"noun_chunks":{},"named_entities":{}}},"tweet_id":"1781053802398814682","hashtags":{},"outlinks":{},"quoted_tweet":{"outlinks":{},"hashtags":{},"mentioning/replying":{},"replied_tweet":{}}}]

[{"__crawled_url":"https://twitter.com/example2","theme_topic":{"w1_balanced":{"label":"__label__a","confidence":0.3981},"w5_balanced":{"label":"__label__c","confidence":1}},"author":"author2","author_userid":"116712288","author_username":"author2","canonical_url":"https://twitter.com/example2","collected_by":"User","collection_method":"tweety 1.0.9.4","collection_time":"2024-05-27T14:40:32","collection_time_epoch":1716813632,"isquoted":false,"isreply":true,"isretweet":false,"language":"de","mentioning/replying":"twitteruser","num_likes":"0","num_retweets":"0","plain_text":"@twitteruser here is another exmaple text 🤔","published_time":"2024-04-18T20:14:51","published_time_epoch":1713471291,"published_time_original":"2024-04-18 20:14:51+00:00","replied_tweet":{"author":"Twitter User","author_userid":"1053198649700827136","author_username":"twitteruser"},"spacy_annotations":{"de_core_news_lg":{"noun_chunks":[{"text":"@twitteruser","start_char":0,"end_char":9},{"text":"more exapmle text","start_char":20,"end_char":34},{"text":"Gel","start_char":40,"end_char":43},{"text":"Haar","start_char":47,"end_char":51}],"named_entities":[{"text":"@twitteruser","start_char":0,"end_char":9,"label_":"MISC"}]},"xx_ent_wiki_sm":{"named_entities":{}},"da_core_news_lg":{"noun_chunks":{},"named_entities":{}},"en_core_web_lg":{"noun_chunks":{},"named_entities":{}},"fr_core_news_lg":{"noun_chunks":{},"named_entities":{}},"it_core_news_lg":{"noun_chunks":{},"named_entities":{}},"pl_core_news_lg":{"named_entities":{}},"es_core_news_lg":{"noun_chunks":{},"named_entities":{}},"fi_core_news_lg":{"noun_chunks":{},"named_entities":{}}},"tweet_id":"1781053802398814682","hashtags":{},"outlinks":{},"quoted_tweet":{"outlinks":{},"hashtags":{},"mentioning/replying":{},"replied_tweet":{}}}]

As you can see the lines contain stuff like emojis and are in different languages so the encoding="uft8" must be included while opening the file, here are a few examples what I tried and the error message I get. I should also mention, that since every line is it's own list just calling the elements like with a normal json object didn't work.

Thanks a lot for every answer and even reading this post!

#try1
import json
import csv

data = "C:/Users/Sample-tweets.ndjson"
json_data = json.loads(data)
csv_file ="try3.csv"
csv_obj = open(csv_file, "w")
csv_writer = csv.writer(csv_obj)
header = json_data[0].keys()
csv_writer.writerow(header)
for item in json_data:
    csv_writer.writerow(item.values())
csv_obj.close()
#raise JSONDecodeError("Expecting value", s, err.value) from None
#json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)


#try2
import json
import csv

with open('Sample-tweets.ndjson', encoding="utf8") as ndfile:
data = json.load(ndfile)

csv_data = data['emp_details']
data_file = open('try1.csv', 'w', encoding="utf8")
csv_writer = csv.writer(data_file)
count = 0
for data in csv_data:
    if count == 0:
        header = emp.keys()
csv_writer.writerow(header) #spacing error?! can't even run the script 
        count += 1
    csv_writer.writerow(emp.values())
data_file.close()

with open('Sample-tweets.ndjson', encoding="utf8") as ndfile:
jsondata = json.load(ndfile)

data_file = open('try2.csv', 'w', newline='', encoding="uft8")
csv_writer = csv.writer(data_file)

count = 0
for data in ndfile:
if count == 0:
header = data.keys()
csv_writer.writerow(header)
count += 1
csv_writer.writerow(data.values())
data_file.close()
#error message: raise JSONDecodeError("Extra data", s, end)
#json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 1908)



#try3 to see if the auto dictionary works
import json

output_lines=[]
with open('C:/Users/Sample1-tweets.ndjson', 'r', encoding="utf8") as f:
    json_in=f.read()
json_in=json.loads(json_in)
print(json_in[2])
#error message: raise JSONDecodeError("Extra data", s, end)
#json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 1908)
#->same error message as above
Top answer
1 of 1
3

Pandas has a function called json_normalize, which can directly convert a dict into a dataframe. In order to convert a JSON string into a dict you can simply use the json library. Good source I found would be this`.

import json
import pandas as pd

# Test string, assuming it is from API
test_string = """{
    "status": "1",
    "msg": "Success",
    "data": {
      "id": "12345",
      "PriceDetail": [
        {
          "item": "Apple",
          "amount": "10",
          "weight": "225",
          "price": "92",
          "bestbeforeendeate": "30/09/2023"
        }
        ]
    }
}"""

# Function converts the api result to the dataframe and appends it to df
def add_new_entry_to_dataframe(df, api_result_string):
    input_parsed = json.loads(api_result_string)
    df_with_new_data = pd.json_normalize(input_parsed['data']['PriceDetail'])
    df = df.append(df_with_new_data)
    return df
    

# The dataframe you want to store everything
df = pd.DataFrame()

## Loop where you fetch new data
for i in range(10):
    newly_fetched_result = test_string
    df = add_new_entry_to_dataframe(df, newly_fetched_result)


df = df.reset_index()

# Save as .csv
df.to_csv('output.csv')

print(df)

The output of above code:

item amount weight price bestbeforeendeate
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023

EDIT: I had another look at the problem and thought I share another solution, which might be better for you. Instead of building a huge dataframe over time, the code below appends the fetched data directly into the CSV file. The advantage is that all data is already in the CSV if the program crashes or if you terminate it.

# Function converts the json string to a dataframe and appends it directly to the CSV file
def add_json_string_to_csv(api_result_string):
    input_parsed = json.loads(api_result_string)
    df_with_new_data = pd.json_normalize(input_parsed['data']['PriceDetail'])
    df_with_new_data.to_csv('output.csv', mode='a', header=False)

## Loop where you fetch new data
while (True):
    newly_fetched_result = test_string
    add_json_string_to_csv(newly_fetched_result)
🌐
Medium
medium.com › bright-ai › big-data-munging-convert-nested-json-files-to-combined-data-frame-and-convert-to-csv-in-python-4d5899f3b621
Big Data: Convert nested JSON files to data frame and CSV in python - ProdAI - Medium
October 31, 2022 - Big Data: Convert nested JSON files to data frame and CSV in python Step1: Iterate through multiple JSON files using glob.glob(*.json) Step 2: Read the nested JSON file as line Step 3: Flatten JSON …
🌐
GeeksforGeeks
geeksforgeeks.org › convert-nested-json-to-csv-in-python
Convert nested JSON to CSV in Python | GeeksforGeeks
August 23, 2021 - It checks for the key-value pairs in the dict object. If the value is again a dict then it concatenates the key string with the key string of the nested dict. The desired CSV data is created using the generate_csv_data() function.
🌐
PyPI
pypi.org › project › json-flat
json-flat
July 6, 2020 - JavaScript is disabled in your browser · Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser
🌐
Onlinetools1
onlinetools1.github.io › blogs › json2csv
convert json to csv in python - Step-by-Step Guide
CSV file saved as {csv_file}') import json def flatten_json(json_data, separator='_'): result = [] def flatten(item, parent_key='', sep=separator): if isinstance(item, dict): for key, value in item.items(): new_key = f"{parent_key}{sep}{key}" if parent_key else key flatten(value, new_key, sep) ...
🌐
Python Forum
python-forum.io › thread-37268.html
Convert nested sample json api data into csv in python
May 20, 2022 - Want to convert Sample JSON data into CSV file using python. I am retrieving JSON data from API. As my JSON has nested objects, so it normally cannot be directly converted to CSV.I don't want to do any hard coding and I want to make a python code ful...
🌐
Dadroit
dadroit.com › blog › json-to-csv
How To Convert JSON to CSV File: A Comprehensive Guide
September 21, 2023 - Python's Pandas library, Node.js's json2csv module, and the JQ command-line tool are code-based solutions for converting JSON data to CSV. Pandas is robust and versatile, capable of handling complex data manipulations. json2csv provides a more straightforward interface for JSON file to CSV conversion, including options for flattening nested structures.
🌐
CSVJSON
csvjson.com › json2csv
JSON to CSV - CSVJSON
Wrap a line with square brackets [] and use JSON.parse() to convert to a JSON array. To convert from CSVJSON back to JSON, use the companion tool CSVJSON to JSON.
🌐
ConvertCSV
convertcsv.com › json-to-csv.htm
JSON To CSV Converter
It can also be in JSONLines/MongoDb format with each JSON record on separate lines. You can also identify the array using Javascript notation. You can also force double quotes around each field value or it will be determined for you. The output CSV header row is optional.
🌐
Kaggle
kaggle.com › code › julian3833 › 1-quick-start-read-csv-and-flatten-json-fields
1 - Quick start: read csv and flatten json fields | Kaggle
September 14, 2018 - Explore and run machine learning code with Kaggle Notebooks | Using data from Google Analytics Customer Revenue Prediction