Brave Search

Loading and parsing a JSON file with multiple JSON objects

stackoverflow.com › questions › 12451431 › loading-and-parsing-a-json-file-with-multiple-json-objects

You have a JSON Lines format text file. You need to parse your file line by line:

import json

data = []
with open('file') as f:
    for line in f:
        data.append(json.loads(line))

Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.

Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.

If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.

Answer from Martijn Pieters on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 12451431 › loading-and-parsing-a-json-file-with-multiple-json-objects

python - Loading and parsing a JSON file with multiple JSON objects - Stack Overflow

Top answer

1 of 7

316

You have a JSON Lines format text file. You need to parse your file line by line:

import json

data = []
with open('file') as f:
    for line in f:
        data.append(json.loads(line))

Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.

2 of 7

In case you are using pandas and you will be interested in loading the json file as a dataframe, you can use:

import pandas as pd
df = pd.read_json('file.json', lines=True)

And to convert it into a json array, you can use:

df.to_json('new_file.json')

reddit.com › r/learnpython › reading a large (30.6g) jsonl file

r/learnpython on Reddit: Reading a large (30.6G) JSONL file

April 21, 2021 -

Hi all,

I am working on a project where I have text data stored in a massive (30.6G) json lines file. While I do have 32G of RAM, I would obviously like to avoid loading the entire file into memory.

What is the best way to go about loading a json file like this in without hogging memory?

Top answer

1 of 4

There's no good off-the-shelf solution for this. JSON files are simply not designed for that. There's a couple of "lazy" json parsers or "iterative" parsers, but in the end it comes down to what your data looks like. It's often better / easier to parse out the higher objects yourself. For example if your data is a massive list of lists, you could manually search for the "[]" characters and pass the results into json.loads as a "stream".

2 of 4

Assuming the data is actually separate lines of small data like the json lines format requires you can iterate line by line with open('really_big_file.jsonl', mode='r') as infile: for line in infile: json_line = loads(line) # now do the thing Edit: Obvious question is though what are you going to do with the data - can you actually parse it line-by-line and extract the small amounts you need or at the end of all this do you end up loading the dataset in anyway?

Videos