You have a JSON Lines format text file. You need to parse your file line by line:
import json
data = []
with open('file') as f:
for line in f:
data.append(json.loads(line))
Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.
Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.
If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.
Answer from Martijn Pieters on Stack Overflowpython - Loading and parsing a JSON file with multiple JSON objects - Stack Overflow
Python conversion from JSON to JSONL - Stack Overflow
Reading a large (30.6G) JSONL file
How do you print json data in multiply lines?
Videos
Β» pip install json-lines
You have a JSON Lines format text file. You need to parse your file line by line:
import json
data = []
with open('file') as f:
for line in f:
data.append(json.loads(line))
Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.
Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.
If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.
In case you are using pandas and you will be interested in loading the json file as a dataframe, you can use:
import pandas as pd
df = pd.read_json('file.json', lines=True)
And to convert it into a json array, you can use:
df.to_json('new_file.json')
Your input appears to be a sequence of Python objects; it certainly is not valid a JSON document.
If you have a list of Python dictionaries, then all you have to do is dump each entry into a file separately, followed by a newline:
import json
with open('output.jsonl', 'w') as outfile:
for entry in JSON_file:
json.dump(entry, outfile)
outfile.write('\n')
The default configuration for the json module is to output JSON without newlines embedded.
Assuming your A, B and C names are really strings, that would produce:
{"index": 1, "met": "1043205", "no": "A"}
{"index": 2, "met": "000031043206", "no": "B"}
{"index": 3, "met": "0031043207", "no": "C"}
If you started with a JSON document containing a list of entries, just parse that document first with json.load()/json.loads().
The jsonlines package is made exactly for your use case:
import jsonlines
items = [
{'a': 1, 'b': 2},
{'a', 123, 'b': 456},
]
with jsonlines.open('output.jsonl', 'w') as writer:
writer.write_all(items)
(Yes, I wrote it years after you posted your original question.)
Hi all,
I am working on a project where I have text data stored in a massive (30.6G) json lines file. While I do have 32G of RAM, I would obviously like to avoid loading the entire file into memory.
What is the best way to go about loading a json file like this in without hogging memory?
Currently trying to get some data from a url which shows in json format, i can get the data, however when i print it, it just shows in 1 long line of text which isnt what i want. i want the text to be split into multiply lines like when you add \n to strings. (i know its normally not good to do except Exceptions, its just there while i get the other part to work, also the entire def is in a class)
Here is what i currently have. I havent work much with json data before which is why im stuck at what exactly to do.
def info(self):
try:
url = [url]
response = requests.get(url)
x = json.loads(response.text)
lore = str(x['data'][input_champion]['lore'])
print('Getting champion info, please wait')
time.sleep(5)
print(f'lore: {lore}')
time.sleep(0.5)
except Exception as e:
time.sleep(5)