python read json file with multiple lines

Loading and parsing a JSON file with multiple JSON objects

stackoverflow.com › questions › 12451431 › loading-and-parsing-a-json-file-with-multiple-json-objects

You have a JSON Lines format text file. You need to parse your file line by line:

import json

data = []
with open('file') as f:
    for line in f:
        data.append(json.loads(line))

Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.

Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.

If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.

Answer from Martijn Pieters on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 12451431 › loading-and-parsing-a-json-file-with-multiple-json-objects

python - Loading and parsing a JSON file with multiple JSON objects - Stack Overflow

Top answer

1 of 7

316

You have a JSON Lines format text file. You need to parse your file line by line:

import json

data = []
with open('file') as f:
    for line in f:
        data.append(json.loads(line))

Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.

2 of 7

In case you are using pandas and you will be interested in loading the json file as a dataframe, you can use:

import pandas as pd
df = pd.read_json('file.json', lines=True)

And to convert it into a json array, you can use:

df.to_json('new_file.json')

Stack Overflow

stackoverflow.com › questions › 34461917 › issue-parsing-multiline-json-file-using-python

Issue parsing multiline JSON file using Python - Stack Overflow

Top answer

1 of 3

You will go crazy if you try to parse a json file line by line. The json module has helper methods to read file objects directly or strings i.e. the load and loads methods. load takes a file object (as shown below) for a file that contains json data, while loads takes a string that contains json data.

Option 1: - Preferred

import json
with open('test.json', 'r') as jf:
    weatherData = json.load(jf)
    print weatherData

Option 2:

import json
with open('test.json', 'r') as jf:
    weatherData = json.loads(jf.read())
    print weatherData

If you are looking for higher performance json parsing check out ujson

2 of 3

In the first snippet, you try to parse it line by line. You should parse it all at once. The easiest is to use json.load(jsonfile). (The jf variable name is misleading as it's a string). So the correct way to parse it:

import json

with open('test.json', 'r') as jsonFile:
    weatherData = json.loads(jsonFile)

Although it's a good idea to store the json in one line, as it's more concise.

In the second snippet your problem is that you print it as unicode string which is and u'string here' is python specific. A valid json uses double quotation marks

Videos

youtube.com

14 Read, Parse or Flatten JSON data | JSON file with Schema ...

08:34

YouTube

How to read Single and MultiLine json files using Pyspark - YouTube

February 20, 2023

10:49

YouTube

Read Singleline and Multiline JSON in PySpark using Databricks ...

July 11, 2022

View all

reddit.com › r/learnpython › parsing multi-line json into single-line python

r/learnpython on Reddit: Parsing Multi-Line JSON into Single-Line Python

March 14, 2020 -

I have been working on Project Euler problem 8 ( https://projecteuler.net/problem=8 ), which gives a 1000-digit number. I am trying to import this data into Python as easily as possible, so I copied the number into a .txt and wrapped it in double-quotes. The number is still on 20 lines.

When I try to parse the code into a Python string using json.load(), I get an error that there is an invalid control character at the end of each line. I did some research and found that sometimes converting to a raw-string (starting the JSON with an r does this) will allow the number to be parsed, but I get the error that no JSON object could be detected. I do not understand fully the difference between json.load() and json.loads(), but I know that json.loads() also does not work, with an error that a string or buffer was expected.

My code to parse the string is as follows:

import json
number = json.load(open("ProjectEuler8Number.txt", "r"))

Is there any way to parse a multi-line JSON into a single-line string in Python?

Top answer

1 of 2

You don't make something a JSON with just double quotes, a JSON needs at least outer [] or {}. But why bother when you can just do with open("ProjectEuler8Number.txt") as fp: number = ''.join(c for c in fp.read() if c.isdigit()) this will simply read all the digits from the file and form a single string from it

2 of 2

it has nothing to do with json can't you just declare it at the top of your program like this? number = ''' 73167176531330624919225119674426574742355349194934 96983520312774506326239578318016984801869478851843 85861560789112949495459501737958331952853208805511 12540698747158523863050715693290963295227443043557 66896648950445244523161731856403098711121722383113 62229893423380308135336276614282806444486645238749 30358907296290491560440772390713810515859307960866 70172427121883998797908792274921901699720888093776 65727333001053367881220235421809751254540594752243 52584907711670556013604839586446706324415722155397 53697817977846174064955149290862569321978468622482 83972241375657056057490261407972968652414535100474 82166370484403199890008895243450658541227588666881 16427171479924442928230863465674813919123162824586 17866458359124566529476545682848912883142607690042 24219022671055626321111109370544217506941658960408 07198403850962455444362981230987879927244284909188 84580156166097919133875499200524063689912560717606 05886116467109405077541002256983155200055935729725 71636269561882670428252483600823257530420752963450 '''.replace('\n', '')

Spark By {Examples}

sparkbyexamples.com › home › hbase › pyspark read multiple lines (multiline) json file

PySpark Read Multiple Lines (multiline) JSON File - Spark By {Examples}

March 27, 2024 - # Read multiline json file from pyspark.sql import SparkSession spark = SparkSession.builder \ .master("local[1]") \ .appName("SparkByExamples.com") \ .getOrCreate() multiline_df = spark.read.option("multiline", "true") \ .json("resources/multiline-zipcode.json") multiline_df.printSchema() multiline_df.show()

PYnative

pynative.com › home › python › json › python parse multiple json objects from file

Python Parse multiple JSON objects from file | Solve ValueError: Extra data

May 14, 2021 - If your file contains a list of JSON objects, and you want to decode one object one-at-a-time, we can do it. To Load and parse a JSON file with multiple JSON objects we need to follow below steps: ... Read the file line by line because each line contains valid JSON. i.e., read one JSON object ...

reddit.com › r/learnpython › there's no way to have multi-line strings in a json file, right? how could i deal with this limitation when getting python to read from such a file.

r/learnpython on Reddit: There's no way to have multi-line strings in a Json file, right? How could I deal with this limitation when getting Python to read from such a file.

September 13, 2017 -

I found about YAML, but would knowledge of reading and writing from Json transfer pretty easily.
My dictionary keys would be several sentences each, for now. If it grows more I don't want it to be too hard to read when manually editing.

Top answer

1 of 4

you can have multiline strings in json. this is the contents of sample.json {"text": "hello world\n2nd line"} and some python that reads the file import json with open("/tmp/sample.json", "rb") as f: x = json.loads(f.read()) the value of x["text"] is a multiline string.

2 of 4

If you want to store these sentences in an easily parsable format and retain the ability to manually edit them, I recommend using yaml as it has thorough support for gracefully handling multiline values. pip install the PyYAML package; using it is very similar to the built-in json package.

Stack Overflow

stackoverflow.com › questions › 2913554 › python-read-multiline-json › 2913637

Python read multiline JSON - Stack Overflow

Top answer

1 of 3

Get rid of all of the backslashes and all of the "Pythonic" quoting in the settings file. Works fine if the file is just:

{
  "user":"username",
  "password":"passwd"
}

Note also that JSON strings are quoted with double quotes, not single quotes. See JSON spec here:

http://www.json.org/

2 of 3

>>> s = """
{
  "user":"username",
  "password":"passwd"
}
"""
>>> json.loads(s)
{'password': 'passwd', 'user': 'username'}

json doesn't consider \ to be a line-continuation character.

Medium

sankettantia.medium.com › multiline-a-python-package-for-multi-line-json-values-c4f7a76f0305

Multiline — a Python package for multi-line JSON values | by Sanket Tantia | Medium

December 29, 2020 - In case, we want to store multiline strings then we will have to manually convert them by removing all the newlines or replacing them with \n character. Python’s default json package can parse a Json file or a string, only if it is valid that ...

Find elsewhere

Google Bing Mojeek

Stack Overflow

stackoverflow.com › questions › 71617249 › write-multiple-json-lines-to-json-file

python - Write multiple JSON lines to JSON file - Stack Overflow

Top answer

1 of 1

def main():
   ... # changeKeyValueCode
   writeFile(data)
 
def writeFile(data):
   with open('new_file.json', 'a') as f:
       json.dump(data, f)

with open('new_file.json', 'a')

open file with (a), it will search the file if found append data to the end, else it will create empty file and then append data.

Stack Overflow

stackoverflow.com › questions › 67602386 › reading-a-json-file-that-has-multiple-lines

python - Reading a json file that has multiple lines - Stack Overflow

Top answer

1 of 1

You are dealing with ndjson(Newline delimited JSON) data format.

You have to read the whole data string, split it by lines and parse each line as a JSON object resulting in a list of JSONs:

def parse_ndjson(data):
    return [json.loads(l) for l in data.splitlines()]


with open('C:\\Users\\test.json', 'r', encoding="utf8") as handle:
    data = handle.read()
    dicts = parse_ndjson(data)

for d in dicts:
    new_d = my_function(d)
    print("New dict", new_d)

Stack Overflow

stackoverflow.com › questions › 21533894 › how-to-read-line-delimited-json-from-large-file-line-by-line

python - How to read line-delimited JSON from large file (line by line) - Stack Overflow

Top answer

1 of 9

Just read each line and construct a json object at this time:

with open(file_path) as f:
    for line in f:
        j_content = json.loads(line)

This way, you load proper complete json object (provided there is no \n in a json value somewhere or in the middle of your json object) and you avoid memory issue as each object is created when needed.

There is also this answer.:

https://stackoverflow.com/a/7795029/671543

2 of 9

contents = open(file_path, "r").read() 
data = [json.loads(str(item)) for item in contents.strip().split('\n')]

Stack Overflow

stackoverflow.com › questions › 74632449 › how-to-read-multiline-json-like-file-with-multiple-json-fragments-separated-by-j

python - How to read multiline json-like file with multiple JSON fragments separated by just a new line? - Stack Overflow

If you know the objects are each a line, then loading and parsing multiple JSON objects per file would help, but if this is not the case, then you either need to fix the one writing the JSON or try to find the interception points manually.

Stack Overflow

stackoverflow.com › questions › 42444130 › python-multi-line-json-and-variables › 42444225

Python multi-line JSON and variables - Stack Overflow

Top answer

1 of 2

Why don't you make it a dictionary and set variables then use the json library to make it into json

import json
json_serial = "123"
my_json = {
    'settings': {
        "serial": json_serial,
        "status": '2',
        "ersion": '3',
    },
    'config': {
        'active': '4',
        'version': '5'
    }
}
print(json.dumps(my_json))

2 of 2

If you absolutely insist on generating JSON with string concatenation -- and, to be clear, you absolutely shouldn't -- the only way to be entirely certain that your output is valid JSON is to generate the substrings being substituted with a JSON generator. That is:

'''"settings" : {
  "serial"   : {serial},
  "version"  : {version}
}'''.format(serial=json.dumps("5"), version=json.dumps(1))

But don't. Really, really don't. The answer by @davidejones is the Right Thing for this scenario.

Stack Overflow

stackoverflow.com › questions › 19321638 › python-parsing-a-single-line-json-file

Python: Parsing a single-line JSON file - Stack Overflow

Top answer

1 of 1

Use the standard library :

json_data = json.loads(your_line)

# Usage exemple
for feature in json_data['features']:
    print feature['id']

Stack Overflow

stackoverflow.com › questions › 20037430 › reading-multiple-json-records-into-a-pandas-dataframe

python - Reading multiple JSON records into a Pandas dataframe - Stack Overflow

Top answer

1 of 4

119

Note: Line separated json is now supported in read_json (since 0.19.0):

CopyIn [31]: pd.read_json('{"a":1,"b":2}\n{"a":3,"b":4}', lines=True)
Out[31]:
   a  b
0  1  2
1  3  4

or with a file/filepath rather than a json string:

Copypd.read_json(json_file, lines=True)

It's going to depend on the size of you DataFrames which is faster, but another option is to use str.join to smash your multi line "JSON" (Note: it's not valid json), into valid json and use read_json:

CopyIn [11]: '[%s]' % ','.join(test.splitlines())
Out[11]: '[{"a":1,"b":2},{"a":3,"b":4}]'

For this tiny example this is slower, if around 100 it's the similar, signicant gains if it's larger...

CopyIn [21]: %timeit pd.read_json('[%s]' % ','.join(test.splitlines()))
1000 loops, best of 3: 977 µs per loop

In [22]: %timeit l=[ json.loads(l) for l in test.splitlines()]; df = pd.DataFrame(l)
1000 loops, best of 3: 282 µs per loop

In [23]: test_100 = '\n'.join([test] * 100)

In [24]: %timeit pd.read_json('[%s]' % ','.join(test_100.splitlines()))
1000 loops, best of 3: 1.25 ms per loop

In [25]: %timeit l = [json.loads(l) for l in test_100.splitlines()]; df = pd.DataFrame(l)
1000 loops, best of 3: 1.25 ms per loop

In [26]: test_1000 = '\n'.join([test] * 1000)

In [27]: %timeit l = [json.loads(l) for l in test_1000.splitlines()]; df = pd.DataFrame(l)
100 loops, best of 3: 9.78 ms per loop

In [28]: %timeit pd.read_json('[%s]' % ','.join(test_1000.splitlines()))
100 loops, best of 3: 3.36 ms per loop

Note: of that time the join is surprisingly fast.

2 of 4

If you are trying to save memory, then reading the file a line at a time will be much more memory efficient:

Copywith open('test.json') as f:
    data = pd.DataFrame(json.loads(line) for line in f)

Also, if you import simplejson as json, the compiled C extensions included with simplejson are much faster than the pure-Python json module.

reddit.com › r/learnpython › reading a large (30.6g) jsonl file

r/learnpython on Reddit: Reading a large (30.6G) JSONL file

April 21, 2021 -

Hi all,

I am working on a project where I have text data stored in a massive (30.6G) json lines file. While I do have 32G of RAM, I would obviously like to avoid loading the entire file into memory.

What is the best way to go about loading a json file like this in without hogging memory?

Top answer

1 of 4

There's no good off-the-shelf solution for this. JSON files are simply not designed for that. There's a couple of "lazy" json parsers or "iterative" parsers, but in the end it comes down to what your data looks like. It's often better / easier to parse out the higher objects yourself. For example if your data is a massive list of lists, you could manually search for the "[]" characters and pass the results into json.loads as a "stream".

2 of 4

Assuming the data is actually separate lines of small data like the json lines format requires you can iterate line by line with open('really_big_file.jsonl', mode='r') as infile: for line in infile: json_line = loads(line) # now do the thing Edit: Obvious question is though what are you going to do with the data - can you actually parse it line-by-line and extract the small amounts you need or at the end of all this do you end up loading the dataset in anyway?

Stack Overflow

stackoverflow.com › questions › 40712178 › reading-the-json-file-with-multiple-objects-in-python

Reading the JSON File with multiple objects in Python - Stack Overflow

Top answer

1 of 4

There are several problems with the logic of your code.

ss = s.read()

reads the entire file s into a single string. The next line

for line in ss:

iterates over each character in that string, one by one. So on each loop line is a single character. In

    line = ss[7:]

you are getting the entire file contents apart from the first 7 characters (in positions 0 through 6, inclusive) and replacing the previous content of line with that. And then

T.append(json.loads(line))

attempts to convert that to JSON and store the resulting object into the T list.

Here's some code that does what you want. We don't need to read the entire file into a string with .read, or into a list of lines with .readlines, we can simply put the file handle into a for loop and that will iterate over the file line by line.

We use a with statement to open the file, so that it will get closed automatically when we exit the with block, or if there's an IO error.

import json

table = []
with open('simple.json', 'r') as f:
    for line in f:
        table.append(json.loads(line[7:]))

for row in table:
    print(row)

output

{'color': '33ef', 'age': '55', 'gender': 'm'}
{'color': '3444', 'age': '56', 'gender': 'f'}
{'color': '3999', 'age': '70', 'gender': 'm'}

We can make this more compact by building the table list in a list comprehension:

import json

with open('simple.json', 'r') as f:
    table = [json.loads(line[7:]) for line in f]

for row in table:
    print(row)

2 of 4

If you use Pandas you can simply write df = pd.read_json(f, lines=True)

as per doc the lines=True:

Read the file as a json object per line.

Stack Overflow

stackoverflow.com › questions › 53442359 › how-to-read-in-json-file-in-lines

python - how to read in json file in lines - Stack Overflow

Top answer

1 of 3

If your data is exactly in that format, we can edit it into valid JSON.

import json

source = '''\
{
 "A":0,
 "B":2
}{
 "A":3,
 "B":4
}{
 "C":5,
 "D":6
}
'''

fixed = '[' + source.replace('}{', '},{') + ']'
lst = json.loads(fixed)
print(lst)

output

[{'A': 0, 'B': 2}, {'A': 3, 'B': 4}, {'C': 5, 'D': 6}]

This relies on each record being separated by '}{'. If that's not the case, we can use regex to do the search & replace operation.

2 of 3

Add [ and ] around your input and try this:

import json
with open('data.json') as data_file:    
    data = json.load(data_file)
    print (data)

This code returns this line

[{'A': 0, 'B': 2}, {'A': 3, 'B': 4}]

when I put this data into the file:

[
{
 "A":0,
 "B":2
},{
 "A":3,
 "B":4
}
]

If you can't edit the file data.json, you can read string from this file, add [ and ] around this string, and call json.loads().

Update: Oh, I see that I added comma separator between JSON files. For initial input this my code doesn't work. But may be it is better to modify generator of this file? (i.e. to add comma separator)

Stack Exchange

codereview.stackexchange.com › questions › 254225 › parsing-a-json-one-line-at-a-time-in-python

Parsing a JSON one line at a time in Python - Code Review Stack Exchange

Top answer

1 of 1

Your speed-up claim:

avoid loading the whole file at once, instead parsing a file line by line, which seems to be 2x as fast according to testing

seems highly unlikely, and I would like to see proof of this. A compiled-and-tuned built-in JSON parser that operates on an in-memory buffer is nearly certain to outperform a non-compiled, non-built-in, line-by-line parser. The only advantage that your code will likely have is reduced memory occupation for huge files.

More important than speed is correctness, and it seems you've already discovered cases where your parser simply breaks.

It's also worth calling out that your parser is call-recursive. It will be trivially easy to crash your parser by providing a sufficiently-nested JSON file that will blow its stack, and these do exist in the wild in non-malicious situations.

Basically, other than for learning purposes this shouldn't really be done at all. For the 99% of cases where it works, use built-in JSON parsing. For the 1% of cases where memory concerns actually call for iterative parsing, your problem has already been solved multiple times.

Databricks Documentation

docs.databricks.com › data engineering › lakeflow connect › data formats › json

JSON file | Databricks on AWS

CREATE TEMPORARY VIEW multiLineJsonTable USING json OPTIONS (path="/tmp/multi-line.json",multiline=true) ... val mdf = spark.read.option("multiline", "true").format("json").load("/tmp/multi-line.json") mdf.show(false)