split json into multiple json python

How to split json into multiple files per document

stackoverflow.com › questions › 46534569 › how-to-split-json-into-multiple-files-per-document

Here is a Python solution to your problem.

Don't forget to change the in_file_path to the location of your big JSON file.

import json

in_file_path='path/to/file.json' # Change me!

with open(in_file_path,'r') as in_json_file:

    # Read the file and convert it to a dictionary
    json_obj_list = json.load(in_json_file)

    for json_obj in json_obj_list:
        filename=json_obj['_id']+'.json'

        with open(filename, 'w') as out_json_file:
            # Save each obj to their respective filepath
            # with pretty formatting thanks to `indent=4`
            json.dump(json_obj, out_json_file, indent=4)

Side Note: I ran this in Python3, it should work in Python2 as well

Answer from Stefan Collier on Stack Overflow

GitHub

github.com › jhsu98 › json-splitter

GitHub - jhsu98/json-splitter · GitHub

Navigate to the directory where the script exists and begin by typing python3 json-splitter.py · Enter the name of the JSON file (include the extension) when prompted, then enter the maximum number of MB for each file.

Starred by 54 users

Forked by 20 users

Languages Python

Stack Overflow

stackoverflow.com › questions › 46534569 › how-to-split-json-into-multiple-files-per-document

How to split json into multiple files per document - Stack Overflow

Top answer

1 of 3

Here is a Python solution to your problem.

Don't forget to change the in_file_path to the location of your big JSON file.

import json

in_file_path='path/to/file.json' # Change me!

with open(in_file_path,'r') as in_json_file:

    # Read the file and convert it to a dictionary
    json_obj_list = json.load(in_json_file)

    for json_obj in json_obj_list:
        filename=json_obj['_id']+'.json'

        with open(filename, 'w') as out_json_file:
            # Save each obj to their respective filepath
            # with pretty formatting thanks to `indent=4`
            json.dump(json_obj, out_json_file, indent=4)

Side Note: I ran this in Python3, it should work in Python2 as well

2 of 3

I ran into this problem today as well, and did some research. Just want to share the resulting Python snippet that lets you also customise the length of split files (thanks to this slicing method).

import os
import json
from itertools import islice

def split_json(
    data_path,
    file_name,
    size_split=1000,
):
    """Split a big JSON file into chunks.
    data_path : str, "data_folder"
    file_name : str, "data_file" (exclude ".json")
    """
    with open(os.path.join(data_path, file_name + ".json"), "r") as f:
        whole_file = json.load(f)

    split = len(whole_file) # size_split

    for i in range(split + 1):
        with open(os.path.join(data_path, file_name + "_"+ str(split+1) + "_" + str(i+1) + ".json"), 'w') as f:
            json.dump(dict(islice(whole_file.items(), i*size_split, (i+1)*size_split)), f)
    return

Update: Then, when you need to combine them together again, use the following code:

json_all = dict()
split = 4         # this is the 1-based actual number of splits

for i in range(1, split+1):
    with open(os.path.join("data_folder", "data_file_" + str(split) + "_" + str(i) + ".json"), 'r') as f:
        json_i = json.load(f)
        json_all.update(json_i)

Discussions

python - Split a large json file into multiple smaller files - Stack Overflow

I have a large JSON file, about 5 million records and a file size of about 32GB, that I need to get loaded into our Snowflake Data Warehouse. I need to get this file broken up into chunks of about ... More on stackoverflow.com

stackoverflow.com

Python Split json into separate json based on node value - Stack Overflow

Using python I want to split a json file into multiple files based on the "transactionTypeName" within the transacations.details. In each file I want the rest of the details as well start... More on stackoverflow.com

stackoverflow.com

Split 150GB json file with Python?

I have questions... What do you want to achieve by "viewing" a 1GB part of a JSON file? what's the end-goal? How did you end up with 150GB file to begin with? Technically it's possible to split a file using python, the question remains what exactly do you hope to understand from these "splits"... More on reddit.com

r/learnpython

March 3, 2024

python - Split JSON File into multiple JSONs according to their ID? - Stack Overflow

Please don't tag with python-2.7 and python-3.x unless you have a good reason to do so. ... I won't teach you how to do file I/O and assume you can do that yourself. Once you have loaded the original file as a dict with the json module, do More on stackoverflow.com

stackoverflow.com

April 10, 2017

Medium

jhsu98.medium.com › how-to-build-a-command-line-json-splitter-72061b0f20a6

How to Build a Command Line JSON Splitter | by Jonathan Hsu | Medium

March 10, 2019 - I started this project because I was tired of manually copy/pasting large JSON files to split them into smaller pieces. The project is available at https://github.com/jhsu98/json-splitter, but if you’d like to learn how I built it then continue on. Before we get started, I’m going to assume you’ve installed Python 3.x and know that to run the script you’ll want to type python3 json-splitter.py

Stack Overflow

stackoverflow.com › questions › 43074147 › split-a-large-json-file-into-multiple-smaller-files

python - Split a large json file into multiple smaller files - Stack Overflow

Top answer

1 of 7

Use this code in linux command prompt

split -b 53750k <your-file>
cat xa* > <your-file>

Refer to this link: https://askubuntu.com/questions/28847/text-editor-to-edit-large-4-3-gb-plain-text-file

2 of 7

Answering the question whether Python or Node will be better for the task would be an opinion and we are not allowed to voice our opinions on Stack Overflow. You have to decide yourself what you have more experience in and what you want to work with - Python or Node.

If you go with Node, there are some modules that can help you with that task, that do streaming JSON parsing. E.g. those modules:

https://www.npmjs.com/package/JSONStream
https://www.npmjs.com/package/stream-json
https://www.npmjs.com/package/json-stream

If you go with Python, there are streaming JSON parsers here as well:

https://github.com/kashifrazzaqui/json-streamer
https://github.com/danielyule/naya
http://www.enricozini.org/blog/2011/tips/python-stream-json/

Stack Overflow

stackoverflow.com › questions › 72913506 › python-split-json-into-separate-json-based-on-node-value

Python Split json into separate json based on node value - Stack Overflow

Then loop over until all transactions have been split. This could be simplified by copying the JSON read from the file to a temporary object. In Python 3, it would look like this (Added comments to explain each step): import json # Load JSON data from a file into object fileRead = open('test.json', 'r') jsonContent = json.load(fileRead) fileRead.close() # Loop through each transaction read from the file (3 in the example) for transaction in jsonContent['transactions']: jsonFileContent = {} # Loop through each json key value pair read from the object.

reddit.com › r/learnpython › split 150gb json file with python?

r/learnpython on Reddit: Split 150GB json file with Python?

March 3, 2024 -

I haven't had much to do with json files so far, but now I need them for datahoarding. Now I have a 150GB json file here and can't open it because there's not enough free HDD & RAM on my laptop and computers.

So I have to split the file into several pieces (prefer 1 GB files) and open and view them one after the other. How can I do this on Windows?

Google spits out ancient results (and mostly for Linux) which, as usual, have contradictory information.

Maybe it is possible with a Python script. I don't care if the last and first line of the new file look slightly different than in the original json file

Top answer

1 of 29

2 of 29

The problem with JSON is that it can have an arbitrary nested structure, so you can't create "splits" without knowing the overall structure first, which again either requires first parsing the entire thing (Catch-22) or having a priori knowledge of the structure which can be used. I assume your data in the file is quite regular, so to your question if it's possible with a Python script: probably yes by writing something taking advantage of this known regular structure. But you didn't provide enough information to guide you further.

Python Forum

python-forum.io › thread-37678.html

Python Split json into separate json based on node value

Using python I want to split a json file into multiple files based on the 'transactionTypeName' within the transacations.details. In each file I want the rest of the details as well starting from careperson to username. Below is the json file. Had to...

Cloudera Community

community.cloudera.com › t5 › Support-Questions › How-to-split-large-json-file-into-multiple-json-files-in › m-p › 364136

Solved: How to split large json file into multiple json fi... - Cloudera Community - 364136

March 2, 2023 - That example generates a file with 102 records and on SlitRecord we use a JsontTreeReader that will split by 3 records and writes the flowfiles out, In this case per 3 per flowFile generating 34 FlowFiles. ... In your case and based on your screenshot I would change split count to be 1500000 ( or another number based on your needs ) ... Try to look into QueryRecord or PartitionRecord Processors.

Find elsewhere

Google Bing Mojeek

Stack Overflow

stackoverflow.com › questions › 43315779 › split-json-file-into-multiple-jsons-according-to-their-id › 43316220

python - Split JSON File into multiple JSONs according to their ID? - Stack Overflow

Top answer

1 of 3

I won't teach you how to do file I/O and assume you can do that yourself.

Once you have loaded the original file as a dict with the json module, do

>>> org = {"one":"Some data", "two":"Some data"}
>>> dicts = [{k:v} for k,v in org.items()]
>>> dicts
[{'two': 'Some data'}, {'one': 'Some data'}]

which will give you a list of dictionaries that you can dump to a file (or separate files named after the keys), if you wish.

2 of 3

After loading the JSON file you can treat it as a dictionary in python and then save the contents in file by looping through as you would in normal python dictionary. Here is an example related to what you want to achieve

Data = {"one":"Some data", "two":"Some data"}
for item in Data:
    name = item + '.json'
    file = open(name, 'w')
    file.write('{"%s":"%s"}' % (item, Data[item]))
    file.close()

Stack Overflow

stackoverflow.com › questions › 48790861 › split-json-array-into-separate-files-objects

posix - Split JSON array into separate files/objects - Stack Overflow

Top answer

1 of 6

To split a json with many records into chunks of a desired size I simply use:

jq -c '.[0:1000]' mybig.json

which works like python slicing.

See the docs here: https://stedolan.github.io/jq/manual/

Array/String Slice: .[10:15]

The .[10:15] syntax can be used to return a subarray of an array or substring of a string. The array returned by .[10:15] will be of length 5, containing the elements from index 10 (inclusive) to index 15 (exclusive). Either index may be negative (in which case it counts backwards from the end of the array), or omitted (in which case it refers to the start or end of the array).

2 of 6

Using jq, one can split an array into its components using the filter:

.[]

The question then becomes what is to be done with each component. If you want to direct each component to a separate file, you could (for example) use jq with the -c option, and filter the result into awk, which can then allocate the components to different files. See e.g. Split JSON File Objects Into Multiple Files

Performance considerations

One might think that the overhead of calling jq+awk would be high compared to calling python, but both jq and awk are lightweight compared to python+json, as suggested by these timings (using Python 2.7.10):

time (jq -c  .[] input.json | awk '{print > "doc00" NR ".json";}')
user    0m0.005s
sys     0m0.008s

time python split.py
user    0m0.016s
sys     0m0.046s

reddit.com › r/learnpython › split one json file into multiple files of same size

r/learnpython on Reddit: Split one json file into multiple files of same size

December 25, 2019 -

I can't figure it out how I can do that. Already tried multiple solutions and googled. Does anyone have a script for doing that? thanks!

Top answer

1 of 3

with open ('some.json') as json_file: json_data = json_file.read() for i, char in enumerate(json_data): with open(str(i), 'w') as out_file: out_file.write(char) Seriously though, what are you trying to solve? ZIP can generate similar-sized archives, or you can use the split utility.

2 of 3

Not exactly sure what you’re trying to do but you could throw everything into a mongodb database , query and write the results to files incrementally?

OpenAI Developer Community

community.openai.com › api

Is there any sample code to split a json file into smaller chunks? - API - OpenAI Developer Community

October 17, 2023 - For example… I got a json file that looks kind of like this: [ { “id:”: 1, “question:”: "What are your hours of operations? ", “answer:”: "We open at 10am and close at 8pm.

Plain English

plainenglish.io › blog › split-big-json-file-into-small-splits

Split a Big JSON File into Smaller Files using Python

So, I created a simple JSON CSV converter that you can use to convert all the JSON files into one code without converting them one by one. This is the JSON CSV converter code. You need to provide the number of splits according to your requirement. In my work, I split the big JSON file into 8 splits.

GeeksforGeeks

geeksforgeeks.org › python › extract-multiple-json-objects-from-one-file-using-python

Extract Multiple JSON Objects from one File using Python - GeeksforGeeks

July 23, 2025 - Entire file content will be read into memory as a single string using file.read() function. The content string will split into substrings wherever the custom separator appears. This will divide the string into separate objects.

Stack Overflow

stackoverflow.com › questions › 18988406 › splitting-json-data-in-python

Splitting json data in python - Stack Overflow

Top answer

1 of 2

You already have a list; the commas are put there by Python to delimit the values only when printing the list.

Just access element 2 directly:

print ting[2]

This prints:

[1379962800000, 125.539504822835]

Each of the entries in item['values'] (so ting) is a list of two float values, so you can address each of those with index 0 and 1:

>>> print ting[2][0]
1379962800000
>>> print ting[2][1]
125.539504822835

To get a list of all the second values, you could use a list comprehension:

second_vals = [t[1] for t in ting]

2 of 2

When you load the data with json.loads, it is already parsed into a real list that you can slice and index as normal. If you want the data starting with the third element, just use ting[2:]. (If you just want the third element by itself, just use ting[2].)

PYnative

pynative.com › home › python › json › python parse multiple json objects from file

Python Parse multiple JSON objects from file | Solve ValueError: Extra data

May 14, 2021 - If your file contains a list of ... we need to follow below steps: ... Read the file line by line because each line contains valid JSON. i.e., read one JSON object at a time. Convert each JSON object into Python dict using a json.loads()...

Like Geeks

likegeeks.com › home › python › 5 methods to split json array in python

5 Methods To Split JSON array in Python

January 23, 2024 - In this output, the original array has been divided into two halves. The mid_index determines the splitting point, ensuring an even distribution of data between first_half and second_half. You can split JSON array based on conditions using list comprehensions.

reddit.com › r/bash › splitting a huge json file into several smaller files

r/bash on Reddit: Splitting a huge JSON file into several smaller files

May 27, 2022 -

Hi!

I have a huge JSON file containing company data that I want to split into several smaller files based on their companyId. The JSON file looks like this:

[
    {
        "companyId": "123456789",
        "name": "Foobar Ltd.",
        // more company data
    },

    // etc.
]

Ideally, I want to split this based on the X first characters of companyId, so that I end up with companies that share the first part of their companyId in separate smaller files;

companyId 123456789 => 1234.json
companyId 234567890 => 2345.json
// etc

I could write a Perl script to do this for me, but I was wondering if it's at all possible to do it ~~with a one-liner~~ without too much "outside of bash", if that makes sense, at least without having to rely on Perl, Python etc. The only progress I have made so far is this:

cat huge.json | jq '.[]' | jq '.companyId'

...which outputs the companyId, and I could probably get the X first characters from that, but where is the rest of the JSON record?

Thanks in advance!

EDIT: Specified that I don't want to use Perl (or similar tools), because I want to do this as "minimal" as possible.

Top answer

1 of 4

You probably want to look at the --compact-output and --stream flags to jq Parse your big input file with jq, pipe it into a while loop, read the input one line at a time. You can then search each line for the relevant company id and dump the line into the right file as needed. You will probably need some 'glue' around the individual lines to make each individual split json file be valid json afterwards. It might be better to create one json file per companyID at first then merge them all together in a second pass.

2 of 4

$ cat company.json [ { "companyId": "123456789", "name": "Foobar1 Ltd." }, { "companyId": "765456788", "name": "BarFoo2 Ltd." }, { "companyId": "123456788", "name": "Barfoo1 Ltd." }, { "companyId": "765456789", "name": "Foobar2 Ltd." } ] You can use group_by() $ jq -c 'group_by(.companyId[:4])[]' company.json [{"companyId":"123456789","name":"Foobar1 Ltd."},{"companyId":"123456788","name":"Barfoo1 Ltd."}] [{"companyId":"765456788","name":"BarFoo2 Ltd."},{"companyId":"765456789","name":"Foobar2 Ltd."}] As for splitting it into separate files - you probably have to call jq multiple times: $ jq -c 'group_by(.companyId[:4])[]' company.json | while read -r line do filename=$(jq -r '.[0].companyId[:4]' <<< "$line").json declare -p filename jq . <<< "$line" done Output: declare -- filename="1234.json" [ { "companyId": "123456789", "name": "Foobar1 Ltd." }, { "companyId": "123456788", "name": "Barfoo1 Ltd." } ] declare -- filename="7654.json" [ { "companyId": "765456788", "name": "BarFoo2 Ltd." }, { "companyId": "765456789", "name": "Foobar2 Ltd." } ]

Langchain

docs.langchain.com › oss › python › integrations › splitters › recursive_json_splitter

Split JSON data - text splitter integration - Docs by LangChain

How the text is split: json value. How the chunk size is measured: by number of characters. ... import json import requests # This is a large nested json object and will be loaded as a python dict json_data = requests.get("https://api.smith.langchain.com/openapi.json").json()

Stack Overflow

stackoverflow.com › questions › 17640919 › split-json-file-in-equal-smaller-parts-with-python

Split JSON file in equal/smaller parts with Python - Stack Overflow

Top answer

1 of 2

Use an iteration grouper; the itertools module recipes list includes the following:

from itertools import izip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

This lets you iterate over your tweets in groups of 5000:

for i, group in enumerate(grouper(input_tweets, 5000)):
    with open('outputbatch_{}.json'.format(i), 'w') as outputfile:
        json.dump(list(group), outputfile)

2 of 2

I think your first thought is good. Just iterate over all tweets you got, save them in a temp array and track an index which you increment by one every tweet. Always when the current-index modulo 5000 is equals 0 call a method that converts the tweets in string-format and save this in a file with the index in the filename. If you reach the end of tweets, do the same on this last rest.