split json file into multiple files python

How to split json into multiple files per document

stackoverflow.com › questions › 46534569 › how-to-split-json-into-multiple-files-per-document

Here is a Python solution to your problem.

Don't forget to change the in_file_path to the location of your big JSON file.

import json

in_file_path='path/to/file.json' # Change me!

with open(in_file_path,'r') as in_json_file:

    # Read the file and convert it to a dictionary
    json_obj_list = json.load(in_json_file)

    for json_obj in json_obj_list:
        filename=json_obj['_id']+'.json'

        with open(filename, 'w') as out_json_file:
            # Save each obj to their respective filepath
            # with pretty formatting thanks to `indent=4`
            json.dump(json_obj, out_json_file, indent=4)

Side Note: I ran this in Python3, it should work in Python2 as well

Answer from Stefan Collier on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 46534569 › how-to-split-json-into-multiple-files-per-document

How to split json into multiple files per document - Stack Overflow

Top answer

1 of 3

Here is a Python solution to your problem.

Don't forget to change the in_file_path to the location of your big JSON file.

import json

in_file_path='path/to/file.json' # Change me!

with open(in_file_path,'r') as in_json_file:

    # Read the file and convert it to a dictionary
    json_obj_list = json.load(in_json_file)

    for json_obj in json_obj_list:
        filename=json_obj['_id']+'.json'

        with open(filename, 'w') as out_json_file:
            # Save each obj to their respective filepath
            # with pretty formatting thanks to `indent=4`
            json.dump(json_obj, out_json_file, indent=4)

Side Note: I ran this in Python3, it should work in Python2 as well

2 of 3

I ran into this problem today as well, and did some research. Just want to share the resulting Python snippet that lets you also customise the length of split files (thanks to this slicing method).

import os
import json
from itertools import islice

def split_json(
    data_path,
    file_name,
    size_split=1000,
):
    """Split a big JSON file into chunks.
    data_path : str, "data_folder"
    file_name : str, "data_file" (exclude ".json")
    """
    with open(os.path.join(data_path, file_name + ".json"), "r") as f:
        whole_file = json.load(f)

    split = len(whole_file) # size_split

    for i in range(split + 1):
        with open(os.path.join(data_path, file_name + "_"+ str(split+1) + "_" + str(i+1) + ".json"), 'w') as f:
            json.dump(dict(islice(whole_file.items(), i*size_split, (i+1)*size_split)), f)
    return

Update: Then, when you need to combine them together again, use the following code:

json_all = dict()
split = 4         # this is the 1-based actual number of splits

for i in range(1, split+1):
    with open(os.path.join("data_folder", "data_file_" + str(split) + "_" + str(i) + ".json"), 'r') as f:
        json_i = json.load(f)
        json_all.update(json_i)

GitHub

github.com › jhsu98 › json-splitter

GitHub - jhsu98/json-splitter · GitHub

Navigate to the directory where the script exists and begin by typing python3 json-splitter.py · Enter the name of the JSON file (include the extension) when prompted, then enter the maximum number of MB for each file.

Starred by 54 users

Forked by 20 users

Languages Python

Discussions

python - Split a large json file into multiple smaller files - Stack Overflow

I have a large JSON file, about 5 million records and a file size of about 32GB, that I need to get loaded into our Snowflake Data Warehouse. I need to get this file broken up into chunks of about ... More on stackoverflow.com

stackoverflow.com

Split 150GB json file with Python?

I have questions... What do you want to achieve by "viewing" a 1GB part of a JSON file? what's the end-goal? How did you end up with 150GB file to begin with? Technically it's possible to split a file using python, the question remains what exactly do you hope to understand from these "splits"... More on reddit.com

r/learnpython

March 3, 2024

Split one json file into multiple files of same size

with open ('some.json') as json_file: json_data = json_file.read() for i, char in enumerate(json_data): with open(str(i), 'w') as out_file: out_file.write(char) Seriously though, what are you trying to solve? ZIP can generate similar-sized archives, or you can use the split utility. More on reddit.com

r/learnpython

December 25, 2019

How to split data from a json file into two json files with Python - Stack Overflow

I have a json file with two sets of data and I'd like to split this one json file into two separate json file so that each file has one set of data. For example, the existing json file looks like t... More on stackoverflow.com

stackoverflow.com

Videos

04:52

YouTube

JSON Splitter for Mac by SysTools | Mac JSON Splitter Software ...

January 6, 2026

06:09

YouTube

JSON Splitter Tool by SysTools | How to Split Large JSON File | ...

December 24, 2025

04:37

YouTube

Langchain Tutorial | Text Splitters | Part 4 | Json Splitter with ...

December 27, 2024

03:18

YouTube

python split json into multiple files - YouTube

February 23, 2024

youtube.com

Split large JSON file in windows

16:19

YouTube

Python Tutorial - Read data from text file and split into multiple ...

June 6, 2018

View all

Medium

jhsu98.medium.com › how-to-build-a-command-line-json-splitter-72061b0f20a6

How to Build a Command Line JSON Splitter | by Jonathan Hsu | Medium

March 10, 2019 - The project is available at ... installed Python 3.x and know that to run the script you’ll want to type python3 json-splitter.py...

Stack Overflow

stackoverflow.com › questions › 43074147 › split-a-large-json-file-into-multiple-smaller-files

python - Split a large json file into multiple smaller files - Stack Overflow

Top answer

1 of 7

Use this code in linux command prompt

split -b 53750k <your-file>
cat xa* > <your-file>

Refer to this link: https://askubuntu.com/questions/28847/text-editor-to-edit-large-4-3-gb-plain-text-file

2 of 7

Answering the question whether Python or Node will be better for the task would be an opinion and we are not allowed to voice our opinions on Stack Overflow. You have to decide yourself what you have more experience in and what you want to work with - Python or Node.

If you go with Node, there are some modules that can help you with that task, that do streaming JSON parsing. E.g. those modules:

https://www.npmjs.com/package/JSONStream
https://www.npmjs.com/package/stream-json
https://www.npmjs.com/package/json-stream

If you go with Python, there are streaming JSON parsers here as well:

https://github.com/kashifrazzaqui/json-streamer
https://github.com/danielyule/naya
http://www.enricozini.org/blog/2011/tips/python-stream-json/

reddit.com › r/learnpython › split 150gb json file with python?

r/learnpython on Reddit: Split 150GB json file with Python?

March 3, 2024 -

I haven't had much to do with json files so far, but now I need them for datahoarding. Now I have a 150GB json file here and can't open it because there's not enough free HDD & RAM on my laptop and computers.

So I have to split the file into several pieces (prefer 1 GB files) and open and view them one after the other. How can I do this on Windows?

Google spits out ancient results (and mostly for Linux) which, as usual, have contradictory information.

Maybe it is possible with a Python script. I don't care if the last and first line of the new file look slightly different than in the original json file

Top answer

1 of 29

2 of 29

The problem with JSON is that it can have an arbitrary nested structure, so you can't create "splits" without knowing the overall structure first, which again either requires first parsing the entire thing (Catch-22) or having a priori knowledge of the structure which can be used. I assume your data in the file is quite regular, so to your question if it's possible with a Python script: probably yes by writing something taking advantage of this known regular structure. But you didn't provide enough information to guide you further.

Plain English

plainenglish.io › blog › split-big-json-file-into-small-splits

Split a Big JSON File into Smaller Files using Python

So, I created a simple JSON CSV converter that you can use to convert all the JSON files into one code without converting them one by one. This is the JSON CSV converter code. You need to provide the number of splits according to your requirement. In my work, I split the big JSON file into 8 splits.

Python Forum

python-forum.io › thread-37678.html

Python Split json into separate json based on node value

Using python I want to split a json file into multiple files based on the 'transactionTypeName' within the transacations.details. In each file I want the rest of the details as well starting from careperson to username. Below is the json file. Had to...

Find elsewhere

Google Bing Mojeek

reddit.com › r/learnpython › split one json file into multiple files of same size

r/learnpython on Reddit: Split one json file into multiple files of same size

December 25, 2019 -

I can't figure it out how I can do that. Already tried multiple solutions and googled. Does anyone have a script for doing that? thanks!

Top answer

1 of 3

2 of 3

Not exactly sure what you’re trying to do but you could throw everything into a mongodb database , query and write the results to files incrementally?

Stack Overflow

stackoverflow.com › questions › 72359882 › how-to-split-data-from-a-json-file-into-two-json-files-with-python

How to split data from a json file into two json files with Python - Stack Overflow

Top answer

1 of 2

You can try:

import json

dct = {
    "client_id": {"0": "abc123", "1": "def456"},
    "client_name": {"0": "companyA", "1": "companyB"},
    "revenue": {"0": "54,786", "1": "62,754"},
    "rate": {"0": "4", "1": "5"},
}

tmp = {}
for k, v in dct.items():
    for kk, vv in v.items():
        tmp.setdefault(kk, {}).update({k: vv})

for i, v in enumerate(tmp.values(), 1):
    with open(f"File{i}.json", "w") as f_out:
        json.dump(v, f_out, indent=4)

This creates two files File1.json, File2.json:

{
    "client_id": "abc123",
    "client_name": "companyA",
    "revenue": "54,786",
    "rate": "4"
}

and

{
    "client_id": "abc123",
    "client_name": "companyA",
    "revenue": "54,786",
    "rate": "4"
}

EDIT: To create output dictionary:

dct = {
    "client_id": {"0": "abc123", "1": "def456"},
    "client_name": {"0": "companyA", "1": "companyB"},
    "revenue": {"0": "54,786", "1": "62,754"},
    "rate": {"0": "4", "1": "5"},
}

tmp = {}
for k, v in dct.items():
    for kk, vv in v.items():
        tmp.setdefault(kk, {}).update({k: vv})

out = {}
for i, v in enumerate(tmp.values(), 1):
    out[f"File{i}"] = v

print(out)

Prints:

{
    "File1": {
        "client_id": "abc123",
        "client_name": "companyA",
        "revenue": "54,786",
        "rate": "4",
    },
    "File2": {
        "client_id": "def456",
        "client_name": "companyB",
        "revenue": "62,754",
        "rate": "5",
    },
}

2 of 2

You can use the json package to read your json file and process it in a for loop

import json

with open('json_data.json') as json_file:
    data = json.load(json_file)

# Contains the "0", "1", ...
list_of_new_dicts = data["client_id"].keys()
new_data = {}

for key, dico in data.items():
    for num, value in dico.items():
        new_data[num][key] = value

Your new data dictionnary should look like the following:

{
  "0":{
    "client_id" : "abc123",
    "client_name": "companyA",
    "revenue": "54,786", 
    "rate" : "4"
  },
  "1":{ 
    "client_id" :  "def456",
    "client_name": "companyB",
    "revenue":  "62,754",
    "rate" :  "5"
  }
}

Then to save the file you can do something like:

with open('json_data_0.json', 'w') as outfile:
    json.dump(new_data["0"], outfile)

Cloudera Community

community.cloudera.com › t5 › Support-Questions › How-to-split-large-json-file-into-multiple-json-files-in › m-p › 364136

Solved: How to split large json file into multiple json fi... - Cloudera Community - 364136

March 2, 2023 - That example generates a file with 102 records and on SlitRecord we use a JsontTreeReader that will split by 3 records and writes the flowfiles out, In this case per 3 per flowFile generating 34 FlowFiles. ... In your case and based on your screenshot I would change split count to be 1500000 ( or another number based on your needs ) ... Try to look into QueryRecord or PartitionRecord Processors.

Stack Overflow

stackoverflow.com › questions › 17640919 › split-json-file-in-equal-smaller-parts-with-python

Split JSON file in equal/smaller parts with Python - Stack Overflow

Top answer

1 of 2

Use an iteration grouper; the itertools module recipes list includes the following:

from itertools import izip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

This lets you iterate over your tweets in groups of 5000:

for i, group in enumerate(grouper(input_tweets, 5000)):
    with open('outputbatch_{}.json'.format(i), 'w') as outputfile:
        json.dump(list(group), outputfile)

2 of 2

I think your first thought is good. Just iterate over all tweets you got, save them in a temp array and track an index which you increment by one every tweet. Always when the current-index modulo 5000 is equals 0 call a method that converts the tweets in string-format and save this in a file with the index in the filename. If you reach the end of tweets, do the same on this last rest.

GitHub

gist.github.com › 97-109-107 › bf9211c4a160deb4ee15

A tiny python thing to split big json files into smaller junks. · GitHub

November 21, 2014 - A tiny python thing to split big json files into smaller junks. - json-split.py

Merge-json-files

merge-json-files.com › blog › how-to-split-json-files

How to Split JSON File into Multiple Files: Step-by-Step Guide | Merge JSON Files

April 3, 2025 - Learn how to split large JSON files into smaller parts using Python, jq command-line, and online tools. A complete guide for developers handling big JSON datasets and nested structures.

Stack Overflow

stackoverflow.com › questions › 52656205 › split-json-file

python - Split JSON file - Stack Overflow

Top answer

1 of 1

Split by keeping the structure.

You have an Array, denoted by the outer square brackets: []

Then, you have objects, denoted but the curly brackets: {}

Split into different files, by creating different arrays:

File A:

[
  {
    "column1": "value1",
    "column2": "value2",
    "column3": "value3"
  },
  {
    "column1": "value4",
    "column2": "value5",
    "column3": "{'something':'something'}"
  }
]

File B:

[
  {
    "column1": "value7",
    "column2": "value8",
    "column3": "value9"
  }
]

Then, you can read each file, and they will be correct.

Merge-json-files

merge-json-files.com › json-file-splitter

Split JSON Files Instantly | Free Online JSON Splitter Tool

Use our online tool to split large JSON files into parts easily. Free, safe, and fast — no login or install needed.

PyPI

pypi.org › project › json-file-split

json-file-split

JavaScript is disabled in your browser · Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser

Stack Overflow

stackoverflow.com › questions › 48790861 › split-json-array-into-separate-files-objects

posix - Split JSON array into separate files/objects - Stack Overflow

Top answer

1 of 6

To split a json with many records into chunks of a desired size I simply use:

jq -c '.[0:1000]' mybig.json

which works like python slicing.

See the docs here: https://stedolan.github.io/jq/manual/

Array/String Slice: .[10:15]

The .[10:15] syntax can be used to return a subarray of an array or substring of a string. The array returned by .[10:15] will be of length 5, containing the elements from index 10 (inclusive) to index 15 (exclusive). Either index may be negative (in which case it counts backwards from the end of the array), or omitted (in which case it refers to the start or end of the array).

2 of 6

Using jq, one can split an array into its components using the filter:

.[]

The question then becomes what is to be done with each component. If you want to direct each component to a separate file, you could (for example) use jq with the -c option, and filter the result into awk, which can then allocate the components to different files. See e.g. Split JSON File Objects Into Multiple Files

Performance considerations

One might think that the overhead of calling jq+awk would be high compared to calling python, but both jq and awk are lightweight compared to python+json, as suggested by these timings (using Python 2.7.10):

time (jq -c  .[] input.json | awk '{print > "doc00" NR ".json";}')
user    0m0.005s
sys     0m0.008s

time python split.py
user    0m0.016s
sys     0m0.046s

GitHub

gist.github.com › stp-ip › 5759205

Python script to split starred.json into smaller junks to make them usable for alternatives. Sometimes trying to import starred.json files with more than 50mb or even less fail. -> split it and it should work. · GitHub

June 1, 2016 - Python script to split starred.json into smaller junks to make them usable for alternatives. Sometimes trying to import starred.json files with more than 50mb or even less fail. -> split it and it should work. - split_starred_json.py

Stack Overflow

stackoverflow.com › questions › 72913506 › python-split-json-into-separate-json-based-on-node-value

Python Split json into separate json based on node value - Stack Overflow

Then loop over until all transactions have been split. This could be simplified by copying the JSON read from the file to a temporary object. In Python 3, it would look like this (Added comments to explain each step): import json # Load JSON data from a file into object fileRead = open('test.json', 'r') jsonContent = json.load(fileRead) fileRead.close() # Loop through each transaction read from the file (3 in the example) for transaction in jsonContent['transactions']: jsonFileContent = {} # Loop through each json key value pair read from the object.

Stack Overflow

stackoverflow.com › questions › 70075821 › splitting-json-file-into-smaller-parts

python - Splitting JSON file into smaller parts - Stack Overflow

Top answer

1 of 3

The original file is not valid JSON while the json.dump creates a file with valid JSON. My suggestion would be to convert the line items to JSON one at a time when writing to file.

Replace this:

for i in range(total+1):
    json.dump(ll[i * size_of_the_split:(i + 1) * size_of_the_split], open(
        json_file+"\\split50k" + str(i+1) + ".json", 'w',
        encoding='utf8'), ensure_ascii=False, indent=True)

with this:

for i in range(len(ll)):
    if i % size_of_the_split ==0:
        if i != 0:
            file.close()
        file = open(json_file+"\\split50k"+str(i+1)+".json",'w')
    file.write(str(ll[i]))
file.close()

2 of 3

Try using json.loads(line) when reading the file:

with open(os.path.join(json_file, 'test.json'), 'r',
          encoding='utf-8') as f1:
    ll = [json.loads(line) for line in f1.readlines()]
    # The rest