first check if it's a valid json file or not using JSON validator site
once the file is in valid json format you can use the below code to read it as dataframe
with open("training.json") as datafile:
data = json.load(datafile)
dataframe = pd.DataFrame(data)
hope this helps.
Answer from Ravi.Dudi on Stack OverflowHow to read json file using python pandas? - Stack Overflow
python - How to read a json data into a dataframe using pandas - Stack Overflow
Why Parse JSON With Python When Pandas Exists?
python - How to read a large json in pandas? - Stack Overflow
Videos
first check if it's a valid json file or not using JSON validator site
once the file is in valid json format you can use the below code to read it as dataframe
with open("training.json") as datafile:
data = json.load(datafile)
dataframe = pd.DataFrame(data)
hope this helps.
read_json() can't work because of the new line after "pqr". You can either try and fix that line or try and format the whole thing into valid JSON. I'm doing the latter here by adding commas after new lines and surrounding the whole thing with brackets to form a proper JSON array:
with open('temp.txt') as f:
content = f.read()
pd.read_json('[' + content.replace('}\n', '},') + ']')
you can use:
with open('json_example.json') as json_data:
data = json.load(json_data)
df=pd.DataFrame.from_dict(data,orient='index').reset_index().rename(columns={'index':'Sentence',0:'Label'})
Easy way that I remember
import pandas as pd
import json
with open("./data.json", "r") as f:
data = json.load(f)
df = pd.DataFrame({"Sentence": data.keys(), "Label": data.values()})
With read_json
To read straight from the file using read_json, you can use something like:
pd.read_json("./data.json", lines=True)\
.T\
.reset_index()\
.rename(columns={"index": "Sentence", 0: "Labels"})
Explanation
A little dirty but as you probably noticed, lines=True isn't completely sufficient so the above transposes the result so that you have
| (index) | 0 |
|---|---|
| Text1 | 4 |
| Text2 | 1 |
| TextN | 123 |
So then resetting the index moves the index over to be a column named "index" and then renaming the columns.
Doing it with pure Python is interesting. It's incredibly flexible. It's time consuming. Is it silly?
I'm generally up for doing things the native way just because it's clean. But am I being silly not abstracting it away with some package? I was using a flavor of SQL I rarely touch the other day and was told "now with JSON support" and it actually wasn't terrible. SQL isn't exactly a bastion of exclusively new thinking. If we've already eliminated actual javascript for dealing with its JSON, why stop there? I am becoming a back in the good ole days when we used horses type of ass?
Perhaps, the file you are reading contains multiple json objects rather and than a single json or array object which the methods json.load(json_file) and pd.read_json('review.json') are expecting. These methods are supposed to read files with single json object.
From the yelp dataset I have seen, your file must be containing something like:
{"review_id":"xxxxx","user_id":"xxxxx","business_id":"xxxx","stars":5,"date":"xxx-xx-xx","text":"xyxyxyxyxx","useful":0,"funny":0,"cool":0}
{"review_id":"yyyy","user_id":"yyyyy","business_id":"yyyyy","stars":3,"date":"yyyy-yy-yy","text":"ababababab","useful":0,"funny":0,"cool":0}
....
....
and so on.
Hence, it is important to realize that this is not single json data rather it is multiple json objects in one file.
To read this data into pandas data frame the following solution should work:
import pandas as pd
with open('review.json') as json_file:
data = json_file.readlines()
# this line below may take at least 8-10 minutes of processing for 4-5 million rows. It converts all strings in list to actual json objects.
data = list(map(json.loads, data))
pd.DataFrame(data)
Assuming the size of data to be pretty large, I think your machine will take considerable amount of time to load the data into data frame.
If you don't want to use a for-loop, the following should do the trick:
import pandas as pd
df = pd.read_json("foo.json", lines=True)
This will handle the case where your json file looks similar to this:
{"foo": "bar"}
{"foo": "baz"}
{"foo": "qux"}
And will turn it into a DataFrame consisting of a single column, foo, with three rows.
You can read more at Panda's docs
The problem is because of the {} that are around your file, pandas thinks that the first level of the JSON are the columns and thus it uses just Browser History as a column. You can use this code to solve your problem:
import pandas as pd
df = pd.DataFrame(json.load(open('BrowserHistory.json', encoding='cp850'))['Browser History'])
print(df)
Because your objects are in a list at the second level down of your JSON, you can't read it directly into a dataframe using read_json. Instead, you could read the json into a variable, and then create the dataframe from that:
import pandas as pd
import json
f = open("BrowserHistory.json")
js = json.load(f)
df = pd.DataFrame(js['Browser History'])
df
# favicon_url page_transition ... client_id time_usec
# 0 https://www.google.com/favicon.ico LINK ... cliendid 1620386529857946
# 1 https://www.google.com/favicon.ico LINK ... cliendid 1620386514845201
# 2 https://www.google.com/favicon.ico LINK ... cliendid 1620386499014063
# 3 https://ssl.gstatic.com/ui/v1/icons/mail/rfr/g... LINK ... cliendid 1620386492788783
Note you may need to specify the file encoding on the open call e.g.
f = open("BrowserHistory.json", encoding="utf8")
You can use eval to evaluate the string to dict and then pass directly to DataFrame constructor.
>>> import pandas as pd
>>> pd.DataFrame(eval("[{'code': '8', 'name': 'Human'}, {'code': '11', 'name': 'Orc'}]"))
code name
0 8 Human
1 11 Orc
Similarly you can also use ast.literal_eval in place of eval
>>> import ast
>>> pd.DataFrame(ast.literal_eval("[{'code': '8', 'name': 'Human'}, {'code': '11', 'name': 'Orc'}]"))
code name
0 8 Human
1 11 Orc
It appears that OP is parsing a Python object and not a JSON object, so read_json is not appropriate.
If one is actually ingesting Python structs, then eval is appropriate.
The eval method will not work if you have JSON booleans, because Python does not parse true or false as True/False.
Also, your json_string is not valid JSON. Single quotes are not delimiters in JSON. So you must also change that (the comments indicate this).
If one did need to have pandas.read_json process a valid JSON string, then per the Pandas GitHub issue that caused this to break, you should wrap json_string in a StringIO so that it may be read akin to a file.
import pandas as pd
from io import StringIO
json_string = '[{"code": "8", "name": "Human"}, {"code": "11", "name": "Orc"}]'
df = pd.read_json(StringIO(json_string))