The json.load() method (without "s" in "load") can read a file directly:
import json
with open('strings.json') as f:
d = json.load(f)
print(d)
You were using the json.loads() method, which is used for string arguments only.
The error you get with json.loads is a totally different problem. In that case, there is some invalid JSON content in that file. For that, I would recommend running the file through a JSON validator.
There are also solutions for fixing JSON like for example How do I automatically fix an invalid JSON string?.
Answer from ubomb on Stack Overflowpython - Reading JSON from a file - Stack Overflow
What is the difference between json.load() and ...
JSON load() vs loads()
load() loads JSON from a file or file-like object
loads() loads JSON from a given string or unicode object
It's in the documentation
More on reddit.comJson.loads only returns names but not the values
Videos
The json.load() method (without "s" in "load") can read a file directly:
import json
with open('strings.json') as f:
d = json.load(f)
print(d)
You were using the json.loads() method, which is used for string arguments only.
The error you get with json.loads is a totally different problem. In that case, there is some invalid JSON content in that file. For that, I would recommend running the file through a JSON validator.
There are also solutions for fixing JSON like for example How do I automatically fix an invalid JSON string?.
Here is a copy of code which works fine for me,
import json
with open("test.json") as json_file:
json_data = json.load(json_file)
print(json_data)
with the data
{
"a": [1,3,"asdf",true],
"b": {
"Hello": "world"
}
}
You may want to wrap your json.load line with a try catch, because invalid JSON will cause a stacktrace error message.
Yes, s stands for string. The json.loads function does not take the file path, but the file contents as a string. Look at the documentation.
Simple example:
with open("file.json") as f:
data = json.load(f) # ok
data = json.loads(f) # not ok, f is not a string but a file
text = '{"a": 1, "b": 2}' # a string with json encoded data
data = json.loads(text)
Just going to add a simple example to what everyone has explained,
json.load()
json.load can deserialize a file itself i.e. it accepts a file object, for example,
# open a json file for reading and print content using json.load
with open("/xyz/json_data.json", "r") as content:
print(json.load(content))
will output,
{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}
If I use json.loads to open a file instead,
# you cannot use json.loads on file object
with open("json_data.json", "r") as content:
print(json.loads(content))
I would get this error:
TypeError: expected string or buffer
json.loads()
json.loads() deserialize string.
So in order to use json.loads I will have to pass the content of the file using read() function, for example,
using content.read() with json.loads() return content of the file,
with open("json_data.json", "r") as content:
print(json.loads(content.read()))
Output,
{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}
That's because type of content.read() is string, i.e. <type 'str'>
If I use json.load() with content.read(), I will get error,
with open("json_data.json", "r") as content:
print(json.load(content.read()))
Gives,
AttributeError: 'str' object has no attribute 'read'
So, now you know json.load deserialze file and json.loads deserialize a string.
Another example,
sys.stdin return file object, so if i do print(json.load(sys.stdin)), I will get actual json data,
cat json_data.json | ./test.py
{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}
If I want to use json.loads(), I would do print(json.loads(sys.stdin.read())) instead.
Can someone explain what the difference is between using either load() or loads() is with the JSON library? And which, if either, is the preferred method.
I'm writing a simple script where I want the JSON data from a URL parsed out into a list. Both of these options seem to work:
import json import urllib2 url = "string to url" response = urllib2.urlopen(url) data = json.load(response)
or
import json import urllib2 url = "string to url" response = urllib2.urlopen(url) data = json.loads(response.read())
I know that there are other libraries available for parsing out JSON data, but for the time being I'm working only with the json and urllib2 libraries.
Any insight into which one should be used?
Thanks
load() loads JSON from a file or file-like object
loads() loads JSON from a given string or unicode object
It's in the documentation
The "s" is an abbreviation for "string". "dump__s__" is read as "dump string". "load__s__" = "load string". Otherwise these methods want a file-like object. This convention is scattered throughout python and even 3rd-party packages.
I did json.loads(json.loads(string)) and was able to get the dictionary. You can check it out. The first time it doesn't just return the same string, but processes it (e.g. removes \\ characters).
Ok first you should print your object so that you can read it:
>>> from pprint import pprint
>>> output = [{'in_reply_to_status_id_str': None, 'in_reply_to_screen_name': None, 'retweeted': False, 'in_reply_to_status_id': None, 'contributors': None, 'favorite_count': 0, 'in_reply_to_user_id': None, 'coordinates': None, 'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', 'geo': None, 'retweet_count': 0, 'text': 'Tweeting a url \nhttp://t.co/QDVYv6bV90', 'created_at': 'Mon Sep 01 19:36:25 +0000 2014', 'entities': {'symbols': [], 'user_mentions': [], 'urls': [{'expanded_url': 'http://www.isthereanappthat.com', 'display_url': 'isthereanappthat.com', 'url': 'http://t.co/QDVYv6bV90', 'indices': [16, 38]}], 'hashtags': []}, 'id_str': '506526005943865344', 'in_reply_to_user_id_str': None, 'truncated': False, 'favorited': False, 'lang': 'en', 'possibly_sensitive': False, 'id': 506526005943865344, 'user': {'profile_text_color': '333333', 'time_zone': None, 'entities': {'description': {'urls': []}}, 'url': None, 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'protected': False, 'default_profile_image': True, 'utc_offset': None, 'default_profile': True, 'screen_name': 'KickzWatch', 'follow_request_sent': False, 'following': False, 'profile_background_color': 'C0DEED', 'notifications': False, 'description': '', 'profile_sidebar_border_color': 'C0DEED', 'geo_enabled': False, 'verified': False, 'friends_count': 40, 'created_at': 'Mon Sep 01 16:29:18 +0000 2014', 'is_translator': False, 'profile_sidebar_fill_color': 'DDEEF6', 'statuses_count': 4, 'location': '', 'id_str': '2784389341', 'followers_count': 4, 'favourites_count': 0, 'contributors_enabled': False, 'is_translation_enabled': False, 'lang': 'en', 'profile_image_url': 'http://abs.twimg.com/sticky/default_profile_images/default_profile_6_normal.png', 'profile_image_url_https': 'https://abs.twimg.com/sticky/default_profile_images/default_profile_6_normal.png', 'id': 2784389341, 'profile_use_background_image': True, 'listed_count': 0, 'profile_background_tile': False, 'name': 'Maktub Destiny', 'profile_link_color': '0084B4'}, 'place': None}]
>>> pprint(output)
[{'contributors': None,
'coordinates': None,
'created_at': 'Mon Sep 01 19:36:25 +0000 2014',
'entities': {'hashtags': [],
'symbols': [],
'urls': [{'display_url': 'isthereanappthat.com',
'expanded_url': 'http://www.isthereanappthat.com',
'indices': [16, 38],
'url': 'http://t.co/QDVYv6bV90'}],
'user_mentions': []},
'favorite_count': 0,
'favorited': False,
'geo': None,
'id': 506526005943865344,
'id_str': '506526005943865344',
'in_reply_to_screen_name': None,
'in_reply_to_status_id': None,
'in_reply_to_status_id_str': None,
'in_reply_to_user_id': None,
'in_reply_to_user_id_str': None,
'lang': 'en',
'place': None,
'possibly_sensitive': False,
'retweet_count': 0,
'retweeted': False,
'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>',
'text': 'Tweeting a url \nhttp://t.co/QDVYv6bV90',
'truncated': False,
'user': {'contributors_enabled': False,
'created_at': 'Mon Sep 01 16:29:18 +0000 2014',
'default_profile': True,
'default_profile_image': True,
'description': '',
'entities': {'description': {'urls': []}},
'favourites_count': 0,
'follow_request_sent': False,
'followers_count': 4,
'following': False,
'friends_count': 40,
'geo_enabled': False,
'id': 2784389341,
'id_str': '2784389341',
'is_translation_enabled': False,
'is_translator': False,
'lang': 'en',
'listed_count': 0,
'location': '',
'name': 'Maktub Destiny',
'notifications': False,
'profile_background_color': 'C0DEED',
'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png',
'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png',
'profile_background_tile': False,
'profile_image_url': 'http://abs.twimg.com/sticky/default_profile_images/default_profile_6_normal.png',
'profile_image_url_https': 'https://abs.twimg.com/sticky/default_profile_images/default_profile_6_normal.png',
'profile_link_color': '0084B4',
'profile_sidebar_border_color': 'C0DEED',
'profile_sidebar_fill_color': 'DDEEF6',
'profile_text_color': '333333',
'profile_use_background_image': True,
'protected': False,
'screen_name': 'KickzWatch',
'statuses_count': 4,
'time_zone': None,
'url': None,
'utc_offset': None,
'verified': False}}]
From looking at this you can see that output is a list which contains a single dict. To access this you need:
>>> first_elem = output[0]
You will also see that the hashtags key in the first_elem is contained in a second level dict under the key entities:
>>> entities = first_elem['entities']
>>> pprint(entities)
{'hashtags': [],
'symbols': [],
'urls': [{'display_url': 'isthereanappthat.com',
'expanded_url': 'http://www.isthereanappthat.com',
'indices': [16, 38],
'url': 'http://t.co/QDVYv6bV90'}],
'user_mentions': []}
Now you are able to access hashtags:
>>> entities['hashtags']
[]
Which just happens to be the empty list.
To convert to JSON, note the comment:
>>> import json
>>> # Make sure output is the list object not a string representing the object
>>> json_string = json.dumps(output)
>>> jason = json.loads(output)
>>> jason[0]['entities']['hashtags']
[]
I think your problem is that you made output a string before you json.dumps it, meaning that json.loads will return a string, not a json object.
And @Dan's answer is correct, this is not valid JSON. It is however a valid python dict, and I'm assuming that you got it from Twitter using python then printed it.