It would be the files listed here. https://open.fda.gov/apis/drug/event/download/
I have been using youtube videos on how to bulk download but I haven't figure it out.
Get data from the URL and then call json.loads e.g.
Python3 example:
import urllib.request, json
with urllib.request.urlopen("http://maps.googleapis.com/maps/api/geocode/json?address=google") as url:
data = json.loads(url.read().decode())
print(data)
Python2 example:
import urllib, json
url = "http://maps.googleapis.com/maps/api/geocode/json?address=google"
response = urllib.urlopen(url)
data = json.loads(response.read())
print data
The output would result in something like this:
{
"results" : [
{
"address_components" : [
{
"long_name" : "Charleston and Huff",
"short_name" : "Charleston and Huff",
"types" : [ "establishment", "point_of_interest" ]
},
{
"long_name" : "Mountain View",
"short_name" : "Mountain View",
"types" : [ "locality", "political" ]
},
{
...
I'll take a guess that you actually want to get data from the URL:
jsonurl = urlopen(url)
text = json.loads(jsonurl.read()) # <-- read from it
Or, check out JSON decoder in the requests library.
import requests
r = requests.get('someurl')
print r.json() # if response type was set to JSON, then you'll automatically have a JSON response here...
Videos
I'm trying to download JSON files from a URL. When I've attempted to open the saved JSON files in python, I get the error:
"raise JSONDecodeError("Expecting value", s, err.value) from None".
Whenever I try opening the JSON files in a URL, I see this:
"SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data"
Below is a simplified version of my code. Is there a way to download JSON files correctly?
def read_url(url):
urls = []
psfiles = []
url = url.replace(" ","%20")
req = Request(url)
a = urlopen(req).read()
soup = BeautifulSoup(a, 'html.parser')
x = (soup.find_all('a'))
for i in x:
file_name = i.extract().get_text()
url_new = url + file_name
url_new = url_new.replace(" ","%20")
if(file_name[-1]=='/' and file_name[0]!='.'):
read_url(url_new)
if url_new.endswith('json'):
urls.append(url_new)
for i in urls:
psfile = i.replace('url','')
psfiles.append(psfile)
for j in range(len(psfiles)):
urllib.request.urlretrieve("url", "path to directory"+psfiles[j])
if __name__ == '__main__':
while True:
read_url("url")
time.sleep(1800)Ok so I had a lot of trouble interacting with the site. I decided to just go with the webbrowser library.
import webbrowser
chrome_path="C:xxx\\Google\\Chrome\\Application\\chrome.exe"
webbrowser.register('chrome', None,webbrowser.BackgroundBrowser(chrome_path))
url = 'http://testsite/csv?date=2019-07-18'
Setting chrome to download files automatically populates my download folder from where i can automate everything else :)
You need to read the data out of the object that urlopen returns.
Try
import urllib
with urllib.request.urlopen("test.com/csv?date=2019-07-17") as f:
jsonl = f.read()
Don't use beautiful soup to process a json http response. Use something like requests:
url = "https://www.daraz.pk/womens-kurtas-shalwar-kameez/?pathInfo=womens-kurtas-shalwar-kameez&page=2&YII_CSRF_TOKEN=31eb0a5d28f4dde909d3233b5a0c23bd03348f69&more_products=true"
header = {'x-requested-with': 'XMLHttpRequest'}
t = requests.get(url, headers=True)
newDictionary=json.loads(t)
print (newDictionary)
The beautiful soup object can't be parsed with json.loads() that way.
If you have HTML data on some of those json keys then you can use beautiful soup to parse those string values individually. If you have a key called content on your json, containing html, you can parse it like so:
BeautifulSoup(newDictionary.content, "lxml")
You may need to experiment with different parsers, if you have fragmentary html.
The following is an example of how to use various JSON data that has been loaded as an object with json.loads().
Working Example โ Tested with Python 2.6.9 and 2.7.10 and 3.3.5 and 3.5.0
import json
json_data = '''
{
"array": [
1,
2,
3
],
"boolean": true,
"null": null,
"number": 123,
"object": {
"a": "b",
"c": "d",
"e": "f"
},
"string": "Hello World"
}
'''
data = json.loads(json_data)
list_0 = [
data['array'][0],
data['array'][1],
data['array'][2],
data['boolean'],
data['null'],
data['number'],
data['object']['a'],
data['object']['c'],
data['object']['e'],
data['string']
]
print('''
array value 0 {0}
array value 1 {1}
array value 2 {2}
boolean value {3}
null value {4}
number value {5}
object value a value {6}
object value c value {7}
object value e value {8}
string value {9}
'''.format(*list_0))
Output
array value 0 1
array value 1 2
array value 2 3
boolean value True
null value None
number value 123
object value a value b
object value c value d
object value e value f
string value Hello World
For example: https://www.reddit.com/r/learnpython/about.json
Trying to use urllib and requests always returns error 429.