Use urllib library
import urllib
finalurl = baseUrl + urllib.parse.quote(searchterm)
you can use quote_plus() to add + insted of %20 to undo this use
urllib.parse.unquote(str)
In Python 2, use urllib.quote() and urllib.unquote() respectively
python - Convert spaces to in list - Stack Overflow
My URL has spaces in it and Python won't let me make a request
Encode spaces in query parameter with %20 instead of +
qgis - Inserting instead of a space as a string - Geographic Information Systems Stack Exchange
I would recommend using urllib.parse module and its quote() function.
https://docs.python.org/3.6/library/urllib.parse.html#urllib.parse.quote
Example for Python3:
from urllib.parse import quote
text_encoded = quote(t.text)
Note: using quote_plus() won't work in your case as this function replaces spaces by plus char.
Use the String.replace() method as described here: http://www.tutorialspoint.com/python/string_replace.htm
So for t.text, it would be t.text.replace(" ", "%20")
I am scraping Wikipedia for information about retro video games. One of the things I want to grab from a page is the main image in the infobox. Here's the code for it. I am using
from urllib.request import urlopen
import json
import re
def getImage(href, name):
default = 'n/a'
query = 'https://en.wikipedia.org/w/api.php?action=query&prop=images&format=json&titles=' + href
client = urlopen(query)
data = json.loads(client.read())
pids = data['query']['pages']
pid = 0
for page in pids:
pid = page
images = pids[pid]['images']
# Search for the first image that matches the title or 'box art'
# Use regular expressions for this
for image in images:
title = image['title'] #My Title has spaces and belongs in a URL!!!
if (re.search('box art', title) != None) or (re.search(name, title) != None):
newQ = 'https://en.wikipedia.org/w/api.php?action=query&titles=' + title + '&prop=imageinfo&iiprop=url'
client = urlopen(newQ) #This is where I'm getting the error
data = json.loads(client.read())['query']['pages']
pid = 0
for page in data:
pid = page
img_url = data[pid]['imageinfo']['url']
return img_url
return default
#Lets search for Super Smash Bros
href = 'Super_Smash_Bros._Melee'
name = 'Super Smash Bros. Melee'
#I expect this function to print the URL for the box art of SSBM
print(getImage(href, name))The file names of the images located through the Wikimedia API have spaces in them, and in order to find the URL of the box art, I have to make another query with the File name, but these files have spaces in them.
Go ahead and try the query yourself: https://en.wikipedia.org/w/api.php?action=query&titles=%27File:Super%20Smash%20Bros%20Melee%20box%20art.png%27&prop=imageinfo&iiprop=url
Since the query works in my browser, I do not believe my query is wrong. Here's the error python is giving me
PS C:\Users\Happy Customer\documents> python3 test.py
Traceback (most recent call last):
File "test.py", line 70, in <module>
print(getimg(link, name))
File "test.py", line 57, in getimg
client = urlopen(newQ)
##... unimportant nonsense....##
raise InvalidURL(f"URL can't contain control characters. {url!r} "
http.client.InvalidURL: URL can't contain control characters. '/w/api.php?action=query&titles=File:Super Smash Bros Melee box art.png&prop=imageinfo&iiprop=url' (found at least ' ')What am I to do if the file name has spaces in it and I need to make a Rest API call?!
You're close, you just need to put %20 inside the expression surrounded by single quotes (used to denote string) instead of double quotes (used to denote field names).
So you could use:
replace("url", ' ', '%20')
Note that this replaces all spaces in each string so you may need to be mindful, especially in cases where you might have a space at the end of a string.
There's more than just space that is not "URL safe".
If you are looking to URL encode your strings, try (for Python 2):
import urllib
url = urllib.urlencode(url)
This will handle spaces and a whole lot more.
This post has a quick table of the different characters you need to encode.
To follow up on @WeaselFox's answer, they introduced a patch that accepts a quote_via keyword argument to urllib.parse.urlencode. Now you could do this:
import requests
import urllib
payload = {'key1': 'value 1', 'key2': 'value 2'}
headers = {'Content-Type': 'application/json;charset=UTF-8'}
params = urllib.parse.urlencode(payload, quote_via=urllib.parse.quote)
r = requests.get("http://example.com/service", params=params, headers=headers,
auth=("admin", "password"))
PYTHON 2.7
Override the urllib.quote_pluse with urllib.quote
The urlencoder uses urllib.quote_pluse to encode the data.
code
import requests
import urllib
urllib.quote_plus=urllib.quote # A fix for urlencoder to give %20
payload = {'key1': 'value 1', 'key2': 'value 2'}
headers = {'Content-Type': 'application/json;charset=UTF-8'}
param = urllib.urlencode(payload) #encodes the data
r = requests.get("http://example.com/service", params=param, headers=headers,
auth=("admin", "password"))
output
the output for param = urllib.urlencode(payload)
'key2=value%202&key1=value%20%201'