The data is UTF-8 encoded bytes escaped with URL quoting, so you want to decode, with urllib.parse.unquote(), which handles decoding from percent-encoded data to UTF-8 bytes and then to text, transparently:
from urllib.parse import unquote
url = unquote(url)
Demo:
>>> from urllib.parse import unquote
>>> url = 'example.com?title=%D0%BF%D1%80%D0%B0%D0%B2%D0%BE%D0%B2%D0%B0%D1%8F+%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%B0'
>>> unquote(url)
'example.com?title=правовая+защита'
The Python 2 equivalent is urllib.unquote(), but this returns a bytestring, so you'd have to decode manually:
from urllib import unquote
url = unquote(url).decode('utf8')
Answer from Martijn Pieters on Stack OverflowThe data is UTF-8 encoded bytes escaped with URL quoting, so you want to decode, with urllib.parse.unquote(), which handles decoding from percent-encoded data to UTF-8 bytes and then to text, transparently:
from urllib.parse import unquote
url = unquote(url)
Demo:
>>> from urllib.parse import unquote
>>> url = 'example.com?title=%D0%BF%D1%80%D0%B0%D0%B2%D0%BE%D0%B2%D0%B0%D1%8F+%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%B0'
>>> unquote(url)
'example.com?title=правовая+защита'
The Python 2 equivalent is urllib.unquote(), but this returns a bytestring, so you'd have to decode manually:
from urllib import unquote
url = unquote(url).decode('utf8')
If you are using Python 3, you can use urllib.parse.unquote:
url = """example.com?title=%D0%BF%D1%80%D0%B0%D0%B2%D0%BE%D0%B2%D0%B0%D1%8F+%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%B0"""
import urllib.parse
urllib.parse.unquote(url)
gives:
'example.com?title=правовая+защита'
Videos
» pip install urldecode
Found these Python one liners that do what you want:
Python2
$ alias urldecode='python -c "import sys, urllib as ul; \
print ul.unquote_plus(sys.argv[1])"'
$ alias urlencode='python -c "import sys, urllib as ul; \
print ul.quote_plus(sys.argv[1])"'
Python3
$ alias urldecode='python3 -c "import sys, urllib.parse as ul; \
print(ul.unquote_plus(sys.argv[1]))"'
$ alias urlencode='python3 -c "import sys, urllib.parse as ul; \
print (ul.quote_plus(sys.argv[1]))"'
Example
$ urldecode 'q+werty%3D%2F%3B'
q werty=/;
$ urlencode 'q werty=/;'
q+werty%3D%2F%3B
References
- Urlencode and urldecode from a command line
sed
Try the following command line:
$ sed 's@+@ @g;s@%@\\x@g' file | xargs -0 printf "%b"
or the following alternative using echo -e:
$ sed -e's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' file | xargs echo -e
Note: The above syntax may not convert + to spaces, and can eat all the newlines.
You may define it as alias and add it to your shell rc files:
$ alias urldecode='sed "s@+@ @g;s@%@\\\\x@g" | xargs -0 printf "%b"'
Then every time when you need it, simply go with:
$ echo "http%3A%2F%2Fwww" | urldecode
http://www
Bash
When scripting, you can use the following syntax:
input="http%3A%2F%2Fwww"
decoded=$(printf '%b' "${input//%/\\x}")
However above syntax won't handle pluses (+) correctly, so you've to replace them with spaces via sed or as suggested by @isaac, use the following syntax:
decoded=$(input=${input//+/ }; printf "${input//%/\\x}")
You can also use the following urlencode() and urldecode() functions:
urlencode() {
# urlencode <string>
local length="${#1}"
for (( i = 0; i < length; i++ )); do
local c="${1:i:1}"
case $c in
[a-zA-Z0-9.~_-]) printf "$c" ;;
*) printf '%%%02X' "'$c" ;;
esac
done
}
urldecode() {
# urldecode <string>
local url_encoded="${1//+/ }"
printf '%b' "${url_encoded//%/\\x}"
}
Note that above
urldecode()assumes the data contains no backslash.
Here is similar Joel's version found at: https://github.com/sixarm/urldecode.sh
bash + xxd
Bash function with xxd tool:
urlencode() {
local length="${#1}"
for (( i = 0; i < length; i++ )); do
local c="${1:i:1}"
case $c in
[a-zA-Z0-9.~_-]) printf "$c" ;;
*) printf "$c" | xxd -p -c1 | while read x;do printf "%%%s" "$x";done
esac
done
}
Found in cdown's gist file, also at stackoverflow.
PHP
Using PHP you can try the following command:
$ echo oil+and+gas | php -r 'echo urldecode(fgets(STDIN));' // Or: php://stdin
oil and gas
or just:
php -r 'echo urldecode("oil+and+gas");'
Use -R for multiple line input.
Perl
In Perl you can use URI::Escape.
decoded_url=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "$encoded_url")
Or to process a file:
perl -i -MURI::Escape -e 'print uri_unescape($ARGV[0])' file
awk
Try anon solution:
awk -niord '{printf RT?$0chr("0x"substr(RT,2)):$0}' RS=%..
Note: Parameter -n is specific to GNU awk.
Try Stéphane Chazelas urlencode solution:
awk -v RS='&#[0-9]+;' -v ORS= '1;RT{printf("%%%02X", substr(RT,3))}'
See: Using awk printf to urldecode text.
decoding file names
If you need to remove url encoding from the file names, use deurlname tool from renameutils (e.g. deurlname *.*).
See also:
- Can wget decode uri file names when downloading in batch?
- How to remove URI encoding from file names?
Related:
- How to decode URL-encoded string in shell? at SO
- How can I encode and decode percent-encoded strings on the command line? at Ask Ubuntu