URL encoding, also known as percent-encoding, is a method used to encode special characters in Uniform Resource Identifiers (URIs) so they can be safely transmitted over the internet. It ensures that characters not allowed in URLs—such as spaces, reserved symbols, or non-ASCII characters—are converted into a standard format using the percent sign (%) followed by two hexadecimal digits representing the character’s byte value.
Why URL Encoding Is Important
Prevents parsing errors: Characters like
?,&,#,/, and spaces have special meanings in URLs. Encoding them avoids misinterpretation by browsers and servers.Supports non-ASCII characters: URLs are based on ASCII, so characters from other languages (e.g.,
€,©,ш) must be encoded using UTF-8 and then percent-encoded.Enables data transmission: When sending data via query strings (e.g., in web forms), special characters must be encoded to ensure integrity and correct interpretation.
Common Encoding Examples
| Character | Purpose in URL | Encoded Form |
| Space | Separates words | %20 or + (in query strings) |
& | Separates query parameters | %26 |
? | Starts query string | %3F |
# | Indicates fragment/anchor | %23 |
@ | Separates user/password from domain | %40 |
+ | Represents space in query strings | %2B |
% | Indicates encoding | %25 |
How It Works
Encoding: Replaces unsafe or reserved characters with
%XX, whereXXis the hexadecimal representation of the character’s UTF-8 byte value.Decoding: Reverses the process to restore the original character.
Tools & Functions
Online Tools: Use free tools like meyerweb.com/eric/tools/dencoder/ or jam.dev/utilities/url-encoder to encode/decode URLs instantly.
JavaScript: Use
encodeURIComponent()for individual values (e.g., query parameters), orencodeURI()for entire URIs.Python: Use
urllib.parse.quote()orurllib.parse.quote_plus()for encoding.PHP: Use
rawurlencode()for proper encoding.
⚠️ Note: While
+is often used to represent a space in query strings (legacy compatibility),%20is the correct and more universal encoding. Always use%20in paths and when full precision is required.
Security Considerations
Double encoding (encoding twice) can bypass filters in security systems and is exploited in attacks like path traversal, XSS, and SQL injection.
Examples include CVE-2001-0333 (IIS directory traversal) and CVE-2004-1939 (XSS via double-encoded slashes).
Standards
Defined in RFC 3986 and RFC 3987 (IRI).
UTF-8 is the standard character encoding used for URL encoding.
In summary, URL encoding ensures that data is correctly interpreted across systems, enabling reliable communication over the web. Always use UTF-8 and %XX format for safe, standardized encoding.
What are all the reasons behind url encoding?
URL encoding using a `+` instead of ` ` for a space - Developers - Talk TW
utf 8 - What is the proper way to URL encode Unicode characters? - Stack Overflow
URL encoding the space character: + or ? - Stack Overflow
Videos
I can't understand why we need to encode certain characters in url? I mean %encoding. I did a lot of google search still can't figure out exactly what's the point behind this! Could anyone help me out with this?
I would always encode in UTF-8. From the Wikipedia page on percent encoding:
The generic URI syntax mandates that new URI schemes that provide for the representation of character data in a URI must, in effect, represent characters from the unreserved set without translation, and should convert all other characters to bytes according to UTF-8, and then percent-encode those values. This requirement was introduced in January 2005 with the publication of RFC 3986. URI schemes introduced before this date are not affected.
It seems like because there were other accepted ways of doing URL encoding in the past, browsers attempt several methods of decoding a URI, but if you're the one doing the encoding you should use UTF-8.
IRI (RFC 3987) is the latest standard that replaces the URI/URL (RFC 3986 and older) standards. URI/URL do not natively support Unicode (well, RFC 3986 adds provisions for future URI/URL-based protocols to support it, but does not update past RFCs). The "%uXXXX" scheme is a non-standard extension to allow Unicode in some situations, but is not universally implemented by everyone. IRI, on the other hand, fully supports Unicode, and requires that text be encoded as UTF-8 before then being percent-encoded.
From Wikipedia (emphasis and link added):
When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20". The MIME type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications.
So, the real percent encoding uses %20 while form data in URLs is in a modified form that uses +. So you're most likely to only see + in URLs in the query string after an ?.
This confusion is because URLs are still 'broken' to this day.
From a blog post:
Take "http://www.google.com" for instance. This is a URL. A URL is a Uniform Resource Locator and is really a pointer to a web page (in most cases). URLs actually have a very well-defined structure since the first specification in 1994.
We can extract detailed information about the "http://www.google.com" URL:
+---------------+-------------------+ | Part | Data | +---------------+-------------------+ | Scheme | http | | Host | www.google.com | +---------------+-------------------+If we look at a more complex URL such as:
"https://bob:[email protected]:8080/file;p=1?q=2#third"
we can extract the following information:
+-------------------+---------------------+ | Part | Data | +-------------------+---------------------+ | Scheme | https | | User | bob | | Password | bobby | | Host | www.lunatech.com | | Port | 8080 | | Path | /file;p=1 | | Path parameter | p=1 | | Query | q=2 | | Fragment | third | +-------------------+---------------------+ https://bob:[email protected]:8080/file;p=1?q=2#third \___/ \_/ \___/ \______________/ \__/\_______/ \_/ \___/ | | | | | | \_/ | | Scheme User Password Host Port Path | | Fragment \_____________________________/ | Query | Path parameter AuthorityThe reserved characters are different for each part.
For HTTP URLs, a space in a path fragment part has to be encoded to "%20" (not, absolutely not "+"), while the "+" character in the path fragment part can be left unencoded.
Now in the query part, spaces may be encoded to either "+" (for backwards compatibility: do not try to search for it in the URI standard) or "%20" while the "+" character (as a result of this ambiguity) has to be escaped to "%2B".
This means that the "blue+light blue" string has to be encoded differently in the path and query parts:
"http://example.com/blue+light%20blue?blue%2Blight+blue".
From there you can deduce that encoding a fully constructed URL is impossible without a syntactical awareness of the URL structure.
This boils down to:
You should have %20 before the ? and + after.
Source