ELI5: Why do URLs have %20 in them to represent a space?
URL encoding the space character: + or ? - Stack Overflow
encoding - The origin on why ' ' is used as a space in URLs - Stack Overflow
How to remove %20 in urls??
Videos
From Wikipedia (emphasis and link added):
When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20". The MIME type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications.
So, the real percent encoding uses %20 while form data in URLs is in a modified form that uses +. So you're most likely to only see + in URLs in the query string after an ?.
This confusion is because URLs are still 'broken' to this day.
From a blog post:
Take "http://www.google.com" for instance. This is a URL. A URL is a Uniform Resource Locator and is really a pointer to a web page (in most cases). URLs actually have a very well-defined structure since the first specification in 1994.
We can extract detailed information about the "http://www.google.com" URL:
+---------------+-------------------+ | Part | Data | +---------------+-------------------+ | Scheme | http | | Host | www.google.com | +---------------+-------------------+If we look at a more complex URL such as:
"https://bob:[email protected]:8080/file;p=1?q=2#third"
we can extract the following information:
+-------------------+---------------------+ | Part | Data | +-------------------+---------------------+ | Scheme | https | | User | bob | | Password | bobby | | Host | www.lunatech.com | | Port | 8080 | | Path | /file;p=1 | | Path parameter | p=1 | | Query | q=2 | | Fragment | third | +-------------------+---------------------+ https://bob:[email protected]:8080/file;p=1?q=2#third \___/ \_/ \___/ \______________/ \__/\_______/ \_/ \___/ | | | | | | \_/ | | Scheme User Password Host Port Path | | Fragment \_____________________________/ | Query | Path parameter AuthorityThe reserved characters are different for each part.
For HTTP URLs, a space in a path fragment part has to be encoded to "%20" (not, absolutely not "+"), while the "+" character in the path fragment part can be left unencoded.
Now in the query part, spaces may be encoded to either "+" (for backwards compatibility: do not try to search for it in the URI standard) or "%20" while the "+" character (as a result of this ambiguity) has to be escaped to "%2B".
This means that the "blue+light blue" string has to be encoded differently in the path and query parts:
"http://example.com/blue+light%20blue?blue%2Blight+blue".
From there you can deduce that encoding a fully constructed URL is impossible without a syntactical awareness of the URL structure.
This boils down to:
You should have %20 before the ? and + after.
Source
It's called percent encoding. Some characters can't be in a URI (for example #, as it denotes the URL fragment), so they are represented with characters that can be (# becomes %23)
Here's an excerpt from that same article:
When a character from the reserved set (a "reserved character") has special meaning (a "reserved purpose") in a certain context, and a URI scheme says that it is necessary to use that character for some other purpose, then the character must be percent-encoded. Percent-encoding a reserved character involves converting the character to its corresponding byte value in ASCII and then representing that value as a pair of hexadecimal digits. The digits, preceded by a percent sign ("%") which is used as an escape character, are then used in the URI in place of the reserved character. (For a non-ASCII character, it is typically converted to its byte sequence in UTF-8, and then each byte value is represented as above.)
The space character's character code is 32:
> ' '.charCodeAt(0)
32
Which is 20 in base-16:
> ' '.charCodeAt(0).toString(16)
"20"
Tack a percent sign in front of it and you get %20.
Because URLs have strict syntactic rules, like / being a special path separator character, spaces not being allowed in a URL and all characters having to be a certain subset of ASCII. To embed arbitrary characters in URLs regardless of these restrictions, bytes can be percent encoded. The byte x20 represents a space in the ASCII encoding (and most other encodings), hence %20 is the URL-encoded version of it.
Can anyone help me figure out how to remove %20 from the urls
Example:
Currently my link is like this :-https://localhost/Django%20Tutorial%/hello%20world/
But i wanted it like this:-https://localhost/DjangoTutorial/helloworld/
Sorry if it is a basic question! I just can't figure it out, i googled it but did not get a correct answer.
We all are soo blessed to have a community like this in reddit. There are many really good people who spend their time to help us out when we are stuck at something!! Thank you so much for doing that!!