This might be a bit more efficient with the same outcome:
function escapeXml(unsafe) {
return unsafe.replace(/[<>&'"]/g, function (c) {
switch (c) {
case '<': return '<';
case '>': return '>';
case '&': return '&';
case '\'': return ''';
case '"': return '"';
}
});
}
Answer from hgoebl on Stack OverflowThis might be a bit more efficient with the same outcome:
function escapeXml(unsafe) {
return unsafe.replace(/[<>&'"]/g, function (c) {
switch (c) {
case '<': return '<';
case '>': return '>';
case '&': return '&';
case '\'': return ''';
case '"': return '"';
}
});
}
HTML encoding is simply replacing &, ", ', < and > chars with their entity equivalents. Order matters, if you don't replace the & chars first, you'll double encode some of the entities:
if (!String.prototype.encodeHTML) {
String.prototype.encodeHTML = function () {
return this.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
};
}
As @Johan B.W. de Vries pointed out, this will have issues with the tag names, I would like to clarify that I made the assumption that this was being used for the value only
Conversely if you want to decode HTML entities1, make sure you decode & to & after everything else so that you don't double decode any entities:
if (!String.prototype.decodeHTML) {
String.prototype.decodeHTML = function () {
return this.replace(/'/g, "'")
.replace(/"/g, '"')
.replace(/>/g, '>')
.replace(/</g, '<')
.replace(/&/g, '&');
};
}
1 just the basics, not including © to or other such things
As far as libraries are concerned. Underscore.js (or Lodash if you prefer) provides an _.escape method to perform this functionality.
» npm install xml-escape
Four things:
replacereturns the updated string, so you have to use the return value.When the first argument is a string, it only replaces teh first occurrence; to replace all of them, you have to give a regular expression with the
gflag.Character entities end with
;(e.g.,&, not&)."is", not'; and'is', not"
For example:
function encodeMe(myString) {
mystring = myString.replace(/&/g, "&");
mystring = myString.replace(/</g, "<");
mystring = myString.replace(/>/g, ">");
mystring = myString.replace(/"/g, """);
mystring = myString.replace(/'/g, "'");
return myString;
}
or of course, one long chained statment:
function encodeMe(myString) {
mystring = myString.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">")
.replace(/"/g, """)
.replace(/'/g, "'");
return myString;
}
I'm also not 100% sure XML has '. HTML does, but I'm not sure XML does.
Missing a ; -- though I think you have your last two characters switched around.
And you want those replaces to also have the global flag.
function encodeMe(myString) {
var r = myString.replace("&", "&")
.replace("<", "<")
.replace(">", ">")
.replace("\"", "'")
.replace("'", """);
return r
}
The correct answer is to double encode the text. First with JavascriptEncode and next with XmlAttributeEncode. The rationale behind this is that everything within a xml/html attribute should be XML attribute encoded. The parser of the browser will interpret this as an xml attribute and decode it that way. The browser will supply this decoded text to the javascript interpreter and it should therefore be JavaScript encoded properly to prevent a security leak.
This double encoding will not result invalid results, because the browser will also double decode this text (because two separate interpreters are involved). Here is an example of the correct encoding.
string unsafeText = "Hello <unsafe> ');alert('xss');alert('";
string javaEncoded = AntiXss.JavascriptEncode(unsafeText, false);
ENCODED_STRING = AntiXss.XmlAttributeEncode(javaEncoded);
<input type="button" onclick="alert('[ENCODED_STRING]');"
value="Click me" />
While double encoding is the only correct way to do this, I like to note that using only JavaScript encoding will usually yield correct result. The constraint here is that the attribute's text is put between quotes.
JavaScript encoding uses the same white list (except for the space character) as HTML/XML attribute encoding. Difference between them is how unsafe characters are encoded. Javascript encodes them as \xXX and \uXXXX (such as \u01A3), while XML attribute encodes them as &#XX; and &#XXXX; (such as A3;). When encoding text with JavaScript encoding, there are only two characters left that will be encoded again by the XML attribute encoder, namely the space character and the backslash character. Those two characters would only be form a problem when the attribute’s text isn’t wrapped between quotes.
Note however that only using XML attribute encoding in this scenario will NOT yield correct result.
Install the onclick handler in a separate <script> tag.
<input type="button" id="clickMeButton" value="Click me" />
...
<script type="text/javascript">
...
document.getElementById('clickMeButton').onclick = function () {
alert([ENCODED STRING HERE using AntiXss.JavascriptEncode]);
}
...
</script>