By definition, CDATA section content is taken as such, not parsed even for character references like ™. See What does <![CDATA[]]> in XML mean?
Independently of this, ™ is undefined, though commonly interpreted by browsers as denoting the trade mark character. Correct references for trade mark character are ™ and ™.
If the document encoding is UTF-8, you should enter the character “” as such. Inside CDATA sections, it’s really the only way.
Answer from Jukka K. Korpela on Stack OverflowCDATA stands for Character Data and it means that the data in between these strings includes data that could be interpreted as XML markup, but should not be.
The key differences between CDATA and comments are:
- As Richard points out, CDATA is still part of the document, while a comment is not.
- In CDATA you cannot include the string
]]>(CDEnd), while in a comment--is invalid. - Parameter Entity references are not recognized inside of comments.
This means given these four snippets of XML from one well-formed document:
<!ENTITY MyParamEntity "Has been expanded">
<!--
Within this comment I can use ]]>
and other reserved characters like <
&, ', and ", but %MyParamEntity; will not be expanded
(if I retrieve the text of this node it will contain
%MyParamEntity; and not "Has been expanded")
and I can't place two dashes next to each other.
-->
<![CDATA[
Within this Character Data block I can
use double dashes as much as I want (along with <, &, ', and ")
*and* %MyParamEntity; will be expanded to the text
"Has been expanded" ... however, I can't use
the CEND sequence. If I need to use CEND I must escape one of the
brackets or the greater-than sign using concatenated CDATA sections.
]]>
<description>An example of escaped CENDs</description>
<!-- This text contains a CEND ]]> -->
<!-- In this first case we put the ]] at the end of the first CDATA block
and the > in the second CDATA block -->
<data><![CDATA[This text contains a CEND ]]]]><![CDATA[>]]></data>
<!-- In this second case we put a ] at the end of the first CDATA block
and the ]> in the second CDATA block -->
<alternative><![CDATA[This text contains a CEND ]]]><![CDATA[]>]]></alternative>
A CDATA section is "a section of element content that is marked for the parser to interpret as only character data, not markup."
Syntactically, it behaves similarly to a comment:
<exampleOfAComment>
<!--
Since this is a comment
I can use all sorts of reserved characters
like > < " and &
or write things like
<foo></bar>
but my document is still well-formed!
-->
</exampleOfAComment>
... but it is still part of the document:
<exampleOfACDATA>
<![CDATA[
Since this is a CDATA section
I can use all sorts of reserved characters
like > < " and &
or write things like
<foo></bar>
but my document is still well formed!
]]>
</exampleOfACDATA>
Try saving the following as a .xhtml file (not .html) and open it using FireFox (not Internet Explorer) to see the difference between the comment and the CDATA section; the comment won't appear when you look at the document in a browser, while the CDATA section will:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" >
<head>
<title>CDATA Example</title>
</head>
<body>
<h2>Using a Comment</h2>
<div id="commentExample">
<!--
You won't see this in the document
and can use reserved characters like
< > & "
-->
</div>
<h2>Using a CDATA Section</h2>
<div id="cdataExample">
<![CDATA[
You will see this in the document
and can use reserved characters like
< > & "
]]>
</div>
</body>
</html>
Something to take note of with CDATA sections is that they have no encoding, so there's no way to include the string ]]> in them. Any character data which contains ]]> will have to - as far as I know - be a text node instead. Likewise, from a DOM manipulation perspective you can't create a CDATA section which includes ]]>:
var myEl = xmlDoc.getElementById("cdata-wrapper");
myEl.appendChild(xmlDoc.createCDATASection("This section cannot contain ]]>"));
This DOM manipulation code will either throw an exception (in Firefox) or result in a poorly structured XML document: http://jsfiddle.net/9NNHA/
You have to break your data into pieces to conceal the ]]>.
Here's the whole thing:
<![CDATA[]]]]><![CDATA[>]]>
The first <![CDATA[]]]]> has the ]]. The second <![CDATA[>]]> has the >.
You cannot escape a CDATA end sequence. Production rule 20 of the XML specification is quite clear:
[20] CData ::= (Char* - (Char* ']]>' Char*))
EDIT: This product rule literally means "A CData section may contain anything you want BUT the sequence ']]>'. No exception.".
EDIT2: The same section also reads:
Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using "
<" and "&". CDATA sections cannot nest.
In other words, it's not possible to use entity reference, markup or any other form of interpreted syntax. The only parsed text inside a CDATA section is ]]>, and it terminates the section.
Hence, it is not possible to escape ]]> within a CDATA section.
EDIT3: The same section also reads:
2.7 CDATA Sections
[Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>":]
Then there may be a CDATA section anywhere character data may occur, including multiple adjacent CDATA sections inplace of a single CDATA section. That allows it to be possible to split the ]]> token and put the two parts of it in adjacent CDATA sections.
ex:
<![CDATA[Certain tokens like ]]> can be difficult and <invalid>]]>
should be written as
<![CDATA[Certain tokens like ]]]]><![CDATA[> can be difficult and <valid>]]>