Apache Commons Lang

Apache provides the Commons Lang library.

StringEscapeUtils.unescapeXml(xml)

(commons-lang, download)

Answer from Bozho on Stack Overflow
Top answer
1 of 4
56

Apache Commons Lang

Apache provides the Commons Lang library.

StringEscapeUtils.unescapeXml(xml)

(commons-lang, download)

2 of 4
6

Here's a simple method to unescape XML. It handles the predefined XML entities and decimal numerical entities (&#nnnn;). Modifying it to handle hex entities (&#xhhhh;) should be simple.

public static String unescapeXML( final String xml )
{
    Pattern xmlEntityRegex = Pattern.compile( "&(#?)([^;]+);" );
    //Unfortunately, Matcher requires a StringBuffer instead of a StringBuilder
    StringBuffer unescapedOutput = new StringBuffer( xml.length() );

    Matcher m = xmlEntityRegex.matcher( xml );
    Map<String,String> builtinEntities = null;
    String entity;
    String hashmark;
    String ent;
    int code;
    while ( m.find() ) {
        ent = m.group(2);
        hashmark = m.group(1);
        if ( (hashmark != null) && (hashmark.length() > 0) ) {
            code = Integer.parseInt( ent );
            entity = Character.toString( (char) code );
        } else {
            //must be a non-numerical entity
            if ( builtinEntities == null ) {
                builtinEntities = buildBuiltinXMLEntityMap();
            }
            entity = builtinEntities.get( ent );
            if ( entity == null ) {
                //not a known entity - ignore it
                entity = "&" + ent + ';';
            }
        }
        m.appendReplacement( unescapedOutput, entity );
    }
    m.appendTail( unescapedOutput );

    return unescapedOutput.toString();
}

private static Map<String,String> buildBuiltinXMLEntityMap()
{
    Map<String,String> entities = new HashMap<String,String>(10);
    entities.put( "lt", "<" );
    entities.put( "gt", ">" );
    entities.put( "amp", "&" );
    entities.put( "apos", "'" );
    entities.put( "quot", "\"" );
    return entities;
}
🌐
Unbescape
unbescape.org
unbescape: powerful, fast and easy escape/unescape operations for Java
unbescape is a Java library aimed at performing fully-featured and high-performance escape and unescape operations for: HTML (HTML5 and HTML 4) XML (XML 1.0 and XML 1.1) JavaScript · JSON · URI / URL (both paths and query parameters) CSS (both identifiers and string literals) CSV (Comma-Separated ...
🌐
Apache Commons
commons.apache.org › proper › commons-lang › javadocs › api-2.6 › org › apache › commons › lang › StringEscapeUtils.html
StringEscapeUtils (Commons Lang 2.6 API)
January 10, 2011 - Note that unicode characters greater than 0x7f are currently escaped to their numerical \\u equivalent. This may change in future releases. ... Unescapes a string containing XML entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
🌐
GitHub
github.com › unbescape › unbescape
GitHub - unbescape/unbescape: Advanced yet easy to use escaping library for Java · GitHub
Support for both XML 1.0 and XML 1.1 escape/unescape operations. No support for DTD-defined or user-defined entities. Only the five predefined XML character entities are supported: &lt;, &gt;, &amp;, &quot; and &apos;. Automatic escaping of allowed control characters. ... Support for the JavaScript ...
Starred by 246 users
Forked by 34 users
Languages   Java 93.2% | HTML 6.7% | CSS 0.1%
🌐
Apache Commons
commons.apache.org › proper › commons-text › javadocs › api-release › org › apache › commons › text › StringEscapeUtils.html
StringEscapeUtils (Apache Commons Text 1.9 API)
Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes. Supports only HTML 3.0 entities. ... Escapes the characters in a String using XML entities.
🌐
Blogger
javarevisited.blogspot.com › 2012 › 09 › how-to-replace-escape-xml-special-characters-java-string.html
How to replace escape XML special characters in Java String - Example
Output Original unescaped XML String: Java & HTML Escaped XML String in Java: Java &amp; HTML Original unescaped XML String: Java > HTML Escaped XML String : Java &gt; HTML Original unescaped XML String: Java < HTML Escaped XML String: Java &lt; HTML Original unescaped XML String: Java " HTML Escaped XML String: Java &quot; HTML Original unescaped XML String: Java ' HTML Escaped XML String: Java &apos; HTML
🌐
Tabnine
tabnine.com › home page › code › java › org.xwiki.xml.xmlutils
org.xwiki.xml.XMLUtils.unescape java code examples | Tabnine
origin: org.xwiki.platform/xwiki-platform-xml-script · /** * Unescape encoded special XML characters. Only &gt;, &lt; &amp;, " and ' are unescaped, since they are the only * ones that affect the resulting markup.
🌐
Tabnine
tabnine.com › home page › code › java › org.json.xml
org.json.XML.unescape java code examples | Tabnine
if (ja != null) { ja.put(token instanceof String ? keepStrings ? XML.unescape((String)token) :XML.stringToValue((String)token) : token);
Find elsewhere
🌐
Experts Exchange
experts-exchange.com › questions › 20669375 › Java-code-for-XML-Escape-Unescape.html
Solved: Java code for XML Escape/Unescape | Experts Exchange
July 5, 2003 - Hi I need some java implementation of the interface below. It has to do escaping/unesceping of 5 xml special characters <>"&' in some string. I want to do that with regular expression if it can. public class XMLEscape { public static String escape(String val){ } public static String unescape(String val){ } } should esacape ><"&' with &gt; &lt; &quot; &amp; &apos; and vice versa like XMLEscape.escape("if ((a>b)&&(c<b))") returns if ((a&gt;b)&amp;&amp;(c&lt;b
🌐
Apache Commons
commons.apache.org › proper › commons-lang › javadocs › api-3.8.1 › index.html
StringEscapeUtils (Apache Commons Lang 3.8.1 API)
JavaScript is disabled on your browser · Frame Alert · This document is designed to be viewed using the frames feature. If you see this message, you are using a non-frame-capable web client. Link to Non-frame version
🌐
Apache Commons
commons.apache.org › proper › commons-lang › apidocs › org › apache › commons › lang3 › StringEscapeUtils.html
StringEscapeUtils (Apache Commons Lang 3.21.0-SNAPSHOT API)
Unescapes any Java literals found in the String. ... Deprecated. Unescapes any Json literals found in the String. ... Deprecated. Unescapes a string containing XML entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
🌐
JSON Formatter
jsonformatter.org › xml-unescape
Best XML Unescape
Online XML Unescape characters tool to escape ampersand, quote and all special characters.
🌐
GeeksforGeeks
geeksforgeeks.org › java › escaping-xml-special-characters-in-java-string
Escaping XML Special Characters in Java String - GeeksforGeeks
August 21, 2025 - Program to escape XML Special Characters !! Unescaped String: DataStructures & Java Escaped String: DataStructures &amp; Java Unescaped String: DataStructures > Java Escaped String: DataStructures &gt; Java Unescaped String: DataStructures < Java Escaped String: DataStructures &lt; Java Unescaped ...
🌐
Code Beautify
codebeautify.org › xml-escape-unescape
XML Escape and XML Unescape Online Tool
This tool saves your time and helps to unescape eXtensible Markup Language data. This tool allows loading the Plain XML data URL, which loads plain data to unescape.
🌐
Java Guides
javaguides.net › p › xml-escape-unescape-online-tool.html
XML Escape / Unescape Online Tool
August 23, 2023 - Escape XML: Converts special XML characters into their escape sequences. Unescape XML: Converts XML escape sequences back to their original characters.
Top answer
1 of 2
3

Using apache commons lang 3, a class that only replaces the HTML-specific entities:

import org.apache.commons.text.translate.AggregateTranslator;
import org.apache.commons.text.translate.CharSequenceTranslator;
import org.apache.commons.text.translate.EntityArrays;
import org.apache.commons.text.translate.LookupTranslator;
import org.apache.commons.text.translate.NumericEntityUnescaper;


public class HtmlEscapeUtils {

  /**
   * @see {@link org.apache.commons.text.StringEscapeUtils#UNESCAPE_HTML4}
   */
  public static final CharSequenceTranslator UNESCAPE_HTML_SPECIFIC =
      new AggregateTranslator(
          new LookupTranslator(EntityArrays.ISO8859_1_UNESCAPE),
          new LookupTranslator(EntityArrays.HTML40_EXTENDED_UNESCAPE),
          new NumericEntityUnescaper());


  /**
   * @see {@link org.apache.commons.text.StringEscapeUtils#unescapeHtml4(String)}
   * @param input - HTML String with e.g. &quot; &amp; &auml;
   * @return XML String, HTML4 Entities replaced, but XML Entites remain (e.g. &quot; und &amp;)
   */
  public static final String unescapeHtmlToXml(final String input) {
    return UNESCAPE_HTML_SPECIFIC.translate(input);
  }

}
2 of 2
1

The list of all HTML named character references is available at http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json

If you can tolerate the occasional mistake, you could just go over that file and replace all named character references that are not allowed in stand-alone XML with the corresponding numeric character reference.

That simple approach can run into problems though if your input is HTML, not XHTML:

<script>var y=1, lt = 3, x = y&lt; alert(x);</script>

contains a script element whose content is not encoded using entities, so naively replacing the &lt; here will break the script. There are other elements like <xmp> and <style> that can have similar problems as will CDATA sections in foreign XML elements.

If you need a really faithful conversion, or if your HTML is messy, your best bet might be to parse the HTML to a DOM using something like nu.validator and then use How to pretty print XML from Java? to convert the DOM to valid XML.

Even if your input is XHTML, you might need to worry about character sequences that look like entities in CDATA sections. Again, parse and re-render might be your best option.

🌐
Stack Overflow
stackoverflow.com › questions › 6012746 › how-to-unescape-non-standard-characters-in-xml-in-java
How to unescape non-standard characters in XML in Java? - Stack Overflow
I realize a similar question has been asked before, and the solution is to use StringEscapeUtils.unescape(). However, per the method description: Supports only the five basic XML entities (gt,...