🌐
Blogger
javarevisited.blogspot.com › 2012 › 09 › how-to-replace-escape-xml-special-characters-java-string.html
How to replace escape XML special characters in Java String - Example
There are two approaches to replace XML or HTML special characters from Java String, First, Write your own function to replace XML special characters or use any open source library which has already implemented it.
🌐
Stack Overflow
stackoverflow.com › questions › 21558446 › remove-escape-characters-from-xml-string-in-java
remove escape characters from XML string in java - Stack Overflow
Or use jaxb move all the xml to beans and remove the non needed chars from the String properties. ... let me test the xslt way. The JAXB way is way far for the app I am working. Its all legacy and Compex Schema failed several times to translate ...
Discussions

xml - Java use replaceAll with Escape Characters String - Stack Overflow
Good night Stack Overflow! Tonight I'm trying to remove the "header" from an XML I've parsed as a string and use replaceAll to remove the following: More on stackoverflow.com
🌐 stackoverflow.com
Best way to encode text data for XML in Java? - Stack Overflow
What is the recommended way of encoding strings for an XML output in Java. The strings might contain characters like "&", "<", etc. ... As others have mentioned, using an XML library is the easiest way. If you do want to escape yourself, you could look into StringEscapeUtils from the Apache ... More on stackoverflow.com
🌐 stackoverflow.com
Escaping special character when generating an XML in Java - Stack Overflow
I am trying to develop an XML export feature to give my application users to export their data in an XML format. I have got this feature ready and working until it started failing for some cases. T... More on stackoverflow.com
🌐 stackoverflow.com
removing invalid XML characters from a string in java - Stack Overflow
I was converting a JSONObject to XML which escaped control chars from "\u0001" to "". This code perfectly removed it. 2021-07-29T11:09:49.897Z+00:00 ... Vanja D. Over a year ago · Only this solution removes special characters before deserialization (i.e. More on stackoverflow.com
🌐 stackoverflow.com
🌐
GeeksforGeeks
geeksforgeeks.org › java › escaping-xml-special-characters-in-java-string
Escaping XML Special Characters in Java String - GeeksforGeeks
August 21, 2025 - These special characters are also referred to as XML Metacharacters. By the process of escaping, we would be replacing these characters with alternate strings to give the literal result of special characters. To use StringEscapeUtils, add the Apache Commons Text dependency: ... // Java program to escape all the five XML special characters import org.apache.commons.text.StringEscapeUtils; public class GeeksForGeeks { public static void main(String[] args) { System.out.println("Program to escape XML Special Characters !!"); // Escape & character String unescapedXMLString = "DataStructures & Java
🌐
javaspring
javaspring.net › blog › java-utility-to-remove-all-xml-escape-characters-using-java
Java Utility to Remove All XML Escape Characters — javaspring.net
In this code, we define a method removeXmlEscapes that takes an input string and replaces all the XML escape sequences with their original characters using the replaceAll method. Another approach is to use a Map to store the mapping of escaped ...
🌐
Breizheurope Finistère
breizheurope-finistere.eu › post › java-utility-to-remove-all-xml-escape-characters-using-java
Breizheurope-finistere
October 19, 2024 - Java provides powerful string manipulation tools to tackle this challenge. One popular solution is using the StringEscapeUtils class from the Apache Commons Lang library. This class offers the unescapeXml method, specifically designed for removing XML escape characters.
🌐
Araqev
araqev.com › home › how can you use a java utility to remove all xml escape characters?
How Can You Use a Java Utility to Remove All XML Escape Characters?
March 22, 2025 - One effective approach to removing XML escape characters in Java involves utilizing regular expressions or string manipulation methods. Libraries such as Apache Commons Lang provide utility functions that can simplify the process.
🌐
A Girl Among Geeks
agirlamonggeeks.com › home › how can i use java to remove all xml escape characters efficiently?
How Can I Use Java to Remove All XML Escape Characters Efficiently?
July 4, 2025 - These sequences, such as `&`, `<`, `>`, `"`, and `'`, must be converted back to their original characters to process or display the content correctly. Below are several effective approaches to remove all XML escape characters in Java: Using Apache Commons Text: The StringEscapeUtils class provides utility methods to unescape XML entities.
🌐
JSON Formatter
jsonformatter.org › xml-escape
Best XML Escape characters tool
Escapes or unescapes an XML file removing traces of offending characters that could be wrongfully interpreted as markup.
Find elsewhere
Top answer
1 of 3
56

You can use apache common text library to escape a string.

org.apache.commons.text.StringEscapeUtils

# For XML 1.0 
String escapedXml = StringEscapeUtils.escapeXml10("the data might contain & or ! or % or ' or # etc");

# For XML 1.1
String escapedXml = StringEscapeUtils.escapeXml11("the data might contain & or ! or % or ' or # etc");

But what you are looking for is a way to convert any string into a valid XML tag name. For ASCII characters, XML tag name must begin with one of _:a-zA-Z and followed by any number of character in _:a-zA-Z0-9.-

I believe there is no library to do this for you so you have to implement your own function to convert from any string to match this pattern or alternatively make it into a value of attritbue.

<property name="no more need to be encoded, it should be handled by XML library">0.0</property>
2 of 3
1
public class RssParser {
int length;
    URL url;
URLConnection urlConn;
NodeList nodeList;
Document doc;
Node node;
Element firstEle;
NodeList titleList;
Element ele;
NodeList txtEleList;
String retVal, urlStrToParse, rootNodeName;

public RssParser(String urlStrToParse, String rootNodeName){
    this.urlStrToParse = urlStrToParse;
    this.rootNodeName = rootNodeName;

    url=null;
    urlConn=null;
    nodeList=null;
    doc=null;
    node=null;
    firstEle=null;
    titleList=null;
    ele=null;
    txtEleList=null;
    retVal=null;
            doc = null;
    try {
        url = new URL(this.urlStrToParse);
                    // dis is path of url which v'll parse
        urlConn = url.openConnection();

                    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();

        String s = isToString(urlConn.getInputStream());
        s = s.replace("&", "&amp;");
        StringBuilder sb =
                            new StringBuilder
                                    ("<?xml version=\"1.0\" encoding=\"utf-8\"?>");
        sb.append("\n"+s);
        System.out.println("STR: \n"+sb.toString());
        s = sb.toString();

        doc = db.parse(urlConn.getInputStream());
        nodeList = doc.getElementsByTagName(this.rootNodeName); 
        //  dis is d first node which
        //  contains other inner element-nodes
        length =nodeList.getLength();
        firstEle=doc.getDocumentElement();
    }
    catch (ParserConfigurationException pce) {
        System.out.println("Could not Parse XML: " + pce.getMessage());
    }
    catch (SAXException se) {
        System.out.println("Could not Parse XML: " + se.getMessage());
    }
    catch (IOException ioe) {
        System.out.println("Invalid XML: " + ioe.getMessage());
    }
    catch(Exception e){
        System.out.println("Error: "+e.toString());
    }
}


public String isToString(InputStream in) throws IOException {
    StringBuffer out = new StringBuffer();
    byte[] b = new byte[512];
    for (int i; (i = in.read(b)) != -1;) {
        out.append(new String(b, 0, i));
    }
    return out.toString();
}

public String getVal(int i, String param){
    node =nodeList.item(i);
    if(node.getNodeType() == Node.ELEMENT_NODE)
    {
        System.out.println("Param: "+param);
        titleList = firstEle.getElementsByTagName(param);
        if(firstEle.hasAttribute("id"))
        System.out.println("hasAttrib----------------");
        else System.out.println("Has NOTNOT      NOT");
        System.out.println("titleList: "+titleList.toString());
    ele = (Element)titleList.item(i);
    System.out.println("ele: "+ele);
        txtEleList = ele.getChildNodes();
    retVal=(((Node)txtEleList.item(0)).getNodeValue()).toString();
    if (retVal == null)
        return null;
            System.out.println("retVal: "+retVal);
    }
return retVal;
}
}
Top answer
1 of 9
94

Java's regex supports supplementary characters, so you can specify those high ranges with two UTF-16 encoded chars, or, even easier, use \x to specify any valid code point.

Here is the pattern for removing characters that are illegal in XML 1.0:

// XML 1.0
// #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
String xml10pattern = "[^"
                    + "\u0009\r\n"
                    + "\u0020-\uD7FF"
                    + "\uE000-\uFFFD"
                    + "\x{10000}-\x{10FFFF}"
                    + "]";

Most people will want the XML 1.0 version.

Here is the pattern for removing characters that are illegal in XML 1.1:

// XML 1.1
// [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
String xml11pattern = "[^"
                    + "\u0001-\uD7FF"
                    + "\uE000-\uFFFD"
                    + "\x{10000}-\x{10FFFF}"
                    + "]+";

You will need to use String.replaceAll(...) and not String.replace(...).

String illegal = "Hello, World!\0";
String legal = illegal.replaceAll(pattern, "");
2 of 9
13

All these answers so far only replace the characters themselves. But sometimes an XML document will have invalid XML entity sequences resulting in errors. For example, if you have &#2; in your xml, a java xml parser will throw Illegal character entity: expansion character (code 0x2 at ....

Here is a simple java program that can replace those invalid entity sequences.

  public final Pattern XML_ENTITY_PATTERN = Pattern.compile("\\&\\#(?:x([0-9a-fA-F]+)|([0-9]+))\\;");

  /**
   * Remove problematic xml entities from the xml string so that you can parse it with java DOM / SAX libraries.
   */
  String getCleanedXml(String xmlString) {
    Matcher m = XML_ENTITY_PATTERN.matcher(xmlString);
    Set<String> replaceSet = new HashSet<>();
    while (m.find()) {
      String group = m.group(1);
      int val;
      if (group != null) {
        val = Integer.parseInt(group, 16);
        if (isInvalidXmlChar(val)) {
          replaceSet.add("&#x" + group + ";");
        }
      } else if ((group = m.group(2)) != null) {
        val = Integer.parseInt(group);
        if (isInvalidXmlChar(val)) {
          replaceSet.add("&#" + group + ";");
        }
      }
    }
    String cleanedXmlString = xmlString;
    for (String replacer : replaceSet) {
      cleanedXmlString = cleanedXmlString.replaceAll(replacer, "");
    }
    return cleanedXmlString;
  }

  private boolean isInvalidXmlChar(int val) {
    if (val == 0x9 || val == 0xA || val == 0xD ||
            val >= 0x20 && val <= 0xD7FF ||
            val >= 0x10000 && val <= 0x10FFFF) {
      return false;
    }
    return true;
  }
🌐
Code Beautify
codebeautify.org › xml-escape-unescape
XML Escape and XML Unescape Online Tool
Copy, Paste and Escape. XML Escape is very unique tool to escape plain xml.
🌐
Apache Commons
commons.apache.org › proper › commons-lang › javadocs › api-3.8.1 › index.html
StringEscapeUtils (Apache Commons Lang 3.8.1 API)
JavaScript is disabled on your browser · Frame Alert · This document is designed to be viewed using the frames feature. If you see this message, you are using a non-frame-capable web client. Link to Non-frame version
🌐
Javapractices
javapractices.com › topic › TopicAction.do
Java Practices->Escape special characters
If a query string is placed in an HREF attribute, then even a URL encoded query string is often not of valid form. This is because URLEncoder produces valid HTTP, but it doesn't in general produce text which is a valid HTML attribute - the ampersand character needs to be replaced by the corresponding character entity &amp;. Here is an example of a utility class which escapes special characters for HTML, XML, regular expressions, and so on. package hirondelle.web4j.util; import java.net.URLEncoder; import java.io.UnsupportedEncodingException; import java.text.CharacterIterator; import java.text
🌐
GeeksforGeeks
origin.geeksforgeeks.org › escaping-xml-special-characters-in-java-string
Escaping XML Special Characters in Java String | GeeksforGeeks
February 22, 2021 - These special characters are also referred to as XML Metacharacters. By the process of escaping, we would be replacing these characters with alternate strings to give the literal result of special characters. ... <GeeksForGeeks> Data Structures & Java </GeeksForGeeks> // is an invalid string in java because '&' is a reserved literal // in XML that is used to import other XML entity.
🌐
Urban Fresh Produce
urbanfreshproduce.ca › post › java-utility-to-remove-all-xml-escape-characters-using-java
java utility to remove all xml escape characters using java
October 19, 2024 - One popular solution is using the StringEscapeUtils class from the Apache Commons Lang library. This class offers the unescapeXml method, specifically designed for removing XML escape characters.
🌐
Stack Exchange
codereview.stackexchange.com › questions › 234915 › escaping-invalid-xml-characters-e-g-for-the-java-dom-api
Escaping invalid XML characters (e.g. for the Java DOM API) - Code Review Stack Exchange
January 1, 2020 - // should not be reviewed String string = "text#text##text#0;text" + '\u0000' + "text<text&text#"; Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument(); Element element = document.createElement("element"); element.appendChild(document.createTextNode(escapeInvalidXmlCharacters(string))); document.appendChild(element); TransformerFactory.newInstance().newTransformer().transform(new DOMSource(document), new StreamResult(new File("test.xml"))); // creates <?xml version="1.0" encoding="UTF-8" standalone="no"?><element>text##text####text##0;text#0;text&lt;text&amp;text##</element> document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File("test.xml")); System.out.println(unescapeInvalidXmlCharacters(document.getDocumentElement().getTextContent()).equals(string)); // prints true