Firstly, as others have stated, you shouldn't use any XML processing. Just read the file as a text file.
Secondly, your umlaut characters showing up as '�' is due to an incorrect charset (encoding) being used. The charset error may be in your code, or it may be the XML file.
The original XML file contains encoding="windows-1252", but it's unusual for XML to be encoded in anything other than UTF-8, so I suspect the file is really a UTF-8 file and the encoding it claims to use is not correct.
Try forcing UTF-8 when reading the file. It's good practice, regardless, to specify the charset when converting bytes to text:
String xml = new String(
Files.readAllBytes(xmlFile.toPath(), StandardCharsets.UTF_8));
Answer from VGR on Stack OverflowFirstly, as others have stated, you shouldn't use any XML processing. Just read the file as a text file.
Secondly, your umlaut characters showing up as '�' is due to an incorrect charset (encoding) being used. The charset error may be in your code, or it may be the XML file.
The original XML file contains encoding="windows-1252", but it's unusual for XML to be encoded in anything other than UTF-8, so I suspect the file is really a UTF-8 file and the encoding it claims to use is not correct.
Try forcing UTF-8 when reading the file. It's good practice, regardless, to specify the charset when converting bytes to text:
String xml = new String(
Files.readAllBytes(xmlFile.toPath(), StandardCharsets.UTF_8));
try this :
String xmlToString=FileUtils.readFileToString(new File("/file/path/file.xml"));
You need to have Commons-io jar for this.
java - XML Document to String - Stack Overflow
java convert string to xml and parse node - Stack Overflow
I created an XML Strings Translator Tool
How to convert a String into XmlString in Java 7
Videos
Assuming doc is your instance of org.w3c.dom.Document:
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(doc), new StreamResult(writer));
String output = writer.getBuffer().toString().replaceAll("\n|\r", "");
Use the Apache XMLSerializer
here's an example: http://www.informit.com/articles/article.asp?p=31349&seqNum=3&rl=1
you can check this as well
http://www.netomatix.com/XmlFileToString.aspx
I'd use Java's XML document libraries. It's a bit of a mess, but works.
String xml = "<response>\n" +
"<returnCode>-2</returnCode>\n" +
"<error>\n" +
"<errorCode>100</errorCode>\n" +
"<errorMessage>ERROR HERE!!!</errorMessage>\n" +
"</error>\n" +
"</response>";
Document doc = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.parse(new InputSource(new StringReader(xml)));
NodeList errNodes = doc.getElementsByTagName("error");
if (errNodes.getLength() > 0) {
Element err = (Element)errNodes.item(0);
System.out.println(err.getElementsByTagName("errorMessage")
.item(0)
.getTextContent());
} else {
// success
}
I would probably use an XML parser to convert it into XML using DOM, then get the text. This has the advantage of being robust and coping with any unusual situations such as a line like this, where something has been commented out:
<!-- commented out <errorMessage>ERROR HERE!!!</errorMessage> -->
If you try and parse it yourself then you might fall foul of things like this. Also it has the advantage that if the requirements expand, then its really easy to change your code.
http://docs.oracle.com/cd/B28359_01/appdev.111/b28394/adx_j_parser.htm