The Java runtime library supports validation. Last time I checked this was the Apache Xerces parser under the covers. You should probably use a javax.xml.validation.Validator.
import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import java.net.URL;
import org.xml.sax.SAXException;
//import java.io.File; // if you use File
import java.io.IOException;
...
URL schemaFile = new URL("http://host:port/filename.xsd");
// webapp example xsd:
// URL schemaFile = new URL("http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd");
// local file example:
// File schemaFile = new File("/location/to/localfile.xsd"); // etc.
Source xmlFile = new StreamSource(new File("web.xml"));
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(xmlFile);
System.out.println(xmlFile.getSystemId() + " is valid");
} catch (SAXException e) {
System.out.println(xmlFile.getSystemId() + " is NOT valid reason:" + e);
} catch (IOException e) {}
The schema factory constant is the string http://www.w3.org/2001/XMLSchema which defines XSDs. The above code validates a WAR deployment descriptor against the URL http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd but you could just as easily validate against a local file.
You should not use the DOMParser to validate a document (unless your goal is to create a document object model anyway). This will start creating DOM objects as it parses the document - wasteful if you aren't going to use them.
Answer from McDowell on Stack OverflowVideos
The Java runtime library supports validation. Last time I checked this was the Apache Xerces parser under the covers. You should probably use a javax.xml.validation.Validator.
import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import java.net.URL;
import org.xml.sax.SAXException;
//import java.io.File; // if you use File
import java.io.IOException;
...
URL schemaFile = new URL("http://host:port/filename.xsd");
// webapp example xsd:
// URL schemaFile = new URL("http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd");
// local file example:
// File schemaFile = new File("/location/to/localfile.xsd"); // etc.
Source xmlFile = new StreamSource(new File("web.xml"));
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(xmlFile);
System.out.println(xmlFile.getSystemId() + " is valid");
} catch (SAXException e) {
System.out.println(xmlFile.getSystemId() + " is NOT valid reason:" + e);
} catch (IOException e) {}
The schema factory constant is the string http://www.w3.org/2001/XMLSchema which defines XSDs. The above code validates a WAR deployment descriptor against the URL http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd but you could just as easily validate against a local file.
You should not use the DOMParser to validate a document (unless your goal is to create a document object model anyway). This will start creating DOM objects as it parses the document - wasteful if you aren't going to use them.
Here's how to do it using Xerces2. A tutorial for this, here (req. signup).
Original attribution: blatantly copied from here:
import org.apache.xerces.parsers.DOMParser;
import java.io.File;
import org.w3c.dom.Document;
public class SchemaTest {
public static void main (String args[]) {
File docFile = new File("memory.xml");
try {
DOMParser parser = new DOMParser();
parser.setFeature("http://xml.org/sax/features/validation", true);
parser.setProperty(
"http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
"memory.xsd");
ErrorChecker errors = new ErrorChecker();
parser.setErrorHandler(errors);
parser.parse("memory.xml");
} catch (Exception e) {
System.out.print("Problem parsing the file.");
}
}
}
You can check if an XML document is well-formed using the following code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
builder.setErrorHandler(new SimpleErrorHandler());
// the "parse" method also validates XML, will throw an exception if misformatted
Document document = builder.parse(new InputSource("document.xml"));
The SimpleErrorHandler class referred to in the above code is as follows:
public class SimpleErrorHandler implements ErrorHandler {
public void warning(SAXParseException e) throws SAXException {
System.out.println(e.getMessage());
}
public void error(SAXParseException e) throws SAXException {
System.out.println(e.getMessage());
}
public void fatalError(SAXParseException e) throws SAXException {
System.out.println(e.getMessage());
}
}
This came from this website, which provides various methods for validating XML with Java. Note also that this method loads an entire DOM tree into memory, see comments for alternatives if you want to save on RAM.
What you are asking is how to verify that a piece of content is well-formed XML document. This is easily done by simply letting an XML parser (try to) parse content in question -- if there are issues, parser will report an error by throwing exception. There really isn't anything more to that; so all you need is to figure out how to parse an XML document.
About the only thing to beware is that some libs that claim to be XML parsers are not really proper parsers, in that they actually might not verify things that XML parser must do (as per XML specification) -- in Java, Javolution is an example of something that does little to no checking; VTD-XML and XPP3 do some verification (but not all required checks). And at the other end of spectrum, Xerces and Woodstox check everything that specification mandates. Xerces is bundled with JDK; and most web service frameworks bundle Woodstox in addition.
Since the accepted answer already shows how to parse content into a DOM document (which starts with parsing), that might be enough.
The only caveat is that this requires that you have 3-5x as much memory available as raw size of the input document. To get around this limitation you could use a streaming parser, such as Woodstox (which implements standard Stax API). If so, you would create an XMLStreamReader, and just call reader.next() as long as reader.hasNext() returns true.