Actually Java supports 4 methods to parse XML out of the box:
DOM Parser/Builder: The whole XML structure is loaded into memory and you can use the well known DOM methods to work with it. DOM also allows you to write to the document with Xslt transformations. Example:
Copypublic static void parse() throws ParserConfigurationException, IOException, SAXException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder = factory.newDocumentBuilder();
File file = new File("test.xml");
Document doc = builder.parse(file);
// Do something with the document here.
}
SAX Parser: Solely to read a XML document. The Sax parser runs through the document and calls callback methods of the user. There are methods for start/end of a document, element and so on. They're defined in org.xml.sax.ContentHandler and there's an empty helper class DefaultHandler.
Copypublic static void parse() throws ParserConfigurationException, SAXException {
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
SAXParser saxParser = factory.newSAXParser();
File file = new File("test.xml");
saxParser.parse(file, new ElementHandler()); // specify handler
}
StAx Reader/Writer: This works with a datastream oriented interface. The program asks for the next element when it's ready just like a cursor/iterator. You can also create documents with it. Read document:
Copypublic static void parse() throws XMLStreamException, IOException {
try (FileInputStream fis = new FileInputStream("test.xml")) {
XMLInputFactory xmlInFact = XMLInputFactory.newInstance();
XMLStreamReader reader = xmlInFact.createXMLStreamReader(fis);
while(reader.hasNext()) {
reader.next(); // do something here
}
}
}
Write document:
Copypublic static void parse() throws XMLStreamException, IOException {
try (FileOutputStream fos = new FileOutputStream("test.xml")){
XMLOutputFactory xmlOutFact = XMLOutputFactory.newInstance();
XMLStreamWriter writer = xmlOutFact.createXMLStreamWriter(fos);
writer.writeStartDocument();
writer.writeStartElement("test");
// write stuff
writer.writeEndElement();
}
}
JAXB: The newest implementation to read XML documents: Is part of Java 6 in v2. This allows us to serialize java objects from a document. You read the document with a class that implements a interface to javax.xml.bind.Unmarshaller (you get a class for this from JAXBContext.newInstance). The context has to be initialized with the used classes, but you just have to specify the root classes and don't have to worry about static referenced classes. You use annotations to specify which classes should be elements (@XmlRootElement) and which fields are elements(@XmlElement) or attributes (@XmlAttribute, what a surprise!)
Copypublic static void parse() throws JAXBException, IOException {
try (FileInputStream adrFile = new FileInputStream("test")) {
JAXBContext ctx = JAXBContext.newInstance(RootElementClass.class);
Unmarshaller um = ctx.createUnmarshaller();
RootElementClass rootElement = (RootElementClass) um.unmarshal(adrFile);
}
}
Write document:
Copypublic static void parse(RootElementClass out) throws IOException, JAXBException {
try (FileOutputStream adrFile = new FileOutputStream("test.xml")) {
JAXBContext ctx = JAXBContext.newInstance(RootElementClass.class);
Marshaller ma = ctx.createMarshaller();
ma.marshal(out, adrFile);
}
}
Examples shamelessly copied from some old lecture slides ;-)
Edit: About "which API should I use?". Well it depends - not all APIs have the same capabilities as you see, but if you have control over the classes you use to map the XML document JAXB is my personal favorite, really elegant and simple solution (though I haven't used it for really large documents, it could get a bit complex). SAX is pretty easy to use too and just stay away from DOM if you don't have a really good reason to use it - old, clunky API in my opinion. I don't think there are any modern 3rd party libraries that feature anything especially useful that's missing from the STL and the standard libraries have the usual advantages of being extremely well tested, documented and stable.
Answer from Voo on Stack OverflowActually Java supports 4 methods to parse XML out of the box:
DOM Parser/Builder: The whole XML structure is loaded into memory and you can use the well known DOM methods to work with it. DOM also allows you to write to the document with Xslt transformations. Example:
Copypublic static void parse() throws ParserConfigurationException, IOException, SAXException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder = factory.newDocumentBuilder();
File file = new File("test.xml");
Document doc = builder.parse(file);
// Do something with the document here.
}
SAX Parser: Solely to read a XML document. The Sax parser runs through the document and calls callback methods of the user. There are methods for start/end of a document, element and so on. They're defined in org.xml.sax.ContentHandler and there's an empty helper class DefaultHandler.
Copypublic static void parse() throws ParserConfigurationException, SAXException {
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
SAXParser saxParser = factory.newSAXParser();
File file = new File("test.xml");
saxParser.parse(file, new ElementHandler()); // specify handler
}
StAx Reader/Writer: This works with a datastream oriented interface. The program asks for the next element when it's ready just like a cursor/iterator. You can also create documents with it. Read document:
Copypublic static void parse() throws XMLStreamException, IOException {
try (FileInputStream fis = new FileInputStream("test.xml")) {
XMLInputFactory xmlInFact = XMLInputFactory.newInstance();
XMLStreamReader reader = xmlInFact.createXMLStreamReader(fis);
while(reader.hasNext()) {
reader.next(); // do something here
}
}
}
Write document:
Copypublic static void parse() throws XMLStreamException, IOException {
try (FileOutputStream fos = new FileOutputStream("test.xml")){
XMLOutputFactory xmlOutFact = XMLOutputFactory.newInstance();
XMLStreamWriter writer = xmlOutFact.createXMLStreamWriter(fos);
writer.writeStartDocument();
writer.writeStartElement("test");
// write stuff
writer.writeEndElement();
}
}
JAXB: The newest implementation to read XML documents: Is part of Java 6 in v2. This allows us to serialize java objects from a document. You read the document with a class that implements a interface to javax.xml.bind.Unmarshaller (you get a class for this from JAXBContext.newInstance). The context has to be initialized with the used classes, but you just have to specify the root classes and don't have to worry about static referenced classes. You use annotations to specify which classes should be elements (@XmlRootElement) and which fields are elements(@XmlElement) or attributes (@XmlAttribute, what a surprise!)
Copypublic static void parse() throws JAXBException, IOException {
try (FileInputStream adrFile = new FileInputStream("test")) {
JAXBContext ctx = JAXBContext.newInstance(RootElementClass.class);
Unmarshaller um = ctx.createUnmarshaller();
RootElementClass rootElement = (RootElementClass) um.unmarshal(adrFile);
}
}
Write document:
Copypublic static void parse(RootElementClass out) throws IOException, JAXBException {
try (FileOutputStream adrFile = new FileOutputStream("test.xml")) {
JAXBContext ctx = JAXBContext.newInstance(RootElementClass.class);
Marshaller ma = ctx.createMarshaller();
ma.marshal(out, adrFile);
}
}
Examples shamelessly copied from some old lecture slides ;-)
Edit: About "which API should I use?". Well it depends - not all APIs have the same capabilities as you see, but if you have control over the classes you use to map the XML document JAXB is my personal favorite, really elegant and simple solution (though I haven't used it for really large documents, it could get a bit complex). SAX is pretty easy to use too and just stay away from DOM if you don't have a really good reason to use it - old, clunky API in my opinion. I don't think there are any modern 3rd party libraries that feature anything especially useful that's missing from the STL and the standard libraries have the usual advantages of being extremely well tested, documented and stable.
Java supports two methods for XML parsing out of the box.
SAXParser
You can use this parser if you want to parse large XML files and/or don't want to use a lot of memory.
http://download.oracle.com/javase/6/docs/api/javax/xml/parsers/SAXParserFactory.html
Example: http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/
DOMParser
You can use this parser if you need to do XPath queries or need to have the complete DOM available.
http://download.oracle.com/javase/6/docs/api/javax/xml/parsers/DocumentBuilderFactory.html
Example: http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/
What are some well maintained XML libraries?
Best XML parser for Java - Stack Overflow
What is the best way to parse and munipulate large XML files in Java
Best Java JSON Parser: Gson or Jackson?
Jackson, but mostly because I tend to use it from Spring MVC and it just works without any extra effort.
More on reddit.comVideos
I'm looking for XML libraries that will be supported but google search turned up a graveyard of old or abandoned libraries. Does anyone have any current recommendations that won't be dropped?
I think you should not consider any specific parser implementation. Java API for XML Processing lets you use any conforming parser implementation in a standard way. The code should be much more portable, and when you realise that a specific parser has grown too old, you can replace it with another without changing a line of your code (if you do it correctly).
Basically there are three ways of handling XML in a standard way:
- SAX This is the simplest API. You read the XML by defining a Handler class that receives the data inside elements/attributes when the XML gets processed in a serial way. It is faster and simpler if you only plan to read some attributes/elements and/or write some values back (your case).
- DOM This method creates an object tree which lets you modify/access it randomly so it is better for complex XML manipulation and handling.
- StAX This is in the middle of the path between SAX and DOM. You just write code to pull the data from the parser you are interested in when it is processed.
Forget about proprietary APIs such as JDOM or Apache ones (i.e. Apache Xerces XMLSerializer) because will tie you to a specific implementation that can evolve in time or lose backwards compatibility, which will make you change your code in the future when you want to upgrade to a new version of JDOM or whatever parser you use. If you stick to Java standard API (using factories and interfaces) your code will be much more modular and maintainable.
There is no need to say that all (I haven't checked all, but I'm almost sure) of the parsers proposed comply with a JAXP implementation so technically you can use all, no matter which.
Here is a nice comparision on DOM, SAX, StAX & TrAX (Source: http://download.oracle.com/docs/cd/E17802_01/webservices/webservices/docs/1.6/tutorial/doc/SJSXP2.html )
Feature StAX SAX DOM TrAX
API Type Pull,streaming Push,streaming In memory tree XSLT Rule
Ease of Use High Medium High Medium
XPath Capability No No Yes Yes
CPU & Memory Good Good Varies Varies
Forward Only Yes Yes No No
Read XML Yes Yes Yes Yes
Write XML Yes No Yes Yes
CRUD No No Yes No