Two examples with expat and libxml2. The second one is, IMHO, much easier to use since it creates a tree in memory, a data structure which is easy to work with. expat, on the other hand, does not build anything (you have to do it yourself), it just allows you to call handlers at specific events during the parsing. But expat may be faster (I didn't measure).

With expat, reading a XML file and displaying the elements indented:

/* 
   A simple test program to parse XML documents with expat
   <http://expat.sourceforge.net/>. It just displays the element
   names.

   On Debian, compile with:

   gcc -Wall -o expat-test -lexpat expat-test.c  

   Inspired from <http://www.xml.com/pub/a/1999/09/expat/index.html> 
*/

#include <expat.h>
#include <stdio.h>
#include <string.h>

/* Keep track of the current level in the XML tree */
int             Depth;

#define MAXCHARS 1000000

void
start(void *data, const char *el, const char **attr)
{
    int             i;

    for (i = 0; i < Depth; i++)
        printf("  ");

    printf("%s", el);

    for (i = 0; attr[i]; i += 2) {
        printf(" %s='%s'", attr[i], attr[i + 1]);
    }

    printf("\n");
    Depth++;
}               /* End of start handler */

void
end(void *data, const char *el)
{
    Depth--;
}               /* End of end handler */

int
main(int argc, char **argv)
{

    char           *filename;
    FILE           *f;
    size_t          size;
    char           *xmltext;
    XML_Parser      parser;

    if (argc != 2) {
        fprintf(stderr, "Usage: %s filename\n", argv[0]);
        return (1);
    }
    filename = argv[1];
    parser = XML_ParserCreate(NULL);
    if (parser == NULL) {
        fprintf(stderr, "Parser not created\n");
        return (1);
    }
    /* Tell expat to use functions start() and end() each times it encounters
     * the start or end of an element. */
    XML_SetElementHandler(parser, start, end);
    f = fopen(filename, "r");
    xmltext = malloc(MAXCHARS);
    /* Slurp the XML file in the buffer xmltext */
    size = fread(xmltext, sizeof(char), MAXCHARS, f);
    if (XML_Parse(parser, xmltext, strlen(xmltext), XML_TRUE) ==
        XML_STATUS_ERROR) {
        fprintf(stderr,
            "Cannot parse %s, file may be too large or not well-formed XML\n",
            filename);
        return (1);
    }
    fclose(f);
    XML_ParserFree(parser);
    fprintf(stdout, "Successfully parsed %i characters in file %s\n", size,
        filename);
    return (0);
}

With libxml2, a program which displays the name of the root element and the names of its children:

/*
   Simple test with libxml2 <http://xmlsoft.org>. It displays the name
   of the root element and the names of all its children (not
   descendents, just children).

   On Debian, compiles with:
   gcc -Wall -o read-xml2 $(xml2-config --cflags) $(xml2-config --libs) \
                    read-xml2.c -lxml2 

*/

#include <stdio.h>
#include <string.h>
#include <libxml/parser.h>

int
main(int argc, char **argv)
{
    xmlDoc         *document;
    xmlNode        *root, *first_child, *node;
    char           *filename;

    if (argc < 2) {
        fprintf(stderr, "Usage: %s filename.xml\n", argv[0]);
        return 1;
    }
    filename = argv[1];

    document = xmlReadFile(filename, NULL, 0);
    root = xmlDocGetRootElement(document);
    fprintf(stdout, "Root is <%s> (%i)\n", root->name, root->type);
    first_child = root->children;
    for (node = first_child; node; node = node->next) {
        fprintf(stdout, "\t Child is <%s> (%i)\n", node->name, node->type);
    }
    fprintf(stdout, "...\n");
    return 0;
}
Answer from bortzmeyer on Stack Overflow
Top answer
1 of 10
79

Two examples with expat and libxml2. The second one is, IMHO, much easier to use since it creates a tree in memory, a data structure which is easy to work with. expat, on the other hand, does not build anything (you have to do it yourself), it just allows you to call handlers at specific events during the parsing. But expat may be faster (I didn't measure).

With expat, reading a XML file and displaying the elements indented:

/* 
   A simple test program to parse XML documents with expat
   <http://expat.sourceforge.net/>. It just displays the element
   names.

   On Debian, compile with:

   gcc -Wall -o expat-test -lexpat expat-test.c  

   Inspired from <http://www.xml.com/pub/a/1999/09/expat/index.html> 
*/

#include <expat.h>
#include <stdio.h>
#include <string.h>

/* Keep track of the current level in the XML tree */
int             Depth;

#define MAXCHARS 1000000

void
start(void *data, const char *el, const char **attr)
{
    int             i;

    for (i = 0; i < Depth; i++)
        printf("  ");

    printf("%s", el);

    for (i = 0; attr[i]; i += 2) {
        printf(" %s='%s'", attr[i], attr[i + 1]);
    }

    printf("\n");
    Depth++;
}               /* End of start handler */

void
end(void *data, const char *el)
{
    Depth--;
}               /* End of end handler */

int
main(int argc, char **argv)
{

    char           *filename;
    FILE           *f;
    size_t          size;
    char           *xmltext;
    XML_Parser      parser;

    if (argc != 2) {
        fprintf(stderr, "Usage: %s filename\n", argv[0]);
        return (1);
    }
    filename = argv[1];
    parser = XML_ParserCreate(NULL);
    if (parser == NULL) {
        fprintf(stderr, "Parser not created\n");
        return (1);
    }
    /* Tell expat to use functions start() and end() each times it encounters
     * the start or end of an element. */
    XML_SetElementHandler(parser, start, end);
    f = fopen(filename, "r");
    xmltext = malloc(MAXCHARS);
    /* Slurp the XML file in the buffer xmltext */
    size = fread(xmltext, sizeof(char), MAXCHARS, f);
    if (XML_Parse(parser, xmltext, strlen(xmltext), XML_TRUE) ==
        XML_STATUS_ERROR) {
        fprintf(stderr,
            "Cannot parse %s, file may be too large or not well-formed XML\n",
            filename);
        return (1);
    }
    fclose(f);
    XML_ParserFree(parser);
    fprintf(stdout, "Successfully parsed %i characters in file %s\n", size,
        filename);
    return (0);
}

With libxml2, a program which displays the name of the root element and the names of its children:

/*
   Simple test with libxml2 <http://xmlsoft.org>. It displays the name
   of the root element and the names of all its children (not
   descendents, just children).

   On Debian, compiles with:
   gcc -Wall -o read-xml2 $(xml2-config --cflags) $(xml2-config --libs) \
                    read-xml2.c -lxml2 

*/

#include <stdio.h>
#include <string.h>
#include <libxml/parser.h>

int
main(int argc, char **argv)
{
    xmlDoc         *document;
    xmlNode        *root, *first_child, *node;
    char           *filename;

    if (argc < 2) {
        fprintf(stderr, "Usage: %s filename.xml\n", argv[0]);
        return 1;
    }
    filename = argv[1];

    document = xmlReadFile(filename, NULL, 0);
    root = xmlDocGetRootElement(document);
    fprintf(stdout, "Root is <%s> (%i)\n", root->name, root->type);
    first_child = root->children;
    for (node = first_child; node; node = node->next) {
        fprintf(stdout, "\t Child is <%s> (%i)\n", node->name, node->type);
    }
    fprintf(stdout, "...\n");
    return 0;
}
2 of 10
45

How about one written in pure assembler :-) Don't forget to check out the benchmarks.

🌐
GitHub
github.com › ooxi › xml.c
GitHub - ooxi/xml.c: Simple XML subset parser comparable to glib's Markup parser, but without any dependencies in one self contained file. · GitHub
Similar to the GLib Markup parser, which also just parses an xml subset, xml.c is a simple, small and self contained xml parser in one file. Ideal for embedding into other projects without the need for big external dependencies.
Starred by 216 users
Forked by 74 users
Languages   C 78.9% | C++ 15.0% | CMake 5.2% | Shell 0.9%
🌐
Free
lars.ruoff.free.fr › xmlcpp
Free C/C++ XML Parser Libraries
February 15, 2012 - Fully standard compliant C++ code. Multiplatform. High flexibility. You can control many aspects of file parsing and DOM tree building via parsing options. Lacks validation, DTD processing, XML namespaces, proper handling of encoding. Lacks UTF-16/32 parsing. The website includes a short documentation page including some code samples that illustrate the use of the library.
🌐
SourceForge
xmlparselib.sourceforge.net
Standard XML Parsing Library
XML-Parse library is a lightweight set of re-usable functions for general purpose parsing, checking, and creating xml files. It can support stream-oriented, SAX or DOM parsing styles, and includes an optional xsd schema validator and graphical schema generator.
🌐
Apache
xerces.apache.org › xerces-c
Xerces-C++ XML Parser
Xerces-C++ is a validating XML parser written in a portable subset of C++. Xerces-C++ makes it easy to give your application the ability to read and write XML data. A shared library is provided for parsing, generating, manipulating, and validating XML documents using the DOM, SAX, and SAX2 APIs.
🌐
Expat
libexpat.github.io
Welcome to Expat! · Expat XML parser
Welcome to Expat, a stream-oriented XML parser library written in C.
🌐
Microsoft Learn
learn.microsoft.com › en-us › archive › msdn-magazine › 2007 › april › xmllite-a-small-and-fast-xml-parser-for-native-c
XmlLite: A Small And Fast XML Parser For Native C++ | Microsoft Learn
XmlLite provides a powerful XML parser for your native C++ applications. It emphasizes performance, is aware of the system resources it uses, and supports a great deal of flexibility in controlling these characteristics.
🌐
W3C
dev.w3.org › XInclude-Test-Suite › libxml2-2.4.24 › libxml2-2.4.24 › doc › xml.html
The XML C library for Gnome - w3.org - W3C
This document describes libxml, the XML C library developed for the Gnome project. XML is a standard for building tag-based structured documents/data. ... Libxml exports Push (progressive) and Pull (blocking) type parser interfaces for both XML and HTML.
Find elsewhere
🌐
The Free Country
thefreecountry.com › sourcecode › xml.shtml
Free XML Parser/Generator Libraries | thefreecountry.com
August 20, 2021 - Libxml2 is a C XML parser library, with an assortment of bindings/wrappers for other languages if you don't use C (eg, C++, Perl, Delphi/Pascal, Ruby, PHP, Java, Rexx, AppleScript, etc). It implements the XML standard, namespaces, XML Base, XML Inclusions (XInclude) 1.0, XML Catalogs Working Draft, Canonical XML Version 1.0, W3C XML Schemas Part 2, etc.
🌐
SourceForge
sxmlc.sourceforge.net
SXMLC - Simple XML C parser
It is just intended to give C developers an API for XML handling as simple and flexible as possible, being fast and memory-efficient. It can be compiled as a stand-alone library to be included in other applications, possibly written in other languages, or be included directly in C projects (add 21 Kb to your executable).
🌐
Oracle
docs.oracle.com › database › 121 › ADXDK › adx_c_parser.htm
20 Using the XML Parser for C
This chapter explains how to use the Extensible Markup Language (XML) parser for C.
🌐
Stanford
web.stanford.edu › dept › itss › docs › oracle › 10gR2 › appdev.102 › b14252 › adx_c_parser.htm
Using the XML Parser for C
You can find the specification at the following URL: ... Oracle XML parser for C checks if an XML document is well-formed, and optionally validates it against a DTD.
🌐
SourceForge
sourceforge.net › home › open source software › software development › xml parsers
Best Open Source Windows XML Parsers 2026
This zipped Ubuntu VM is set up ... in the topic. This challenge was used in an OWASP APPSEC 'Breaking Bad' event. ... NunniMCAX is a minimal (19KB) C library for parsing XML....
🌐
GitHub
github.com › capmar › sxml
GitHub - capmar/sxml: Small XML parser in C
Small XML parser in C. Contribute to capmar/sxml development by creating an account on GitHub.
Starred by 91 users
Forked by 11 users
Languages   C 100.0% | C 100.0%
Top answer
1 of 6
766

Just like with standard library containers, what library you should use depends on your needs. Here's a convenient flowchart:

So the first question is this: What do you need?

I Need Full XML Compliance

OK, so you need to process XML. Not toy XML, real XML. You need to be able to read and write all of the XML specification, not just the low-lying, easy-to-parse bits. You need Namespaces, DocTypes, entity substitution, the works. The W3C XML Specification, in its entirety.

The next question is: Does your API need to conform to DOM or SAX?

I Need Exact DOM and/or SAX Conformance

OK, so you really need the API to be DOM and/or SAX. It can't just be a SAX-style push parser, or a DOM-style retained parser. It must be the actual DOM or the actual SAX, to the extent that C++ allows.

You have chosen:

Xerces

That's your choice. It's pretty much the only C++ XML parser/writer that has full (or as near as C++ allows) DOM and SAX conformance. It also has XInclude support, XML Schema support, and a plethora of other features.

It has no real dependencies. It uses the Apache license.

I Don't Care About DOM and/or SAX Conformance

You have chosen:

LibXML2

LibXML2 offers a C-style interface (if that really bothers you, go use Xerces), though the interface is at least somewhat object-based and easily wrapped. It provides a lot of features, like XInclude support (with callbacks so that you can tell it where it gets the file from), an XPath 1.0 recognizer, RelaxNG and Schematron support (though the error messages leave a lot to be desired), and so forth.

It does have a dependency on iconv, but it can be configured without that dependency. Though that does mean that you'll have a more limited set of possible text encodings it can parse.

It uses the MIT license.

I Do Not Need Full XML Compliance

OK, so full XML compliance doesn't matter to you. Your XML documents are either fully under your control or are guaranteed to use the "basic subset" of XML: no namespaces, entities, etc.

So what does matter to you? The next question is: What is the most important thing to you in your XML work?

Maximum XML Parsing Performance

Your application needs to take XML and turn it into C++ datastructures as fast as this conversion can possibly happen.

You have chosen:

RapidXML

This XML parser is exactly what it says on the tin: rapid XML. It doesn't even deal with pulling the file into memory; how that happens is up to you. What it does deal with is parsing that into a series of C++ data structures that you can access. And it does this about as fast as it takes to scan the file byte by byte.

Of course, there's no such thing as a free lunch. Like most XML parsers that don't care about the XML specification, Rapid XML doesn't touch namespaces, DocTypes, entities (with the exception of character entities and the 6 basic XML ones), and so forth. So basically nodes, elements, attributes, and such.

Also, it is a DOM-style parser. So it does require that you read all of the text in. However, what it doesn't do is copy any of that text (usually). The way RapidXML gets most of its speed is by refering to strings in-place. This requires more memory management on your part (you must keep that string alive while RapidXML is looking at it).

RapidXML's DOM is bare-bones. You can get string values for things. You can search for attributes by name. That's about it. There are no convenience functions to turn attributes into other values (numbers, dates, etc). You just get strings.

One other downside with RapidXML is that it is painful for writing XML. It requires you to do a lot of explicit memory allocation of string names in order to build its DOM. It does provide a kind of string buffer, but that still requires a lot of explicit work on your end. It's certainly functional, but it's a pain to use.

It uses the MIT licence. It is a header-only library with no dependencies.

  • There is a RapidXML "GitHub patch" that allows it to also work with namespaces.

I Care About Performance But Not Quite That Much

Yes, performance matters to you. But maybe you need something a bit less bare-bones. Maybe something that can handle more Unicode, or doesn't require so much user-controlled memory management. Performance is still important, but you want something a little less direct.

You have chosen:

PugiXML

Historically, this served as inspiration for RapidXML. But the two projects have diverged, with Pugi offering more features, while RapidXML is focused entirely on speed.

PugiXML offers Unicode conversion support, so if you have some UTF-16 docs around and want to read them as UTF-8, Pugi will provide. It even has an XPath 1.0 implementation, if you need that sort of thing.

But Pugi is still quite fast. Like RapidXML, it has no dependencies and is distributed under the MIT License.

Reading Huge Documents

You need to read documents that are measured in the gigabytes in size. Maybe you're getting them from stdin, being fed by some other process. Or you're reading them from massive files. Or whatever. The point is, what you need is to not have to read the entire file into memory all at once in order to process it.

You have chosen:

LibXML2

Xerces's SAX-style API will work in this capacity, but LibXML2 is here because it's a bit easier to work with. A SAX-style API is a push-API: it starts parsing a stream and just fires off events that you have to catch. You are forced to manage context, state, and so forth. Code that reads a SAX-style API is a lot more spread out than one might hope.

LibXML2's xmlReader object is a pull-API. You ask to go to the next XML node or element; you aren't told. This allows you to store context as you see fit, to handle different entities in a way that's much more readable in code than a bunch of callbacks.

Alternatives

Expat

Expat is a well-known C++ parser that uses a pull-parser API. It was written by James Clark.

It's current status is active. The most recent version is 2.2.9, which was released on (2019-09-25).

LlamaXML

It is an implementation of an StAX-style API. It is a pull-parser, similar to LibXML2's xmlReader parser.

But it hasn't been updated since 2005. So again, Caveat Emptor.

XPath Support

XPath is a system for querying elements within an XML tree. It's a handy way of effectively naming an element or collection of element by common properties, using a standardized syntax. Many XML libraries offer XPath support.

There are effectively three choices here:

  • LibXML2: It provides full XPath 1.0 support. Again, it is a C API, so if that bothers you, there are alternatives.
  • PugiXML: It comes with XPath 1.0 support as well. As above, it's more of a C++ API than LibXML2, so you may be more comfortable with it.
  • TinyXML: It does not come with XPath support, but there is the TinyXPath library that provides it. TinyXML is undergoing a conversion to version 2.0, which significantly changes the API, so TinyXPath may not work with the new API. Like TinyXML itself, TinyXPath is distributed under the zLib license.

Just Get The Job Done

So, you don't care about XML correctness. Performance isn't an issue for you. Streaming is irrelevant. All you want is something that gets XML into memory and allows you to stick it back onto disk again. What you care about is API.

You want an XML parser that's going to be small, easy to install, trivial to use, and small enough to be irrelevant to your eventual executable's size.

You have chosen:

TinyXML

I put TinyXML in this slot because it is about as braindead simple to use as XML parsers get. Yes, it's slow, but it's simple and obvious. It has a lot of convenience functions for converting attributes and so forth.

Writing XML is no problem in TinyXML. You just new up some objects, attach them together, send the document to a std::ostream, and everyone's happy.

There is also something of an ecosystem built around TinyXML, with a more iterator-friendly API, and even an XPath 1.0 implementation layered on top of it.

TinyXML uses the zLib license, which is more or less the MIT License with a different name.

2 of 6
22

There is another approach to handling XML that you may want to consider, called XML data binding. Especially if you already have a formal specification of your XML vocabulary, for example, in XML Schema.

XML data binding allows you to use XML without actually doing any XML parsing or serialization. A data binding compiler auto-generates all the low-level code and presents the parsed data as C++ classes that correspond to your application domain. You then work with this data by calling functions, and working with C++ types (int, double, etc) instead of comparing strings and parsing text (which is what you do with low-level XML access APIs such as DOM or SAX).

See, for example, an open-source XML data binding implementation that I wrote, CodeSynthesis XSD and, for a lighter-weight, dependency-free version, CodeSynthesis XSD/e.

🌐
GitHub
github.com › ziord › cxml
GitHub - ziord/cxml: C XML Minimalistic Library (CXML) - An XML library for C with a focus on simplicity and ease of use. · GitHub
C XML Minimalistic Library (CXML) - An XML library for C with a focus on simplicity and ease of use. - ziord/cxml
Starred by 50 users
Forked by 3 users
Languages   C 99.3% | CMake 0.7%
🌐
Oracle
docs.oracle.com › cd › B10464_01 › web.904 › b12099 › adx23pac.htm
24 Using XML Parser for C
It is also available for download from the OTN site: http://otn.oracle.com/tech/xml · It is located in $ORACLE_HOME/xdk/c/parser.
🌐
Cprogramming
cboard.cprogramming.com › c-programming › 110390-parsing-xml-c-without-3rd-party-libs.html
Parsing XML with C without 3rd party libs
December 18, 2008 - And Welcome to the C forum. ... View Forum Posts Woof, woof! ... There are a few ansi XML parser libraries, what's wrong with using them? ie lib mini-xml (libmxml). It's a lot of work to tap out a standards compliant XML parser, even then you'd probably just be introducing new bugs.