Two examples with expat and libxml2. The second one is, IMHO, much easier to use since it creates a tree in memory, a data structure which is easy to work with. expat, on the other hand, does not build anything (you have to do it yourself), it just allows you to call handlers at specific events during the parsing. But expat may be faster (I didn't measure).
With expat, reading a XML file and displaying the elements indented:
/*
A simple test program to parse XML documents with expat
<http://expat.sourceforge.net/>. It just displays the element
names.
On Debian, compile with:
gcc -Wall -o expat-test -lexpat expat-test.c
Inspired from <http://www.xml.com/pub/a/1999/09/expat/index.html>
*/
#include <expat.h>
#include <stdio.h>
#include <string.h>
/* Keep track of the current level in the XML tree */
int Depth;
#define MAXCHARS 1000000
void
start(void *data, const char *el, const char **attr)
{
int i;
for (i = 0; i < Depth; i++)
printf(" ");
printf("%s", el);
for (i = 0; attr[i]; i += 2) {
printf(" %s='%s'", attr[i], attr[i + 1]);
}
printf("\n");
Depth++;
} /* End of start handler */
void
end(void *data, const char *el)
{
Depth--;
} /* End of end handler */
int
main(int argc, char **argv)
{
char *filename;
FILE *f;
size_t size;
char *xmltext;
XML_Parser parser;
if (argc != 2) {
fprintf(stderr, "Usage: %s filename\n", argv[0]);
return (1);
}
filename = argv[1];
parser = XML_ParserCreate(NULL);
if (parser == NULL) {
fprintf(stderr, "Parser not created\n");
return (1);
}
/* Tell expat to use functions start() and end() each times it encounters
* the start or end of an element. */
XML_SetElementHandler(parser, start, end);
f = fopen(filename, "r");
xmltext = malloc(MAXCHARS);
/* Slurp the XML file in the buffer xmltext */
size = fread(xmltext, sizeof(char), MAXCHARS, f);
if (XML_Parse(parser, xmltext, strlen(xmltext), XML_TRUE) ==
XML_STATUS_ERROR) {
fprintf(stderr,
"Cannot parse %s, file may be too large or not well-formed XML\n",
filename);
return (1);
}
fclose(f);
XML_ParserFree(parser);
fprintf(stdout, "Successfully parsed %i characters in file %s\n", size,
filename);
return (0);
}
With libxml2, a program which displays the name of the root element and the names of its children:
/*
Simple test with libxml2 <http://xmlsoft.org>. It displays the name
of the root element and the names of all its children (not
descendents, just children).
On Debian, compiles with:
gcc -Wall -o read-xml2 $(xml2-config --cflags) $(xml2-config --libs) \
read-xml2.c -lxml2
*/
#include <stdio.h>
#include <string.h>
#include <libxml/parser.h>
int
main(int argc, char **argv)
{
xmlDoc *document;
xmlNode *root, *first_child, *node;
char *filename;
if (argc < 2) {
fprintf(stderr, "Usage: %s filename.xml\n", argv[0]);
return 1;
}
filename = argv[1];
document = xmlReadFile(filename, NULL, 0);
root = xmlDocGetRootElement(document);
fprintf(stdout, "Root is <%s> (%i)\n", root->name, root->type);
first_child = root->children;
for (node = first_child; node; node = node->next) {
fprintf(stdout, "\t Child is <%s> (%i)\n", node->name, node->type);
}
fprintf(stdout, "...\n");
return 0;
}
Answer from bortzmeyer on Stack OverflowTwo examples with expat and libxml2. The second one is, IMHO, much easier to use since it creates a tree in memory, a data structure which is easy to work with. expat, on the other hand, does not build anything (you have to do it yourself), it just allows you to call handlers at specific events during the parsing. But expat may be faster (I didn't measure).
With expat, reading a XML file and displaying the elements indented:
/*
A simple test program to parse XML documents with expat
<http://expat.sourceforge.net/>. It just displays the element
names.
On Debian, compile with:
gcc -Wall -o expat-test -lexpat expat-test.c
Inspired from <http://www.xml.com/pub/a/1999/09/expat/index.html>
*/
#include <expat.h>
#include <stdio.h>
#include <string.h>
/* Keep track of the current level in the XML tree */
int Depth;
#define MAXCHARS 1000000
void
start(void *data, const char *el, const char **attr)
{
int i;
for (i = 0; i < Depth; i++)
printf(" ");
printf("%s", el);
for (i = 0; attr[i]; i += 2) {
printf(" %s='%s'", attr[i], attr[i + 1]);
}
printf("\n");
Depth++;
} /* End of start handler */
void
end(void *data, const char *el)
{
Depth--;
} /* End of end handler */
int
main(int argc, char **argv)
{
char *filename;
FILE *f;
size_t size;
char *xmltext;
XML_Parser parser;
if (argc != 2) {
fprintf(stderr, "Usage: %s filename\n", argv[0]);
return (1);
}
filename = argv[1];
parser = XML_ParserCreate(NULL);
if (parser == NULL) {
fprintf(stderr, "Parser not created\n");
return (1);
}
/* Tell expat to use functions start() and end() each times it encounters
* the start or end of an element. */
XML_SetElementHandler(parser, start, end);
f = fopen(filename, "r");
xmltext = malloc(MAXCHARS);
/* Slurp the XML file in the buffer xmltext */
size = fread(xmltext, sizeof(char), MAXCHARS, f);
if (XML_Parse(parser, xmltext, strlen(xmltext), XML_TRUE) ==
XML_STATUS_ERROR) {
fprintf(stderr,
"Cannot parse %s, file may be too large or not well-formed XML\n",
filename);
return (1);
}
fclose(f);
XML_ParserFree(parser);
fprintf(stdout, "Successfully parsed %i characters in file %s\n", size,
filename);
return (0);
}
With libxml2, a program which displays the name of the root element and the names of its children:
/*
Simple test with libxml2 <http://xmlsoft.org>. It displays the name
of the root element and the names of all its children (not
descendents, just children).
On Debian, compiles with:
gcc -Wall -o read-xml2 $(xml2-config --cflags) $(xml2-config --libs) \
read-xml2.c -lxml2
*/
#include <stdio.h>
#include <string.h>
#include <libxml/parser.h>
int
main(int argc, char **argv)
{
xmlDoc *document;
xmlNode *root, *first_child, *node;
char *filename;
if (argc < 2) {
fprintf(stderr, "Usage: %s filename.xml\n", argv[0]);
return 1;
}
filename = argv[1];
document = xmlReadFile(filename, NULL, 0);
root = xmlDocGetRootElement(document);
fprintf(stdout, "Root is <%s> (%i)\n", root->name, root->type);
first_child = root->children;
for (node = first_child; node; node = node->next) {
fprintf(stdout, "\t Child is <%s> (%i)\n", node->name, node->type);
}
fprintf(stdout, "...\n");
return 0;
}
How about one written in pure assembler :-) Don't forget to check out the benchmarks.
A simple C XML parser - Stack Overflow
GNU C library for parsing of XML files
Any C++ XML Parsers that are...C++?
Fast Haskell: Competing with C at parsing XML
Tada! We matched Hexml, in pure Haskell, using safe accessor functions. We provided a SAX API which is very fast, and a simple demonstration DOM parser with a familiar API which is also quite fast.
No. You didn't match it. As pointed out in many comment, you compare Hexml DOM API to your SAX API.
Even if you replied that "It also talk about implementing DOM API later in another section", that is intentional misleading.
It seems like you intentionally drop the distinction between SAX and DOM. Claiming that your library matched another library. How could it matches if it i not even the same DOM/SAX interface?
Then tack along DOM API in the end of the article, mumbling that "I could do better if I wanted to, but this is good enough", as if to mislead reader into thinking that DOM API is just a bonus. When, in fact, that's the whole point of the API you are comparing to.
More on reddit.comVideos
Libxml2
The Expat parser is the best I've come across - I use it in my C++ code in preference to the various C++ parsers - but it is written in C. Very easy to use and embed in your application. So I don't see why in your question you say:
(other than lib eXpat)
do you have something against it?