You can try xml2js. It's a simple XML to JavaScript object converter. It gets your XML converted to a JS object so that you can access its content with ease.
Here are some other options:
- libxmljs
- xml-stream
- xmldoc
- cheerio – implements a subset of core jQuery for XML (and HTML)
I have used xml2js and it has worked fine for me. The rest you might have to try out for yourself.
» npm install fast-xml-parser
Videos
You can try xml2js. It's a simple XML to JavaScript object converter. It gets your XML converted to a JS object so that you can access its content with ease.
Here are some other options:
- libxmljs
- xml-stream
- xmldoc
- cheerio – implements a subset of core jQuery for XML (and HTML)
I have used xml2js and it has worked fine for me. The rest you might have to try out for yourself.
This answer concerns developers for Windows. You want to pick an XML parsing module that does NOT depend on node-expat. Node-expat requires node-gyp and node-gyp requires you to install Visual Studio on your machine. If your machine is a Windows Server, you definitely don't want to install Visual Studio on it.
So, which XML parsing module to pick?
Save yourself a lot of trouble and use either xml2js or xmldoc. They depend on sax.js which is a pure Javascript solution that doesn't require node-gyp.
Both libxmljs and xml-stream require node-gyp. Don't pick these unless you already have Visual Studio on your machine installed or you don't mind going down that road.
Update 2015-10-24: it seems somebody found a solution to use node-gyp on Windows without installing VS: https://github.com/nodejs/node-gyp/issues/629#issuecomment-138276692
I have a large XML file (around 2–3 GB) and I want to parse it within a few seconds using Node.js. I tried packages like xml-flow and xml-stream, but they take 20–30 minutes to finish.
Is there any faster way to do this in Node.js or should I use a different language/tool?
context:
I'm building a job distribution system. During client onboarding, we ask clients to provide a feed URL (usually a .xml or .xml.gz file) containing millions of <job> nodes — sometimes the file is 2–3 GB or more.
I don't want to fully process or store the feed at this stage. Instead, we just need to:
-
Count the number of <job> nodes
-
Extract all unique field names used inside the <job> nodes
-
Display this info in real-time to help map client fields to our internal DB structure
This should ideally happen in a few seconds, not minutes. But even with streaming parsers like xml-flow or sax, the analysis is taking 20–30 minutes.
I stream the file using gzip decompression (zlib) and process it as it downloads. so I'm not waiting for the full download. The actual slowdown is from traversing millions of nodes, especially when different job entries have different or optional fields.
» npm install xml2js
» npm install node-xml-parser