Nowadays, there is at least one better tool, called slimit:

SlimIt is a JavaScript minifier written in Python. It compiles JavaScript into more compact code so that it downloads and runs faster.

SlimIt also provides a library that includes a JavaScript parser, lexer, pretty printer and a tree visitor.

Demo:

Imagine we have the following javascript code:

Copy$.ajax({
    type: "POST",
    url: 'http://www.example.com',
    data: {
        email: 'abc@g.com',
        phone: '9999999999',
        name: 'XYZ'
    }
});

And now we need to get email, phone and name values from the data object.

The idea here would be to instantiate a slimit parser, visit all nodes, filter all assignments and put them into the dictionary:

Copyfrom slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor


data = """
$.ajax({
    type: "POST",
    url: 'http://www.example.com',
    data: {
        email: 'abc@g.com',
        phone: '9999999999',
        name: 'XYZ'
    }
});
"""

parser = Parser()
tree = parser.parse(data)
fields = {getattr(node.left, 'value', ''): getattr(node.right, 'value', '')
          for node in nodevisitor.visit(tree)
          if isinstance(node, ast.Assign)}

print fields

It prints:

Copy{'name': "'XYZ'", 
 'url': "'http://www.example.com'", 
 'type': '"POST"', 
 'phone': "'9999999999'", 
 'data': '', 
 'email': "'abc@g.com'"}
Answer from alecxe on Stack Overflow
🌐
GitHub
github.com › PiotrDabkowski › pyjsparser
GitHub - PiotrDabkowski/pyjsparser: Fast JavaScript parser for Python. · GitHub
Fast JavaScript parser - manual translation of esprima.js to python.
Starred by 255 users
Forked by 40 users
Languages   JavaScript 82.5% | Python 17.5%
Top answer
1 of 5
52

Nowadays, there is at least one better tool, called slimit:

SlimIt is a JavaScript minifier written in Python. It compiles JavaScript into more compact code so that it downloads and runs faster.

SlimIt also provides a library that includes a JavaScript parser, lexer, pretty printer and a tree visitor.

Demo:

Imagine we have the following javascript code:

Copy$.ajax({
    type: "POST",
    url: 'http://www.example.com',
    data: {
        email: 'abc@g.com',
        phone: '9999999999',
        name: 'XYZ'
    }
});

And now we need to get email, phone and name values from the data object.

The idea here would be to instantiate a slimit parser, visit all nodes, filter all assignments and put them into the dictionary:

Copyfrom slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor


data = """
$.ajax({
    type: "POST",
    url: 'http://www.example.com',
    data: {
        email: 'abc@g.com',
        phone: '9999999999',
        name: 'XYZ'
    }
});
"""

parser = Parser()
tree = parser.parse(data)
fields = {getattr(node.left, 'value', ''): getattr(node.right, 'value', '')
          for node in nodevisitor.visit(tree)
          if isinstance(node, ast.Assign)}

print fields

It prints:

Copy{'name': "'XYZ'", 
 'url': "'http://www.example.com'", 
 'type': '"POST"', 
 'phone': "'9999999999'", 
 'data': '', 
 'email': "'abc@g.com'"}
2 of 5
23

ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.

The ANTLR site provides many grammars, including one for JavaScript.

As it happens, there is a Python API available - so you can call the lexer (recognizer) generated from the grammar directly from Python (good luck).

Discussions

JavaScript parser for Python
Nothing comes to mind off the top of my head, but if you cant find anything like this, it does sound like something you could use a regex search to achieve. More on reddit.com
🌐 r/learnpython
7
2
June 2, 2022
Parsing Javascript In Python - Stack Overflow
I usually use Beautiful Soup to parse html that I need, but I came across some Javascript that I would like to get from here. More on stackoverflow.com
🌐 stackoverflow.com
December 16, 2016
javascript - Parse JS file via Python - Stack Overflow
have a question, how can I properly get the value for field 'data' from the following JS file via Python? Tried do it like parsing a json, but for json.load it's in incorrect format. So will be tha... More on stackoverflow.com
🌐 stackoverflow.com
How can I parse Javascript variables using python? - Stack Overflow
The problem: A website I am trying to gather data from uses Javascript to produce a graph. I'd like to be able to pull the data that is being used in the graph, but I am not sure where to start. For More on stackoverflow.com
🌐 stackoverflow.com
🌐
Reddit
reddit.com › r/webscraping › how do you parse a javascript script with python?
r/webscraping on Reddit: How do you parse a javascript script with python?
May 21, 2022 -

Dear webscrapers,

I'm scraping this website by intercepting the API with insomnia and requests. The problem is, I don't get a JSON but a javascript script containing javascript and multiple JSONS with the data I want. Does anybody know how I can parse the script to get whatever I want?

Any help appreciated, have been stuck on this for some time ^^'

🌐
PyPI
pypi.org › project › esprima
esprima · PyPI
>>> import esprima >>> program = 'const answer = 42' >>> esprima.tokenize(program) [{ type: "Keyword", value: "const" }, { type: "Identifier", value: "answer" }, { type: "Punctuator", value: "=" }, { type: "Numeric", value: "42" }] >>> esprima.parseScript(program) { body: [ { kind: "const", declarations: [ { init: { raw: "42", type: "Literal", value: 42 }, type: "VariableDeclarator", id: { type: "Identifier", name: "answer" } } ], type: "VariableDeclaration" } ], type: "Program", sourceType: "script" } For more information, please read the complete documentation. ... Author: German M. Bravo (Kronuz) Tags esprima , ecmascript , javascript , parser , ast
      » pip install esprima
    
Published   Aug 24, 2018
Version   4.0.1
🌐
Python Programming
pythonprogramming.net › javascript-dynamic-scraping-parsing-beautiful-soup-tutorial
Scraping Dynamic Javascript Text
Upon having done that, we can see the javascript data! Look at you shinin! ... import dryscrape sess = dryscrape.Session() sess.visit('https://pythonprogramming.net/parsememcparseface/') source = sess.body() soup = bs.BeautifulSoup(source,'lxml') js_test = soup.find('p', class_='jstest') print(js_test.text)
🌐
GitHub
github.com › differentmatt › filbert
GitHub - differentmatt/filbert: JavaScript parser of Python · GitHub
parse_dammit(input, options) takes the same arguments and returns the same syntax tree as the parse function in filbert.js, but never raises an error, and will do its best to parse syntactically invalid code in as meaningful a way as it can. It'll insert identifier nodes with name "✖" as placeholders in places where it can't make sense of the input. Depends on filbert.js, because it uses the same tokenizer. Python3 is the target language.
Starred by 139 users
Forked by 26 users
Languages   JavaScript 82.5% | HTML 17.1% | CSS 0.4%
🌐
GitHub
github.com › Nykakin › chompjs
GitHub - Nykakin/chompjs: Parsing JavaScript objects into Python data structures · GitHub
chompjs library was designed to bypass this limitation, and it allows to scrape such JavaScript objects into proper Python dictionaries: >>> import chompjs >>> >>> chompjs.parse_js_object("{'a': 'b'}") {'a': 'b'} >>> chompjs.parse_js_object('{a: "b"}') {'a': 'b'} >>> chompjs.parse_js_object('{"a": [1, 2, 3,]}') {'a': [1, 2, 3]} >>> chompjs.parse_js_object('{"a": .99}') {'a': 0.99}
Starred by 221 users
Forked by 12 users
Languages   C 54.9% | Python 45.1%
Find elsewhere
🌐
Tchut-Tchut Blog
beenje.github.io › blog › posts › parsing-javascript-rendered-pages-in-python-with-pyppeteer
Parsing JavaScript rendered pages in Python with pyppeteer | Tchut-Tchut Blog
June 2, 2018 - Pyppeteer allows you to do the same from Python. So there is no magic. You just let Chromium load and render the page with the latest JavaScript and browser features.
🌐
LinkedIn
linkedin.com › pulse › chatgpt-javascript-parser-bipin-patwardhan
ChatGPT and JavaScript parser
August 6, 2023 - You can install it using `pip install pyparsing`. This code snippet defines a basic grammar for JavaScript using pyparsing and then parses the sample JavaScript code provided in the `js_code` variable.
🌐
Readthedocs
openerp-web-v7.readthedocs.io › en › stable
py.js, a Python expressions parser and evaluator — py.js 0.6 documentation
To evaluate a Python expression, simply call py.eval(). py.eval() takes a mandatory Python expression parameter, as a string, and an optional evaluation context (namespace for the expression’s free variables), and returns a javascript value:
🌐
Reddit
reddit.com › r/learnpython › javascript parser for python
r/learnpython on Reddit: JavaScript parser for Python
June 2, 2022 -

Hello, I'm looking for a framework that will help me read a JS file locally and edit it. I want to be able to find a specific function inside the file and edit it, adding new lines of code at the beginning or at the end.

🌐
Plain English
python.plainenglish.io › use-python-crawler-to-parse-javascript-object-6ca1ceeb067e
Use Python Crawler to Parse JavaScript Object | by Richard | Python in Plain English
November 7, 2023 - This format is called JavaScript Object. It looks a lot like a Python dictionary, and also looks a lot like JSON. However, if this format is used in Python, whether it is directly used for dictionary parsing or JSON parsing, an error will be reported, as shown in the following figure:
🌐
Javawithus
javawithus.com › en › how-to-parse-an-html-page-with-javascript-in-python-3
How to parse an html page with JavaScript in python 3?
March 15, 2021 - To get static data from html, javascript text, you can use the appropriate parsers, such as BeautifulSoup, slimit. Example: How can I use Beautiful Soup to search for a keyword if this word is in the script tag? To get information from a web page whose elements javascript dynamically generates, you can use a web browser. To manage different browsers from Python, selenium webdriver helps: example showing the GUI.
Top answer
1 of 4
12

If your format really is just one or more var foo = [JSON array or object literal];, you can just write a dotall regex to extract them, then parse each one as JSON. For example:

Copy>>> j = '''var line1=
[["Wed, 12 Jun 2013 01:00:00 +0000",22.4916114807,"2 sold"],
["Fri, 14 Jun 2013 01:00:00 +0000",27.4950008392,"2 sold"],
["Sun, 16 Jun 2013 01:00:00 +0000",19.5499992371,"1 sold"],
["Tue, 18 Jun 2013 01:00:00 +0000",17.25,"1 sold"],
["Sun, 23 Jun 2013 01:00:00 +0000",15.5420341492,"2 sold"],
["Thu, 27 Jun 2013 01:00:00 +0000",8.79045295715,"3 sold"],
["Fri, 28 Jun 2013 01:00:00 +0000",10,"1 sold"]];\s*$'''
>>> values = re.findall(r'var.*?=\s*(.*?);', j, re.DOTALL | re.MULTILINE)
>>> for value in values:
...     print(json.loads(value))
[[['Wed, 12 Jun 2013 01:00:00 +0000', 22.4916114807, '2 sold'],
  ['Fri, 14 Jun 2013 01:00:00 +0000', 27.4950008392, '2 sold'],
  ['Sun, 16 Jun 2013 01:00:00 +0000', 19.5499992371, '1 sold'],
  ['Tue, 18 Jun 2013 01:00:00 +0000', 17.25, '1 sold'],
  ['Sun, 23 Jun 2013 01:00:00 +0000', 15.5420341492, '2 sold'],
  ['Thu, 27 Jun 2013 01:00:00 +0000', 8.79045295715, '3 sold'],
  ['Fri, 28 Jun 2013 01:00:00 +0000', 10, '1 sold']]]

Of course this makes a few assumptions:

  • A semicolon at the end of the line must be an actual statement separator, not the middle of a string. This should be safe because JS doesn't have Python-style multiline strings.
  • The code actually does have semicolons at the end of each statement, even though they're optional in JS. Most JS code has those semicolons, but it obviously isn't guaranteed.
  • The array and object literals really are JSON-compatible. This definitely isn't guaranteed; for example, JS can use single-quoted strings, but JSON can't. But it does work for your example.
  • Your format really is this well-defined. For example, if there might be a statement like var line2 = [[1]] + line1; in the middle of your code, it's going to cause problems.

Note that if the data might contain JavaScript literals that aren't all valid JSON, but are all valid Python literals (which isn't likely, but isn't impossible, either), you can use ast.literal_eval on them instead of json.loads. But I wouldn't do that unless you know this is the case.

2 of 4
5

Okay, so there are a few ways to do it, but I ended up simply using a regular expression to find everything between line1= and ;

Copy#Read page data as a string
pageData = sock.read()
#set p as regular expression
p = re.compile('(?<=line1=)(.*)(?=;)')
#find all instances of regular expression in pageData
parsed = p.findall(pageData)
#evaluate list as python code => turn into list in python
newParsed = eval(parsed[0])

Regex is nice when you have good coding, but is this method better (EDIT: or worse!) than any of the other answers here?

EDIT: I ultimately used the following:

Copy#Read page data as a string
pageData = sock.read()
#set p as regular expression
p = re.compile('(?<=line1=)(.*)(?=;)')
#find all instances of regular expression in pageData
parsed = p.findall(pageData)
#load as JSON instead of using evaluate to prevent risky execution of unknown code
newParsed = json.loads(parsed[0])
🌐
PyPI
pypi.org › project › calmjs.parse
calmjs.parse · PyPI
As the Calmjs project provides a framework that produces and consume these module definitions, the the ability to have a comprehensive understanding of given JavaScript sources is a given. This goal was originally achieved using slimit, a JavaScript minifier library that also provided a comprehensive parser class that was built using Python Lex-Yacc (i.e.
      » pip install calmjs.parse
    
Published   Nov 08, 2025
Version   1.3.4
🌐
LinuxQuestions.org
linuxquestions.org › questions › programming-9 › parsing-javascript-to-json-using-python-3-a-4175542790
[SOLVED] Parsing Javascript To JSON Using Python 3
May 16, 2015 - this is a very specific request, and for that i apologise, but i am at a loss for what to do.. for a javascript project i am working on i want to be
🌐
npm
npmjs.com › package › dt-python-parser
dt-python-parser - npm
In addition, several auxiliary methods are provided, for example, to filter comments of type # and """ in Python statements. ... Tip: The current Parser is the Javascript language version, if necessary, you can try to compile the Grammar file to other target languages
      » npm install dt-python-parser
    
Published   Apr 07, 2022
Version   0.9.0