python parse html javascript

stackoverflow.com › questions › 27114673 › parsing-javascript-code-in-html-source

I suggest you take a look at the BeautifulSoup - it can help you extract JavaScript code from an HTML file (but not parse/run it):

source = """<html>...</html>"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(source)
js_code = soup.find_all("script")[0].text

Then you can use some JavaScript interpreter to run the code and get the variables - there are some out there like this one or this one. Just Google it.

Answer from Victor on Stack Overflow

Python

docs.python.org › 3 › library › html.parser.html

html.parser — Simple HTML and XHTML parser

Encountered an end tag : h1 Encountered an end tag : body Encountered an end tag : html ... Feed some text to the parser. It is processed insofar as it consists of complete elements; incomplete data is buffered until more data is fed or close() is called.

Stack Overflow

stackoverflow.com › questions › 27114673 › parsing-javascript-code-in-html-source

python - parsing JavaScript code in HTML source - Stack Overflow

Top answer

1 of 2

I suggest you take a look at the BeautifulSoup - it can help you extract JavaScript code from an HTML file (but not parse/run it):

source = """<html>...</html>"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(source)
js_code = soup.find_all("script")[0].text

Then you can use some JavaScript interpreter to run the code and get the variables - there are some out there like this one or this one. Just Google it.

2 of 2

-1

I think you need to add the fuction so the computer can read if it is javascript and python, use this:

script type="text/javascript">  <!-------or python----></script>

Discussions

python - How to parse html that includes javascript code - Stack Overflow

How does one parse html documents which make heavy use of javascript? I know there are a few libraries in python which can parse static xml/html files and I'm basically looking for a programme or l... More on stackoverflow.com

stackoverflow.com

November 7, 2011

How to parse JavaScript code in html source in Python? - Stack Overflow

I am trying to web scrape some data inside a JavaScript tag in a HTML source. The situation: I can get to the appropriate tag. But inside that tag, there is a big stri... More on stackoverflow.com

stackoverflow.com

Parsing html from a javascript rendered url with python object - Stack Overflow

I have successfully parsed the data that I want from the first page using some code from the following url: https://impythonist.wordpress.com/2015/01/06/ultimate-guide-for-scraping-javascript-rendered-web-pages More on stackoverflow.com

stackoverflow.com

Parsing HTML using Python - Stack Overflow

I'm looking for an HTML Parser module for Python that can help me get the tags in the form of Python lists/dictionaries/objects. If I have a document of the form: Heading More on stackoverflow.com

stackoverflow.com

Tomassetti

tomassetti.me › home › parsing html: a guide to select the right library

Parsing HTML: a guide to select the right library

September 21, 2017 - They are both quite powerful, but the first will be more familiar to users of JavaScript, while the other is more pythonic. from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') # it finds all nodes satisfying the regular expression # and having the matching id soup.find_all(href=re.compile("elsie"), id='link1') # [<a class="sister" href="http://example.com/elsie" id="link1">three</a>] # CSS selectors soup.select("p > a") # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

PyPI

pypi.org › project › AdvancedHTMLParser

AdvancedHTMLParser

JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser

Codementor

codementor.io › community › html parser — developer tools

HTML Parser — Developer Tools | Codementor

October 14, 2021 - ... <script type='text/javascript' src='js/bootstrap.js'></script> <script type='text/javascript' src='js/custom.js'></script> ... To scan the HTML soup for script tags, we can use the find_all helper:

Theautomatic

theautomatic.net › home › scraping data from a javascript webpage with python

Scraping data from a JavaScript webpage with Python - Open Source Automation

May 4, 2020 - Scraping data from a JavaScript-rendered website with Python and requests_html. requests_html is an alternative to Selenium and PhantomJS.

Stack Overflow

stackoverflow.com › questions › 7064109 › how-to-parse-html-that-includes-javascript-code

python - How to parse html that includes javascript code - Stack Overflow

Top answer

1 of 3

You can use Selenium with python as detailed here

Example:

import xmlrpclib

# Make an object to represent the XML-RPC server.
server_url = "http://localhost:8080/selenium-driver/RPC2"
app = xmlrpclib.ServerProxy(server_url)

# Bump timeout a little higher than the default 5 seconds
app.setTimeout(15)

import os
os.system('start run_firefox.bat')

print app.open('http://localhost:8080/AUT/000000A/http/www.amazon.com/')
print app.verifyTitle('Amazon.com: Welcome')
print app.verifySelected('url', 'All Products')
print app.select('url', 'Books')
print app.verifySelected('url', 'Books')
print app.verifyValue('field-keywords', '')
print app.type('field-keywords', 'Python Cookbook')
print app.clickAndWait('Go')
print app.verifyTitle('Amazon.com: Books Search Results: Python Cookbook')
print app.verifyTextPresent('Python Cookbook', '')
print app.verifyTextPresent('Alex Martellibot, David Ascher', '')
print app.testComplete()

2 of 3

From Mozilla Gecko FAQ:

Q. Can you invoke the Gecko engine from a Unix shell script? Could you send it HTML and get back a web page that might be sent to the printer?

A. Not really supported; you can probably get something close to what you want by writing your own application using Gecko's embedding APIs, though. Note that it's currently not possible to print without a widget on the screen to render to.

Embedding Gecko in a program that outputs what you want may be way too heavy, but at least your output will be as good as it gets.

Stack Overflow

stackoverflow.com › questions › 55984356 › how-to-parse-javascript-code-in-html-source-in-python

How to parse JavaScript code in html source in Python? - Stack Overflow

Top answer

1 of 1

You could just stick with regex on text alone without searching for script

import re
import requests

r = requests.get('https://www.khanacademy.org/computing/computer-programming/programming/drawing-basics/pt/making-drawings-with-code')
p = re.compile(r'contentId":"((?:(?!").)*)')  
i = p.findall(r.text)[0]
print(i)

Regex

Find elsewhere

Google Bing Mojeek

Tchut-Tchut Blog

beenje.github.io › blog › posts › parsing-javascript-rendered-pages-in-python-with-pyppeteer

Parsing JavaScript rendered pages in Python with pyppeteer | Tchut-Tchut Blog

June 2, 2018 - Pyppeteer allows you to do the same from Python. So there is no magic. You just let Chromium load and render the page with the latest JavaScript and browser features. This is super powerful. The first time you run pyppeteer, it even downloads a recent version of Chromium. So no initial setup is required. Pyppeteer is based on asyncio. This is hidden by requests-html that gives you a simple interface but of course less flexibility.

JanBask Training

janbasktraining.com › community › python-python › parse-an-html-string-with-js

Parse an HTML string with JS | JanBask Training Community

November 3, 2025 - Instead of treating that string as plain text, JavaScript provides native tools to convert it into DOM elements so you can interact with it just like other HTML elements on the page.

Stack Overflow

stackoverflow.com › questions › 47495643 › parsing-html-from-a-javascript-rendered-url-with-python-object

Parsing html from a javascript rendered url with python object - Stack Overflow

Top answer

1 of 1

How about using selenium and phantomjs instead of PyQt.
You can easily get selenium by executing "pip install selenium". If you use Mac you can get phantomjs by executing "brew install phantomjs". If your PC is Windows use choco instead of brew, or Ubuntu use apt-get.

from selenium import webdriver
from bs4 import BeautifulSoup

base_url = "https://uk.reuters.com"
first_page = "/business/markets/index/.FTSE?sortBy=&sortDir=&pn=1"

browser = webdriver.PhantomJS()

# PARSE THE HTML
browser.get(base_url + first_page)
soup = BeautifulSoup(browser.page_source, "lxml")
row_data = soup.find('div', attrs={'class':'column1 gridPanel grid8'})

# PARSE ALL ROW DATA
stripe_rows = row_data.findAll('tr', attrs={'class':'stripe'})
non_stripe_rows = row_data.findAll('tr', attrs={'class':''})
print(len(stripe_rows), len(non_stripe_rows))

# GO TO THE NEXT PAGE
next_button = soup.find("li", attrs={"class":"next"})
while next_button:
  next_page = next_button.find("a")["href"]
  browser.get(base_url + next_page)
  soup = BeautifulSoup(browser.page_source, "lxml")
  row_data = soup.find('div', attrs={'class':'column1 gridPanel grid8'})
  stripe_rows = row_data.findAll('tr', attrs={'class':'stripe'})
  non_stripe_rows = row_data.findAll('tr', attrs={'class':''})
  print(len(stripe_rows), len(non_stripe_rows))
  next_button = soup.find("li", attrs={"class":"next"})

# DONT FORGET THIS!!
browser.quit()

I know the code above is not efficient (too slow I feel), but I think that it will bring you the results you desire. In addition, if the web page you want to scrape does not use Javascript, even PhantomJS and selenium are unnecessary. You can use the requests module. However, since I wanted to show you the contrast with PyQt, I used PhantomJS and Selenium in this answer.

Medium

medium.com › @datajournal › how-to-parse-html-with-python-94495c11bc96

How to Parse HTML in Python: Top Libraries Tutorial | Medium

October 14, 2024 - You can use BeautifulSoup for easy navigation of the HTML structure and combine it with requests to fetch data from dynamic websites. If you need to interact with websites that use JavaScript to load content, you might need to use tools like Selenium or Playwright to first render the page and then parse the HTML.

ScrapingBee

scrapingbee.com › blog › python-html-parsers

How to parse HTML in Python: A step-by-step guide for beginners | ScrapingBee

January 16, 2026 - If you've ever tried to pull data ... HTML in Python. The web runs on HTML, and turning messy markup into clean, structured data is one of those rites of passage every dev goes through sooner or later. This guide walks you through the whole thing, step by step: fetching pages, parsing them properly, and doing it in a way that won't make websites hate you. We'll start simple, then jump into a real-world setup using ScrapingBee, which quietly handles the messy stuff like JavaScript rendering, ...

DEV Community

dev.to › sm0ke › html-parser-extact-html-information-with-ease-308m

HTML Parser - Extract HTML information with ease - DEV Community

November 30, 2024 - from bs4 import BeautifulSoup as bs # Load the HTML content html_file = open('index.html', 'r') html_content = html_file.read() html_file.close() # clean up # Initialize the BS object soup = bs(html_content,'html.parser') # At this point, we can interact with the HTML # elements stored in memory using all helpers offered by BS library · At this point, we have the DOM tree loaded in the BeautifulSoup object. Let's scan the DOM tree for Javascript files, the script nodes:

Requests

requests.readthedocs.io › projects › requests-html › en › latest

requests-HTML v0.3.4 documentation

>>> r = session.get('http://python-requests.org/') >>> r.html.find('a', containing='kenneth') [<Element 'a' href='http://kennethreitz.com/pages/open-projects.html'>, <Element 'a' href='http://kennethreitz.org/'>, <Element 'a' href='https://twitter.com/kennethreitz' class=('twitter-follow-button',) data-show-count='false'>, <Element 'a' class=('reference', 'internal') href='dev/contributing/#kenneth-reitz-s-code-style'>] Let’s grab some text that’s rendered by JavaScript:

Kennethreitz

requests-html.kennethreitz.org

Requests-HTML: HTML Parsing for Humans (writing Python 3)! — requests-HTML v0.3.4 documentation

This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. ... Full JavaScript support! CSS Selectors (a.k.a jQuery-style, thanks to PyQuery). XPath Selectors, for the faint at heart. Mocked user-agent (like a real web browser). Automatic following of redirects. ... The Requests experience you know and love, with magical parsing abilities. ... Only Python 3.6 is supported.

ZenRows

zenrows.com › homepage › tutorial › how to parse html with python (using the top 6 parsers)

How to Parse HTML With Python (Using The Top 6 Parsers) - ZenRows

October 7, 2024 - Though a reliable tool, it has limitations when dealing with unstructured HTML and does not support JavaScript-rendered content. Its smaller community and infrequent updates can also limit its usefulness for complex scraping projects. No external dependencies (built into Python's standard library). Lightweight and fast for basic tasks. Good control over the parsing process through an event-driven method.

Python Programming

pythonprogramming.net › javascript-dynamic-scraping-parsing-beautiful-soup-tutorial

Scraping Dynamic Javascript Text

To simulate this, I have added the following code to the parsememcparseface page: <p>Javascript (dynamic data) test:</p> <p class='jstest' id='yesnojs'>y u bad tho?</p> <script> document.getElementById('yesnojs').innerHTML = 'Look at you shinin!'; </script> The code basically takes regular paragraph tags, with the class of jstest, and initially returns the text y u bad tho?. After this, however, there is some javascript defined that will subsequently update that jstest paragraph data to be Look at you shinin!.

Roborabbit

roborabbit.com › blog › top-5-python-html-parser

Top 5 Python HTML Parsers

requests-html is a Python library that intends to make parsing HTML as simple and intuitive as possible. It is built on top of requests, extending the HTTP-making library with HTML parsing abilities.

Stack Overflow

stackoverflow.com › questions › 11709079 › parsing-html-using-python

Parsing HTML using Python - Stack Overflow

Top answer

1 of 8

291

So that I can ask it to get me the content/text in the div tag with class='container' contained within the body tag, Or something similar.

try: 
    from BeautifulSoup import BeautifulSoup
except ImportError:
    from bs4 import BeautifulSoup
html = #the HTML code you've written above
parsed_html = BeautifulSoup(html)
print(parsed_html.body.find('div', attrs={'class':'container'}).text)

You don't need performance descriptions I guess - just read how BeautifulSoup works. Look at its official documentation.

2 of 8

114

I guess what you're looking for is pyquery:

pyquery: a jquery-like library for python.

An example of what you want may be like:

from pyquery import PyQuery    
html = # Your HTML CODE
pq = PyQuery(html)
tag = pq('div#id') # or     tag = pq('div.class')
print tag.text()

And it uses the same selectors as Firefox's or Chrome's inspect element. For example:

The inspected element selector is 'div#mw-head.noprint'. So in pyquery, you just need to pass this selector:

pq('div#mw-head.noprint')