I suggest you take a look at the BeautifulSoup - it can help you extract JavaScript code from an HTML file (but not parse/run it):
source = """<html>...</html>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(source)
js_code = soup.find_all("script")[0].text
Then you can use some JavaScript interpreter to run the code and get the variables - there are some out there like this one or this one. Just Google it.
Answer from Victor on Stack OverflowI suggest you take a look at the BeautifulSoup - it can help you extract JavaScript code from an HTML file (but not parse/run it):
source = """<html>...</html>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(source)
js_code = soup.find_all("script")[0].text
Then you can use some JavaScript interpreter to run the code and get the variables - there are some out there like this one or this one. Just Google it.
I think you need to add the fuction so the computer can read if it is javascript and python, use this:
script type="text/javascript"> <!-------or python----></script>
python - How to parse html that includes javascript code - Stack Overflow
How to parse JavaScript code in html source in Python? - Stack Overflow
Parsing html from a javascript rendered url with python object - Stack Overflow
Parsing HTML using Python - Stack Overflow
You can use Selenium with python as detailed here
Example:
import xmlrpclib
# Make an object to represent the XML-RPC server.
server_url = "http://localhost:8080/selenium-driver/RPC2"
app = xmlrpclib.ServerProxy(server_url)
# Bump timeout a little higher than the default 5 seconds
app.setTimeout(15)
import os
os.system('start run_firefox.bat')
print app.open('http://localhost:8080/AUT/000000A/http/www.amazon.com/')
print app.verifyTitle('Amazon.com: Welcome')
print app.verifySelected('url', 'All Products')
print app.select('url', 'Books')
print app.verifySelected('url', 'Books')
print app.verifyValue('field-keywords', '')
print app.type('field-keywords', 'Python Cookbook')
print app.clickAndWait('Go')
print app.verifyTitle('Amazon.com: Books Search Results: Python Cookbook')
print app.verifyTextPresent('Python Cookbook', '')
print app.verifyTextPresent('Alex Martellibot, David Ascher', '')
print app.testComplete()
From Mozilla Gecko FAQ:
Q. Can you invoke the Gecko engine from a Unix shell script? Could you send it HTML and get back a web page that might be sent to the printer?
A. Not really supported; you can probably get something close to what you want by writing your own application using Gecko's embedding APIs, though. Note that it's currently not possible to print without a widget on the screen to render to.
Embedding Gecko in a program that outputs what you want may be way too heavy, but at least your output will be as good as it gets.
So that I can ask it to get me the content/text in the div tag with class='container' contained within the body tag, Or something similar.
try:
from BeautifulSoup import BeautifulSoup
except ImportError:
from bs4 import BeautifulSoup
html = #the HTML code you've written above
parsed_html = BeautifulSoup(html)
print(parsed_html.body.find('div', attrs={'class':'container'}).text)
You don't need performance descriptions I guess - just read how BeautifulSoup works. Look at its official documentation.
I guess what you're looking for is pyquery:
pyquery: a jquery-like library for python.
An example of what you want may be like:
from pyquery import PyQuery
html = # Your HTML CODE
pq = PyQuery(html)
tag = pq('div#id') # or tag = pq('div.class')
print tag.text()
And it uses the same selectors as Firefox's or Chrome's inspect element. For example:
The inspected element selector is 'div#mw-head.noprint'. So in pyquery, you just need to pass this selector:
pq('div#mw-head.noprint')
