python parse html table example

stackoverflow.com › questions › 6325216 › parse-html-table-to-python-list

You should use some HTML parsing library like lxml:

from lxml import etree
s = """<table>
  <tr><th>Event</th><th>Start Date</th><th>End Date</th></tr>
  <tr><td>a</td><td>b</td><td>c</td></tr>
  <tr><td>d</td><td>e</td><td>f</td></tr>
  <tr><td>g</td><td>h</td><td>i</td></tr>
</table>
"""
table = etree.HTML(s).find("body/table")
rows = iter(table)
headers = [col.text for col in next(rows)]
for row in rows:
    values = [col.text for col in row]
    print dict(zip(headers, values))

prints

{'End Date': 'c', 'Start Date': 'b', 'Event': 'a'}
{'End Date': 'f', 'Start Date': 'e', 'Event': 'd'}
{'End Date': 'i', 'Start Date': 'h', 'Event': 'g'}

Answer from Sven Marnach on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 6325216 › parse-html-table-to-python-list

Parse HTML table to Python list? - Stack Overflow

Top answer

1 of 4

You should use some HTML parsing library like lxml:

from lxml import etree
s = """<table>
  <tr><th>Event</th><th>Start Date</th><th>End Date</th></tr>
  <tr><td>a</td><td>b</td><td>c</td></tr>
  <tr><td>d</td><td>e</td><td>f</td></tr>
  <tr><td>g</td><td>h</td><td>i</td></tr>
</table>
"""
table = etree.HTML(s).find("body/table")
rows = iter(table)
headers = [col.text for col in next(rows)]
for row in rows:
    values = [col.text for col in row]
    print dict(zip(headers, values))

prints

{'End Date': 'c', 'Start Date': 'b', 'Event': 'a'}
{'End Date': 'f', 'Start Date': 'e', 'Event': 'd'}
{'End Date': 'i', 'Start Date': 'h', 'Event': 'g'}

2 of 4

Hands down the easiest way to parse a HTML table is to use pandas.read_html() - it accepts both URLs and HTML.

import pandas as pd
url = r'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
tables = pd.read_html(url) # Returns list of all tables on page
sp500_table = tables[0] # Select table of interest

As of pandas version 1.5.0, read_html() can preserve hyperlinks with the extract_links argument. Table elements will be tuples.

Python

docs.python.org › 3 › library › html.parser.html

html.parser — Simple HTML and XHTML parser

Encountered a start tag: html Encountered a start tag: head Encountered a start tag: title Encountered some data : Test Encountered an end tag : title Encountered an end tag : head Encountered a start tag: body Encountered a start tag: h1 ...

Discussions

Any way to parse HTML tables?

You can probably find some JavaScript routine that can convert html table data to json. Then run the JavaScript in a URL action. I saw this ability to run JavaScript here and tried it out yesterday: https://www.reddit.com/r/shortcuts/comments/9hpplv/is_it_possible_to_read_the_contents_of_an_xml_file/?st=JNYO7T87&sh=bf71ea97 But just saw that Pretty Print also does this: https://www.reddit.com/r/shortcuts/comments/9mk9br/pretty_print_dictionary/?st=JNYOAYQW&sh=0a9cdf3d More on reddit.com

r/shortcuts

November 1, 2018

How to scrap a html table with bs4?

See how to write csv https://realpython.com/python-csv/ Altough I would create a loop on each tr and on each of those loop on each td for tr in soup.find_all("tr"): for td in tr.find_all("td"): handle td.text You might also want to consider pandas here https://www.geeksforgeeks.org/convert-html-table-into-csv-file-in-python/ More on reddit.com

r/learnpython

October 23, 2020

Guide to Scrape HTML Table Using Python

1.5M subscribers in the Python community. The largest Python community for Reddit! Stay up to date with the latest news, packages, and meta… More on reddit.com

r/Python

October 29, 2022

How to read a table in beautiful soup, and parse the elements

tables=soup.findAll('table')[0].findAll('tr') findAll('table') gets you list of tables. [0] is indexing the first item in the list. You could loop through each table: for table in soup.findAll('table'): or use pandas pandas has a read_html function - you could also try: tables = pandas.read_html(page.content) It returns a list of dataframes - one per table found. More on reddit.com

r/learnpython

January 5, 2022

Videos

youtube.com

How to Extract Tables from HTML and Webpages using Python