You should use some HTML parsing library like lxml:
from lxml import etree
s = """<table>
<tr><th>Event</th><th>Start Date</th><th>End Date</th></tr>
<tr><td>a</td><td>b</td><td>c</td></tr>
<tr><td>d</td><td>e</td><td>f</td></tr>
<tr><td>g</td><td>h</td><td>i</td></tr>
</table>
"""
table = etree.HTML(s).find("body/table")
rows = iter(table)
headers = [col.text for col in next(rows)]
for row in rows:
values = [col.text for col in row]
print dict(zip(headers, values))
prints
{'End Date': 'c', 'Start Date': 'b', 'Event': 'a'}
{'End Date': 'f', 'Start Date': 'e', 'Event': 'd'}
{'End Date': 'i', 'Start Date': 'h', 'Event': 'g'}
Answer from Sven Marnach on Stack Overflow Top answer 1 of 4
90
You should use some HTML parsing library like lxml:
from lxml import etree
s = """<table>
<tr><th>Event</th><th>Start Date</th><th>End Date</th></tr>
<tr><td>a</td><td>b</td><td>c</td></tr>
<tr><td>d</td><td>e</td><td>f</td></tr>
<tr><td>g</td><td>h</td><td>i</td></tr>
</table>
"""
table = etree.HTML(s).find("body/table")
rows = iter(table)
headers = [col.text for col in next(rows)]
for row in rows:
values = [col.text for col in row]
print dict(zip(headers, values))
prints
{'End Date': 'c', 'Start Date': 'b', 'Event': 'a'}
{'End Date': 'f', 'Start Date': 'e', 'Event': 'd'}
{'End Date': 'i', 'Start Date': 'h', 'Event': 'g'}
2 of 4
79
Hands down the easiest way to parse a HTML table is to use pandas.read_html() - it accepts both URLs and HTML.
import pandas as pd
url = r'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
tables = pd.read_html(url) # Returns list of all tables on page
sp500_table = tables[0] # Select table of interest
As of pandas version 1.5.0, read_html() can preserve hyperlinks with the extract_links argument. Table elements will be tuples.
Any way to parse HTML tables?
You can probably find some JavaScript routine that can convert html table data to json. Then run the JavaScript in a URL action. I saw this ability to run JavaScript here and tried it out yesterday: https://www.reddit.com/r/shortcuts/comments/9hpplv/is_it_possible_to_read_the_contents_of_an_xml_file/?st=JNYO7T87&sh=bf71ea97 But just saw that Pretty Print also does this: https://www.reddit.com/r/shortcuts/comments/9mk9br/pretty_print_dictionary/?st=JNYOAYQW&sh=0a9cdf3d More on reddit.com
How to scrap a html table with bs4?
See how to write csv https://realpython.com/python-csv/ Altough I would create a loop on each tr and on each of those loop on each td for tr in soup.find_all("tr"): for td in tr.find_all("td"): handle td.text You might also want to consider pandas here https://www.geeksforgeeks.org/convert-html-table-into-csv-file-in-python/ More on reddit.com
Guide to Scrape HTML Table Using Python
1.5M subscribers in the Python community. The largest Python community for Reddit! Stay up to date with the latest news, packages, and meta… More on reddit.com
How to read a table in beautiful soup, and parse the elements
tables=soup.findAll('table')[0].findAll('tr') findAll('table') gets you list of tables. [0] is indexing the first item in the list. You could loop through each table: for table in soup.findAll('table'): or use pandas pandas has a read_html function - you could also try: tables = pandas.read_html(page.content) It returns a list of dataframes - one per table found. More on reddit.com
Which Python library is best for beginners to parse HTML?
BeautifulSoup is the best starting point. It's simple, well-documented, and quite forgiving of broken HTML. Pair it with the lxml parser for a balance of speed and flexibility.
scrapingbee.com
scrapingbee.com › blog › python-html-parsers
How to parse HTML in Python: A step-by-step guide for beginners ...
What's the fastest Python library for parsing HTML?
lxml is the fastest option since it's written in C and supports XPath for precise queries. It's ideal for large-scale or performance-sensitive scraping projects.
scrapingbee.com
scrapingbee.com › blog › python-html-parsers
How to parse HTML in Python: A step-by-step guide for beginners ...
Can I parse XML or JSON with the same tools?
BeautifulSoup and lxml can handle XML, but JSON requires Python's built-in json module. Many sites provide JSON APIs, which are often easier to use than scraping HTML.
scrapingbee.com
scrapingbee.com › blog › python-html-parsers
How to parse HTML in Python: A step-by-step guide for beginners ...
Videos
How to Extract Tables from HTML and Webpages using Python
11:16
Parsing HTML Tables with Python to a Dictionary - YouTube
16:58
How to Parse HTML Tables to JSON With Python - YouTube
04:04
Extracting HTML Data Tables as Pandas Dataframes in Python - YouTube
11:23
Learn How to Read HTML Tables with Pandas in Minutes - YouTube
06:58
BeautifulSoup + Requests | Web Scraping in Python - YouTube
Scott Rome
srome.github.io › Parsing-HTML-Tables-in-Python-with-BeautifulSoup-and-pandas
Parsing HTML Tables in Python with BeautifulSoup and pandas
May 30, 2016 - In the next bit of code, we define a website that is simply the HTML for a table. We load it into BeautifulSoup and parse it, returning a pandas data frame of the contents. As you can see, we grab all the tr elements from the table, followed by grabbing the td elements one at a time. We use the “get_text()” method from the td element (called a column in each iteration) and put it into our python object representing a table (it will eventually be a pandas dataframe).
PyPI
pypi.org › project › html-table-parser-python3
html-table-parser-python3 · PyPI
» pip install html-table-parser-python3
TutorialsPoint
tutorialspoint.com › article › how-to-parse-html-pages-to-fetch-html-tables-with-python
How to Parse HTML pages to fetch HTML tables with Python?
November 9, 2020 - Status code: {response.status_code}") return [] # Parse HTML content soup = BeautifulSoup(response.text, 'html.parser') # Find all tables tables = soup.find_all('table') extracted_tables = [] for i, table in enumerate(tables): try: # Convert to DataFrame df = pd.read_html(str(table))[0] extracted_tables.append({ 'table_index': i + 1, 'dataframe': df, 'shape': df.shape }) except Exception as e: print(f"Error processing table {i + 1}: {e}") return extracted_tables # Example usage url = "https://www.tutorialspoint.com/python/python_basic_operators.htm" all_tables = extract_tables_from_url(url) print(f"Successfully extracted {len(all_tables)} tables") for table_info in all_tables[:2]: # Show first 2 tables print(f"\nTable {table_info['table_index']} - Shape: {table_info['shape']}") print(table_info['dataframe'].head(3))
Tchut-Tchut Blog
beenje.github.io › blog › posts › parsing-html-tables-in-python-with-pandas
Parsing HTML Tables in Python with pandas | Tchut-Tchut Blog
March 27, 2018 - --------------------------------------------------------------------------- HTTPError Traceback (most recent call last) <ipython-input-17-7e6b50c9f1f3> in <module>() ----> 1 pd.read_html('https://httpbin.org/basic-auth/myuser/mypasswd') ~/miniconda3/envs/jupyter/lib/python3.6/site-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na) 913 thousands=thousands, attrs=attrs, encoding=encoding, 914 decimal=decimal, converters=converters, na_values=na_values, -->
ScrapingBee
scrapingbee.com › blog › python-html-parsers
How to parse HTML in Python: A step-by-step guide for beginners | ScrapingBee
January 16, 2026 - Want to parse HTML in Python right away? Here's the fastest working setup: one version using plain old requests for static pages, and another using ScrapingBee for the real world, where sites throw JavaScript and anti-bot nonsense at you. ... import requests from bs4 import BeautifulSoup # Fetch the page directly url = "https://example.com" html = requests.get(url).text # Parse the HTML with BeautifulSoup + lxml soup = BeautifulSoup(html, "lxml") # Extract the title and all links print(soup.title.get_text()) for link in soup.select("a[href]"): print(link["href"])
Zyte
zyte.com › home › blog › how to extract data from html table
How to extract data from an HTML table - Zyte #1 Web Scraping Service
September 13, 2022 - One such method is available in the popular python Pandas library, it is called read_html(). The method accepts numerous arguments that allow you to customize how the table will be parsed. You can call this method with a URL or file or actual string. For example, you might do it like this:
Pandas
pandas.pydata.org › docs › reference › api › pandas.read_html.html
pandas.read_html — pandas 3.0.3 documentation - PyData |
The default value will return all tables contained on a page. This value is converted to a regular expression so that there is consistent behavior between Beautiful Soup and lxml. flavor{“lxml”, “html5lib”, “bs4”} or list-like, optional · The parsing engine (or list of parsing engines) to use.
ScraperAPI
scraperapi.com › home › blog › how to scrape html tables using python
How To Scrape HTML Tables Using Python
March 31, 2026 - Let’s enter the table’s URL (https://datatables.net/examples/styling/stripe.html) in our browser and inspect the page to see what’s happening under the hood. This is why this is a great page to practice scraping tabular data with Python. There’s a clear <table> tag pair opening and closing the table and all the relevant data is inside the <tbody> tag.
DEV Community
dev.to › chrisgreening › effortlessly-scrape-html-tables-into-python-using-pdreadhtml-559p
Effortlessly scrape HTML tables into Python using pd.read_html! - DEV Community
August 24, 2023 - Here's a step-by-step guide to using this function to get tables from a webpage right into our Python environments: Import pandas: First let's import pandas into our script: ... Specify the source and call pd.read_html: Determine where pd.read_html should look for the HTML content. It could be a URL or a string containing HTML code. For this example let's pull some tables off of the Python Wiki page:
Substack
substack.com › home › post › p-151645890
How-To Parse HTML Tables in Python Using Pandas
November 15, 2024 - This method relies on lxml, BeautifulSoup, and the html5lib libraries to parse the HTML page, so make sure to install them if you haven’t done so already. ... Next, identify a website you want to extract the data from, let’s use the List of video games featuring Mario Wikipedia entry as an example. import pandas as pd url = ‘https://en.wikipedia.org/wiki/List_of_video_games_featuring_Mario’ tables = pd.read_html(url) print(len(tables)) # CONTINUE YOUR ANALYSIS HERE
GitHub
github.com › fmilthaler › HTMLParser
GitHub - fmilthaler/HTMLParser: Python class to scrap and parse a webpage (using requests, BeautifulSoup4), mainly for converting tables to pandas.DataFrame · GitHub
Here we scrap a page from Wikipedia, parse it for tables, and convert the first table found into a pandas.DataFrame. from htmlparser import HTMLParser import pandas # Here we scrap a page from Wikipedia, parse it for tables, and convert the ...
Author fmilthaler