You should use some HTML parsing library like lxml:

from lxml import etree
s = """<table>
  <tr><th>Event</th><th>Start Date</th><th>End Date</th></tr>
  <tr><td>a</td><td>b</td><td>c</td></tr>
  <tr><td>d</td><td>e</td><td>f</td></tr>
  <tr><td>g</td><td>h</td><td>i</td></tr>
</table>
"""
table = etree.HTML(s).find("body/table")
rows = iter(table)
headers = [col.text for col in next(rows)]
for row in rows:
    values = [col.text for col in row]
    print dict(zip(headers, values))

prints

{'End Date': 'c', 'Start Date': 'b', 'Event': 'a'}
{'End Date': 'f', 'Start Date': 'e', 'Event': 'd'}
{'End Date': 'i', 'Start Date': 'h', 'Event': 'g'}
Answer from Sven Marnach on Stack Overflow
Discussions

python - Fastest, easiest, and best way to parse an HTML table? - Stack Overflow
All of these are reasonably tolerant of poorly formed HTML. ... I suggest loading the document with an XML parser like DOMDocument::loadHTMLFile that is bundled with PHP and then use XPath to grep the data you need. More on stackoverflow.com
🌐 stackoverflow.com
Parse HTML Table with some attribute tags from Text in Flat File to Table
I have been playing with this data set for quite some time and just can't get it right, so I wanted to reach out and see if I could get some assistance from a wiz in the community. 1. This is mocked up data that sort of matches some of the characteristics that I'm dealing with 2. Not all tags ... More on community.alteryx.com
🌐 community.alteryx.com
September 27, 2024
[Help] HTML Table Parsing via HTML::TableExtract
my ($table) = $table_extract->tables; Documentation for method tables says: "Return table objects for all tables that matched. Returns an empty list if no tables matched." This is how $table becomes undefined. IOW, there are no tables in the HTML. for my $row ($table->rows) { Can't call method "rows" on an undefined value at parse_table.pl line 51 (#1) That is the consequence of $table being undefined. If you want real help instead of messing around another half year with no progress, make it possible for us to run the code, including the input like you did last time, see http://sscce.org Did you see my previous answer ? More on reddit.com
🌐 r/perl
7
10
December 30, 2021
code golf - HTML Table Parser - Code Golf Stack Exchange
Input a strict subset of HTML string representation as defined below. Output parsed table, while any cells who span multiple rows or columns, record its value on the top left cell, an... More on codegolf.stackexchange.com
🌐 codegolf.stackexchange.com
🌐
PyPI
pypi.org › project › html-table-parser-python3
html-table-parser-python3 · PyPI
Its purpose is to parse HTML tables without help of external modules. Everything I use is part of python 3. Instead of installing this module, you can just copy the class located in parse.py into your own code. Probably best shown by example using pyenv for convenience: ... The parser returns a nested lists of tables containing rows containing cells as strings.
      » pip install html-table-parser-python3
    
Published   Dec 06, 2022
Version   0.3.1
🌐
Encodian
support.encodian.com › hc › en-gb › articles › 11505625014685-Utility-Parse-HTML-Table
Utility - Parse HTML Table – Encodian Customer Help
September 2, 2025 - The 'Utility - Parse HTML Table' action for Power Automate parses an HTML table to JSON. The action will automatically locate HTML tables contained within HTML documents.
🌐
Scott Rome
srome.github.io › Parsing-HTML-Tables-in-Python-with-BeautifulSoup-and-pandas
Parsing HTML Tables in Python with BeautifulSoup and pandas
May 30, 2016 - We initialize the parser object and grab the table using our code above: If you had looked at the URL above, you’d have seen that we were parsing QB stats from the 2015 season off of FantasyPros.com. Our data has been prepared in such a way that we can immediately start an analysis. As you can see, this code may find it’s way into some scraper scripts once Football season starts again, but it’s perfectly capable of scraping any page with an HTML table.
🌐
Python
docs.python.org › 3 › library › html.parser.html
html.parser — Simple HTML and XHTML parser
Source code: Lib/html/parser.py This module defines a class HTMLParser which serves as the basis for parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML. Example HTML Parser...
Find elsewhere
🌐
Zyte
zyte.com › home › blog › how to extract data from html table
How to extract data from an HTML table - Zyte #1 Web Scraping Service
September 13, 2022 - Then you parse the table with BeautifulSoup extracting text content from each cell and storing the file in JSON ... 1def main(url): 2 content = download_page(url) 3 soup = BeautifulSoup(content, 'html.parser') 4 result = {} 5 for row in soup.table.find_all('tr'): 6 row_header = row.th.get_text() 7 row_cell = row.td.get_text() 8 result[row_header] = row_cell 9 with open('book_table.json', 'w') as storage_file: 10 storage_file.write(json.dumps(result))
🌐
Divhunt
divhunt.com › tools › extract-tables-from-html
HTML Table Parser - Extract Tables From HTML | Divhunt
Free HTML table parser API to extract all tables with headers and rows from raw HTML. Convert tables to JSON for data processing. Try it now.
Published   July 5, 2023
🌐
GitHub
github.com › Tomas2D › puppeteer-table-parser
GitHub - Tomas2D/puppeteer-table-parser: Scrape and parse HTML tables with the Puppeteer table parser.
All data came from the HTML page, which you can find in test/assets/1.html. Basic example (the simple table where we want to parse three columns without editing) import { tableParser } from 'puppeteer-table-parser' await tableParser(page, { selector: 'table', allowedColNames: { 'Car Name': 'car', 'Horse Powers': 'hp', 'Manufacture Year': 'year', }, });
Starred by 21 users
Forked by 3 users
Languages   TypeScript 93.5% | HTML 5.7% | Shell 0.8% | TypeScript 93.5% | HTML 5.7% | Shell 0.8%
🌐
Niko, doko?
nikodoko.com › posts › html-table-parsing
Parsing HTML Table Fragments
HTML parser job: it is simply ensuring that inputs are turned into viable HTML documents. In my case, the specification was also dictating the generation of the tbody token, as the parsing algorithm is supposed to automatically generate a tbody while encountering a tr in a table (outside of ...
🌐
ScraperAPI
scraperapi.com › home › blog › how to scrape html tables using python
How To Scrape HTML Tables Using Python
March 31, 2026 - Once we have the response, we can process the table data using BeautifulSoup: </pre> <pre class="wp-block-syntaxhighlighter-code"># Parse the HTML content soup = BeautifulSoup(response.text, 'html.parser') employee_list = [] # Find and process the table table = soup.find('table', class_='stripe') # Extract data from all rows for employee_data in table.find_all('tbody'): rows = employee_data.find_all('tr') for row in rows: cells = row.find_all('td') employee_list.append({ 'Name': cells[0].text, 'Position': cells[1].text, 'Office': cells[2].text, 'Age': cells[3].text, 'Start date': cells[4].text, 'salary': cells[5].text }) # Save the data to a JSON file with open('employee_data.json', 'w') as json_file: json.dump(employee_list, json_file, indent=2)</pre >
🌐
Reddit
reddit.com › r/perl › [help] html table parsing via html::tableextract
r/perl on Reddit: [Help] HTML Table Parsing via HTML::TableExtract
December 30, 2021 -

Not too long ago, I posted this thread, and received feedback indicating that a wiser approach to my problem involved the use of one or more Perl modules. Towards that end, I have the following:

#!/usr/bin perl -w

# For reference: https://metacpan.org/pod/HTML::TableExtract

use strict;
use warnings;
use diagnostics;

use HTML::TableExtract;

my $headers = ['Guest ID', 'Password'];

my $table_extract = HTML::TableExtract->new(headers => $headers);

$table_extract->parse_file('sample.html');

my ($table) = $table_extract->tables;


for my $row ($table->rows) {
    print join(" ", @$row), "\n";
}

Which works as expected, i.e. reads an HTML file, and parses two (2) regions of interest. What I'd like to be able to do, though, is pull the HTML directly from the web, rather than have to store it as a file, and then parse it with this:

$table_extract->parse($HTML);

instead of:

$table_extract->parse_file('sample.html');.

(I use LWP::UserAgent; to pass credentials, and retrieve the page, FYI). Here's the error I get:

Can't call method "rows" on an undefined value at parse_table.pl line 51 (#1)

It isn't clear to me what's breaking? This:

print($table_extract->parse_file('sample.html'));

returns this:

HTML::TableExtract=HASH(0x55912db53110)HTTP::Response=HASH(0x55912e2a28c8)HTML::TableExtract=HASH(0x55912db53110)

But this:

print(my ($table) = $table_extract->tables);

returns this:

Use of uninitialized value in print at parse_table.pl line 49 (#1)
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables.

So I guess that's where the problem starts.

Any suggestions on how to further debug/remedy this?

Top answer
1 of 3
5

JavaScript (Node.js), 175 bytes

x=>x.replace(/<t.(?: c.*?(\d+)")?(?: .*?(\d+)")?>(\w*)/g,(t,c=t<'<te'||!--y,r=1,v)=>{for(i=0;c;++i)if(!(X[~y]?.[i]+1))for(j=1,--c;+r+--j;u[i]=v,v='')u=X[~y-j]||=[]},X=y=[])&&X

Attempt This Online!

2 of 3
3

Charcoal, 144 bytes

SθSθ≔⁰ηW›Lθ⁸«≔⁰ζF∧›Lθ⁹⪪✂θ⁷±χ¹</td><td«≔E⊞O⪪κwspan=ω∨Σλ¹ε≔✂κ⊕⌕κ>Lκ¹κ≔⁺η§ε¹δF⁻δLυ⊞υ⟦⟧F✂υηδ¹«W∧‹ζLλ¬⁼§λζ⁰≦⊕ζF⁻⁺ζ§ε⁰Lλ⊞λ⁰F§ε⁰«§≔λ⁺ζμκ≔ωκ»»»≦⊕ηSθ»⭆¹υ

Try it online! Link is to verbose version of code. Explanation:

Sθ

Skip over the initial <table>.

Sθ

Read the first line of the table body.

≔⁰η

Start at (0-indexed) row 0.

W›θ⁸«

Repeat until </table> is reached.

≔⁰ζ

Start at column 0.

F∧›Lθ⁹⪪✂θ⁷±χ¹</td><td«

Loop over the cells of the table, excluding the leading <td and trailing </td>.

≔E⊞O⪪κwspan=ω∨Σλ¹ε

Extract the rowspan and colspan. (This depends on the text not containing digits.)

≔✂κ⊕⌕κ>Lκ¹κ

Extract the text.

≔⁺η§ε¹δ

Get the height necessary to include this rowspan.

F⁻δLυ⊞υ⟦⟧

Extend the table to that height if necessary.

F✂υηδ¹«

Loop over each row in the rowspan.

W∧‹ζLλ¬⁼§λζ⁰≦⊕ζ

Increase the column until it's not a used cell. (This only makes a difference on the first row, in which case the column advances past the previous cell and any cells from rowspans in previous rows.)

F⁻⁺ζ§ε⁰Lλ⊞λ⁰

Extend the row to the width necessary to include the colspan.

F§ε⁰«

Loop for every colspan.

§≔λ⁺ζμκ

Set the cell to the current text.

≔ωκ

Clear the current text.

»»»≦⊕η

Advance to the text row.

Sθ

Read the next line of the table.

»⭆¹υ

Pretty-print the final table, as the default output would confuse empty cells with the double-spacing between rows.

🌐
Medium
medium.com › @jasonschvach › stop-using-beautifulsoup-to-parse-html-table-tags-discover-the-power-of-pandas-381bb8878695
Stop Using BeautifulSoup to Parse HTML Table Tags: Discover the Power of Pandas | by Jason Schvach | Medium
April 11, 2023 - For this example, we will use a simple HTML page containing a <table> tag: import requests from bs4 import BeautifulSoup import pandas as pd url = 'https://www.basketball-reference.com/leagues/NBA_2023.html' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') table = soup.find('table', {'id':'per_game-team'}) header = [th.text.strip() for th in table.find('thead').find_all('th')] data = [] for row in table.find_all('tr'): rowData = [td.text.strip() for td in row.find_all(['th','td'])] if len(rowData) == len(header): data.append(rowData) df = pd.DataFrame(data, columns=header)
🌐
Alteryx Community
community.alteryx.com › t5 › Alteryx-Designer-Desktop-Discussions › Parse-HTML-Table-with-some-attribute-tags-from-Text-in-Flat-File › td-p › 1323246
Solved: Parse HTML Table with some attribute tags from Tex... - Alteryx Community
September 30, 2024 - TH - Table Header TR - Table Row TD - Table Data = cell TD will be open of place holder and \TD will be the end of it So now that you know where a table row ends you know where the next row starting So what you need to do is create so flags for each of the rows get the values that inside the >< and the with Cross Tab or Summarize tool you could concatenate the rows and then with Text to Column get the 8 columns.
🌐
MetaCPAN
metacpan.org › pod › HTML::TableExtract
HTML::TableExtract - Perl module for extracting the content contained in tables within an HTML document, either as text or encoded element trees. - metacpan.org
HTML::TableExtract is a subclass of HTML::Parser that serves to extract the information from tables of interest contained within an HTML document. The information from each extracted table is stored in table objects.
🌐
Jsontotable
jsontotable.org › html-to-table
HTML to Table Converter - Extract & Convert HTML Tables Online | Free HTML Table Parser | JSON to Table Converter
Convert HTML tables to structured data efficiently with this powerful online HTML table converter. Extract table data from HTML documents, validate table structure, handle multiple tables, and export to Excel or PDF formats.