stackoverflow.com › questions › 6325216 › parse-html-table-to-python-list

You should use some HTML parsing library like lxml:

from lxml import etree
s = """<table>
  <tr><th>Event</th><th>Start Date</th><th>End Date</th></tr>
  <tr><td>a</td><td>b</td><td>c</td></tr>
  <tr><td>d</td><td>e</td><td>f</td></tr>
  <tr><td>g</td><td>h</td><td>i</td></tr>
</table>
"""
table = etree.HTML(s).find("body/table")
rows = iter(table)
headers = [col.text for col in next(rows)]
for row in rows:
    values = [col.text for col in row]
    print dict(zip(headers, values))

prints

{'End Date': 'c', 'Start Date': 'b', 'Event': 'a'}
{'End Date': 'f', 'Start Date': 'e', 'Event': 'd'}
{'End Date': 'i', 'Start Date': 'h', 'Event': 'g'}

Answer from Sven Marnach on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 6325216 › parse-html-table-to-python-list

Parse HTML table to Python list? - Stack Overflow

Top answer

1 of 4

You should use some HTML parsing library like lxml:

from lxml import etree
s = """<table>
  <tr><th>Event</th><th>Start Date</th><th>End Date</th></tr>
  <tr><td>a</td><td>b</td><td>c</td></tr>
  <tr><td>d</td><td>e</td><td>f</td></tr>
  <tr><td>g</td><td>h</td><td>i</td></tr>
</table>
"""
table = etree.HTML(s).find("body/table")
rows = iter(table)
headers = [col.text for col in next(rows)]
for row in rows:
    values = [col.text for col in row]
    print dict(zip(headers, values))

prints

{'End Date': 'c', 'Start Date': 'b', 'Event': 'a'}
{'End Date': 'f', 'Start Date': 'e', 'Event': 'd'}
{'End Date': 'i', 'Start Date': 'h', 'Event': 'g'}

2 of 4

Hands down the easiest way to parse a HTML table is to use pandas.read_html() - it accepts both URLs and HTML.

import pandas as pd
url = r'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
tables = pd.read_html(url) # Returns list of all tables on page
sp500_table = tables[0] # Select table of interest

As of pandas version 1.5.0, read_html() can preserve hyperlinks with the extract_links argument. Table elements will be tuples.

reddit.com › r/shortcuts › any way to parse html tables?

r/shortcuts on Reddit: Any way to parse HTML tables?

November 1, 2018 -

My grades are presented as a vanilla HTML table and converting the site to text makes it very hard to get values from the table. Is there any shortcut that parses an HTML table into a dictionary?

Top answer

1 of 2

You can probably find some JavaScript routine that can convert html table data to json. Then run the JavaScript in a URL action. I saw this ability to run JavaScript here and tried it out yesterday: https://www.reddit.com/r/shortcuts/comments/9hpplv/is_it_possible_to_read_the_contents_of_an_xml_file/?st=JNYO7T87&sh=bf71ea97 But just saw that Pretty Print also does this: https://www.reddit.com/r/shortcuts/comments/9mk9br/pretty_print_dictionary/?st=JNYOAYQW&sh=0a9cdf3d

2 of 2

No, but you can use: Get contents of URL. Make HTML from rich text. Then you can use match text to pull out what you want.

Discussions

python - Fastest, easiest, and best way to parse an HTML table? - Stack Overflow

All of these are reasonably tolerant of poorly formed HTML. ... I suggest loading the document with an XML parser like DOMDocument::loadHTMLFile that is bundled with PHP and then use XPath to grep the data you need. More on stackoverflow.com

stackoverflow.com

Parse HTML Table with some attribute tags from Text in Flat File to Table

I have been playing with this data set for quite some time and just can't get it right, so I wanted to reach out and see if I could get some assistance from a wiz in the community. 1. This is mocked up data that sort of matches some of the characteristics that I'm dealing with 2. Not all tags ... More on community.alteryx.com

community.alteryx.com

September 27, 2024

[Help] HTML Table Parsing via HTML::TableExtract

my ($table) = $table_extract->tables; Documentation for method tables says: "Return table objects for all tables that matched. Returns an empty list if no tables matched." This is how $table becomes undefined. IOW, there are no tables in the HTML. for my $row ($table->rows) { Can't call method "rows" on an undefined value at parse_table.pl line 51 (#1) That is the consequence of $table being undefined. If you want real help instead of messing around another half year with no progress, make it possible for us to run the code, including the input like you did last time, see http://sscce.org Did you see my previous answer ? More on reddit.com

r/perl

December 30, 2021

code golf - HTML Table Parser - Code Golf Stack Exchange

Input a strict subset of HTML string representation as defined below. Output parsed table, while any cells who span multiple rows or columns, record its value on the top left cell, an... More on codegolf.stackexchange.com

codegolf.stackexchange.com

Videos

02:57

YouTube

How to Parse HTML-Like Text into a Table - Power Query Challenge ...

June 29, 2025

11:16

YouTube

Parsing HTML Tables with Python to a Dictionary - YouTube

April 11, 2022

16:58

YouTube

How to Parse HTML Tables to JSON With Python - YouTube

February 9, 2022

youtube.com

How to Extract Tables from HTML and Webpages using Python

10:48

YouTube

Get data from HTML tables in Power Automate - YouTube

October 10, 2023

20:53

YouTube

How to use Power Automate to parse a HTML Table and convert to ...

June 2, 2021

View all

PyPI

pypi.org › project › html-table-parser-python3

html-table-parser-python3 · PyPI

Its purpose is to parse HTML tables without help of external modules. Everything I use is part of python 3. Instead of installing this module, you can just copy the class located in parse.py into your own code. Probably best shown by example using pyenv for convenience: ... The parser returns a nested lists of tables containing rows containing cells as strings.

      » pip install html-table-parser-python3

Published Dec 06, 2022

Version 0.3.1

Homepage https://github.com/schmijos/html-table-parser-python3

Encodian

support.encodian.com › hc › en-gb › articles › 11505625014685-Utility-Parse-HTML-Table

Utility - Parse HTML Table – Encodian Customer Help

September 2, 2025 - The 'Utility - Parse HTML Table' action for Power Automate parses an HTML table to JSON. The action will automatically locate HTML tables contained within HTML documents.

Scott Rome

srome.github.io › Parsing-HTML-Tables-in-Python-with-BeautifulSoup-and-pandas

Parsing HTML Tables in Python with BeautifulSoup and pandas

May 30, 2016 - We initialize the parser object and grab the table using our code above: If you had looked at the URL above, you’d have seen that we were parsing QB stats from the 2015 season off of FantasyPros.com. Our data has been prepared in such a way that we can immediately start an analysis. As you can see, this code may find it’s way into some scraper scripts once Football season starts again, but it’s perfectly capable of scraping any page with an HTML table.

Python

docs.python.org › 3 › library › html.parser.html

html.parser — Simple HTML and XHTML parser

Source code: Lib/html/parser.py This module defines a class HTMLParser which serves as the basis for parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML. Example HTML Parser...

Find elsewhere

Google Bing Mojeek

Zyte

zyte.com › home › blog › how to extract data from html table

How to extract data from an HTML table - Zyte #1 Web Scraping Service

September 13, 2022 - Then you parse the table with BeautifulSoup extracting text content from each cell and storing the file in JSON ... 1def main(url): 2 content = download_page(url) 3 soup = BeautifulSoup(content, 'html.parser') 4 result = {} 5 for row in soup.table.find_all('tr'): 6 row_header = row.th.get_text() 7 row_cell = row.td.get_text() 8 result[row_header] = row_cell 9 with open('book_table.json', 'w') as storage_file: 10 storage_file.write(json.dumps(result))

Divhunt

divhunt.com › tools › extract-tables-from-html

HTML Table Parser - Extract Tables From HTML | Divhunt

04:53

Free HTML table parser API to extract all tables with headers and rows from raw HTML. Convert tables to JSON for data processing. Try it now.

Published July 5, 2023

GitHub

github.com › VastBlast › html-table-parser-node

GitHub - VastBlast/html-table-parser-node: Node.js library that allows parsing HTML tables into multi-level objects · GitHub

HtmlTableParser is a Node.js library that allows parsing HTML tables into JavaScript objects.

Author VastBlast

Stack Overflow

stackoverflow.com › questions › 4893298 › fastest-easiest-and-best-way-to-parse-an-html-table

python - Fastest, easiest, and best way to parse an HTML table? - Stack Overflow

Top answer

1 of 5

For your general problem: try lxml.html from the lxml package (think of it as the stdlibs xml.etree on steroids: the same xml api, but with html support, xpath, xslt etc...)

A quick example for your specific case:

from lxml import html

tree = html.parse('http://www.datamystic.com/timezone/time_zones.html')
table = tree.findall('//table')[1]
data = [
           [td.text_content().strip() for td in row.findall('td')] 
           for row in table.findall('tr')
       ]

This will give you a nested list: each sub-list corresponds to a row in the table and contains the data from the cells. The sneakily inserted advertisement rows are not filtered out yet, but it should get you on your way. (and by the way: lxml is fast!)

BUT: More specifically for your particular use case: there are better way to get at timezone database information than scraping that particular webpage (aside: note that the web page actually mentions that you are not allowed to copy its contents). There are even existing libraries that already use this information, see for example python-dateutil.

2 of 5

Avoid regular expressions for parsing HTML, they're simply not appropriate for it, you want a DOM parser like BeautifulSoup for sure...

A few other alternatives

SimpleHTMLDom PHP
Hpricot & Nokogiri Ruby
Web::Scraper Perl/CPAN

All of these are reasonably tolerant of poorly formed HTML.

GitHub

github.com › Tomas2D › puppeteer-table-parser

GitHub - Tomas2D/puppeteer-table-parser: Scrape and parse HTML tables with the Puppeteer table parser.

All data came from the HTML page, which you can find in test/assets/1.html. Basic example (the simple table where we want to parse three columns without editing) import { tableParser } from 'puppeteer-table-parser' await tableParser(page, { selector: 'table', allowedColNames: { 'Car Name': 'car', 'Horse Powers': 'hp', 'Manufacture Year': 'year', }, });

Starred by 21 users

Forked by 3 users

Niko, doko?

nikodoko.com › posts › html-table-parsing

Parsing HTML Table Fragments

HTML parser job: it is simply ensuring that inputs are turned into viable HTML documents. In my case, the specification was also dictating the generation of the tbody token, as the parsing algorithm is supposed to automatically generate a tbody while encountering a tr in a table (outside of ...

ScraperAPI

scraperapi.com › home › blog › how to scrape html tables using python

How To Scrape HTML Tables Using Python

March 31, 2026 - Once we have the response, we can process the table data using BeautifulSoup: </pre> <pre class="wp-block-syntaxhighlighter-code"># Parse the HTML content soup = BeautifulSoup(response.text, 'html.parser') employee_list = [] # Find and process the table table = soup.find('table', class_='stripe') # Extract data from all rows for employee_data in table.find_all('tbody'): rows = employee_data.find_all('tr') for row in rows: cells = row.find_all('td') employee_list.append({ 'Name': cells[0].text, 'Position': cells[1].text, 'Office': cells[2].text, 'Age': cells[3].text, 'Start date': cells[4].text, 'salary': cells[5].text }) # Save the data to a JSON file with open('employee_data.json', 'w') as json_file: json.dump(employee_list, json_file, indent=2)</pre >

Alteryx

community.alteryx.com › home › participate › discussions › alteryx one

Parse HTML Table with some attribute tags from Text in Flat File to Table - Alteryx

Top answer

1 of 1

You can also use the RegEx Tool to tokenize the data between each set of relevant tags: You may need to modify the workflow to suit your needs, but this framework should give you the right foundation. Hope this helps and Happy Solving!

reddit.com › r/perl › [help] html table parsing via html::tableextract

r/perl on Reddit: [Help] HTML Table Parsing via HTML::TableExtract

December 30, 2021 -

Not too long ago, I posted this thread, and received feedback indicating that a wiser approach to my problem involved the use of one or more Perl modules. Towards that end, I have the following:

#!/usr/bin perl -w

# For reference: https://metacpan.org/pod/HTML::TableExtract

use strict;
use warnings;
use diagnostics;

use HTML::TableExtract;

my $headers = ['Guest ID', 'Password'];

my $table_extract = HTML::TableExtract->new(headers => $headers);

$table_extract->parse_file('sample.html');

my ($table) = $table_extract->tables;


for my $row ($table->rows) {
    print join(" ", @$row), "\n";
}

Which works as expected, i.e. reads an HTML file, and parses two (2) regions of interest. What I'd like to be able to do, though, is pull the HTML directly from the web, rather than have to store it as a file, and then parse it with this:

$table_extract->parse($HTML);

instead of:

$table_extract->parse_file('sample.html');.

(I use LWP::UserAgent; to pass credentials, and retrieve the page, FYI). Here's the error I get:

Can't call method "rows" on an undefined value at parse_table.pl line 51 (#1)

It isn't clear to me what's breaking? This:

print($table_extract->parse_file('sample.html'));

returns this:

HTML::TableExtract=HASH(0x55912db53110)HTTP::Response=HASH(0x55912e2a28c8)HTML::TableExtract=HASH(0x55912db53110)

But this:

print(my ($table) = $table_extract->tables);

returns this:

Use of uninitialized value in print at parse_table.pl line 49 (#1)
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables.

So I guess that's where the problem starts.

Any suggestions on how to further debug/remedy this?

Top answer

1 of 1

Stack Exchange

codegolf.stackexchange.com › questions › 272965 › html-table-parser

code golf - HTML Table Parser - Code Golf Stack Exchange

Top answer

1 of 3

JavaScript (Node.js), 175 bytes

x=>x.replace(/<t.(?: c.*?(\d+)")?(?: .*?(\d+)")?>(\w*)/g,(t,c=t<'<te'||!--y,r=1,v)=>{for(i=0;c;++i)if(!(X[~y]?.[i]+1))for(j=1,--c;+r+--j;u[i]=v,v='')u=X[~y-j]||=[]},X=y=[])&&X

Attempt This Online!

2 of 3

Charcoal, 144 bytes

ＳθＳθ≔⁰ηＷ›Ｌθ⁸«≔⁰ζＦ∧›Ｌθ⁹⪪✂θ⁷±χ¹</td><td«≔Ｅ⊞Ｏ⪪κwspan=ω∨Σλ¹ε≔✂κ⊕⌕κ>Ｌκ¹κ≔⁺η§ε¹δＦ⁻δＬυ⊞υ⟦⟧Ｆ✂υηδ¹«Ｗ∧‹ζＬλ¬⁼§λζ⁰≦⊕ζＦ⁻⁺ζ§ε⁰Ｌλ⊞λ⁰Ｆ§ε⁰«§≔λ⁺ζμκ≔ωκ»»»≦⊕ηＳθ»⭆¹υ

Try it online! Link is to verbose version of code. Explanation:

Ｓθ

Skip over the initial <table>.

Ｓθ

Read the first line of the table body.

≔⁰η

Start at (0-indexed) row 0.

Ｗ›θ⁸«

Repeat until </table> is reached.

≔⁰ζ

Start at column 0.

Ｆ∧›Ｌθ⁹⪪✂θ⁷±χ¹</td><td«

Loop over the cells of the table, excluding the leading <td and trailing </td>.

≔Ｅ⊞Ｏ⪪κwspan=ω∨Σλ¹ε

Extract the rowspan and colspan. (This depends on the text not containing digits.)

≔✂κ⊕⌕κ>Ｌκ¹κ

Extract the text.

≔⁺η§ε¹δ

Get the height necessary to include this rowspan.

Ｆ⁻δＬυ⊞υ⟦⟧

Extend the table to that height if necessary.

Ｆ✂υηδ¹«

Loop over each row in the rowspan.

Ｗ∧‹ζＬλ¬⁼§λζ⁰≦⊕ζ

Increase the column until it's not a used cell. (This only makes a difference on the first row, in which case the column advances past the previous cell and any cells from rowspans in previous rows.)

Ｆ⁻⁺ζ§ε⁰Ｌλ⊞λ⁰

Extend the row to the width necessary to include the colspan.

Ｆ§ε⁰«

Loop for every colspan.

§≔λ⁺ζμκ

Set the cell to the current text.

≔ωκ

Clear the current text.

»»»≦⊕η

Advance to the text row.

Ｓθ

Read the next line of the table.

»⭆¹υ

Pretty-print the final table, as the default output would confuse empty cells with the double-spacing between rows.

Medium

medium.com › @jasonschvach › stop-using-beautifulsoup-to-parse-html-table-tags-discover-the-power-of-pandas-381bb8878695

Stop Using BeautifulSoup to Parse HTML Table Tags: Discover the Power of Pandas | by Jason Schvach | Medium

April 11, 2023 - For this example, we will use a simple HTML page containing a <table> tag: import requests from bs4 import BeautifulSoup import pandas as pd url = 'https://www.basketball-reference.com/leagues/NBA_2023.html' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') table = soup.find('table', {'id':'per_game-team'}) header = [th.text.strip() for th in table.find('thead').find_all('th')] data = [] for row in table.find_all('tr'): rowData = [td.text.strip() for td in row.find_all(['th','td'])] if len(rowData) == len(header): data.append(rowData) df = pd.DataFrame(data, columns=header)

Alteryx Community

community.alteryx.com › t5 › Alteryx-Designer-Desktop-Discussions › Parse-HTML-Table-with-some-attribute-tags-from-Text-in-Flat-File › td-p › 1323246

Solved: Parse HTML Table with some attribute tags from Tex... - Alteryx Community

September 30, 2024 - TH - Table Header TR - Table Row TD - Table Data = cell TD will be open of place holder and \TD will be the end of it So now that you know where a table row ends you know where the next row starting So what you need to do is create so flags for each of the rows get the values that inside the >< and the with Cross Tab or Summarize tool you could concatenate the rows and then with Text to Column get the 8 columns.

MetaCPAN

metacpan.org › pod › HTML::TableExtract

HTML::TableExtract - Perl module for extracting the content contained in tables within an HTML document, either as text or encoded element trees. - metacpan.org

HTML::TableExtract is a subclass of HTML::Parser that serves to extract the information from tables of interest contained within an HTML document. The information from each extracted table is stored in table objects.

Jsontotable

jsontotable.org › html-to-table

HTML to Table Converter - Extract & Convert HTML Tables Online | Free HTML Table Parser | JSON to Table Converter

Convert HTML tables to structured data efficiently with this powerful online HTML table converter. Extract table data from HTML documents, validate table structure, handle multiple tables, and export to Excel or PDF formats.