As @nosklo pointed out here, you are looking for href tags and the associated links. A parse tree will be organized by the html elements themselves, and you find text by searching those elements specifically. For urls, this would look like so (using the lxml library in python 3.6):

from lxml import etree
from io import StringIO
import requests

# Set explicit HTMLParser
parser = etree.HTMLParser()

page = requests.get('https://URL.COM')

# Decode the page content from bytes to string
html = page.content.decode("utf-8")

# Create your etree with a StringIO object which functions similarly
# to a fileHandler
tree = etree.parse(StringIO(html), parser=parser)

# Call this function and pass in your tree
def get_links(tree):
    # This will get the anchor tags <a href...>
    refs = tree.xpath("//a")
    # Get the url from the ref
    links = [link.get('href', '') for link in refs]
    # Return a list that only ends with .com.br
    return [l for l in links if l.endswith('.com.br')]


# Example call
links = get_links(tree)
Answer from C.Nivs on Stack Overflow
🌐
Kennethreitz
requests-html.kennethreitz.org
Requests-HTML: HTML Parsing for Humans (writing Python 3)! — requests-HTML v0.3.4 documentation
Returns a generator of Responses or Requests. ... Send a given PreparedRequest. ... Requests-HTML intends to make parsing HTML (e.g.
Discussions

python - Is there a way to parse out HTML in a response from requests.get()? - Stack Overflow
I'm using the requests package to get data from an API and see some HTML elements in the response data such as

,

, and \', among a bunch of other elements. The return value for response.encoding is utf-8 if that helps. I'd like to parse out all the HTML values and just have a simple ... More on stackoverflow.com
🌐 stackoverflow.com
python - Parsing HTML with requests and BeautifulSoup - Stack Overflow
I'm not sure if I'm approaching this correctly. I'm using requests to make a GET: con = s.get(url) when I call con.content, the whole page is there. But when I pass con into BS: soup = BeautifulS... More on stackoverflow.com
🌐 stackoverflow.com
Get html using Python requests? - Stack Overflow
I am trying to teach myself some basic web scraping. Using Python's requests module, I was able to grab html for various websites until I tried this: More on stackoverflow.com
🌐 stackoverflow.com
Steps for requests-html to parse more than one tag/class in python
You definitely want r/python for this - r/css is for questions about styling 👍 More on reddit.com
🌐 r/css
3
3
October 7, 2020
🌐
Fernandomc
fernandomc.com › posts › using-requests-to-get-and-post
Python Requests and Beautiful Soup - Playing with HTTP Requests, HTML Parsing and APIs – Fernando Medina Corey
May 26, 2018 - So now that we have this requests data in r.text how do we start working with it? Well, let’s start by looking at what it is. ... Well, we at least know that we’re dealing with a string. So we have all the built-in Python string methods like .split(), .replace() and others. But these honestly aren’t going to save us a ton of time if we have to parse through a bunch of HTML ...
🌐
Medium
medium.com › @tubelwj › requests-html-an-html-parsing-library-in-python-8d182d13ecd2
Requests-HTML: An HTML parsing library in Python | by Gen. Devin DL. | Medium
September 17, 2024 - In web crawling, the `requests_html` library can help users quickly scrape and parse web-page content. Suppose you’re developing a web crawler that needs to scrape all article titles from a website; the `requests_html` library can be used ...
🌐
GitHub
github.com › psf › requests-html
GitHub - psf/requests-html: Pythonic HTML Parsing for Humans™
>>> from requests_html import AsyncHTMLSession >>> asession = AsyncHTMLSession() >>> async def get_pythonorg(): ... r = await asession.get('https://python.org/') ... return r ... >>> async def get_reddit(): ... r = await asession.get('https://reddit.com/') ... return r ... >>> async def get_google(): ... r = await asession.get('https://google.com/') ... return r ... >>> results = asession.run(get_pythonorg, get_reddit, get_google) >>> results # check the requests all returned a 200 (success) code [<Response [200]>, <Response [200]>, <Response [200]>] >>> # Each item in the results list is a response object and can be interacted with as such >>> for result in results: ... print(result.html.url) ...
Starred by 13.8K users
Forked by 1K users
Languages   Python 99.7% | Makefile 0.3%
🌐
Codegive
codegive.com › blog › python_requests_parse_html.php
Python Requests Parse HTML (2026): Master Web Scraping & Data Extraction Like a Pro!
April 9, 2026 - We'll use requests for fetching and Beautiful Soup (from bs4 library) for parsing, as it's generally considered more beginner-friendly. First, ensure you have the necessary libraries installed: ... The requests library makes it straightforward to download the content of a web page. import requests def fetch_html_content(url): """ Fetches the HTML content of a given URL. """ try: # Send a GET request to the URL response = requests.get(url) # Raise an HTTPError for bad responses (4xx or 5xx) response.raise_for_status() # Return the HTML content as a string return response.text except requests.exceptions.RequestException as e: print(f"Error fetching URL {url}: {e}") return None
🌐
Apify
blog.apify.com › how-to-parse-html-in-python
How to parse HTML in Python
July 1, 2025 - We parse the HTML content because after parsing, it becomes a tree-like structure. This makes it easy for us to navigate through the tree using built-in methods. import requests from bs4 import BeautifulSoup # Make an HTTP Request target_url = 'https://crawler-test.com/' response_data = requests.get(target_url) # Parse the HTML content soup = BeautifulSoup(response_data.text, 'html.parser')
Find elsewhere
🌐
JC Chouinard
jcchouinard.com › accueil › web scraping with python and requests-html (with example)
Web Scraping With Python and Requests-HTML (with Example) - JC Chouinard
June 21, 2023 - To parse the HTML of the Requests-HTML object with BeautifulSoup, pass the response.html.raw_html attribute to the BeautifulSoup object. # requests-html beautifulsoup from bs4 import BeautifulSoup from requests_html import HTMLSession url = ...
🌐
Medium
medium.com › @datajournal › web-scraping-with-requests-html-015e202970a0
Web Scraping With Python & Requests-HTML in 2025 | Medium
February 23, 2025 - Master web scraping with Python's requests-HTML: send HTTP requests, render JavaScript, parse HTML, and store data effortlessly.
🌐
LabEx
labex.io › tutorials › python-how-to-parse-response-content-from-a-python-requests-call-398048
How to parse response content from a Python requests call | LabEx
For parsing HTML, Python's BeautifulSoup library is an excellent tool. In this step, we'll learn how to extract information from HTML responses. First, let's install BeautifulSoup and its HTML parser: ... import requests from bs4 import ...
🌐
Requests
requests.readthedocs.io › projects › requests-html › en › latest
requests-HTML v0.3.4 documentation
Search the Element (multiple times) for the given parse template. ... The text content of the Element or HTML. xpath(selector: str, *, clean: bool = False, first: bool = False, _encoding: str = None) → Union[List[str], List[requests_html.Element], str, requests_html.Element]¶
🌐
Kishstats
kishstats.com › python › 2019 › 02 › 17 › python-requests-pulling-data.html
Pulling Data with Requests - KishStats
Next, we can create a new Python script to import requests and setup a variable for our target URL: import requests url = 'https://www.cia.gov/library/publications/the-world-factbook/rankorder/2004rank.html' response = requests.get(url) html = response.text print(html) # will print out the full source of the webpage · Once we use requests to fetch the HTML for us, we need some way of parsing it to get the specific data that we want.
🌐
PyPI
pypi.org › project › requests-html
requests-html · PyPI
The Requests experience you know and love, with magical parsing abilities. Async Support · Make a GET request to ‘python.org’, using Requests: >>> from requests_html import HTMLSession >>> session = HTMLSession() >>> r = session.get('https://python.org/') Try async and get some sites at the same time: >>> from requests_html import AsyncHTMLSession >>> asession = AsyncHTMLSession() >>> async def get_pythonorg(): ...
      » pip install requests-html
    
Published   Feb 17, 2019
Version   0.10.0
🌐
Sling Academy
slingacademy.com › article › python-requests-module-how-to-parse-html-responses
Python Requests module: How to parse HTML responses - Sling Academy
try: response = requests.get('https://example.com/nonexistent', timeout=5) response.raise_for_status() except requests.exceptions.HTTPError as errh: print(f'HTTP Error: {errh}') except requests.exceptions.ConnectionError as errc: print(f'Error Connecting: {errc}') except requests.exceptions.Timeout as errt: print(f'Timeout Error: {errt}') except requests.exceptions.RequestException as err: print(f'OOps: Something Else: {err}') Python’s Requests module paired with BeautifulSoup makes it simple to fetch and parse HTML content.
🌐
Twilio
twilio.com › blog › web-scraping-and-parsing-html-in-python-with-beautiful-soup
Web Scraping and Parsing HTML in Python with Beautiful Soup
November 22, 2019 - Before moving on, you will need ... of Python 3 and pip installed. Make sure you create and activate a virtual environment before installing any dependencies. You'll need to install the Requests library for making HTTP requests to get data from the web page, and Beautiful Soup for parsing through the HTML...
🌐
Medium
medium.com › @datajournal › how-to-parse-html-with-python-94495c11bc96
How to Parse HTML in Python: Top Libraries Tutorial | Medium
October 14, 2024 - The lxml library is another powerful tool for parsing HTML and XML in Python. It is known for its speed and accuracy. If performance is a priority, lxml might be a better choice than BeautifulSoup. ... from lxml import html import requests # Fetch the HTML content url = "https://example.com" response = requests.get(url) # Parse the HTML content using lxml tree = html.fromstring(response.content) # Extract the title of the webpage title = tree.findtext('.//title') print("Page Title:", title)
🌐
LearnDataSci
learndatasci.com › tutorials › ultimate-guide-web-scraping-w-python-requests-and-beautifulsoup
Ultimate Guide to Web Scraping with Python Part 1: Requests and BeautifulSoup – LearnDataSci
With Python's requests (pip install requests) library we're getting a web page by using get() on the URL. The response r contains many things, but using r.content will give us the HTML. Once we have the HTML we can then parse it for the data we're interested in analyzing.
Top answer
1 of 4
31

The server in question is giving you a gzipped response. The server is also very broken; it sends the following headers:

$ curl -D - -o /dev/null -s -H 'Accept-Encoding: gzip, deflate' http://www.wrcc.dri.edu/WRCCWrappers.py?sodxtrmts+028815+por+por+pcpn+none+mave+5+01+F
HTTP/1.1 200 OK
Date: Tue, 06 Jan 2015 17:46:49 GMT
Server: Apache
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd"><html xmlns="http: //www.w3.org/1999/xhtml" lang="en-US">
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 3659
Content-Type: text/html

The <!DOCTYPE..> line there is not a valid HTTP header. As such, the remaining headers past Server are ignored. Why the server interjects that is unclear; in all likely hood WRCCWrappers.py is a CGI script that doesn't output headers but does include a double newline after the doctype line, duping the Apache server into inserting additional headers there.

As such, requests also doesn't detect that the data is gzip-encoded. The data is all there, you just have to decode it. Or you could if it wasn't rather incomplete.

The work-around is to tell the server not to bother with compression:

headers = {'Accept-Encoding': 'identity'}
r = requests.get(url, headers=headers)

and an uncompressed response is returned.

Incidentally, on Python 2 the HTTP header parser is not so strict and manages to declare the doctype a header:

>>> pprint(dict(r.headers))
{'<!doctype html public "-//w3c//dtd xhtml 1.0 transitional//en" "dtd/xhtml1-transitional.dtd"><html xmlns="http': '//www.w3.org/1999/xhtml" lang="en-US">',
 'connection': 'Keep-Alive',
 'content-encoding': 'gzip',
 'content-length': '3659',
 'content-type': 'text/html',
 'date': 'Tue, 06 Jan 2015 17:42:06 GMT',
 'keep-alive': 'timeout=5, max=100',
 'server': 'Apache',
 'vary': 'Accept-Encoding'}

and the content-encoding information survives, so there requests decodes the content for you, as expected.

2 of 4
14

The HTTP headers for this URL have now been fixed.

>>> import requests
>>> print requests.__version__
2.5.1
>>> r = requests.get('http://www.wrcc.dri.edu/WRCCWrappers.py?sodxtrmts+028815+por+por+pcpn+none+mave+5+01+F')
>>> r.text[:100]
u'\n<!DOCTYPE html>\n<HTML>\n<HEAD><TITLE>Monthly Average of Precipitation, Station id: 028815</TITLE></H'
>>> r.headers
{'content-length': '3672', 'content-encoding': 'gzip', 'vary': 'Accept-Encoding', 'keep-alive': 'timeout=5, max=100', 'server': 'Apache', 'connection': 'Keep-Alive', 'date': 'Thu, 12 Feb 2015 18:59:37 GMT', 'content-type': 'text/html; charset=utf-8'}