The server in question is giving you a gzipped response. The server is also very broken; it sends the following headers:

$ curl -D - -o /dev/null -s -H 'Accept-Encoding: gzip, deflate' http://www.wrcc.dri.edu/WRCCWrappers.py?sodxtrmts+028815+por+por+pcpn+none+mave+5+01+F
HTTP/1.1 200 OK
Date: Tue, 06 Jan 2015 17:46:49 GMT
Server: Apache
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd"><html xmlns="http: //www.w3.org/1999/xhtml" lang="en-US">
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 3659
Content-Type: text/html

The <!DOCTYPE..> line there is not a valid HTTP header. As such, the remaining headers past Server are ignored. Why the server interjects that is unclear; in all likely hood WRCCWrappers.py is a CGI script that doesn't output headers but does include a double newline after the doctype line, duping the Apache server into inserting additional headers there.

As such, requests also doesn't detect that the data is gzip-encoded. The data is all there, you just have to decode it. Or you could if it wasn't rather incomplete.

The work-around is to tell the server not to bother with compression:

headers = {'Accept-Encoding': 'identity'}
r = requests.get(url, headers=headers)

and an uncompressed response is returned.

Incidentally, on Python 2 the HTTP header parser is not so strict and manages to declare the doctype a header:

>>> pprint(dict(r.headers))
{'<!doctype html public "-//w3c//dtd xhtml 1.0 transitional//en" "dtd/xhtml1-transitional.dtd"><html xmlns="http': '//www.w3.org/1999/xhtml" lang="en-US">',
 'connection': 'Keep-Alive',
 'content-encoding': 'gzip',
 'content-length': '3659',
 'content-type': 'text/html',
 'date': 'Tue, 06 Jan 2015 17:42:06 GMT',
 'keep-alive': 'timeout=5, max=100',
 'server': 'Apache',
 'vary': 'Accept-Encoding'}

and the content-encoding information survives, so there requests decodes the content for you, as expected.

Answer from Martijn Pieters on Stack Overflow
Top answer
1 of 4
31

The server in question is giving you a gzipped response. The server is also very broken; it sends the following headers:

$ curl -D - -o /dev/null -s -H 'Accept-Encoding: gzip, deflate' http://www.wrcc.dri.edu/WRCCWrappers.py?sodxtrmts+028815+por+por+pcpn+none+mave+5+01+F
HTTP/1.1 200 OK
Date: Tue, 06 Jan 2015 17:46:49 GMT
Server: Apache
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd"><html xmlns="http: //www.w3.org/1999/xhtml" lang="en-US">
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 3659
Content-Type: text/html

The <!DOCTYPE..> line there is not a valid HTTP header. As such, the remaining headers past Server are ignored. Why the server interjects that is unclear; in all likely hood WRCCWrappers.py is a CGI script that doesn't output headers but does include a double newline after the doctype line, duping the Apache server into inserting additional headers there.

As such, requests also doesn't detect that the data is gzip-encoded. The data is all there, you just have to decode it. Or you could if it wasn't rather incomplete.

The work-around is to tell the server not to bother with compression:

headers = {'Accept-Encoding': 'identity'}
r = requests.get(url, headers=headers)

and an uncompressed response is returned.

Incidentally, on Python 2 the HTTP header parser is not so strict and manages to declare the doctype a header:

>>> pprint(dict(r.headers))
{'<!doctype html public "-//w3c//dtd xhtml 1.0 transitional//en" "dtd/xhtml1-transitional.dtd"><html xmlns="http': '//www.w3.org/1999/xhtml" lang="en-US">',
 'connection': 'Keep-Alive',
 'content-encoding': 'gzip',
 'content-length': '3659',
 'content-type': 'text/html',
 'date': 'Tue, 06 Jan 2015 17:42:06 GMT',
 'keep-alive': 'timeout=5, max=100',
 'server': 'Apache',
 'vary': 'Accept-Encoding'}

and the content-encoding information survives, so there requests decodes the content for you, as expected.

2 of 4
14

The HTTP headers for this URL have now been fixed.

>>> import requests
>>> print requests.__version__
2.5.1
>>> r = requests.get('http://www.wrcc.dri.edu/WRCCWrappers.py?sodxtrmts+028815+por+por+pcpn+none+mave+5+01+F')
>>> r.text[:100]
u'\n<!DOCTYPE html>\n<HTML>\n<HEAD><TITLE>Monthly Average of Precipitation, Station id: 028815</TITLE></H'
>>> r.headers
{'content-length': '3672', 'content-encoding': 'gzip', 'vary': 'Accept-Encoding', 'keep-alive': 'timeout=5, max=100', 'server': 'Apache', 'connection': 'Keep-Alive', 'date': 'Thu, 12 Feb 2015 18:59:37 GMT', 'content-type': 'text/html; charset=utf-8'}
🌐
Requests
requests.readthedocs.io › projects › requests-html › en › latest
requests-HTML v0.3.4 documentation
Change response enconding and replace it by a HTMLResponse. ... Pass in all the coroutines you want to run, it will wrap each one in a task, run it and wait for the result. Return a list with all results, this is returned in the same order coros are passed in. ... Send a given PreparedRequest. ... Requests-HTML intends to make parsing HTML (e.g.
Discussions

Python getting HTML content via 'requests' returns partial response - Stack Overflow
Some sites uses user-agent to know the nature of user (as of desktop or mobile user) and provide the response accordingly (as the probable case here) ... You can use the mechanize module of python to mimic a browser to fool a web site (come handy when the site is using some short of authentication ... More on stackoverflow.com
🌐 stackoverflow.com
Strange HTML code after parsing via requests. What is it and how to deal?
Erm... I get this. I think you're looking at the window.__INIT_CONFIG__ variable at the bottom. More on reddit.com
🌐 r/learnpython
17
25
May 5, 2023
Scraping Using API, website still returns html output instead of JSON data
[SOLVED] Thanks everyone here for helping. 🙏 u/Brian and I have provided the solution as below. Also, big thanks to… More on reddit.com
🌐 r/learnpython
13
14
October 22, 2022
Couldn't get the whole html using requests.get(url)
You can look at the Network Tab in your developer tools to see the HTTP requests being made when you search. https://i.imgur.com/8ma0Z9Y.jpg The URL it fetches the data from is massive - I cut out some of the unneeded params https://redsky.target.com/v2/plp/search/?count=96&default_purchasability_filter=true&keyword=horizon+organic+whole+milk&offset=0&pricing_store_id=1771&scheduled_delivery_store_id=1771&store_ids=1771%2C1768%2C1113%2C3374%2C1792&include_sponsored_search_v2=true&excludes=available_to_promise_qualitative%2Cavailable_to_promise_location_qualitative&key=ff457966e64d5e877fdbad070f276d18ecec4a01 You can open this URL directly and see the JSON response: https://i.imgur.com/zIDpJVz.png keyword is your search term. count=96 is the amount of results to get (96 is the max per request) - you can use the offset= to get the next "batch" / "page" The key seems to be hardcoded - and it is contained in the original html. {"apiKey":{"name":"x-api-key","value":"ff457966e64d5e877fdbad070f276d18ecec4a01"} Not sure if the other params are important - the stores ones seem to be and seem to be hardcoded too. They may change depending on if you mess around with the search settings. More on reddit.com
🌐 r/learnpython
15
2
August 20, 2020
🌐
W3Schools
w3schools.com › python › ref_requests_response.asp
Python requests.Response Object
The requests.Response() Object contains the server's response to the HTTP request. ... If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: sales@w3schools.com · If you want to report an ...
🌐
JC Chouinard
jcchouinard.com › accueil › web scraping with python and requests-html (with example)
Web Scraping With Python and Requests-HTML (with Example) - JC Chouinard
June 21, 2023 - RuntimeError: Cannot use HTMLSession within an existing event loop. Here, I will make an example with Hamlet Batista’s amazing intro to Python post. Just to make sure that there is no error, I will add a try and except statement to return an error in any case the code doesn’t work. We will store the response in a variable called response. import requests from requests_html import HTMLSession url = "https://www.searchenginejournal.com/introduction-to-python-seo-spreadsheets/342779/" try: session = HTMLSession() response = session.get(url) except requests.exceptions.RequestException as e: print(e)
🌐
ZetCode
zetcode.com › python › requests
Python Requests - accessing web resources via HTTP
July 20, 2019 - For more complex HTML documents, consider using a library like Beautiful Soup instead of regular expressions for more robust parsing. The Response object contains a server's response to an HTTP request. Its status_code attribute returns HTTP status code of the response, such as 200 or 404.
🌐
Medium
medium.com › @tubelwj › requests-html-an-html-parsing-library-in-python-8d182d13ecd2
Requests-HTML: An HTML parsing library in Python | by Gen. Devin DL. | Medium
September 17, 2024 - Requests-HTML: An HTML parsing library in Python When performing web scraping and web-page parsing, Python’s `requests` and `BeautifulSoup` libraries are commonly used tools. The `requests_html` …
Find elsewhere
🌐
Python-requests
html.python-requests.org › _modules › requests_html.html
requests_html — requests-HTML v0.3.4 documentation
Try increasing timeout") html = HTML(url=self.url, html=content.encode(DEFAULT_ENCODING), default_encoding=DEFAULT_ENCODING) self.__dict__.update(html.__dict__) self.page = page return result class HTMLResponse(requests.Response): """An HTML-enabled :class:`requests.Response <requests.Response>` object.
🌐
GitHub
github.com › psf › requests-html
GitHub - psf/requests-html: Pythonic HTML Parsing for Humans™
>>> from requests_html import AsyncHTMLSession >>> asession = AsyncHTMLSession() >>> async def get_pythonorg(): ... r = await asession.get('https://python.org/') ... return r ... >>> async def get_reddit(): ... r = await asession.get('https://reddit.com/') ... return r ... >>> async def get_google(): ... r = await asession.get('https://google.com/') ... return r ... >>> results = asession.run(get_pythonorg, get_reddit, get_google) >>> results # check the requests all returned a 200 (success) code [<Response [200]>, <Response [200]>, <Response [200]>] >>> # Each item in the results list is a response object and can be interacted with as such >>> for result in results: ... print(result.html.url) ...
Starred by 13.8K users
Forked by 1K users
Languages   Python 99.7% | Makefile 0.3%
🌐
Kennethreitz
requests-html.kennethreitz.org
Requests-HTML: HTML Parsing for Humans (writing Python 3)! — requests-HTML v0.3.4 documentation
Returns a generator of Responses or Requests. ... Send a given PreparedRequest. ... Requests-HTML intends to make parsing HTML (e.g.
🌐
Python-requests
docs.python-requests.org › projects › requests-html › en › stable
requests-HTML v0.3.4 documentation
Receives a Response. Returns a generator of Responses or Requests. ... Send a given PreparedRequest. ... Requests-HTML intends to make parsing HTML (e.g.
🌐
Delft Stack
delftstack.com › home › howto › python › response 200 python
How to Get HTML With HTTP Requests in Python | Delft Stack
March 11, 2025 - The HTML content can be accessed using response.text, which contains the raw HTML as a string. Finally, we print the HTML content to the console. This method is efficient and easy to use, making it a go-to choice for many developers. Another built-in option for making HTTP requests in Python is the urllib library.
🌐
Mimo
mimo.org › glossary › python › requests-library
Python requests Library: How to Make HTTP Requests with Python
Make a request: Use requests.get() for GET requests or requests.post() for POST requests. Check the response: The function returns a Response object. Check response.status_code to see if it was successful (200 means OK). Access the content: Use response.text for HTML/text or response.json() ...
🌐
Medium
medium.com › @datajournal › web-scraping-with-python-and-requests-html-015e202970a0
Web Scraping With Python & Requests-HTML in 2025 | Medium
February 23, 2025 - To solve this, requests-HTML offers a method called render(), which allows you to execute JavaScript in the background and fetch the rendered content. If you’re using Jupyter notebooks, you can use arender() for asynchronous rendering. Here’s an example of how to render JavaScript content: # Render JavaScript content response.html.render() # Now you can extract the data content = response.html.find('h1', first=True) print(content.text)
🌐
GeeksforGeeks
geeksforgeeks.org › python › response-text-python-requests
response.text - Python requests - GeeksforGeeks
April 15, 2025 - In Python’s requests library, the response.text attribute allows developers to access the content of the response returned by an HTTP request. This content is always returned as a Unicode string, making it easy to read and manipulate.
🌐
Reddit
reddit.com › r/learnpython › strange html code after parsing via requests. what is it and how to deal?
r/learnpython on Reddit: Strange HTML code after parsing via requests. What is it and how to deal?
May 5, 2023 -

Hello everybody.

I have such code, which extracts html from polish site

import requests

url = "https://www.olx.pl/oferty/uzytkownik/nzuCv/"
response = requests.get(url)
print(response.text)

While this page has normal html (https://imgur.com/a/6W44Jgm), the response has encoding utf-8, and in python/pycharm it is not a Doctype, at all. What is it and how to make it normal html code?

Example of few lines from the very beginning of the response:

\"parentId\":453,\"name\":\"Poradniki i albumy\",\"normalizedName\":\"poradniki-i-albumy\",\"position\":7,\"viewType\":\"list\",\"iconName\":\"\",\"level\":3,\"displayOrder\":7,\"children\":[],\"path\":\"muzyka-edukacja\\u002Fksiazki\\u002Fporadniki-i-albumy\",\"type\":\"goods\",\"isAdding\":true,\"isSearch\":false,\"isOfferSeek\":false,\"privateBusiness\":true,\"photosMax\":8},\"1161\":{\"id\":1161,\"label\":\"komiksy\",\"parentId\":453,\"name\":\"Komiksy\",\"normalizedName\":\"komiksy\",\"position\":4,\"viewType\":\"list\",\"iconName\":\"\",\"level\":3,\"displayOrder\":4,\"children\":[],\"path\":\"muzyka-edukacja\\u002Fksiazki\\u002Fkomiksy\",\"type\":\"goods\",\"isAdding\":true,\"isSearch\":false,\"isOfferSeek\":false,\"privateBusiness\":true,\"photosMax\":8},\"1163\":{\"id\":1163,\"label\":\"dla-dzieci\",\"parentId\":453,\"name\":\"Dla dzieci\",\"normalizedName\":\"dla-dzieci\",\"position\":3,\"viewType\":\"list\",\"iconName\":\"\",\"level\":3,\"displayOrder\":3,\"children\":[],\"path\":\"muzyka-edukacja\\u002Fksiazki\\u002Fdla-dzieci\",\"type\":\"goods\",\"isAdding\":true,\"isSearch\":false,\"isOfferSeek\":false,\"privateBusiness\":true,\"photosMax\":8},\"1165\":{\"id\":1165,\"label\":\"czasopisma\",\"parentId\":453,\"name\":\"Czasopisma\",\"normalizedName\":\"czasopisma\",\"position\":2,\"viewType\":\"list\",\"iconName\":\"\",\"level\":3,\"displayOrder\":2,\"children\":[],\"path\":\"muzyka-edukacja\\u002Fksiazki\\u002Fczasopisma\",

UPDATE:

It appears that HTML is dynamic so it is worth to use Network tab and find endpoints which front end is using. Base on that endpoint I was able to request JSON.

full code here: https://pastebin.com/QbtRBJgb

due to dynamically generated HTML it is not possible to do with requests.

It is possible to do with Selenium, but a window will pop up which annoys a bit - code here https://pastebin.com/U4t8VcVf

🌐
LabEx
labex.io › tutorials › python-how-to-parse-response-content-from-a-python-requests-call-398048
How to parse response content from a Python requests call | LabEx
For parsing HTML, Python's BeautifulSoup library is an excellent tool. In this step, we'll learn how to extract information from HTML responses. First, let's install BeautifulSoup and its HTML parser: ... import requests from bs4 import BeautifulSoup ## Make a request to a webpage url = "https://www.example.com" response = requests.get(url) ## Check if the request was successful if response.status_code == 200: ## Parse the HTML content soup = BeautifulSoup(response.text, 'html.parser') ## Extract the page title title = soup.title.text print(f"Page title: {title}") ## Extract all paragraphs par
🌐
PyPI
pypi.org › project › requests-html
requests-html · PyPI
>>> from requests_html import AsyncHTMLSession >>> asession = AsyncHTMLSession() >>> async def get_pythonorg(): ... r = await asession.get('https://python.org/') >>> async def get_reddit(): ... r = await asession.get('https://reddit.com/') >>> ...
      » pip install requests-html
    
Published   Feb 17, 2019
Version   0.10.0
🌐
PyTutorial
pytutorial.com › python-requests-response-object-guide
PyTutorial | Python Requests Response Object Guide
February 11, 2026 - The Requests Response object is your gateway to the web. It packages the server's reply into an easy-to-use Python object. You learned to check status codes, read headers, and extract content as text or JSON.