Rather than build your own using sockets etc I would use httplib Thus would get the data from the http server and parse the headers into a dictionary e.g.
import httplib
conn = httplib.HTTPConnection("www.python.org")
conn.request("GET", "/index.html")
r1 = conn.getresponse()
headers = r1.getheaders()
print(headers)
gives
[('content-length', '16788'), ('accept-ranges', 'bytes'), ('server', 'Apache/2.2.9 (Debian) DAV/2 SVN/1.5.1 mod_ssl/2.2.9 OpenSSL/0.9.8g mod_wsgi/2.5 Python/2.5.2'), ('last-modified', 'Mon, 15 Feb 2010 07:30:46 GMT'), ('etag', '"105800d-4194-47f9e9871d580"'), ('date', 'Mon, 15 Feb 2010 21:34:18 GMT'), ('content-type', 'text/html')]
and methods for put to send a dictionary as part of a request.
Answer from mmmmmm on Stack OverflowLet's say I have a POST request that looks something like:
Host: www.domain.com Cookie: sessions=103232-34324432-524542....34134141 Content-Length: 55
... goes on in this format
I would like a script to quickly transform that into the following format:
headers = {}
headers['Host'] = 'www.domain.com'
headers['Cookie'] = 'sessions=103232-34324432-524542....34134141' headers['Content-Length'] = '55' ... for every line in a file. Or perhaps simply a function capable of interpreting it.
I have found this snippet online:
dict([[h.partition(':')[0], h.partition(':')[2]] for h in rawheaders.split('\n')])I think it might contains some clues.
Rather than build your own using sockets etc I would use httplib Thus would get the data from the http server and parse the headers into a dictionary e.g.
import httplib
conn = httplib.HTTPConnection("www.python.org")
conn.request("GET", "/index.html")
r1 = conn.getresponse()
headers = r1.getheaders()
print(headers)
gives
[('content-length', '16788'), ('accept-ranges', 'bytes'), ('server', 'Apache/2.2.9 (Debian) DAV/2 SVN/1.5.1 mod_ssl/2.2.9 OpenSSL/0.9.8g mod_wsgi/2.5 Python/2.5.2'), ('last-modified', 'Mon, 15 Feb 2010 07:30:46 GMT'), ('etag', '"105800d-4194-47f9e9871d580"'), ('date', 'Mon, 15 Feb 2010 21:34:18 GMT'), ('content-type', 'text/html')]
and methods for put to send a dictionary as part of a request.
In case you don't find any library solving the problem, here's a naive, untested solution:
def fold(header):
line = "%s: %s" % (header[0], header[1])
if len(line) < 998:
return line
else: #fold
lines = [line]
while len(lines[-1]) > 998:
split_this = lines[-1]
#find last space in longest chunk admissible
split_here = split_this[:998].rfind(" ")
del lines[-1]
lines = lines + [split_this[:split_here]),
split_this[split_here:])] #this may still be too long
#hence the while on lines[-1]
return "\n".join(lines)
def dict2header(data):
return "\n".join((fold(header) for header in data.items()))
def header2dict(data):
data = data.replace("\n ", " ").splitlines()
headers = {}
for line in data:
split_here = line.find(":")
headers[line[:split_here]] = line[split_here:]
return headers
Get a header with Python and convert in JSON (requests - urllib2 - json) - Stack Overflow
Sending header requests via dictionary in python - Stack Overflow
How to print out http-response header in Python - Stack Overflow
Requests: Pretty print http response headers
There are more than a couple ways to encode headers as JSON, but my first thought would be to convert the headers attribute to an actual dictionary instead of accessing it as requests.structures.CaseInsensitiveDict
import requests, json
r = requests.get("https://www.python.org/")
rh = json.dumps(r.headers.__dict__['_store'])
print rh
{'content-length': ('content-length', '45474'), 'via': ('via', '1.1 varnish'), 'x-cache': ('x-cache', 'HIT'), 'accept-ranges': ('accept-ranges', 'bytes'), 'strict-transport-security': ('strict-transport-security', 'max-age=63072000; includeSubDomains'), 'vary': ('vary', 'Cookie'), 'server': ('server', 'nginx'), 'x-served-by': ('x-served-by', 'cache-iad2132-IAD'), 'x-cache-hits': ('x-cache-hits', '1'), 'date': ('date', 'Wed, 02 Jul 2014 14:13:37 GMT'), 'x-frame-options': ('x-frame-options', 'SAMEORIGIN'), 'content-type': ('content-type', 'text/html; charset=utf-8'), 'age': ('age', '1483')}
Depending on exactly what you want on the headers you can specifically access them after this, but this will give you all the information contained in the headers, if in a slightly different format.
If you prefer a different format, you can also convert your headers to a dictionary:
import requests, json
r = requests.get("https://www.python.org/")
print json.dumps(dict(r.headers))
{"content-length": "45682", "via": "1.1 varnish", "x-cache": "HIT", "accept-ranges": "bytes", "strict-transport-security": "max-age=63072000; includeSubDomains", "vary": "Cookie", "server": "nginx", "x-served-by": "cache-at50-ATL", "x-cache-hits": "5", "date": "Wed, 02 Jul 2014 14:08:15 GMT", "x-frame-options": "SAMEORIGIN", "content-type": "text/html; charset=utf-8", "age": "951"}
If you are only interested in the header, make a head request. convert the CaseInsensitiveDict in a dict object and then convert it to json.
import requests
import json
r = requests.head('https://www.python.org/')
rh = dict(r.headers)
json.dumps(rh)
Update: Based on comment of OP, that only the response headers are needed. Even more easy as written in below documentation of Requests module:
We can view the server's response headers using a Python dictionary:
Copy>>> r.headers
{
'content-encoding': 'gzip',
'transfer-encoding': 'chunked',
'connection': 'close',
'server': 'nginx/1.0.4',
'x-runtime': '148ms',
'etag': '"e1ca502697e5c9317743dc078f67693f"',
'content-type': 'application/json'
}
And especially the documentation notes:
The dictionary is special, though: it's made just for HTTP headers. According to RFC 7230, HTTP Header names are case-insensitive.
So, we can access the headers using any capitalization we want:
and goes on to explain even more cleverness concerning RFC compliance.
The Requests documentation states:
Using Response.iter_content will handle a lot of what you would otherwise have to handle when using Response.raw directly. When streaming a download, the above is the preferred and recommended way to retrieve the content.
It offers as example:
Copy>>> r = requests.get('https://api.github.com/events', stream=True)
>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>
>>> r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'
But also offers advice on how to do it in practice by redirecting to a file etc. and using a different method:
Using Response.iter_content will handle a lot of what you would otherwise have to handle when using Response.raw directly
Here's how you get just the response headers using the requests library like you mentioned (implementation in Python3):
Copyimport requests
url = "https://www.google.com"
response = requests.head(url)
print(response.headers) # prints the entire header as a dictionary
print(response.headers["Content-Length"]) # prints a specific section of the dictionary
It's important to use .head() instead of .get() otherwise you will retrieve the whole file/page like the rest of the answers mentioned.
If you wish to retrieve a URL that requires authentication you can replace the above response with this:
Copyresponse = requests.head(url, auth=requests.auth.HTTPBasicAuth(username, password))
Edit: For post request:
import requests
import json
url = "https://jsonplaceholder.typicode.com/posts/"
payload = {
"userId": 10,
"id": 901,
"title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
"body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}
headers = {
'content-type': "application/json",
'cache-control': "no-cache",
'postman-token': "c71c65a6-07f4-a2a4-a6f8-dca3fd706a7a"
}
response = requests.request("POST", url, data=json.dumps(payload), headers=headers)
print(type(response.json()))
class 'dict'
You can use something like this:
import requests
url = "https://api.icndb.com/jokes/random"
headers = {
'cache-control': "no-cache",
'postman-token': "77047c8b-caed-2b2c-ab33-dbddf52a7a9f"
}
response = requests.request("GET", url, headers=headers)
print(type(response.json()))
class 'dict'
import json
response_data = json.loads(response.text)
result = response_data.get('result')
you need to deserialize response.text to have it as dict and then user .get with respective key. result is a key in above example. and response was the response of a url call.