Using Python, Oauth, Github to request pages from adventofcode
I benchmarked Python's top HTTP clients (requests, httpx, aiohttp, etc.) and open sourced it
python - How to download and write a file from Github using Requests - Stack Overflow
Seeking Help: Filtering Issues without Associated Pull Requests in GitHub
Videos
Day 1 problem was easy but sent me off on a tangent. I wanted to retrieve the mass input data directly from the website
import requests
URL = "https://adventofcode.com/2019/day/1/input"
r = requests.get(URL)
for x in r.iter_lines():
print(x)This fails with, "Puzzle inputs differ by user. Please log in to get your puzzle input."
I use my github account to login to adventofcode. So I guess I need to use Oauth somehow. But I'm not clear the approach to take.
Github::Settings::DeveloperSettings has three things, "Github Apps", "Oauth Apps" and "Personal Access Tokens" but none looks obviously like what I need.
The Oauth1 Authentication section of the requests doc doesn't look like anything I setup on the github side (YOUR_APP_KEY, YOUR_APP_SECRET_, USER_OAUTH_TOKEN, USER_OAUTH_TOKEN_SECRET).
I feel like i must be mixing up different protocols and different usages. Can anyone point me directly at what i should setup to do this?
Factsheet
Hey folks
I’ve been working on a Python-heavy project that fires off tons of HTTP requests… and I started wondering:
Which HTTP client should I actually be using?
So I went looking for up-to-date benchmarks comparing requests, httpx, aiohttp, urllib3, and pycurl.
And... I found almost nothing. A few GitHub issues, some outdated blog posts, but nothing that benchmarks them all in one place — especially not including TLS handshake timings.
What My Project Does
This project benchmarks Python's most popular HTTP libraries — requests, httpx, aiohttp, urllib3, and pycurl — across key performance metrics like:
-
Requests per second
-
Total request duration
-
Average connection time
-
TLS handshake latency (where supported)
It runs each library multiple times with randomized order to minimize bias, logs results to CSV, and provides visualizations with pandas + seaborn.
GitHub repo: 👉 https://github.com/perodriguezl/python-http-libraries-benchmark
Target Audience
This is for developers, backend engineers, researchers or infrastructure teams who:
-
Work with high-volume HTTP traffic (APIs, microservices, scrapers)
-
Want to understand how different clients behave in real scenarios
-
Are curious about TLS overhead or latency under concurrency
It’s production-oriented in that the benchmark simulates realistic usage (not just toy code), and could help you choose the best HTTP client for performance-critical systems.
Comparison to Existing Alternatives
I looked around but couldn’t find an open source benchmark that:
-
Includes all five libraries in one place
-
Measures TLS handshake times
-
Randomizes test order across multiple runs
-
Outputs structured data + visual analytics
Most comparisons out there are outdated or incomplete — this project aims to fill that gap and provide a transparent, repeatable tool.
Update: for adding results
Results after running more than 130 benchmarks.
https://ibb.co/fVmqxfpp
https://ibb.co/HpbxKwsM
https://ibb.co/V0sN9V4x
https://ibb.co/zWZ8crzN
Best of all reqs/secs (being almost 10 times daster than the most popular requests): aiohttp
Best total response time (surpringly): httpx
Fastest connection time: aiohttp
Best TLS Handshake: Pycurl
The content of the file in question is included in the returned data. You are getting the full GitHub view of that file, not just the contents.
If you want to download just the file, you need to use the Raw link at the top of the page, which will be (for your example):
https://raw.githubusercontent.com/someguy/brilliant/master/somefile.txt
Note the change in domain name, and the blob/ part of the path is gone.
To demonstrate this with the requests GitHub repository itself:
>>> import requests
>>> r = requests.get('https://github.com/kennethreitz/requests/blob/master/README.rst')
>>> 'Requests:' in r.text
True
>>> r.headers['Content-Type']
'text/html; charset=utf-8'
>>> r = requests.get('https://raw.githubusercontent.com/kennethreitz/requests/master/README.rst')
>>> 'Requests:' in r.text
True
>>> r.headers['Content-Type']
'text/plain; charset=utf-8'
>>> print r.text
Requests: HTTP for Humans
=========================
.. image:: https://travis-ci.org/kennethreitz/requests.png?branch=master
[... etc. ...]
You need to request the raw version of the file, from https://raw.githubusercontent.com.
See the difference:
https://raw.githubusercontent.com/django/django/master/setup.py vs. https://github.com/django/django/blob/master/setup.py
Also, you should probably add a / between your directory and the filename:
>>> getcwd()+'foo.txt'
'/Users/burhanfoo.txt'
>>> import os
>>> os.path.join(getcwd(),'foo.txt')
'/Users/burhan/foo.txt'
Hello. Good day everyone.
I am trying to reverse engineer a major website's API using pure HTTP requests. I chose Python's requests module as my go-to technology to work with because I'm familiar with Python. But I am wondering how good is Python's requests at being undetected and mimicking a browser..? If it's a no go, could you maybe suggest a technology that is light on bandwidth, uses only HTTP requests without loading a browser's driver, and stealthy.
Thanks