In your first loop, you read the whole file, the file pointer is not reset afterwards:
import numpy as np
import csv as csv
with open('c:\MyData\BYLCsv.csv') as data:
readdata = csv.reader(data)
header = next(readdata)
data = list(readdata)
print(header)
for row in data:
print(row)
Answer from Daniel on Stack OverflowIn your first loop, you read the whole file, the file pointer is not reset afterwards:
import numpy as np
import csv as csv
with open('c:\MyData\BYLCsv.csv') as data:
readdata = csv.reader(data)
header = next(readdata)
data = list(readdata)
print(header)
for row in data:
print(row)
Try to put everything in the same for:
import numpy as np
import csv as csv
readdata = csv.reader(open('c:\MyData\BYLCsv.csv'))
data = []
for row in readdata:
print(row)
data.append(row)
for row in data:
print(row)
Header = data[0]
data.pop(0)
You are trying to iterate twice over the readdata iterator, and it can be consumed just once
Using headers with the Python 'Requests' library's get() method - Stack Overflow
What is the common header format of Python files? - Stack Overflow
Python: What is a header? - Stack Overflow
Add headers to colums and rows in Python - 2D array (list of lists) - Code Review Stack Exchange
How do I rotate headers to avoid detection?
Maintain a list of realistic header sets and randomly select one per request. Vary the User-Agent, Accept-Language, and Referer values. Combine this with proxy rotation for better results.
Why do my headers work in testing but fail in production?
This is usually because you're sending too many requests. In testing, your low number of requests doesn't trigger extra anti-bot checks. In production, more requests trigger TLS fingerprinting. Even with perfect headers, a different TLS fingerprint will get you blocked. ScrapFly solves this by using real browser TLS profiles.
How often do I need to update my header patterns?
For sites with strong protection like Cloudflare or Datadome, you may need to update your headers every few weeks or even days. These systems change their rules all the time. This is a lot of work to maintain, and ScrapFly handles it for you.
Videos
According to the API, the headers can all be passed in with requests.get():
import requests
r = requests.get("http://www.example.com/", headers={"Content-Type":"text"})
This answer taught me that you can set headers for an entire session:
s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})
# Both 'x-test' and 'x-test2' are sent
s.get('http://httpbin.org/headers', headers={'x-test2': 'true'})
Bonus: Sessions also handle cookies
Its all metadata for the Foobar module.
The first one is the docstring of the module, that is already explained in Peter's answer.
How do I organize my modules (source files)? (Archive)
The first line of each file shoud be
#!/usr/bin/env python. This makes it possible to run the file as a script invoking the interpreter implicitly, e.g. in a CGI context.Next should be the docstring with a description. If the description is long, the first line should be a short summary that makes sense on its own, separated from the rest by a newline.
All code, including import statements, should follow the docstring. Otherwise, the docstring will not be recognized by the interpreter, and you will not have access to it in interactive sessions (i.e. through
obj.__doc__) or when generating documentation with automated tools.Import built-in modules first, followed by third-party modules, followed by any changes to the path and your own modules. Especially, additions to the path and names of your modules are likely to change rapidly: keeping them in one place makes them easier to find.
Next should be authorship information. This information should follow this format:
__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell" __copyright__ = "Copyright 2007, The Cogent Project" __credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley", "Matthew Wakefield"] __license__ = "GPL" __version__ = "1.0.1" __maintainer__ = "Rob Knight" __email__ = "[email protected]" __status__ = "Production"Status should typically be one of "Prototype", "Development", or "Production".
__maintainer__should be the person who will fix bugs and make improvements if imported.__credits__differs from__author__in that__credits__includes people who reported bug fixes, made suggestions, etc. but did not actually write the code.
Here you have more information, listing __author__, __authors__, __contact__, __copyright__, __license__, __deprecated__, __date__ and __version__ as recognized metadata.
I strongly favour minimal file headers, by which I mean just:
#!/usr/bin/env python # [1]
"""\
This script foos the given bars [2]
Usage: myscript.py BAR1 BAR2
"""
import os # standard library, [3]
import sys
import requests # 3rd party packages
import mypackage # local source
[1]The hashbang if, and only if, this file should be able to be directly executed, i.e. run asmyscript.pyormyscriptor maybe evenpython myscript.py. (The hashbang isn't used in the last case, but providing it gives users the choice of executing it either way.) The hashbang should not be included if the file is a module, intended just to be imported by other Python files.[2]Module docstring[3]Imports, grouped in the standard way, ie. three groups of imports, with a single blank line between them. Within each group, imports are sorted. The final group, imports from local source, can either be absolute imports as shown, or explicit relative imports.
Everything else is a waste of time - both for the author and for subsequent maintainers. It wastes the precious visual space at the top of the file with information that is better tracked elsewhere, and is easy to get out of date and become actively misleading.
If you have legal disclaimers or licensing info, it goes into a separate file. It does not need to infect every source code file. Your copyright should be part of this. People should be able to find it in your LICENSE file, not random source code.
Metadata such as authorship and dates is already maintained by your source control. There is no need to add a less-detailed, erroneous, and out-of-date version of the same info in the file itself.
I don't believe there is any other data that everyone needs to put into all their source files. You may have some particular requirement to do so, but such things apply, by definition, only to you. They have no place in “general headers recommended for everyone”.
There's thing called Docstring in python (and here're some conventions on how to write python code in general - PEP 8) escaped by either triple single quote ''' or triple double quote """ well suited for multiline comments:
'''
File name: test.py
Author: Peter Test
Date created: 4/20/2013
Date last modified: 4/25/2013
Python Version: 2.7
'''
You also may used special variables later (when programming a module) that are dedicated to contain info as:
__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell"
__copyright__ = "Copyright 2007, The Cogent Project"
__credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley",
"Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.0.1"
__maintainer__ = "Rob Knight"
__email__ = "[email protected]"
__status__ = "Production"
More details in answer here.
My Opinion
I use this this format, as I am learning, "This is more for my own sanity, than a necessity."
As I like consistency. So, I start my files like so.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# =============================================================================
# Created By : Jeromie Kirchoff
# Created Date: Mon August 18 18:54:00 PDT 2018
# =============================================================================
"""The Module Has Been Build for..."""
# =============================================================================
# Imports
# =============================================================================
from ... import ...
<more code...>
- First line is the Shebang
- And I know
There's no reason for most Python files to have a shebang linebut, for me I feel it lets the user know that I wrote this explicitly for python3. As on my mac I have both python2 & python3. - Line 2 is the encoding, again just for clarification
- As some of us forget when we are dealing with multiple sources (API's, Databases, Emails etc.)
- Line 3 is more of my own visual representation of the max 80 char.
- I know "Oh, gawd why?!?" again this allows me to keep my code within 80 chars for visual representation & readability.
- Line 4 & 5 is just my own way of keeping track as when working in a big group keeping who wrote it on hand is helpful and saves a bit of time looking thru your
GitHub. Not relevant again just things I picked up for my sanity. - Line 7 is your Docstring that is required at the top of each python file per Flake8.
Again, this is just my preference. In a working environment you have to win everyone over to change the defacto behaviour. I could go on and on about this but we all know about it, at least in the workplace.
Header Block
- What is a header block?
- Is it just comments at the top of your code or is it be something which prints when the program runs?
- Or something else?
So in this context of a university setting:
Header block or comments
Header comments appear at the top of a file. These lines typically include the filename, author, date, version number, and a description of what the file is for and what it contains. For class assignments, headers should also include such things as course name, number, section, instructor, and assignment number.
- Is it just comments at the top of your code or is it be something which prints when the program runs? Or something else?
Well, this can be interpreted differently by your professor, showcase it and ask!
"If you never ask, The answer is ALWAYS No."
ie:
# Course: CS108
# Laboratory: A13
# Date: 2018/08/18
# Username: JayRizzo
# Name: Jeromie Kirchoff
# Description: My First Project Program.
If you are looking for Overkill:
or the python way using "Module Level Dunder Names"
Standard Module Level Dunder Names
__author__ = 'Jeromie Kirchoff'
__copyright__ = 'Copyright 2018, Your Project'
__credits__ = ['Jeromie Kirchoff', 'Victoria Mackie']
__license__ = 'MSU' # Makin' Shi* Up!
__version__ = '1.0.1'
__maintainer__ = 'Jeromie Kirchoff'
__email__ = '[email protected]'
__status__ = 'Prototype'
Add Your Own Custom Names:
__course__ = 'cs108'
__teammates__ = ['Jeromie Kirchoff']
__laboratory__ = 'A13'
__date__ = '2018/08/18'
__username__ = 'JayRizzo'
__description__ = 'My First Project Program.'
Then just add a little code to print if the instructor would like.
print('# ' + '=' * 78)
print('Author: ' + __author__)
print('Teammates: ' + ', '.join(__teammates__))
print('Copyright: ' + __copyright__)
print('Credits: ' + ', '.join(__credits__))
print('License: ' + __license__)
print('Version: ' + __version__)
print('Maintainer: ' + __maintainer__)
print('Email: ' + __email__)
print('Status: ' + __status__)
print('Course: ' + __course__)
print('Laboratory: ' + __laboratory__)
print('Date: ' + __date__)
print('Username: ' + __username__)
print('Description: ' + __description__)
print('# ' + '=' * 78)
End RESULT
Every time the program gets called it will show the list.
$ python3 custom_header.py
# ==============================================================================
Author: Jeromie Kirchoff
Teammates: Jeromie Kirchoff
Copyright: Copyright 2018, Your Project
Credits: Jeromie Kirchoff, Victoria Mackie
License: MSU
Version: 1.0.1
Maintainer: Jeromie Kirchoff
Email: [email protected]
Status: Prototype
Course: CS108
Laboratory: A13
Date: 2018/08/18
Username: JayRizzo
Description: My First Project Program.
# ==============================================================================
Notes: If you expand your program just set this once in the init.py and you should be all set, but again check with the professor.
If would like the script checkout my github.
Maybe this can help (if you don't want to use numpy):
headers = ['foo', 'bar', 'baz', 'other']
l = len(headers)
arr = [["xxx" for i in range(l)] for j in range(l)]
# adding top row
arr = [headers] + arr
# adding first column
headers_mod = ['Title'] + headers
new_arr = [[headers_mod[i]]+arr[i] for i in range(l+1)]
for i in new_arr:
print(*i)
gives you the output as:
Title foo bar baz other
foo xxx xxx xxx xxx
bar xxx xxx xxx xxx
baz xxx xxx xxx xxx
other xxx xxx xxx xxx
Otherwise, when dealing with array manipulations in python try going with numpy, pandas, as they provide better operations like by giving option for axis, transpose, etc.
numpy is excellent for tables, but for a labeled table like this pandas might be better for your needs.
Solution using numpy:
import numpy as np
# The header that i want to add
headers = ['foo', 'bar', 'baz', 'other']
ll = len(headers)+1
data = [['xxx' for _ in range(ll)] for j in range(ll)]
data = np.array(data, dtype=object)
data[0,0] = 'Title'
data[0,1:] = headers
data[1:,0] = headers
print(data)
prints
[['Title' 'foo' 'bar' 'baz' 'other']
['foo' 'xxx' 'xxx' 'xxx' 'xxx']
['bar' 'xxx' 'xxx' 'xxx' 'xxx']
['baz' 'xxx' 'xxx' 'xxx' 'xxx']
['other' 'xxx' 'xxx' 'xxx' 'xxx']]
Setting dtype to object allows your array to mix strings and other data types you might want to use. If your data is just strings then you can use 'UN' as the dtype, where N is the longest string you plan to use. (Numpy, when making an all string array automatically picks your longest string as the maximum length for the strings, which is fine unless your strings are all shorter than the headers you plan to add.)
Alternate version of the above code:
import numpy as np
# The header that i want to add
headers = ['foo', 'bar', 'baz', 'other']
# Add Title to headers to simply later assignment
headers = ['Title'] + headers
ll = len(headers)
data = [['xxx' for _ in range(ll)] for j in range(ll)]
data = np.array(data)
data[0,:] = headers
data[:,0] = headers
print(data)
pandas, on the other hand, is explicitly designed to handle headers
import numpy as np, pandas as pd
# The header that i want to add
headers = ['foo', 'bar', 'baz', 'other']
ll = len(headers) + 1
data = [['xxx' for _ in range(ll)] for j in range(ll)]
data = np.array(data)
data = pd.DataFrame(data[1:,1:], columns=headers, index=headers)
data.columns.name = 'Title'
data.loc['foo','bar'] = 'yes'
print(data)
print('')
print(data['bar'])
print('')
print(data.loc['foo',:])
prints
Title foo bar baz other
foo xxx yes xxx xxx
bar xxx xxx xxx xxx
baz xxx xxx xxx xxx
other xxx xxx xxx xxx
foo yes
bar xxx
baz xxx
other xxx
Name: bar, dtype: object
Title
foo xxx
bar yes
baz xxx
other xxx
Name: foo, dtype: object
Hey everyone. I am trying to create a web scraping program. However, I am facing some difficulties with getting valid request headers. These headers are initialized randomly through Javascript scripts when the website is loaded. I am able to get these scripts, but I want to be able to run them in my script just like a browser would. What can I use to do this? I was looking at requests-html. What do you guys think of this? Is there something better than this? Below is a piece of code I was trying in order to run the script, but it didn't work. My method is probably wrong, but I want to know how to run a script that is returned when I do a GET request
from requests_html import HTMLSession
sess = HTMLSession()
r = sess.get('url_of_javascript_function')
r.html.render()