You need to get the string of each individual link, not of the whole result set.
Loop over the set and fetch .string per element:
[link.string for link in item.find_all('a')]
Answer from Martijn Pieters on Stack Overfloworacle database - Python - How to loop a ResultSet - Stack Overflow
python - Manipulating BeautifulSoup's ResultSet list object - Stack Overflow
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
While attempting to iterate over a beautiful soup ResultSet, an exception is thrown: TypeError: 'ResultSet' object cannot be interpreted as an integer
You can extract tag attributes like a dictionary, and access the text with the .text property.
for state in state_list:
print state['value'].split(".")[0], state.text
Since iteration is the most probable solution, we can use a simple single liner to convert the object into dictionary or a list of lists.
DiCt = {state.text: state['value'] for state in state_list}
LofLists = [[state.text, state['value']] for state in state_list]
It has to do with targeting the specific parts of your html. Would something like this work?
response = requests.get('https://books.toscrape.com/')
# all html&css content-
soup = BeautifulSoup(response.content, 'html')
categories = soup.find("ul", class_ = 'nav nav-list' ).find('li').find('ul').find_all('a')
list = []
for i in categories:
if i:
list.append(i.text.strip())
print(list)
You can get rid of the unwanted characters by just strip text ("words") of the elements.
Solution
So you can use the parameter strip in get_text():
i.get_text(strip=True)
Example:
import requests
from bs4 import BeautifulSoup
url = 'https://books.toscrape.com/'
response = requests.get(url)
# all html&css content-
soup = BeautifulSoup(response.text, 'lxml')
categories = soup.select('ul.nav.nav-list li a' )
list = []
for i in categories:
list.append(i.get_text(strip=True))
print(list)
Output
['Books', 'Travel', 'Mystery', 'Historical Fiction', 'Sequential Art', 'Classics', 'Philosophy', 'Romance', 'Womens Fiction', 'Fiction', 'Childrens', 'Religion', 'Nonfiction', 'Music', 'Default', 'Science Fiction', 'Sports and Games', 'Add a comment', 'Fantasy', 'New Adult', 'Young Adult', 'Science', 'Poetry', 'Paranormal', 'Art', 'Psychology', 'Autobiography', 'Parenting', 'Adult Fiction', 'Humor', 'Horror', 'History', 'Food and Drink', 'Christian Fiction', 'Business', 'Biography', 'Thriller', 'Contemporary', 'Spirituality', 'Academic', 'Self Help', 'Historical', 'Christian', 'Suspense', 'Short Stories', 'Novels', 'Health', 'Politics', 'Cultural', 'Erotica', 'Crime']
You may also take a look at your selector it could be more specific:
soup.select('ul.nav.nav-list li a')
I tried to extract the movie name from the IMDB database. https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm. I used the module, BeautifulSoup, and request.
movies = bs.find('tbody',class_='lister-list').find_all('tr')
title = movies.find('td',class_='titleColumn').a.textI got the following error.
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
pls help me
I'm using beautiful soup in this exercise, and attempting to iterate over a ResultSet, which I have learned is not possible. At least not with my current code. The purpose of iterating over the ResultSet would be to gather the url name of each image, and write the url name as the filename when downloading the images. Is there a better way to achieve this?
The exercise is called Image Site Downloader, and it comes from in ATBS CH 12. From the book:
Write a program that goes to a photo-sharing site like Flickr or Imgur, searches for a category of photos, and then downloads all the resulting images. You could write a program that works with any photo site that has a search feature.
#!python3
# image_downloader.py -- input a keyword and automatically download
# images from https://unsplash.com/
# sys.argv[1] = keyword to search for
import requests
import os
import sys
import bs4
import lxml
def make_photo_dir():
os.makedirs(f'{sys.argv[1]}', exist_ok=True)
def get_site_html_contents() -> bs4.element.ResultSet:
res = requests.get('https://unsplash.com/s/photos/' + sys.argv[1])
res.raise_for_status() # crash program if url not valid
return bs4.BeautifulSoup(res.text, "lxml").select('.YVj9w')
def download_images(html_contents: bs4.element.ResultSet) -> None:
make_photo_dir() # create dir using command line argument
for url in len(range(html_contents)):
image_url = html_contents.get('src')
res = requests.get(image_url)
res.raise_for_status()
image_file = open(os.path.join(sys.argv[1], os.path.basename(url)), 'wb')
for chunk in res.iter_content(100000):
image_file.write(chunk)
image_file.close()
def main():
download_images(get_site_html_contents())
if __name__ == "__main__":
main()updated and tested download_images():
def download_images(self) -> None:
'''
create dir to store photos using search keyword as the name. iterate over
beatuiful soup result set. use 'src' to pass url into image_url
variable. use raise_for_status() to crash the program if url is not valid.
except the MissingSchema exception. download photos into dir with the keyword +
the index as the filename, + '.jpeg' as the file type.
'''
os.makedirs(self.keyword, exist_ok=True) # make dir to store photos
for index, url in enumerate(self.parse_site_html_contents()):
image_url = url.get('src')
try:
res = requests.get(image_url)
res.raise_for_status()
except requests.exceptions.MissingSchema:
continue
image_file = open(os.path.join(self.keyword + '\\' + self.keyword + str(index) + '.jpeg'),'wb')
for chunk in res.iter_content(100000):
image_file.write(chunk)
image_file.close()If you have an iterable in Python, to make a list, one can simply call the list() built-in:
list(cursor.fetchall())
Note that an iterable is often just as useful as a list, and potentially more efficient as it can be lazy.
Your original code fails as it doesn't make too much sense. You loop over the rows and enumerate them, so you get (0, first_row), (1, second_row), etc... - this means you are building up a list of the nth item of each nth row, which isn't what you wanted at all.
This code shows some problems - firstly, list() without any arguments is generally better replaced with an empty list literal ([]), as it's easier to read.
Next, you are trying to loop by index, this is a bad idea in Python. Loop over values, themselves, not indices you then use to get values.
Also note that when you do need to build a list of values like this, a list comprehension is the best way to do it, rather than creating a list, then appending to it.
When I used the answer from Sudhakar Ayyar, the result was a list of lists, as opposed to the list of tuples created by .fetchall(). This was still not what I wanted. With a small change to his code, i was able to get a simple list with all the data from the SQL query:
cursor = connnect_db()
query = "SELECT * FROM `tbl`"
cursor.execute(query)
result = cursor.fetchall() //result = (1,2,3,) or result =((1,3),(4,5),)
final_result = [i[0] for i in result]
Additionally, the last two lines can be combined into:
final_result = [i[0] for i in cursor.fetchall()]