What does the line del taglist[:]
do in the code below?
import urllib
from bs4 import BeautifulSoup
taglist=list()
url=raw_input("Enter URL: ")
count=int(raw_input("Enter count:"))
position=int(raw_input("Enter position:"))
for i in range(count):
print "Retrieving:",url
html=urllib.urlopen(url).read()
soup=BeautifulSoup(html)
tags=soup('a')
for tag in tags:
taglist.append(tag)
url = taglist[position-1].get('href', None)
del taglist[:]
print "Retrieving:",url
The question is "write a Python program that expands on http://www.pythonlearn.com/code/urllinks.py. The program will use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find".
Sample problem: Start at http://python-data.dr-chuck.net/known_by_Fikret.html
Find the link at position 3 (the first name is 1). Follow that link. Repeat this process 4 times. The answer is the last name that you retrieve.
Sequence of names: Fikret Montgomery Mhairade Butchi Anayah
Last name in sequence: Anayah
No comments:
Post a Comment