No one has voted on any posts yet. Votes from other community members are used to determine a member's reputation amongst their peers.
2 Posted Topics
I am building a crawler+parser in Python. It has to be run for, like 20 hours. How can I modify the code such that the code execution pauses (before next urllib2.urlopen) when the internet is disconnected, and AUTOMATICALLY resumes with the same variable values, when the internet connection is back …
Parsing a page with 8000+ urls with BeautifulSoup this is the page [CODE]http://www.thehindubusinessline.com/cgi-bin/bl2002.pl?mainclass=03[/CODE] this is my code [CODE] from urllib2 import URLError,urlopen import re from BeautifulSoup import BeautifulSoup, SoupStrainer def gethtml(address): try: raw=urlopen(address) raw=raw.read() except URLError: raw='Error occured' return raw dat=gethtml("http://www.thehindubusinessline.com/cgi-bin/bl2002.pl?mainclass=03") print 'got html' a_tag=SoupStrainer('a') html_atag = BeautifulSoup(dat, parseOnlyThese=a_tag) print …
The End.
amrutraj