python - urlopen() not working after using urljoin -


i trying open url search html code word using urlopen().(just crawler). not work when use after urljoin. there way can this.

here code

while len(urls) > 0:         htmltext = urlopen(urls[0]).read()         soup = beautifulsoup(htmltext)         tag in soup.findall('a',href=true):                 tag['href'] = urljoin(url1,tag['href'])                 #in_code=tag['href'].read()                 in_code = urlopen(tag['href'])                 print(in_code)                 #print(tag['href'])                 htmlcode = tag['href'].find('student')                 if htmlcode > 0:                         file.write(tag['href']+'\n')         urls.pop(); file.close() 

this error get

c:\python27\crawler>webcrw.py traceback (most recent call last):   file "c:\python27\crawler\webcrw.py", line 21, in <module>     in_code = urlopen(tag['href'])   file "c:\python27\lib\urllib.py", line 86, in urlopen     return opener.open(url)   file "c:\python27\lib\urllib.py", line 207, in open     return getattr(self, name)(url)   file "c:\python27\lib\urllib.py", line 344, in open_http     h.endheaders(data)   file "c:\python27\lib\httplib.py", line 954, in endheaders     self._send_output(message_body)   file "c:\python27\lib\httplib.py", line 814, in _send_output     self.send(msg)   file "c:\python27\lib\httplib.py", line 776, in send     self.connect()   file "c:\python27\lib\httplib.py", line 757, in connect     self.timeout, self.source_address)   file "c:\python27\lib\socket.py", line 553, in create_connection     res in getaddrinfo(host, port, 0, sock_stream): ioerror: [errno socket error] [errno 11004] getaddrinfo failed 


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

android - Keyboard hides my half of edit-text and button below it even in scroll view -

css - Make div keyboard-scrollable in jQuery Mobile? -