web scraping - How to scrape using Python a link from a html class -


i attempting grab link website. sound of word. website http://dictionary.reference.com/browse/would?s=t

so using following code link coming up blank. weird because can use similar set , pull data stock. idea build program gives sound of word ask spelling. kids pretty much. needed go through list of words links in dictionary having trouble getting link print out. i'm using urllib , re code below.

import urllib import re words = [ "would","your", "apple", "orange"]  word in words:     urll = "http://dictionary.reference.com/browse/" + word + "?s=t" #produces link     htmlfile = urllib.urlopen(urll)     htmltext = htmlfile.read()     regex = '<a class="speaker" href =>(.+?)</a>' #puts tag     pattern = re.compile(regex)     link = re.findall(pattern, htmltext)     print "the link word", word, link #should print link 

this expected output word http://static.sfdict.com/staticrep/dictaudio/w02/w0245800.mp3

you should fix regular expression grab inside href attribute value:

<a class="speaker" href="(.*?)" 

note should consider switching regex html parsers, beautifulsoup.

here how can apply beautifulsoup in case:

import urllib  bs4 import beautifulsoup  words = ["would","your", "apple", "orange"]  word in words:     urll = "http://dictionary.reference.com/browse/" + word + "?s=t" #produces link     htmlfile = urllib.urlopen(urll)      soup = beautifulsoup(htmlfile, "html.parser")     links = [link["href"] link in soup.select("a.speaker")]      print(word, links) 

Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -