web scraping - How to scrape using Python a link from a html class -
i attempting grab link website. sound of word. website http://dictionary.reference.com/browse/would?s=t
so using following code link coming up blank. weird because can use similar set , pull data stock. idea build program gives sound of word ask spelling. kids pretty much. needed go through list of words links in dictionary having trouble getting link print out. i'm using urllib , re code below.
import urllib import re words = [ "would","your", "apple", "orange"] word in words: urll = "http://dictionary.reference.com/browse/" + word + "?s=t" #produces link htmlfile = urllib.urlopen(urll) htmltext = htmlfile.read() regex = '<a class="speaker" href =>(.+?)</a>' #puts tag pattern = re.compile(regex) link = re.findall(pattern, htmltext) print "the link word", word, link #should print link
this expected output word http://static.sfdict.com/staticrep/dictaudio/w02/w0245800.mp3
you should fix regular expression grab inside href
attribute value:
<a class="speaker" href="(.*?)"
note should consider switching regex html parsers, beautifulsoup
.
here how can apply beautifulsoup
in case:
import urllib bs4 import beautifulsoup words = ["would","your", "apple", "orange"] word in words: urll = "http://dictionary.reference.com/browse/" + word + "?s=t" #produces link htmlfile = urllib.urlopen(urll) soup = beautifulsoup(htmlfile, "html.parser") links = [link["href"] link in soup.select("a.speaker")] print(word, links)
Comments
Post a Comment