web scraping - How to scrape using Python a link from a html class -

September 15, 2010

i attempting grab link website. sound of word. website http://dictionary.reference.com/browse/would?s=t

so using following code link coming up blank. weird because can use similar set , pull data stock. idea build program gives sound of word ask spelling. kids pretty much. needed go through list of words links in dictionary having trouble getting link print out. i'm using urllib , re code below.

import urllib import re words = [ "would","your", "apple", "orange"]  word in words:     urll = "http://dictionary.reference.com/browse/" + word + "?s=t" #produces link     htmlfile = urllib.urlopen(urll)     htmltext = htmlfile.read()     regex = '<a class="speaker" href =>(.+?)</a>' #puts tag     pattern = re.compile(regex)     link = re.findall(pattern, htmltext)     print "the link word", word, link #should print link

this expected output word http://static.sfdict.com/staticrep/dictaudio/w02/w0245800.mp3

you should fix regular expression grab inside href attribute value:

<a class="speaker" href="(.*?)"

note should consider switching regex html parsers, beautifulsoup.

here how can apply beautifulsoup in case:

import urllib  bs4 import beautifulsoup  words = ["would","your", "apple", "orange"]  word in words:     urll = "http://dictionary.reference.com/browse/" + word + "?s=t" #produces link     htmlfile = urllib.urlopen(urll)      soup = beautifulsoup(htmlfile, "html.parser")     links = [link["href"] link in soup.select("a.speaker")]      print(word, links)

Search This Blog

Two

web scraping - How to scrape using Python a link from a html class -

Comments

Post a Comment

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

android - Keyboard hides my half of edit-text and button below it even in scroll view -

css - Make div keyboard-scrollable in jQuery Mobile? -