html - How to scrape data from a table using a loop to get all td data using python -


so trying data website. , i'm having hard time getting data. can player names thats @ point. been trying different things coming short. here sample code i'm trying go through. note there 2 tables (one each team). , class each player alternates "even" "odd" or "odd" "even" example html file below followed python script. labeled parts want. using python 2.7

`<table id="nbagiteamstats" cellpadding="0" cellspacing="0">       <thead class="nbagiclippers">          <tr>             <th colspan="17">los angeles clippers (1-0)</th> <!-- want team name  -->          </tr>       </thead>       <tbody><tr colspan="17">          <td colspan="17" class="nbagiboxcat"><span>field goals</span><span>rebounds</span></td>       </tr>       <tr>      <td class="nbagiteamhdrstatsnobord" colspan="1">&nbsp;</td>      <td class="nbagiteamhdrstats">pos</td>      <td class="nbagiteamhdrstats">min</td>      <td class="nbagiteamhdrstats">fgm-a</td>      <td class="nbagiteamhdrstats">3pm-a</td>      <td class="nbagiteamhdrstats">ftm-a</td>      <td class="nbagiteamhdrstats">+/-</td>      <td class="nbagiteamhdrstats">off</td>      <td class="nbagiteamhdrstats">def</td>      <td class="nbagiteamhdrstats">tot</td>      <td class="nbagiteamhdrstats">ast</td>      <td class="nbagiteamhdrstats">pf</td>      <td class="nbagiteamhdrstats">st</td>      <td class="nbagiteamhdrstats">to</td>      <td class="nbagiteamhdrstats">bs</td>      <td class="nbagiteamhdrstats">ba</td>      <td class="nbagiteamhdrstats">pts</td>   </tr>   <tr class="odd">      <td id="nbagiboxnme" class="b"><a href="/playerfile/paul_pierce/index.html">p. pierce</a></td> <!-- want player name  -->      <td class="nbagiposition">f</td> <!-- want position name  -->      <td>14:16</td> <!-- want  -->      <td>1-4</td>  <!-- want  -->      <td>1-2</td>  <!-- want  -->      <td>2-2</td>  <!-- want  -->      <td>+12</td>  <!-- want  -->      <td>1</td>  <!-- want  -->      <td>0</td>  <!-- want  -->      <td>1</td>  <!-- want  -->      <td>1</td>  <!-- want  -->      <td>3</td>  <!-- want  -->      <td>2</td>  <!-- want  -->      <td>0</td>  <!-- want  -->      <td>0</td>  <!-- want  -->      <td>0</td>  <!-- want  -->      <td>5</td>  <!-- want  -->   </tr>    <tr class="even">      <td id="nbagiboxnme" class="b"><a href="/playerfile/blake_griffin/index.html">b. griffin</a></td>  <!-- want  -->      <td class="nbagiposition">f</td>  <!-- want  -->      <td>26:19</td>  <!-- want  -->      <td>5-14</td>  <!-- want  -->      <td>0-1</td>  <!-- want  -->      <td>1-1</td>  <!-- want  -->      <td>+14</td>  <!-- want  -->      <td>0</td>  <!-- want  -->      <td>5</td>  <!-- want  -->      <td>5</td>  <!-- want  -->      <td>2</td>  <!-- want  -->      <td>1</td>  <!-- want  -->      <td>1</td>  <!-- want  -->      <td>1</td>  <!-- want  -->      <td>1</td>  <!-- want  -->      <td>1</td>  <!-- want  -->      <td>11</td>  <!-- want  -->   </tr>   <tr class="odd">      <td id="nbagiboxnme" class="b"><a href="/playerfile/deandre_jordan/index.html">d. jordan</a></td>  <!-- want  -->      <td class="nbagiposition">c</td>  <!-- want  -->      <td>26:27</td>  <!-- want  -->      <td>6-7</td>  <!-- want  -->      <td>0-0</td>  <!-- want  -->      <td>3-5</td>  <!-- want  -->      <td>+19</td>  <!-- want  -->      <td>1</td>  <!-- want  -->      <td>11</td>  <!-- want  -->      <td>12</td>  <!-- want  -->      <td>0</td>  <!-- want  -->      <td>1</td>  <!-- want  -->      <td>0</td>  <!-- want  -->      <td>2</td>  <!-- want  -->      <td>3</td>  <!-- want  -->      <td>0</td>  <!-- want  -->      <td>15</td>  <!-- want  -->   </tr>    <!-- , on keep changing class odd even, odd  -->     <!-- note there tables 1 each team  -->    <!--this table id>>> <table id="nbagiteamstats" cellpadding="0" cellspacing="0"> -->` 

this long wanted give example of classes switching here python script plan use dictionary save data once scrape successfully.

import urllib import urllib2 bs4 import beautifulsoup import re gamesforday = ['/games/20151002/denlac/gameinfo.html'] game in gamesforday:    url =  "http://www.nba.com/"+game    page = urllib2.urlopen(url).read()    soup = beautifulsoup(page)    tr in soup.find_all('table id="nbagiteamstats'):     tds = tr.find_all('td')     print tds 

here solution. note have different version of beautifulsoup, not 1 coming bs4, logic might not off. still on python2.7 (on windows in case).

you need fix nuances player sections not display above, think you'll able handle part :-)

import urllib import urllib2 # bs4 import beautifulsoup beautifulsoup import beautifulsoup import re gamesforday = ['/games/20151002/denlac/gameinfo.html'] game in gamesforday:    url =  "http://www.nba.com/"+game    page = urllib2.urlopen(url).read()    soup = beautifulsoup(page)     # fetch tables interested in    tables = soup.findall(id="nbagiteamstats")    table in tables:        team_name = table.thead.tr.th.text        # odd/even class rows (tr)        rows = [ x x in table.findall('tr') if x.get('class',none) in ['odd','even'] ]        player in rows:            # search row cols based on 'id'            player_name = player.find('td', attrs={'id':'nbagiboxnme'}).text             # search row cols based on 'class'            player_position = player.find('td', attrs={'class':'nbagiposition'}).text             # search td class not defined            player_numbers = [ x.text x in player.findall('td', attrs={'class':none})]             print player_name, player_position, player_numbers 

with bs4 (beautifulsoup4 learned) modifications had done. still have handle stuff, extract of data want:

import urllib import urllib2 bs4 import beautifulsoup import re gamesforday = ['/games/20151002/denlac/gameinfo.html'] game in gamesforday:    url =  "http://www.nba.com/"+game    page = urllib2.urlopen(url).read()    soup = beautifulsoup(page, "html.parser")     # fetch tables interested in    tables = soup.findall(id="nbagiteamstats")    table in tables:        team_name = table.thead.tr.th.text        # odd/even class rows (tr)        rows = table.find_all(attrs={'class':'odd'})        rows.extend(table.find_all(attrs={'class':'even'}))         player in rows:            # search row cols based on 'id'            player_name = player.find('td', attrs={'id':'nbagiboxnme'}).text             # search row cols based on 'class'            player_position = player.find('td', attrs={'class':'nbagiposition'}).text             # search td class not defined            player_numbers = [ x.text x in player.findall('td', attrs={'class':none})]             print player_name, player_position, player_numbers 

Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -