Selenium Scrape in Python -
sorry basic question, still trying learn. trying figure out smart way of scraping stock data following html, using selenium2 , python (multiple <tr> of following on page):
<a name="line209"></a><tr align="right" class="odd" nowrap> <a name="line210"></a><td>& </td> <a name="line211"></a><td align="left"><strong> <a name="line212"></a>bac n</strong></td> <a name="line213"></a><td>+</td> <a name="line214"></a><td>17.45</td> <a name="line215"></a><td>17.49</td> <a name="line216"></a><td><strong>17.47</strong></td> <a name="line217"></a><td><strong><font class="fontgreen"> <a name="line218"></a>0.14 (0.81%)</font></strong></td> <a name="line219"></a><td>81,974,096</td> <a name="line220"></a><td align="middle"></td> <a name="line221"></a><td>& </td> <a name="line222"></a></tr> of above code, need extract:
- bac n
- +
- 17.45
- 17.49
- 17.47
- 0.14 (0.81%)
- 81,974,096
ok following code want do. in spirit of learning, make more efficient. hope can help:
def getdata(): tickerdata=[] tickercounter=0 ignoretext=['symbol','t','bid','ask','last',' ','','change','volume','fsi','buy sell '] if quotetype=="summary": numdatapoints=9 elif quotetype=="detail": numdatapoints=21 tr in driver.find_elements_by_xpath("//table[contains(@class, 'tablestyle2')]"): tds=tr.find_elements_by_tag_name('td') td in tds: if td.text not in ignoretext: if len(tickerdata) == numdatapoints: insertdata(tickerdata,tickercounter) tickerdata=[] tickercounter += 1 tickerdata.append(td.text) insertdata(tickerdata,tickercounter) thanks in advance!!
load string variable called html.
from bs4 import beautifulsoup soup = beautifulsoup(html) tags = soup.findall('td') tag in tags: print tag.gettext() beautifulsoup 1 of many ways parse data. use pure python functions if understand basic python finding strings "
Comments
Post a Comment