python - BeautifulSoup not parsing every tag of the html -
i'm having problem beautifulsoup not parsing html received. tried both lxml , html5lib parsers , had same problem.
html = '<td style="vertical-align: top">1</td> <td style="vertical-align: top"><span class="ui-icon country flg-fr"></span>\t</td><td class="pn"><a class="player-link" href="/players/25604">hugo lloris <span class="incident-wrapper"></span> </a><span class="player-meta-data">29</span><span class="player-meta-data">, gk </span></td> <td class="shotstotal ">0\t</td><td class="shotontarget ">0\t</td><td class="keypasstotal ">0\t</td><td class="passsuccessinmatch ">88\t</td><td class="duelaerialwon ">0\t</td><td class="touches ">35\t</td><td class="rating ">6.24</td> <td style="text-align: left"><span class="incident-wrapper"></span></td> ' parsed_html = ipdb> beautifulsoup(html, 'html5lib') <html><head></head><body>1 <span class="ui-icon country flg-fr"></span> <a class="player-link" href="/players/25604">hugo lloris <span class="incident-wrapper"></span> </a><span class="player-meta-data">29</span><span class="player-meta-data">, gk </span> 0 0 0 88 0 35 6.24 <span class="incident-wrapper"></span> </body></html>
it working me. execute following code (using beautifulsoup4==4.4.1
):
from bs4 import beautifulsoup html = """ <td style="vertical-align: top">1</td> <td style="vertical-align: top"><span class="ui-icon country flg-fr"></span>\t</td> <td class="pn"><a class="player-link" href="/players/25604">hugo lloris <span class="incident-wrapper"></span> </a><span class="player-meta-data">29</span><span class="player-meta-data">, gk </span></td> <td class="shotstotal ">0\t</td> <td class="shotontarget ">0\t</td> <td class="keypasstotal ">0\t</td> <td class="passsuccessinmatch ">88\t</td> <td class="duelaerialwon ">0\t</td> <td class="touches ">35\t</td> <td class="rating ">6.24</td> <td style="text-align: left"><span class="incident-wrapper"></span></td> """ parsed_html = beautifulsoup(html, 'html5lib') print(html)
and i've got following html printed:
<td style="vertical-align: top">1</td> <td style="vertical-align: top"><span class="ui-icon country flg-fr"></span> </td> <td class="pn"><a class="player-link" href="/players/25604">hugo lloris <span class="incident-wrapper"></span> </a><span class="player-meta-data">29</span><span class="player-meta-data">, gk </span></td> <td class="shotstotal ">0 </td> <td class="shotontarget ">0 </td> <td class="keypasstotal ">0 </td> <td class="passsuccessinmatch ">88 </td> <td class="duelaerialwon ">0 </td> <td class="touches ">35 </td> <td class="rating ">6.24</td> <td style="text-align: left"><span class="incident-wrapper"></span></td>
don't see missing.
Comments
Post a Comment