xml - Xpath for an <article> HTML5 tag -
i'm using =importxml function on google spreadsheets scrape information different sites. i'm having bad time trying text inside <article> tag using xpath.
here source data.
<div id="blog-post-body-ad" class="ad"> </div> <article class="blog-post-body"> <p>fox's <em>x-men </em>drama <em>hellfire </em>is making change @ top.</p> <p>writers evan katz , manny coto, co-created drama, exiting, <em>the hollywood reporter </em>has learned. out patrick mckay , john d. payne, came the story drama alongside katz , coto , set pen script. search under way new writer.</p> <p>the changes come <em>hellfire </em>is on slower development track, insiders say. <em>hellfire, </em>which was <a href="http://www.hollywoodreporter.com/live-feed/fox-nears-deal-x-men-813542">considered live-action <em>x-men</em></a>, follows young special agent learns power-hungry woman extraordinary abilities working clandestine society of millionaires — known "the hellfire club" — take on world.</p> <p> <div class="embedded-content" data-nid="832221" data-nodetype="blog" data-template="readmore"> <script type="application/json"> { "nid": 832221, "type": "blog", "title": "marvel sets 'legion' pilot noah hawley @ fx, readying 'hellfire' fox", "path": "http://www.hollywoodreporter.com/live-feed/marvel-legion-noah-hawley-fx-832221", "relative-path": "/live-feed/marvel-legion-noah-hawley-fx-832221" } </script> </div></p> <p>sources <em>x-men </em>drama not go pilot season remains on slower track. change comes katz , coto shifting focus fox's <em><a href="http://www.hollywoodreporter.com/live-feed/fox-greenlights-prison-break-event-856203" target="_blank">24: legacy</a>, </em>which received formal pilot order friday during fox's time in front of press @ television critics association's winter press tour. new take on 24 feature entirely new cast diverse lead fox has high hopes reboot franchise new era.</p> <p>the change @ top should not worry diehard fans of <em>x-men </em>franchise. sources fox remains committed <em>hellfire </em>and wants right <em>x-men </em>franchise remains valuable asset company. should <em>hellfire</em> go series , network renew batman prequel <em>gotham, </em>the network have dramas both comic book powerhouses dc comics , marvel — first broadcast network , insiders love see on schedule.</p> <p> </p> <footer class="blog-post-tags"> <a href="/topic/tv-development" data-tracklabel="story - bottom tags tv development">tv development</a> </footer> </article> <div class="blog-post-footer-ad"> using google chrome > inspect > copy xpath
//*[@id="page-content"]/div[1]/article i try google sheets gives me parsing error.
i try solution on question on stack overflow not working me:
=importxml(c2,"//article[contains(concat('', normalize-space(@class), ''), '')//div[@class='blog-post-body']]") what i'm trying achieve text inside <article> tag , big plus text of <article> without or excluding <div class="embedded-content"> in middle of article.
this works article:
=concatenate(importxml("http://www.hollywoodreporter.com/live-feed/foxs-x-men-spinoff-showrunners-856338","//p[3] | //p[4] | //p[5] | //p[6] "))
Comments
Post a Comment