xml - Xpath for an <article> HTML5 tag -

July 15, 2012

i'm using =importxml function on google spreadsheets scrape information different sites. i'm having bad time trying text inside <article> tag using xpath.

here source data.

<div id="blog-post-body-ad" class="ad">     </div>      <article class="blog-post-body">         <p>fox&#39;s <em>x-men </em>drama <em>hellfire </em>is making change @ top.</p> <p>writers evan katz , manny coto, co-created drama, exiting, <em>the hollywood reporter </em>has learned. out patrick mckay , john d. payne, came the story drama alongside katz , coto , set pen script. search under way new writer.</p> <p>the changes come <em>hellfire </em>is on slower development track, insiders say. <em>hellfire, </em>which was&nbsp;<a href="http://www.hollywoodreporter.com/live-feed/fox-nears-deal-x-men-813542">considered live-action&nbsp;<em>x-men</em></a>, follows young special agent learns power-hungry woman extraordinary abilities working clandestine society of millionaires &mdash; known &quot;the hellfire club&quot; &mdash; take on world.</p> <p>     <div class="embedded-content" data-nid="832221" data-nodetype="blog" data-template="readmore">       <script type="application/json">         {           "nid": 832221,           "type": "blog",           "title": "marvel sets &#039;legion&#039; pilot noah hawley @ fx, readying &#039;hellfire&#039; fox",           "path": "http://www.hollywoodreporter.com/live-feed/marvel-legion-noah-hawley-fx-832221",           "relative-path": "/live-feed/marvel-legion-noah-hawley-fx-832221"         }       </script>     </div></p> <p>sources <em>x-men </em>drama not go pilot season remains on slower track. change comes katz , coto shifting focus fox&#39;s <em><a href="http://www.hollywoodreporter.com/live-feed/fox-greenlights-prison-break-event-856203" target="_blank">24: legacy</a>, </em>which received formal pilot order friday during fox&#39;s time in front of press @ television critics association&#39;s winter press tour. new take on 24 feature entirely new cast diverse lead fox has high hopes reboot franchise new era.</p> <p>the change @ top should not worry diehard fans of <em>x-men </em>franchise. sources fox remains committed <em>hellfire </em>and wants right <em>x-men </em>franchise remains valuable asset company. should <em>hellfire</em> go series , network renew batman prequel <em>gotham, </em>the network have dramas both comic book powerhouses dc comics , marvel &mdash; first broadcast network , insiders love see on schedule.</p> <p>&nbsp;</p>          <footer class="blog-post-tags">                             <a href="/topic/tv-development" data-tracklabel="story - bottom tags tv development">tv development</a>                     </footer>     </article>      <div class="blog-post-footer-ad">

using google chrome > inspect > copy xpath

//*[@id="page-content"]/div[1]/article

i try google sheets gives me parsing error.

i try solution on question on stack overflow not working me:

=importxml(c2,"//article[contains(concat('', normalize-space(@class), ''), '')//div[@class='blog-post-body']]")

what i'm trying achieve text inside <article> tag , big plus text of <article> without or excluding <div class="embedded-content"> in middle of article.

this works article:

=concatenate(importxml("http://www.hollywoodreporter.com/live-feed/foxs-x-men-spinoff-showrunners-856338","//p[3] | //p[4] | //p[5] | //p[6] "))

Search This Blog

Two

xml - Xpath for an <article> HTML5 tag -

Comments

Post a Comment

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

android - Keyboard hides my half of edit-text and button below it even in scroll view -

css - Make div keyboard-scrollable in jQuery Mobile? -