javascript - Get YouTube video ID from URL with Python and Regex -
i retrieve video id part of youtube url part of html anchor element using regex:
<a href="http://www.youtube.com/watch?v=nc2blnl0wte">some text</a> i have looked around solutions. found 1 javascript solution took video id url so:
/https?:\/\/(?:[0-9a-z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\s*[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|<\/a>))[?=&+%\w.-]*/ig i use in python supports every variance of youtube's urls. implemented in python script:
string = re.sub(r'https?:\/\/(?:[0-9a-z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\s*[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:[\'"][^<>]*>|<\/a>))[?=&+%\w.-]*', r'\1', string) and no replacements. removed / , /ig regex in javascript still can't pick video id. once able pick id, can change around regex remove anchor element.
what have done wrong solution? thanks.
i don't think (scroll right see part denoted ^^) supposed negative lookahead:
https?:\/\/(?:[0-9a-z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\s*[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|<\/a>))[?=&+%\w.-]* ^^ i believe should non-capturing group (i.e., ?! should ?:).
>>> import re >>> html = '<a href="http://www.youtube.com/watch?v=nc2blnl0wte">some text</a>' >>> pattern = re.compile(r"""https?:\/\/(?:[0-9a-z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\s*[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?:[?=&+%\w.-]*(?:['"][^<>]*>|<\/a>))[?=&+%\w.-]*""", re.ignorecase) >>> re.search(pattern, html).groups() ('nc2blnl0wte',) edit: notice had use re.ignorecase. because regex, as-is, won't match www in www.youtube.com. need [0-9a-z-] [0-9a-za-z-]. however, safer ignoring case don't have worry other text in url.
edit2: negative lookahead, means never able have match when url followed ending , closing of anchor tag (">blah blah blah</a>).
Comments
Post a Comment