c# - regex gets stuck with this call -
i'm working on movie scraper / auto-downloader iterates on current movie collection, finds new recommendations, , downloads new goods.
there part scrape imdb metadata , seems stuck in 1 spot , can't seem figure out why.... has run same code different imdb pages fine (this 29th iteration of new page)
i using c#!
the code:
private string match(string regex, string html, int = 1) { return new regex(regex, regexoptions.multiline).match(html).groups[i].value.trim(); }
regex parameter string contents:
<title>.*?\\(.*?(\\d{4}).*?\\).*?</title>
html parameter string contents: big paste here, literally html string representation of http://www.imdb.com/title/tt4422748/combined
if in chrome, can view with:
view-source:http://www.imdb.com/title/tt4422748/combined
i have paused execution in visual studio , stepped forward, continues run hangs (it doesn't let me step, runs). if hit pause again return same spot same parameter values (and no not calling in infinite loop. i'm pretty new regex appreciated!
use of .*
saying want match everything, yet nothing. each use of causes parser backtrack on many different possibilities becomes unresponsive , appears lock up.
does person designing pattern not know if there going text there or not title? bet 99% of time title has text..so why .*
used, how .+
@ least?
if want text between delimiters, use this
title\>(?<title>[^<]+)\</title
then extract matched text through named group "title" instead of group[0]. group[1] have actual match text if 1 loathes named match captures.
answer regex haters
use html agility pack.