Regex for URL in python -
url1='http://.www.youtube.com/watch?v=tktzob2vjuk&index=1&list=plqmh7e11v6ozwbtsynq1yyznar709udqx' #url2='www.ssa.gov/cgi-bin/popularnames.cgi' def verify(url): try: x=re.search('((^https|http|ftp):)?(/?/?www)\.[a-za-z0-9]+\.[a-za-z]{2,3}\/[-a-za-z0-9?=&%#./]*',url) print x.group() except: print "not valid" verify(url1)
shouldnt url invalid there dot before www?
my output shows:
www.youtube.com/watch?v=tktzob2vjuk&index=1&list=plqmh7e11v6ozwbtsynq1yyznar709udqx
let's break down regex:
( # begin group (^https|http|ftp): # protocol (and https needs @ start) )? # end optional group ( # start group /?/? # optional slashes www # www ) # end group ...
from above, can see both protocol , slashes optional, regex requires www somewhere, regardless of what's @ start.