Regex for URL in python -


url1='http://.www.youtube.com/watch?v=tktzob2vjuk&index=1&list=plqmh7e11v6ozwbtsynq1yyznar709udqx' #url2='www.ssa.gov/cgi-bin/popularnames.cgi' def verify(url):     try:         x=re.search('((^https|http|ftp):)?(/?/?www)\.[a-za-z0-9]+\.[a-za-z]{2,3}\/[-a-za-z0-9?=&%#./]*',url)         print x.group()      except:         print "not valid"  verify(url1) 

shouldnt url invalid there dot before www?

my output shows:

www.youtube.com/watch?v=tktzob2vjuk&index=1&list=plqmh7e11v6ozwbtsynq1yyznar709udqx 

let's break down regex:

(                    # begin group   (^https|http|ftp): # protocol (and https needs @ start) )?                   # end optional group (                    # start group   /?/?               # optional slashes   www                # www )                    # end group ... 

from above, can see both protocol , slashes optional, regex requires www somewhere, regardless of what's @ start.


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -