Parsing a message with various special characters and splitting into a list (re and regex) Python 2.7 -
i trying parse message receives following delimiters (without quotes):
- delimiter1: "@@@" - followed message
- delimiter2: "!!!" - signal
- delimiter3: "---" - followed message
- delimiter4: "###" - followed message
- delimiter5: "$$$" - followed message
i have far:
import re mystring = '@@@useradd---userfirstadded###userremoved!!!$$$message' result = re.split('\\@\@\@|\\!\!\!|\\---|\\#\#\#|\\$\$\$',mystring) print result
my result far:
['', 'useradd', 'userfirstadded', 'userremoved', '', 'message']
i want result printed console:
['@@@useradd','---userfirstadded','###userremoved','!!!','$$$message']
is possible using re.split or need use re.find or lot better? have been playing re.split delimiters can see maybe guys have lot more experience using functionality within python.
edited solution #1 using re (from @thefourtheye):
here code:
import re mystring = '@@@useradd---userfirstadd%ed###this username@!!!$$$hey whats how you??@@@useradd$$$this email @gmail.com!!!' result = re.findall(r'!!!|(?:@|-|#|\$){3}[\w ^]+', mystring) print result
the result printed follows:
['@@@useradd', '---userfirstadd', '###this username', '!!!', '$$$hey whats how you', '@@@useradd', '$$$this email ', '!!!']
edited new specifications:
everything works specified above , more using following answer below @thefourtheye has suggested. if there possibly more functionality function in allowing 1 or 2 of delimiters or more better if user wanted type email address in message use @ symbol or dollar amount $ etc. if isn't possible, can add delimiters space before , after or possibly @@@ separate using delimiters in message or different type of message. suggestions?
summary: add functionality of accepting characters until hitting delimiter pattern (i.e. @@@) otherwise accept every possible character including characters in delimiter pattern in string (i.e. @@@ not split string) possible?
edited solution #2 using regex (from @hwnd):
regex not installed python 2.7 if using that. need download , install package. these explicit directions took can same.
- go https://pypi.python.org/pypi/regex , @ bottom of page there download links. click on regex-2015.03.18-cp27-none-win32.whl windows operating systems running python 2.7 (otherwise try other ones until successful install works you).
- browse download directory of .whl file downloaded. shift+right click anywhere in directory , click on "open command window here" , type "pip install regex-2015.03.18-cp27-none-win32.whl" , should "successfully installed!"
- you able use regex!
here code:
import regex mystring = '@@@useradd---userfirstadd%ed###this username@!!!$$$hey whats how you??@@@useradd$$$this email @gmail.com!!!' result = filter(none, regex.split(r'(?v1)(!!!)|\s*(?=(?:@|\$|#|-){3})', mystring)) print result
the result printed follows:
['@@@useradd', '---userfirstadd%ed', '###this username@', '!!!', '$$$hey whats how you??', '@@@useradd', '$$$this email @gmail.com', '!!!']
edit: since want retain characters between pattern delimiters, can using regex module, splitting on "!!!" , using lookahead other zero-width matches.
>>> import regex >>> s = '@@@useradd---userfirstadd%ed###this username@!!!$$$hey whats how you??@@@useradd$$$this email @gmail.com!!!' >>> filter(none, regex.split(r'(?v1)(!!!)|\s*(?=(?:@|\$|#|-){3})', s)) ['@@@useradd', '---userfirstadd%ed', '###this username@', '!!!', '$$$hey whats how you??', '@@@useradd', '$$$this email @gmail.com', '!!!']