Parsing a message with various special characters and splitting into a list (re and regex) Python 2.7 -


i trying parse message receives following delimiters (without quotes):

  • delimiter1: "@@@" - followed message
  • delimiter2: "!!!" - signal
  • delimiter3: "---" - followed message
  • delimiter4: "###" - followed message
  • delimiter5: "$$$" - followed message

i have far:

import re mystring = '@@@useradd---userfirstadded###userremoved!!!$$$message' result = re.split('\\@\@\@|\\!\!\!|\\---|\\#\#\#|\\$\$\$',mystring) print result 

my result far:

['', 'useradd', 'userfirstadded', 'userremoved', '', 'message'] 

i want result printed console:

['@@@useradd','---userfirstadded','###userremoved','!!!','$$$message'] 

is possible using re.split or need use re.find or lot better? have been playing re.split delimiters can see maybe guys have lot more experience using functionality within python.

edited solution #1 using re (from @thefourtheye):

here code:

import re    mystring = '@@@useradd---userfirstadd%ed###this username@!!!$$$hey whats how you??@@@useradd$$$this email @gmail.com!!!' result = re.findall(r'!!!|(?:@|-|#|\$){3}[\w ^]+', mystring) print result 

the result printed follows:

['@@@useradd', '---userfirstadd', '###this username', '!!!', '$$$hey whats how you', '@@@useradd', '$$$this email ', '!!!'] 

edited new specifications:

everything works specified above , more using following answer below @thefourtheye has suggested. if there possibly more functionality function in allowing 1 or 2 of delimiters or more better if user wanted type email address in message use @ symbol or dollar amount $ etc. if isn't possible, can add delimiters space before , after or possibly @@@ separate using delimiters in message or different type of message. suggestions?

summary: add functionality of accepting characters until hitting delimiter pattern (i.e. @@@) otherwise accept every possible character including characters in delimiter pattern in string (i.e. @@@ not split string) possible?

edited solution #2 using regex (from @hwnd):

regex not installed python 2.7 if using that. need download , install package. these explicit directions took can same.

  1. go https://pypi.python.org/pypi/regex , @ bottom of page there download links. click on regex-2015.03.18-cp27-none-win32.whl windows operating systems running python 2.7 (otherwise try other ones until successful install works you).
  2. browse download directory of .whl file downloaded. shift+right click anywhere in directory , click on "open command window here" , type "pip install regex-2015.03.18-cp27-none-win32.whl" , should "successfully installed!"
  3. you able use regex!

here code:

import regex     mystring = '@@@useradd---userfirstadd%ed###this username@!!!$$$hey whats how you??@@@useradd$$$this email @gmail.com!!!' result = filter(none, regex.split(r'(?v1)(!!!)|\s*(?=(?:@|\$|#|-){3})', mystring))  print result 

the result printed follows:

['@@@useradd', '---userfirstadd%ed', '###this username@', '!!!', '$$$hey whats how you??', '@@@useradd', '$$$this email @gmail.com', '!!!'] 

edit: since want retain characters between pattern delimiters, can using regex module, splitting on "!!!" , using lookahead other zero-width matches.

>>> import regex >>> s = '@@@useradd---userfirstadd%ed###this username@!!!$$$hey whats how you??@@@useradd$$$this email @gmail.com!!!' >>> filter(none, regex.split(r'(?v1)(!!!)|\s*(?=(?:@|\$|#|-){3})', s)) ['@@@useradd', '---userfirstadd%ed', '###this username@', '!!!', '$$$hey whats how you??', '@@@useradd', '$$$this email @gmail.com', '!!!'] 

Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -