c# - Efficient way to tokenize conditionally -
given user input string:
"mainframes/pl/ sql; software testing/pl/sql/project management/"
what way tokenize string such '/' retained if part of "pl/ sql", not otherwise, giving tokens:
"mainframes", "pl/ sql", "software testing", "pl/sql", "project management"
this because users may accidentally enter '/' character separator.
if order of tokens isn't important might work:
public ienumerable<string> tokenise() { var input = "mainframes/pl/ sql; software testing/pl/sql/project management/"; var results = new list<string>(); foreach (match match in regex.matches(input, @"pl\s*/\s*sql", regexoptions.ignorecase)) { results.add(match.value); } input = regex.replace(input, @"pl\s*/\s*sql", string.empty, regexoptions.ignorecase); results.addrange(input.split(new []{'/'}, stringsplitoptions.removeemptyentries)); return results; }
this starts searching pl/sql tokens (accounting differences in whitespace , capitalisation) strips them out of input string , performs simple split on remaining '/' characters. downside order of tokens different input string.