ruby - Parse and transform text for articles reproduction -
i have input string one:
if {decided|planned|wish} {to go|gonna} {camping|have outdoor rest|fishing|hunting}, {may like|need|just need|may use} sleeping bag [product name]. {it|this sleeping bag} {is intended|is ideal} [season] , {designed|sewed|made} [type] {type|form-factor}.
now, need things:
- put values square brackets (ex. [product name] become hard wear mountain)
- take random words curly brackets , paste (ex. {decided|planned|wish} become planned}
so, output string one:
if wish go fishing, may sleeping bag hard wear mountain. sleeping bag ideal winter season , designed cocoon form-factor.
i know how resolve #1 problem, but have on idea problem #2. also, there can nested square brackets, ex: {some word|{some word2|{some word3|some word5}}|some word4}.
so need regular expression ruby, or maybe approach solve problem.
suppose our text:
text =
'if {decided|planned|wish} {to go|gonna} {camping|have outdoor rest|fishing|hunting}, {may like|need|just need|may use} sleeping bag [product name]. {it|this sleeping bag} {is intended|is ideal} [season] , {designed|sewed|made} [type] {type|form-factor}. {it is|{really|{not so|all that}}|certainly} great bag.'
notice i've added nested braces in last sentence.
first, obtain replacements specified hash:
h = { '[product name]'=>'hard wear mountain', '[season]'=>'fall', '[type]'=>'underpaid workers' }
as follows:
r = / \[ # match left bracket .+? # match >= 1 characters non-greedily (stop @ 1st right bracket) \] # match right bracket /x str = text.gsub(r,h)
returning:
"if {decided|planned|wish} {to go|gonna} {camping|have outdoor rest|fishing|hunting}, {may like|need|just need|may use} sleeping bag hard wear mountain. {it|this sleeping bag} {is intended|is ideal} fall , {designed|sewed|made} underpaid workers {type|form-factor}. {it is|{really|{not so|all that}}|certainly} great bag."
every string s = [...]
replaced h[s]
if h
has key s
; else no replacement made.
now replacements, beginning inner {...|...|...}
, working outward until no more replacements made:
old = str loop new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) |s| = s[1..-2].split('|') a[rand(a.size)] end break if new==old old=new end old
returning:
"if decided gonna fishing, need sleeping bag hard wear mountain. sleeping bag intended fall , sewed underpaid workers form-factor. great bag."
the idea here sequence of replacements, each time of strings of form '{...|...|... }'
...
's don't contain left bracket, , therefore not contain nested block. show steps, following shows sequential random replacements (which may of course different have above).
1st round of replacements
str # above old = str new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) |s| = s[1..-2].split('|') a[rand(a.size)] end new==old #=> false
now new
equals:
"if planned gonna hunting, need sleeping bag hard wear mountain. ideal fall , made underpaid workers type. {it is|{really|all that}|certainly} great bag."
notice non-nested brace-blocks have been resolved, , nested block:
{it is|{really|{not so|all that}}|certainly}
has been reduced in nesting levels one:
{it is|{really|all that}|certainly}
as {not so|all that}
has been replaced all that
. random replacement in block done follows:
s0 = '{not so|all that}' s1 = s0[1..-2] #=> "not so|all that" = s1.split('|') #=> ["not so", "all that"] a[rand(a.size)] #=> a[rand(2)] => a[1] => "all that"
2nd round of replacements
old=new new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) |s| = s[1..-2].split('|') a[rand(a.size)] end new==old #=> false
new
equals:
"if planned gonna hunting, need sleeping bag hard wear mountain. ideal fall , made underpaid workers type. {it is|all that|certainly} great bag."
3rd round of replacements
old=new new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) |s| = s[1..-2].split('|') a[rand(a.size)] end new==old #=> false
new
equals:
"if planned gonna hunting, need sleeping bag hard wear mountain. ideal fall , made underpaid workers type. great bag."
we finished, won't know until try again , find new == old #=> true
.
4th round of replacements
old=new new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) |s| = s[1..-2].split('|') a[rand(a.size)] end new==old #=> true