ruby - Parse and transform text for articles reproduction -


i have input string one:

if {decided|planned|wish} {to go|gonna} {camping|have outdoor rest|fishing|hunting}, {may like|need|just need|may use} sleeping bag [product name]. {it|this sleeping bag} {is intended|is ideal} [season] , {designed|sewed|made} [type] {type|form-factor}.

now, need things:

  1. put values square brackets (ex. [product name] become hard wear mountain)
  2. take random words curly brackets , paste (ex. {decided|planned|wish} become planned}

so, output string one:

if wish go fishing, may sleeping bag hard wear mountain. sleeping bag ideal winter season , designed cocoon form-factor.

i know how resolve #1 problem, but have on idea problem #2. also, there can nested square brackets, ex: {some word|{some word2|{some word3|some word5}}|some word4}.

so need regular expression ruby, or maybe approach solve problem.

suppose our text:

text = 

'if {decided|planned|wish} {to go|gonna} {camping|have outdoor rest|fishing|hunting}, {may like|need|just need|may use} sleeping bag [product name]. {it|this sleeping bag} {is intended|is ideal} [season] , {designed|sewed|made} [type] {type|form-factor}. {it is|{really|{not so|all that}}|certainly} great bag.'

notice i've added nested braces in last sentence.

first, obtain replacements specified hash:

h = { '[product name]'=>'hard wear mountain',       '[season]'=>'fall',       '[type]'=>'underpaid workers' } 

as follows:

r = /     \[  # match left bracket     .+? # match >= 1 characters non-greedily (stop @ 1st right bracket)     \]  # match right bracket     /x  str = text.gsub(r,h) 

returning:

"if {decided|planned|wish} {to go|gonna} {camping|have outdoor rest|fishing|hunting}, {may like|need|just need|may use} sleeping bag hard wear mountain. {it|this sleeping bag} {is intended|is ideal} fall , {designed|sewed|made} underpaid workers {type|form-factor}. {it is|{really|{not so|all that}}|certainly} great bag."

every string s = [...] replaced h[s] if h has key s; else no replacement made.

now replacements, beginning inner {...|...|...} , working outward until no more replacements made:

old = str    loop   new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) |s|         = s[1..-2].split('|')         a[rand(a.size)]   end   break if new==old   old=new  end old  

returning:

"if decided gonna fishing, need sleeping bag hard wear mountain. sleeping bag intended fall , sewed underpaid workers form-factor. great bag."

the idea here sequence of replacements, each time of strings of form '{...|...|... }' ...'s don't contain left bracket, , therefore not contain nested block. show steps, following shows sequential random replacements (which may of course different have above).

1st round of replacements

str # above old = str   new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) |s|         = s[1..-2].split('|')         a[rand(a.size)]       end new==old #=> false  

now new equals:

"if planned gonna hunting, need sleeping bag hard wear mountain. ideal fall , made underpaid workers type. {it is|{really|all that}|certainly} great bag."

notice non-nested brace-blocks have been resolved, , nested block:

{it is|{really|{not so|all that}}|certainly} 

has been reduced in nesting levels one:

{it is|{really|all that}|certainly} 

as {not so|all that} has been replaced all that. random replacement in block done follows:

 s0 = '{not so|all that}'  s1 = s0[1..-2]    #=> "not so|all that"    = s1.split('|')    #=> ["not so", "all that"]   a[rand(a.size)]    #=> a[rand(2)] => a[1] => "all that" 

2nd round of replacements

old=new  new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) |s|         = s[1..-2].split('|')         a[rand(a.size)]       end new==old #=> false  

new equals:

"if planned gonna hunting, need sleeping bag hard wear mountain. ideal fall , made underpaid workers type. {it is|all that|certainly} great bag."

3rd round of replacements

old=new  new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) |s|         = s[1..-2].split('|')         a[rand(a.size)]       end new==old #=> false  

new equals:

"if planned gonna hunting, need sleeping bag hard wear mountain. ideal fall , made underpaid workers type. great bag."

we finished, won't know until try again , find new == old #=> true.

4th round of replacements

old=new  new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) |s|         = s[1..-2].split('|')         a[rand(a.size)]       end new==old #=> true 

Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -