bash - keep some lines of a file according to some conditions -
i have file of kind :
k1 bla started k1 bla finished k2 blu finished k3 bli started k3 bli died_skipped_permanently k4 blo started k5 ble started k5 ble died_skipped_permanently k6 blou started k6 blou started
from this, want obtain file where, when each name in column 1 there finished
or died_skipped_permanently
, line containing information present , not other ones (with started or other things). moreover, if 2 lines identical (like 1 of k6), want print one.
with example, output be:
k1 bla finished k2 blu finished k3 bli died_skipped_permanently k4 blo started k5 ble died_skipped_permanently k6 blou started
i can't delete
grep -v started
because names, k4 in example, line present , want know started (or not) need keep info.
i have file names column 1 obtained with:
awk '{print $1}' file | sort | uniq > names # 7,752 lines
i thinking loop of kind:
for each names present in file "names", do:
if 1 of line $line
contains finished
or died_skipped_permanently
, print line in output , don't print others. else, keep lines containing name. delete lines identical.
here idea, don't know how can this. appreciate if help
we can use fact started
lexicographically greater both finished
, died_skipped_permanently
, use
sort filename | awk '!seen[$1,$2]++'
because started
lexicographically greatest, started
line appear after finished
or died_skipped_permanently
line when sort
done. awk code wades through sorted lines , prints hasn't seen combination of fields 1 , 2 before.