bash - keep some lines of a file according to some conditions -
i have file of kind :
k1 bla started k1 bla finished k2 blu finished k3 bli started k3 bli died_skipped_permanently k4 blo started k5 ble started k5 ble died_skipped_permanently k6 blou started k6 blou started from this, want obtain file where, when each name in column 1 there finished or died_skipped_permanently, line containing information present , not other ones (with started or other things). moreover, if 2 lines identical (like 1 of k6), want print one.
with example, output be:
k1 bla finished k2 blu finished k3 bli died_skipped_permanently k4 blo started k5 ble died_skipped_permanently k6 blou started i can't delete
grep -v started because names, k4 in example, line present , want know started (or not) need keep info.
i have file names column 1 obtained with:
awk '{print $1}' file | sort | uniq > names # 7,752 lines i thinking loop of kind:
for each names present in file "names", do:
if 1 of line $line contains finished or died_skipped_permanently, print line in output , don't print others. else, keep lines containing name. delete lines identical.
here idea, don't know how can this. appreciate if help
we can use fact started lexicographically greater both finished , died_skipped_permanently , use
sort filename | awk '!seen[$1,$2]++' because started lexicographically greatest, started line appear after finished or died_skipped_permanently line when sort done. awk code wades through sorted lines , prints hasn't seen combination of fields 1 , 2 before.