database - awk - print only first line of duplicates and the line below it -
i have large database file needs manipulation. need avoid duplicate field 1 delimited '|' for:
-- title1 | title2 |t3 |title4|title5 ----------|----------|-----|------|--------------- -- data1 | same | | | blah blah eligible | x1 data1 | same | | blah | blah eligible | x2 data1 | same | | blah | blah blah eligible | x2 data2 | same | | | blah blah eligible | y1 data2 | same | | blah | blah eligible | y2 data2 | same | | blah | blah blah blah blah eligible | y2 data3 | same | | | blah blah eligible | z1 data3 | same | | blah | blah eligible | z2 data3 | same | | blah | blah blah blah blah eligible | z2
the code using is
begin{ fs = "|" } { count[$1]++; if (count[$1] == 1) first [$1] = $0; if (count[$1] > 1) print first[$1] nr==1; }
but gives me output:
-- title1 | title2 |t3 |title4|title5 ----------|----------|-----|------|--------------- -- data1 | same | | | blah blah eligible | x1 data2 | same | | | blah blah data3 | same | | | blah blah
i prefer output this:
-- title1 | title2 |t3 |title4|title5 ----------|----------|-----|------|--------------- -- data1 | same | | | blah blah eligible | x1 data2 | same | | | blah blah eligible | y1 data3 | same | | | blah blah eligible | z1
i don't care title block need show data outlined. sorry amateurish explanation solution appreciated. novice when comes linux command line scripting if 1 explain why answer wrong, appreciated. not limited awk, , can use command solution. wanted try solution awk.
you try this:
awk -f\| '(printed!=0 && /eligible/) {print; printed=0;} (!seen[$1] && $1 !~ /eligible/) { print; printed = 1; seen[$1] = 1; }'
although there's better way.
eta: there's awk tutorial here , several others around web, along books. basically, awk program series of patterns , blocks of code run on every record (line, default) matches pattern.
awk '/foo/ { lines contain "foo" anywhere } ($1 == "bar") { lines first field "bar' } ($nf ~ /baz/) { lines last field contains "baz" } (nf == 1) { lines 1 field } (nr == 10) { on 10th line }'
if there's no pattern, block runs on every line.
awk '{print $nf}' # print last field of every line
if there's no block , pattern, matching lines printed unchanged:
awk '/foo/' # same grep foo
a block labeled begin run before input processed; block labeled end run after input processed.
awk 'begin { t = 0 } {t += $nf} end { print t }' # print total of last column
but uninitialized variables treated 0 in arithmetic, can skip initialization:
awk '{t += $nf} end {print t}'
some versions of awk require semicolon ;
or newline between pattern/block pairs
Comments
Post a Comment