database - awk - print only first line of duplicates and the line below it -

April 15, 2014

i have large database file needs manipulation. need avoid duplicate field 1 delimited '|' for:

-- title1 | title2   |t3   |title4|title5  ----------|----------|-----|------|--------------- -- data1   | same     |     |      |  blah blah eligible  | x1  data1   | same     |     | blah |  blah eligible  | x2  data1   | same     |     | blah |  blah blah eligible  | x2  data2   | same     |     |      |  blah blah eligible  | y1  data2   | same     |     | blah |  blah eligible  | y2 data2   | same     |     | blah |  blah blah blah blah eligible  | y2 data3   | same     |     |      |  blah blah eligible  | z1 data3   | same     |     | blah |  blah eligible  | z2 data3   | same     |     | blah |  blah blah blah blah eligible  | z2

the code using is

begin{ fs = "|" } { count[$1]++; if (count[$1] == 1) first [$1] = $0; if (count[$1] > 1) print first[$1] nr==1; }

but gives me output:

-- title1 | title2   |t3   |title4|title5  ----------|----------|-----|------|--------------- -- data1   | same     |     |      |  blah blah eligible  | x1  data2   | same     |     |      |  blah blah data3   | same     |     |      |  blah blah

i prefer output this:

-- title1 | title2   |t3   |title4|title5  ----------|----------|-----|------|--------------- -- data1   | same     |     |      |  blah blah eligible  | x1  data2   | same     |     |      |  blah blah eligible  | y1  data3   | same     |     |      |  blah blah eligible  | z1

i don't care title block need show data outlined. sorry amateurish explanation solution appreciated. novice when comes linux command line scripting if 1 explain why answer wrong, appreciated. not limited awk, , can use command solution. wanted try solution awk.

you try this:

awk -f\| '(printed!=0 && /eligible/) {print; printed=0;} (!seen[$1] && $1 !~ /eligible/) { print; printed = 1; seen[$1] = 1;  }'

although there's better way.

eta: there's awk tutorial here , several others around web, along books. basically, awk program series of patterns , blocks of code run on every record (line, default) matches pattern.

awk '/foo/          { lines contain "foo" anywhere }      ($1 == "bar")  { lines first field "bar' }      ($nf ~ /baz/)  { lines last field contains "baz" }      (nf == 1)      { lines 1 field }      (nr == 10)     { on 10th line }'

if there's no pattern, block runs on every line.

awk '{print $nf}'   # print last field of every line

if there's no block , pattern, matching lines printed unchanged:

awk '/foo/'      # same grep foo

a block labeled begin run before input processed; block labeled end run after input processed.

awk 'begin { t = 0 } {t += $nf} end { print t }'   # print total of last column

but uninitialized variables treated 0 in arithmetic, can skip initialization:

awk '{t += $nf} end {print t}'

some versions of awk require semicolon ; or newline between pattern/block pairs

Search This Blog

New Mian

database - awk - print only first line of duplicates and the line below it -

Comments

Post a Comment

Popular posts from this blog

android - java.net.UnknownHostException(Unable to resolve host “URL”: No address associated with hostname) -

jquery - How can I dynamically add a browser tab? -

keyboard - C++ GetAsyncKeyState alternative -