hiveql - In hive, how to do a calculation among 2 rows? -

May 15, 2013

i have table.

   +------------------------------------------------------------+    |     ks      |      time     |     val1      |    val2      |     +-------------+---------------+---------------+--------------+    |           |       1       |       1       |      1       |    |     b       |       1       |       3       |      5       |    |           |       2       |       6       |      7       |    |     b       |       2       |      10       |     12       |    |           |       4       |       6       |      7       |    |     b       |       4       |      20       |     26       |    +------------------------------------------------------------+

what want each row,

ks |  time |  val1 | val1 of next ts of same ks  |

to clear, result of above example should be,

   +------------------------------------------------------------+    |     ks      |      time     |     val1      |   next.val1  |     +-------------+---------------+---------------+--------------+    |           |       1       |       1       |       6      |    |     b       |       1       |       3       |       10     |    |           |       2       |       6       |       6      |    |     b       |       2       |      10       |       20     |    |           |       4       |       6       |      null    |    |     b       |       4       |      20       |      null    |    +------------------------------------------------------------+

(i need same next value2 well)

i tried lot come hive query this, still no luck. able write query in sql mentioned here (quassnoi's answer), couldn't create equivalent in hive because hive doesn't support subqueries in select.

can please me achieve this?

thanks in advance.

edit:

query tried was,

select ks, time, val1, next[0] next.val1 (select ks, time, val1        coalesce(        (        select val1, time        mytable mi        mi.val1 > m.val1 , mi.ks = m.ks        order time        limit 1        ), cast(0 bigint)) next  mytable m order time) t2;

your query seems quite similar "year ago" reporting ubiquitous in financial reporting. think left outer join looking for.

we join table mytable itself, naming 2 instances of same table m , n. every entry in first table m attempt find matching record in n same ks value incremented value of time. if record not exist, column values n null.

select      m.ks,      m.time,     m.val1,      n.val1 next_val1,     m.val2,      n.val2 next_val2      mytable m left outer join     mytable n on (     m.ks = n.ks ,      m.time + 1 = n.time );

returns following.

ks  time  val1  next_val1  val2  next_val2   1     1     6          1     7   2     6     6          7     7   3     6     null       7     null b   1     3     10         5     12 b   2     10    20         12    26 b   3     20    null       26    null

hope helps.

Search This Blog

New Mian

hiveql - In hive, how to do a calculation among 2 rows? -

Comments

Post a Comment

Popular posts from this blog

android - java.net.UnknownHostException(Unable to resolve host “URL”: No address associated with hostname) -

jquery - How can I dynamically add a browser tab? -

keyboard - C++ GetAsyncKeyState alternative -