dataset - Filtering data in R (complex) -

April 15, 2012

i have dataset 7 million records.

i need filter data show 9000 of these.

the first field dmg primary key , take format 1-apr-123456. there 12 occurrences of each dmg value.

another column o_y , takes value of 0 or 1. 0, 1 on 900 occasions.

i return rows same dmg value, @ least 1 of records has , o_y value of 1.

i recommend using data.table doing (fread in data.table quite handy in reading in large data set have enough ram).

i not sure following best way in data.table but, @ least, should started. hopefully, else come along , list idiomatic data.table way this. can think of right now:

assuming data.table called dt , has 2 columns dmg , o_y. use o_y index key dt , subset dt o_y == 1 (dt[.(1)] in data.table syntax). find corresponding dmg values. unique of these dmg values keys.with.ones. succinctly done follows:

setkey(dt, o_y) keys.with.ones <- unique(dt[.(1), dmg][["dmg"]])

next, need extract rows corresponding these values of dmg. need change key dt dmg , extract rows corresponding keys above:

setkey(dt, dmg) dt.filtered <- dt[.(keys.with.ones)]

and done. :)

please refer ?data.table figure out better method if possible , let know.

Search This Blog

New Mian

dataset - Filtering data in R (complex) -

Comments

Post a Comment

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

node.js - Getting the socket id,user id pair of a logged in user(s) -

keyboard - C++ GetAsyncKeyState alternative -