dataset - Filtering data in R (complex) -
i have dataset 7 million records.
i need filter data show 9000 of these.
the first field dmg
primary key , take format 1-apr-123456. there 12 occurrences of each dmg
value.
another column o_y
, takes value of 0 or 1. 0, 1 on 900 occasions.
i return rows same dmg
value, @ least 1 of records has , o_y
value of 1.
i recommend using data.table
doing (fread
in data.table quite handy in reading in large data set have enough ram).
i not sure following best way in data.table but, @ least, should started. hopefully, else come along , list idiomatic data.table way this. can think of right now:
assuming data.table called dt
, has 2 columns dmg
, o_y
. use o_y
index key dt
, subset dt
o_y == 1
(dt[.(1)]
in data.table syntax). find corresponding dmg
values. unique
of these dmg
values keys.with.ones
. succinctly done follows:
setkey(dt, o_y) keys.with.ones <- unique(dt[.(1), dmg][["dmg"]])
next, need extract rows corresponding these values of dmg
. need change key dt
dmg
, extract rows corresponding keys above:
setkey(dt, dmg) dt.filtered <- dt[.(keys.with.ones)]
and done. :)
please refer ?data.table figure out better method if possible , let know.
Comments
Post a Comment