dataset - Filtering data in R (complex) -
i have dataset 7 million records.
i need filter data show 9000 of these.
the first field dmg primary key , take format 1-apr-123456. there 12 occurrences of each dmg value.
another column o_y , takes value of 0 or 1. 0, 1 on 900 occasions.
i return rows same dmg value, @ least 1 of records has , o_y value of 1.
i recommend using data.table doing (fread in data.table quite handy in reading in large data set have enough ram).
i not sure following best way in data.table but, @ least, should started. hopefully, else come along , list idiomatic data.table way this. can think of right now:
assuming data.table called dt , has 2 columns dmg , o_y. use o_y index key dt , subset dt o_y == 1 (dt[.(1)] in data.table syntax). find corresponding dmg values. unique of these dmg values keys.with.ones. succinctly done follows:
setkey(dt, o_y) keys.with.ones <- unique(dt[.(1), dmg][["dmg"]]) next, need extract rows corresponding these values of dmg. need change key dt dmg , extract rows corresponding keys above:
setkey(dt, dmg) dt.filtered <- dt[.(keys.with.ones)] and done. :)
please refer ?data.table figure out better method if possible , let know.
Comments
Post a Comment