Rewriting slow R function in C++ & Rcpp -


i have line of r code:

croppeddna <- completedna[,apply(completedna,2,function(x) any(c(false,x[-length(x)]!=x[-1])))] 

what identify sites (cols) in matrix of dna sequences (1 row = 1 seq) not universal (informative) , subsets them matrix make new 'cropped matrix' i.e. rid of columns in values same. big dataset takes 6 seconds. don't know if can faster in c++ (still beginner in c++) me try. idea use rcpp, loop through columns of charactermatrix, pull out column (the site) charactervector check if same. if same, record column number/index, continue columns. @ end make new charactermatrix includes columns. important keep rownames , column names in th "r version" of matrix i.e. if column goes, should colname.

i've been writing 2 minutes, far have (not finished):

#include <rcpp.h> #include <vector> using namespace rcpp; // [[rcpp::export]] charactermatrix reduce_sequences(charactermatrix completedna) {   std::vector<bool> informativesites;    for(int = 0; < completedna.ncol(); i++)   {     charactervector bpsite = completedna(,i);     if(all(bpsite == bpsite[1])     {       informativesites.push_back(i);     }   } charactermatrix cutdna = completedna(,informativesites); return cutdna; } 

am going right way this? there easier way. understanding need std::vector because it's easy grow them (since don't know in advance how many cols going want keep). indexing need +1 informativesites vector @ end (because r indexes 1 , c++ 0)?

thanks, ben w.

sample data:

set.seed(123) z <- matrix(sample(c("a", "t", "c", "g", "n", "-"), 3*398508, true), 3, 398508) 

op's solution:

system.time(y1 <- z[,apply(z,2,function(x) any(c(false,x[-length(x)]!=x[-1])))]) #    user  system elapsed  #   4.929   0.043   4.976  

a faster version using base r:

system.time(y2 <- (z[, colsums(z[-1,] != z[-nrow(z), ]) > 0])) #    user  system elapsed  #   0.087   0.011   0.098  

the results identical:

identical(y1, y2) # [1] true 

it's possible c++ beat it, necessary?


Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

node.js - Getting the socket id,user id pair of a logged in user(s) -

keyboard - C++ GetAsyncKeyState alternative -