Rewriting slow R function in C++ & Rcpp -

August 15, 2014

i have line of r code:

croppeddna <- completedna[,apply(completedna,2,function(x) any(c(false,x[-length(x)]!=x[-1])))]

what identify sites (cols) in matrix of dna sequences (1 row = 1 seq) not universal (informative) , subsets them matrix make new 'cropped matrix' i.e. rid of columns in values same. big dataset takes 6 seconds. don't know if can faster in c++ (still beginner in c++) me try. idea use rcpp, loop through columns of charactermatrix, pull out column (the site) charactervector check if same. if same, record column number/index, continue columns. @ end make new charactermatrix includes columns. important keep rownames , column names in th "r version" of matrix i.e. if column goes, should colname.

i've been writing 2 minutes, far have (not finished):

#include <rcpp.h> #include <vector> using namespace rcpp; // [[rcpp::export]] charactermatrix reduce_sequences(charactermatrix completedna) {   std::vector<bool> informativesites;    for(int = 0; < completedna.ncol(); i++)   {     charactervector bpsite = completedna(,i);     if(all(bpsite == bpsite[1])     {       informativesites.push_back(i);     }   } charactermatrix cutdna = completedna(,informativesites); return cutdna; }

am going right way this? there easier way. understanding need std::vector because it's easy grow them (since don't know in advance how many cols going want keep). indexing need +1 informativesites vector @ end (because r indexes 1 , c++ 0)?

thanks, ben w.

sample data:

set.seed(123) z <- matrix(sample(c("a", "t", "c", "g", "n", "-"), 3*398508, true), 3, 398508)

op's solution:

system.time(y1 <- z[,apply(z,2,function(x) any(c(false,x[-length(x)]!=x[-1])))]) #    user  system elapsed  #   4.929   0.043   4.976

a faster version using base r:

system.time(y2 <- (z[, colsums(z[-1,] != z[-nrow(z), ]) > 0])) #    user  system elapsed  #   0.087   0.011   0.098

the results identical:

identical(y1, y2) # [1] true

it's possible c++ beat it, necessary?

Search This Blog

New Mian

Rewriting slow R function in C++ & Rcpp -

Comments

Post a Comment

Popular posts from this blog

android - java.net.UnknownHostException(Unable to resolve host “URL”: No address associated with hostname) -

jquery - How can I dynamically add a browser tab? -

keyboard - C++ GetAsyncKeyState alternative -