r - How to know if a regression model generated by random forests is good? ( MSE and %Var(y)) -


i tried use random forests regression. original data data frame of 218 rows , 9 columns. first 8 columns categorical values ( can either a, b, c, or d), , last column v9 has numerical values can go 10.2 999.87.

when used random forests on training set, represents 2/3 of original data , randomly selected, got following results.

>r=randomforest(v9~.,data=trainingdata,mytree=4,ntree=1000,importance=true,do.trace=100)        |      out-of-bag   |   tree |      mse  %var(y) |    100 | 6.927e+04    98.98 |    200 | 6.874e+04    98.22 |    300 | 6.822e+04    97.48 |    400 | 6.812e+04    97.34 |    500 | 6.839e+04    97.73 |    600 | 6.852e+04    97.92 |    700 | 6.826e+04    97.54 |    800 | 6.815e+04    97.39 |    900 | 6.803e+04    97.21 |   1000 | 6.796e+04    97.11 | 

i not know if high variance percentage means model or not. also, since mse high, suspect regression model not good. idea how read results above? mean model not good?

like @joran told, %var amount of total variance of y explained random forest model. after adjust, apply model validation data (1/3 remain):

rfestimated = predict(r, data=validationdata) 

it interesting check residual:

qqnorm((rfestimated - validationdata$v9)/sd(rfestimated-validationdata$v9))  qqline((rfestimated-validationdata$v9)/sd(rfestimated-validationdata$v9)) 

the estimated versus observed values:

plot(validationdata$v9, rfestimated) 

and rmse:

rmse <- (sum((rfestimated-validationdata$v9)^2)/length(validation$v9))^(1/2) 

i hope help!


Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

keyboard - C++ GetAsyncKeyState alternative -

android - java.net.UnknownHostException(Unable to resolve host “URL”: No address associated with hostname) -