r - How to know if a regression model generated by random forests is good? ( MSE and %Var(y)) -
i tried use random forests regression. original data data frame of 218 rows , 9 columns. first 8 columns categorical values ( can either a, b, c, or d), , last column v9 has numerical values can go 10.2 999.87.
when used random forests on training set, represents 2/3 of original data , randomly selected, got following results.
>r=randomforest(v9~.,data=trainingdata,mytree=4,ntree=1000,importance=true,do.trace=100) | out-of-bag | tree | mse %var(y) | 100 | 6.927e+04 98.98 | 200 | 6.874e+04 98.22 | 300 | 6.822e+04 97.48 | 400 | 6.812e+04 97.34 | 500 | 6.839e+04 97.73 | 600 | 6.852e+04 97.92 | 700 | 6.826e+04 97.54 | 800 | 6.815e+04 97.39 | 900 | 6.803e+04 97.21 | 1000 | 6.796e+04 97.11 |
i not know if high variance percentage means model or not. also, since mse high, suspect regression model not good. idea how read results above? mean model not good?
like @joran told, %var amount of total variance of y explained random forest model. after adjust, apply model validation data (1/3 remain):
rfestimated = predict(r, data=validationdata)
it interesting check residual:
qqnorm((rfestimated - validationdata$v9)/sd(rfestimated-validationdata$v9)) qqline((rfestimated-validationdata$v9)/sd(rfestimated-validationdata$v9))
the estimated versus observed values:
plot(validationdata$v9, rfestimated)
and rmse:
rmse <- (sum((rfestimated-validationdata$v9)^2)/length(validation$v9))^(1/2)
i hope help!
Comments
Post a Comment