[ACCEPTED]-Goodness of fit functions in R-curve-fitting

Accepted answer
Score: 28

Just the first part of that question can 7 fill entire books. Just some quick choices:

  • lm() for standard linear models
  • glm() for generalised linear models (eg for logistic regression)
  • rlm() from package MASS for robust linear models
  • lmrob() from package robustbase for robust linear models
  • loess() for non-linear / non-parametric models

Then 6 there are domain-specific models as e.g. time 5 series, micro-econometrics, mixed-effects 4 and much more. Several of the Task Views 3 as e.g. Econometrics discuss this in more detail. As 2 for goodness of fit, that is also something 1 one can spend easily an entire book discussing.

Score: 11

The workhorses of canonical curve fitting 15 in R are lm(), glm() and nls(). To me, goodness-of-fit 14 is a subproblem in the larger problem of 13 model selection. Infact, using goodness-of-fit 12 incorrectly (e.g., via stepwise regression) can 11 give rise to seriously misspecified model 10 (see Harrell's book on "Regression Modeling 9 Strategies"). Rather than discussing the 8 issue from scratch, I recommend Harrell's 7 book for lm and glm. Venables and Ripley's bible 6 is terse, but still worth a reading. "Extending 5 the Linear Model with R" by Faraway is comprehensive 4 and readable. nls is not covered in these 3 sources, but "Nonlinear Regression with 2 R" by Ritz & Streibig fills the gap 1 and is very hands-on.

Score: 8

The nls() function (http://sekhon.berkeley.edu/stats/html/nls.html) is pretty standard for 8 nonlinear least-squares curve fitting. Chi 7 squared (the sum of the squared residuals) is 6 the metric that is optimized in that case, but 5 it is not normalized so you can't readily 4 use it to determine how good the fit is. The 3 main thing you should ensure is that your 2 residuals are normally distributed. Unfortunately 1 I'm not sure of an automated way to do that.

Score: 6

The Quick R site has a reasonable good summary 3 of basic functions used for fitting models 2 and testing the fits, along with sample 1 R code:

Score: 3

The main thing you should ensure is that 15 your residuals are normally distributed. Unfortunately 14 I'm not sure of an automated way to do 13 that.

qqnorm() could probably be modified to find 12 the correlation between the sample quantiles 11 and the theoretical quantiles. Essentially, this 10 would just be a numerical interpretation 9 of the normal quantile plot. Perhaps providing 8 several values of the correlation coefficient 7 for different ranges of quantiles could 6 be useful. For example, if the correlation 5 coefficient is close to 1 for the middle 4 97% of the data and much lower at the tails, this 3 tells us the distribution of residuals is 2 approximately normal, with some funniness 1 going on in the tails.

Score: 2

Best to keep simple, and see if linear methods 12 work "well enuff". You can judge your goodness 11 of fit GENERALLY by looking at the R squared 10 AND F statistic, together, never separate. Adding 9 variables to your model that have no bearing 8 on your dependant variable can increase 7 R2, so you must also consider F statistic.

You 6 should also compare your model to other 5 nested, or more simpler, models. Do this 4 using log liklihood ratio test, so long 3 as dependant variables are the same.

Jarque–Bera 2 test is good for testing the normality of 1 the residual distribution.

More Related questions