Bay Area useR Group (R Programming Language) Message Board Bay Area useR Group (R Programming Language) Discussion Forum › Difference in Values Calculated in R and SAS

Difference in Values Calculated in R and SAS

Mark B.
user 80821562
San Francisco, CA
Post #: 1
Hello all,

I am new so be kind. I ran SAS last year to conduct a statistical analysis involving a Kruskal-Wallis Test. This year I drank the "Open Source Kool-Aid" and am doing the same thing this year in R. I ran a Kruskal-Wallis on the same data set with SAS and R and got these summary results:

Kruskal-Wallis Chi-Square = 11.809 and p = 0.0006

Kruskal-Wallis Chi-Square = 12.260 and p = 0.0022

Does anyone know how R and SAS do these calculations differently and what is the difference? I believe the difference is too great to just be rounding error.

Any ideas?


chris b.
user 10302866
San Carlos, CA
Post #: 6
please post the name of the procedure you used in SAS and in R and consider posting the data itself. First verify that both procedures are using the identical number of observations. You should verify that the calculation formula are identical. Given that the expected value of a chi square equals its degrees of freedom, the difference you have is about 1 . . So my first speculation is that you don' t have the same number of observations in each dataset. Kruskal wallis is relatively easy to compute by hand, and you could do the calculation by hand and determine which procedure is giving the correct value.
Mark B.
user 80821562
San Francisco, CA
Post #: 2
Hello Chris B. and other interested readers,

The name of the procedure in SAS was NPAR1WAY.
The name of the procedure in R was Kruskal.Test.

When I was preparing the data to upload to this site, I stripped out all the other variables that we did not need to do this comparison test. When I did that I ran the data set in SAS and R, they produced the same value of Kruskal-Wallis Chi-Square and p values. The problem was solved.

The difference was that I had a different missing value indicator this time "NA" and "." last time. So, the way SAS was using missing values was different from R. Now, that I am using "NA" there is not any difference in the calculated values.

Chris, it was your suggestions that caused me to do the test that assisted me in solving the problem. So, thanks!


Powered by mvnForum

Our Sponsors

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy