I am continuing to study, “What happens when we amalgamate rubric data?” Part I is here. This part will consider how to treat rubric data as a sample from a larger population. The same assumptions as in Part I apply: “Questions of accuracy and sampling will be ignored. Student work will be assumed correctly categorized. Issues of inter-rater reliability and the like will be assumed solved and simple random samples will be assumed to have been taken.” I will be using a classical approach and avoid the modern logit and probit orientations which I will leave for Part III. The statistical package R will be used.
A first question would be “How well does the sample data represent the population?” – a question of confidence intervals. The R function, MultinomialCI, based on a paper by Cristina P. Sison and Joseph Glaz for this sample data,
gives these confidence intervals,
which are depicted on this chart.
The function treats the data as simply multinomial without using the ordinal aspect of the data. For such a small sample, n=39, the error is quite large, for instance the rubric data estimates that the student population in the developing category is between 18 and 50 percent.
A second question that could be asked is “How do these students compare to other students?” First I would like to compare before and after data. It might be possible to obtain data on the same students at an earlier time. Here at SOU we compare end of freshman year writing to capstone writing for specific students. To do this we can use the Wilcoxon Rank Sum test for paired data. This is a non-parametric statistical test that takes advantage of the data’s ordinal character. This is the data.
Note the added column the improvement, has After score minus Before score calculated. Using this R command: wilcox.test(badata$Before,badata$After, paired = TRUE,alternative = “less”) I got a p-value of .00002 which indicates that there was improvement.
Finally, it is possible to compare two populations with sample rubric data. This can be done with the Wilcoxon Rank-Sum test. The method essentially ranks all the data and sees if one population has more ranks higher than the other. This is the R command: wilcox.test(badata$Y2014,badata$Y2015, paired = FALSE,alternative = “less”,na.action = na.omit) Using this data,
R gave an approximate p-value of .48. There was no change from 2014 to 2015.
All this is fairly basic and pro forma and leaves out how to discover effects of other variables like gpa or major for example and there are better ways of doing all of it. I have spent my winter break immersed in Rethinking Statistics by Richard McElreath. This is a wonderful book and opened my eyes to the world of Bayesian modeling. I am attempting to build reliable models for modeling rubric data using the software that comes with the text. The process has been exciting and fulfilling. I will report my progress in Part III of this series.