## Rubrics as Data – Part IV

Rubric data is ordered categorical data and, as such, can not be used to find averages or other numerical statistics.  See this post for details.  And yet, our (my) instincts are that measures of human behavior must lie on some sort of numerical continuum and be normally distributed.  This is the premise of the following exploration.

The idea is that the measured underlying behavior could be ideally rated on a continuous scale and that our rubric criteria divides the scale into intervals.  Thus, in these figures the underlying behavior is depicted as the blue normal curve and our rubric divides the number line into intervals by establishing cut-points, the black dots.

Possible Histograms with Different Cut-points

The histogram thus represents the frequencies in each interval.  As can be seen, the rubric assigning D, C, B and A could break up the line unevenly in many different ways.  The question becomes, can we recover the parameters of the blue line from the given histogram?  The answer will be yes – if we can tolerate some error.

My research scheme was therefore, after setting a normal curve (mean [mu] and standard deviation [sd]), to define a set of cut-points and generate a histogram. Then use the cut-points to get numerical values representing the categories.  Use the R function, fitdistr(), to recover mu and sd.  Repeat 10,000 times.  Change the cut-points and try again.

Since the scale is arbitrary, I chose it to be the standard deviation of the given normal distribution which fixed to be one.  I used a mu of 2.5 since I have been addressing a rubric scale of 1, 2, 3 and 4.  The problem became: How does varying the cut-points affect the recovery of the mean?  The answer was that we can get fairly close.

The key was devising a way to attach numerical values to each category.  Let d1, d2 and d3 be the cut-points.  Then the numerical value of the two inner bars was assigned to be (d1+d2)/2 and (d2+d3)/2.  Define le as the “limit” of the left end bar.  In point of fact, the continuum extends infinitely in the left direction but I set it to be d1-sd*(1.5), one and a half standard deviations to the left of d1. The value for the leftmost bar became (le+d1)/2.  Similarly I set re to be d3+sd*(1.5), and the value for the rightmost bar to be (d3+re)/2.

Other details.  The histogram represents a sample size of 100.  I used the R function fitdistr() to find a parameters for a normal distribution since I knew where the data came from.  For that reason I did not do any normality checks.  fitdistr() actually gives error bounds but I preferred to just do 10,000 samples.

The following table has the results of the experiments.  Note that even with a fairly wide variation of cut-points, we still get close to the underlying mu of 2.5.

Table of Recovered Mu with Varied Cut-points

From the table, it looks like a reasonable scheme to compute a mean for the underlying distribution on an arbitrary scale could be devised.  First devise a  set of cut-points using a scale that you want for the standard deviation.  Calculate representative numbers for each category.  Pass the information to fitdistr() and report mu with the error.  For example here is the result for some data I have been looking at.

The cut-points were calculated assuming the expected results that 70% of our seniors would lie in the two rightmost categories and that 7.5% would be in each of the outermost categories.  When simulated this yielded a mu of 2.56.  In other words the crude method I used to calculate values for the categories yielded a slight plus average.  From the table, I think it is fair to say that in cases 4, 6 and 7 our students did not meet our expectations and in case 5 they exceeded our expectations.

In sum, this exploration turned out to be a pleasant diversion from my intention to model rubric data.  At best the results offer a way of quantifying the results with one number that has more justification than just taking an average of rubric numbers.

A note on the method.  I built my R program to read and write from a CVS file.  This allowed me to set the experimental values on a line with comments and get back on the same line the results including the file name of the associated graph.