## Rubrics as Data – Part II

I am continuing to study, “What happens when we amalgamate rubric data?”  Part I is here.  This part will consider how to treat rubric data as a sample from a larger population. The same assumptions as in Part I apply: “Questions of accuracy and sampling will be ignored.  Student work will be assumed correctly categorized.  Issues of inter-rater reliability and the like will be assumed solved and simple random samples will be assumed to have been taken.”  I will be using a classical approach and avoid the modern logit and probit orientations which I will leave for Part III.  The statistical package R will be used.

A first question would be “How well does the sample data represent the population?” – a question of confidence intervals. The R function, MultinomialCI, based on a paper by Cristina P. Sison and Joseph Glaz for this sample data,

Summary Rubric Data

gives these confidence intervals,

Confidence Interval Table

which are depicted on this chart.

Sample Rubric Data Histogram with Error Bars.

The function treats the data as simply multinomial without using the ordinal aspect of the data.  For such a small sample, n=39, the error is quite large, for instance the rubric data estimates that the student population in the developing category is between 18 and 50 percent.

A second question that could be asked is “How do these students compare to other students?”  First I would like to compare before and after data.  It might be possible to obtain data on the same students at an earlier time.  Here at SOU we compare end of freshman year writing to capstone writing for specific students.  To do this we can use the Wilcoxon Rank Sum test for paired data.  This is a non-parametric statistical test that takes advantage of the data’s ordinal character.  This is the data.

Before and After Rubric Data Table

Note the added column the improvement, has After score minus Before score calculated.  Using this R command:  wilcox.test(badata\$Before,badata\$After, paired = TRUE,alternative = “less”) I got a p-value of .00002 which indicates that there was improvement.

Finally, it is possible to compare two populations with sample rubric data.  This can be done with the Wilcoxon Rank-Sum test.  The method essentially ranks all the data and sees if one population has more ranks higher than the other.  This is the R command: wilcox.test(badata\$Y2014,badata\$Y2015, paired = FALSE,alternative = “less”,na.action = na.omit)  Using this data,

Unpaired Rubric Data

R gave an approximate p-value of .48.  There was no change from 2014 to 2015.

All this is fairly basic and pro forma and leaves out how to discover effects of other variables like gpa or major for example and there are better ways of doing all of it.  I have spent my winter break immersed in Rethinking Statistics by Richard McElreath. This is a wonderful book and opened my eyes to the world of Bayesian modeling. I am attempting to build reliable models for modeling rubric data using the software that comes with the text.  The process has been  exciting and fulfilling.  I will report my progress in Part III of this series.

Posted in Math Explorations, Teaching | Tagged , | Leave a comment

## A Thought on Living Forever

This came to me as I was reading a profile of Derek Parfit, recently deceased, in the New Yorker, specifically this quote.

My death will break the more direct relations between my present experiences and future experiences, but it will not break various other relations.

When we die our immediate relationships are broken yet our influence on other people and for that matter the material universe lives on through memories people have of us and the artifacts we created, such artifacts, for example, as the words in this blog, the furniture I have made, the stone house I built.

It occurred to me that as teachers we are creating memories of ourselves in others whether we like it or not.  These memories will become part of our students’ persons, generally a very small part and be passed on marginally to their children. This is consoling in one way but should also imbue us, teachers, with a sense of responsibility. I recall this sentence by Gilbert Highet,

It is a serious thing to infer with another man’s life.

Posted in Teaching | Leave a comment

## Erroneous Proof Changed History

Well, maybe.  I learned of the existence of an alternate model for quantum behavior as I followed the story of a NASA test propulsion system that defies Newton’s Third Law.  The idea named Bohmian mechanics is based on something called pilot-wave theory and has been around since the time of Louis de Broglie in the 1920’s.  Ever since it has popped up occasionally, usually because some  counter-intuitive results predicted by  the standard Copenhagen model.

In 1932 John Von Neumann, the famous and formidable mathematician,  “proved” that a result of Bohmian mechanics was impossible thus putting the theory to bed for, it turns out, 30 or so years at which time John Stewart Bell found errors in Von Neumann’s proof.  A few more details to this story can be found here.

The title of this post expresses in some way a triviality.  Any action in the present changes the course of events in the future – see every time-traveling science fiction movie.  It’s just that we cannot know the magnitude of the resultant ripples in time. Yet, if the alternate theory of quantum behavior has been studied and deepened over that gap of more than 30 years might we not have other wonderful inventions like electromagnetic drive (if it works?)

Posted in Math and Me, Rants | | Leave a comment

## Making Space in My Brain

On the Wednesday, after election Tuesday, I took a decisive step.  My mental health was in jeopardy.  I would fight to eliminate a habit – a preoccupying, distracting, futile fifty year old habit.

My students tell me that I have never grown up.  I guess not. Childlike,I am still trying to make sense of my world.  I read about economics to understand how money makes the world go round.  I built a my own house and others and gained a sense of the physical world and how things are put together.  I read chemistry to understand what makes matter matter.  I read about cosmology and quantum science  to understand my place in the universe.  I obsessively read about politics and government to understand how people work together to thrive, or not.  This last, no more.

NPR news will no longer wake me in the morning or play in the background as I eat breakfast or wash the dishes.  Books on tape and silence will reign in my Camry.  I will scan headlines but read no further.  The opinion page will be skipped.  I will resist to the best of my capacity political discussions with friends and strangers alike.  I have suffered through too many dysfunctional wrong-headed presidencies.  I must fight my way out of this depressive obsession with politics.

This effort will (has) created space in my brain that I need to fill with non-political thoughts. The saying goes “The devil makes work for idle hands.”  It is also true that “the devil makes work for idle minds.”  I need new things to think about.

So I am memorizing lines to a play.  I am being very intentional about thinking about the next day’s lectures as I go to bed.  I am learning a new (for me) way of doing statistics and plan to do all the R exercises in the amazing, Rethinking Statistics by Richard McElreath.  I am ordering more math books to read.  I am studying more Go games and reading more about playing bridge.  I already feel freer.

The last stanza of The Jefferson Airplane’s White Rabbit seems appropriate to the times. But I will feed my head not with LSD but by learning and using new ideas, by disciplining my thinking, and as always taking care of the ones I love.

Posted in Rants, Teaching | Tagged | Leave a comment

## Rubrics as Data – Part I

Rubrics, the detailed categorization of student work, have become a common teaching and grading tool.  Here is one designed to assess college-level writing.  The idea is to provide specificity and a sense of progress.  The student and the teacher have detailed descriptions of the various levels of accomplishment so that expectations and grading assessments are better understood and ways to improve easily seen. Rubrics are a highly effective tool for teacher-student communication.

The question to be addressed here is, “What happens when we amalgamate rubric data?”  The idea is that we want to use rubrics to get a sense of the overall student performance  of a class or a cohort.  In this post, questions of accuracy and sampling will be ignored.  Student work will be assumed correctly categorized.  Issues of inter-rater reliability and the like will be assumed solved and simple random samples will be assumed to have been taken.

Consider this data we gathered for an institutional assessment of student writing.  We took a random sample of thirty-nine instances of senior writing.  A particular attribute resulted in this frequency table and this bar graph.

Summary Rubric Data

What do we make of it? We have four categories that measure increasing competency.  We ignore the possibility that the very names of the categories influenced the assessments. The scale – beginning, developing, accomplished, and exemplary – is an ordinal scale, in this case, a person rated accomplished is presumed to have higher skills than one rated developing or rated beginning, for example.  The statistical descriptor for this type of information is  ordinal categorical data.  I note at this point that my reading of Analysis of Ordinal Categorical Data, 2nd by Alan Agresti informs these views and it is not at all impossible that I misinterpreted what I read.

At this stage a natural step would be to replace the descriptive categories – beginning, developing, accomplished, and exemplary – with numbers, say 1, 2, 3 and 4.  Oops, we have just taken our first step onto a slick downward sloping glacier.  Lose our balance and we will end up in the rocks far below.

East Side of Mt. Shasta (*)

The numbers 1,2,3,4 do preserve the ordering. 3 is “better” than 2 which is “better” than 1, etc.  But they deceive.  Is a person whose level is 4 twice as skilled as a person rated a 2?  In terms of the original categories is a person rated “accomplished” twice as skilled as a person rated “beginning”?  Or is the difference in competency between the 2 and 3 levels the same as the difference in competency between the 3 and 4 levels?  In other words, is the difference in the ability of a person rated “developing” and a person rated “accomplished” the same difference in ability of a person rated “accomplished” and a person rated “exemplary?  By translating statements from numerical ratings back to the descriptive ratings, we get nonsense.  What happened?  We started to treat the numerical categories as if the numbers had the normal meaning, for example 4-3 = 3-2 = 1.  Same difference numerically, near nonsense in the original language of the categories.  In statistics talk, we have ordinal data, not interval data.  The categories are not equally split and a natural zero does not exist.

Let’s take another step onto the glacier.  The average score is 2.73.  This treats the data like regular numbers with consistent intervals and ratios which they do not have.  For instance maybe “exemplary” skills are five times “better” than “beginning” skills whatever that means.  Thinking like this is particularly hard to resist here in academia.  We do it all the time.  We take ordinal data, A, A-, B+, etc, assign numbers for the grades, 4, 3.7, 3.3, etc, weight by the number of units taken and calculate GPA to two decimal places. We use this faux number to make financial aid, scholarship, athletic eligibility, and hiring decisions.

Can anything be done?  Dr. Agresti gives a few ways to assign numbers to the ordered categories that use the underlying proportions or assumptions about the data.  This approach allows one to more easily speak of odds ratios and yields possibilities for sophisticated analysis. Instead let’s go back to the data.  Would the calculation of a median help us characterize the data?  With so few categories, the median is not much help.  The median in the above data is “accomplished.  So half of the students are “accomplished” or lower and half are “accomplished” or higher.  Not a lot of new information there.  The mode is easy to see and it might be helpful to say the greater proportion of the students are rated “accomplished.”

Going back to the numerical assignment of categories, maybe we just didn’t choose the ‘right’ scale.  For instance our sense of the difference of skills when we actually measured them might have made us think that exemplary students are really, really good and the rating scale should be 1,2,3,5.  The result would be this bar graph with a big hole in it.

New Scale – 1,2,3,5

This might make sense if we have some reason to think we have a really good sub-population, but our experience tells us that the population of students usually have a continuum of skills.  So the hole is an artifact.  Maybe half of the 5-rated students are really 4’s and half the 3-rated students should have been rated 4’s.  If so we get this bar graph.

Numerical Scale with Hole “Filled” In

We are on a slippery slope, manipulating the data post hoc.  We will need a lengthy justification for any particular choice among the large variety of possibilities.

Yet there must be something there.  Another way to think about the rubric data is as a rough approximation of an underlying latent (to use Dr. Agresti’s term) variable.  It is possible if we could measure a trait with microscopic precision that we could get an x-axis with meaningful units.  As with most measurements of complex human traits we would assume the distribution would be normal.  Our bar graphs would turn into histograms and would estimate the situation according to this rough sketch.

Normal Curve Overlaying a Histogram

A common objection to the use of an x-axis that extends to infinity is that the skill measured is limited and bounded below.  Yet it is not too hard to think of a very, very, very low skilled student or for that matter a student with astonishing high skills – not infinite but far enough out there to establish a good approximation for a normal curve.

This idea can work quite well for symmetric mound-shaped histograms as this example shows.

Symmetric Histogram with Normal Curve

Here the category percentages match the latent variable distribution.

What happens with less symmetrical data?  We get a less accurate match as in this example.

Less Symmetrical Histogram with Normal Curve

The diagram challenges us to explain the lack of symmetry.  Was it the rubric?  Was it our interpretation of the rubric?  Maybe the population really displays that asymmetry?

Enough of the slippery slope.  Forcing a numerical scale on our ordinal categorical data and using these numbers to generate numbers like mean and standard deviation should be questioned.  It could lead us to an erroneous conclusion and at minimum should make us humbly uncertain.  If the data is bell-shaped, less so.   Part II will explore how to draw statistical valid conclusions from aggregate sample rubric data.

#### (*)  I have direct knowledge of a slippery slope on Mt. Shasta.  Many years ago I climbed through sand and talus up the west side of the mountain to a ridge overlooking one of the depicted snow fields.  I took one step out onto the slick packed surface and started sliding – no ice ax, no crampons.  There happened to be a rock outcropping about 150 feet below me. I managed to control my slide and avoided tumbling head over heels.  Those rocks stopped me from plunging 2000 feet.  I had to carefully kick-step my way back up to the ridge.  If the outcropping hadn’t been there, this post would not have been written.

Posted in Math Explorations, Rants, Teaching, Uncategorized | Tagged , | Leave a comment

## Chasing Infinity – 3D Version

The frequency of my blog posts have been diminishing more and more, or should I say, the wave length has been getting longer.  I attribute this mostly to my accession as math department chair.  I just haven’t had the time and/or mental energy to do many math explorations even in the summer. I am also playing less golf.  End of excuses.

Four years ago, I explored what happens at “infinity” for plane curves.   This summer I was curious about what happens at infinity for three dimensional surfaces.  The original idea was to project a curve onto a sphere by placing a plane containing the curve tangent to a sphere, projecting a line from a point on the curve through the center of the sphere and noting where the line intersected the surface of the sphere, effectively sketching the curve on the surface of the sphere like so,

Project a Parabola onto a 3D sphere

The sketch on the sphere looks like this.  Note the two symmetric curves.  This was just to make the later 2D projection easier.

Parabola Sketched on 3D Sphere

Now project the curve on the sphere onto a plane.

Parabola Sketched on 3D Sphere Projected onto the Projective Plane

The disk is called the projective plane and opposite points on the edge are identified – are the same. To get a good look at what happens at infinity just rotate the sphere and project.

Parabola on 3D Sphere Rotated and Projected on Projective Plane

So why not try this one dimension up.  Take a 3D figure, place it tangent to a 4D sphere, “sketch” the figure on the surface of a 4D sphere, and then project it in 3D. In theory this is not hard – just add a dimension.  So I wrote a Mathematica  script, built a crude 4D rotater and took a look.  Thus a paraboloid in 3D.

Paraboloid in 3D

Now projected onto the surface of a 4D sphere and projected in 3D.

Paraboloid Projected on 4D Sphere

This is what I expected.  Since any cross-section of the paraboliod is parabola, all the cross-sections should “go” to the same point at infinity as shown. In this case, no rotation was needed.  Note the distortion of the grid lines. They will help later to figure out what is going on.

A natural next shape to explore is the hyperbolic paraboloid or saddle.

Hyperbolic Paraboloid in 3D

Unrotated it looks like this.

Hyperbolic Paraboloid Projected on 4D Sphere

Those two points at infinity should be able to be identified. Remember a line’s “ends” are the same point at infinity in the projective plane. The figure should be able to be rotated so they are (look like) the same point.  This took some experimenting but here it is.

Hyperbolic Paraboloid Projected on 4D Sphere Rotated

That is cool.  A cone is next.

Cone in 3D

This looks a little odd.  The important idea is that the surface extends to infinity along radial lines extending to different points of infinity.

Cone Projected on 4D Sphere

The gird lines can be seen bunching up on the rim at all the different points of infinity.  No matter how I try I shouldn’t be able to rotate the figure so that any of the points identify.  I couldn’t.  Here is an attempt.

Cone Projected on 4D Sphere Rotated

I explored a few other surfaces to test my understanding.  Here is a plane.

Plane in 3D

Unrotated, all the points at infinity can be seen on the rim.

Rotating just resulted in distortion.

Plane Projected on 4D Sphere Rotated

A parabolic surface looks like this in 3D.

Parabolic Surface in 3D

Projected in 4D like this.

Parabolic Surface Projected on 4D Sphere

The infinity point shown is where all the parabolas converge, but all the horizontal grid lines should also converge.  I managed to rotate the figure to show this.  The point indicated is the “horizontal” infinity.

Parabolic Surface Projected on 4D Sphere Rotated

Lastly, here is a cubic surface.

Cubic Surface in 3D

Unrotated in 4D it looks like this.

Cubic Surface projected on 4D sphere

Those points should be able to be identified.

Cubic Surface Projected on 4D Sphere Rotated

At the end of this exercise I was just beginning to get a feel for rotating objects in four dimensions.  The shapes are aesthetically interesting and sometimes unexpected.

Posted in Cool Ideas, Math Explorations | Tagged , | 1 Comment

## Outcome Switching – What Golfers Know

Last Friday a rare event occurred at our golf course.  My foursome all sank long putts for birdies on the second hole.  What a great feeling and I could go on and on in detail but I won’t.  Anyway we get back to the clubhouse eager to relate our story and all we hear is  a collective, tepid “That’s interesting.”  What is up with that?

My theory is that golfers understand that rare events happen often.  They can’t predict which ones but they know they happen – a line drive tee shot hits the flagpole and drops in, a drive headed out-of-bounds hits a tree and lands in the middle of the fairway, or a ball hits a sprinkler head and bounds over the green.  They have seen it all.  Rare events happen.  Andrew Gelman points us to an article on Vox that discusses outcome switching in medical research.  The idea is,  you design a study to test if a drug works.  It doesn’t according to the statistical measures you set up, so you look for other outcomes.  Maybe it worked on a sub-population or patients lost weight or something and then you publish results for that outcome instead.  You switched outcomes after the fact.  Of course you are going to find something.  Rare events happen.  Golfers know that and don’t get too excited.  Apparently some medical researchers don’t and do.

Posted in Popular Press Explorations, Rants | | Leave a comment