I first worked my way steadily through Statistical Rethinking by Richard McElreath entering all the examples in R and learning about Bayesian modeling. The examples seemed a little arcane and the golem metaphor a little off-putting but by the end (nearly the end) I felt that I could build and explore simple Bayesian models. I was empowered. And what better way to use my newly-developed skills than to work on a rubric as data model. I set about building the model as described in Section 11.1 Ordered Categorical Outcomes. What could go wrong?

I made MAP (maximum a posteriori) and a stan (probabilistic programming modeling language) models using my example from the last posts – 3 *1* scores, 13 *2* scores, 15 *3* scores, and 8 *4* scores. When I did the sampling from the models, I got the occasional nonsense which I attributed to having my cut points (the points separating the scores (logit) ) (technical details will be omitted) getting out of order. Now what? I quote Dr. McElreath’s text, “As always, in small sample contexts, you’ll have to think harder about priors. Consider for example that we know alpha 1 < alpha 2, before we even see the data” (page 335). So that was my problem.

How to fix them. Before that, I need to know that I had “fixed” it. Here is how I graphed the sampled data from my models.

I sorted the samples by goodness-of-fit to the original data. The distribution histogram is in the upper left corner. I moved down the sorted list so that I had 50 evenly spaced samples and plotted them on the large bar graph to give a sense of the variability. I plotted the distribution for each score below the central bar graph as proof I could calculate prediction intervals.

I could now see if anything went wrong, like in this graphic for a small sample.

My object became to remove the odd looking distributions in the middle of the picture. I began by making reasonable changes to the model. Reasonable to me but, I eventually figured out, incomprehensible to the the shell that Dr. McElreath built to simplify his exposition. This took a while. I then plunged into the R literature. This was difficult. I ended up tracking through threads, reading non-answers and references manuals with few and poorly remarked examples. In the end I decided to go with a purely stan model. At this point I got discouraged. I was fighting picky syntax and the only help (stackflow) always seemed to avoid direct answers. The responders seemed more interested in criticizing the asker’s model or suggesting a better model. Please just help us build the model no matter how wrong it is. Also have a easily located place where we can ask about models not the code. I got discouraged and quit.

A month ago, under the influence of excess coffee, I found the optimism to start again on the stan model. It worked. The use of the ordered data type did the trick and away I went. Here is the result.

Now I can use posterior sampling to get prediction intervals. For instance model in the graph above, The 95% prediction intervals are

My next step is to explore stratified sampling (say sample results from each major) using partial pooling.

]]>

The analog to course planning holds. We design a series of small incremental learning opportunities that, if followed, get our students up the mountain of knowledge. Each step is “easy.” Yet, requires effort. If a student falters (stops pedaling) they stop or slide backwards. By working steadily, they will achieve the goals of the course. That is what the quizzes, homework, tests and projects are all about.

]]>What I want to address here is making space for what I term “non-conscious deep work.” Proving theorems and other creative endeavors typically pass through the zen stages: Immersion, Incubation, and Illumination. Often illumination, the “aha” moment, seemly comes out of the blue, commonly attributed to brain work at the subconscious level. Thus the problem is how to make space in one’s subconscious musings for deep work goals.

This issue occurred to me on my walking commute. Walking to work gives me time, 20 minutes, to get ready for the day – going over my schedule, reviewing the concepts I will be teaching, working on phrasing/explanations, setting goals for meetings, etc. The walk home (uphill thus also getting in my exercise for the day) allows me to process the day’s events. I arrive ready to pay attention to my home life and have a pleasant dinner. So what am I processing as I trudge up Holly Street? Most of my day has been spent interacting with human beings. I am primarily a teacher and also department chair and necessarily have been striving to connect to students and colleagues on a human level. These interactions leave plenty to chew over, first consciously then subconsciously. I know this is true by noting my waking thoughts at three in the morning.

I also like to do deep work, recognizing that one person’s deep work is another person’s trivial pursuit. I do this for pleasure (My career doesn’t depend on it) and by compulsion. As such my deep work has all the characteristics: immersion, incubation, and illumination, and I have been known to wake up in the morning with the solution or argument I was looking for.

So, my non-conscious deep work – do I have a choice between processing the events of the day or other academic mundanities, or working on a math problem? I actually try to prime my brain before I close my eyes. The intention is to drowned out any stressful issues of the day with more neutral and fun subject matter. This works for getting to sleep but my thoughts upon waking have little correlation to my going-to-sleep thoughts.

The point is that I seemly don’t have control over my non-conscious deep work. Other aspects of my life, family and work, require the same mental space and my deep work suffers. To put it another way, no matter how disciplined I am about organizing my day, I have little control over how my subconscious organizes my night.

]]>Removing the “I” and the “You” of it makes the inevitable (hopefully small) failures of both teacher and student less personal. It places the student and teacher together in a system, “The System”, if you like, attempting to do our jobs, fulfilling our roles, changing our brains – both of us.

]]>I found it on Andrew Gelman’s blog here. The red areas are based on an alpha (probability of rejecting a null hypothesis) of 0.05. A recent paper (preprint) endorsed/authored by a fat paragraph of statisticians recommends an alpha of .005 for “new discoveries.” I wanted to know how Dr. Gelman’s graph would change.

First here is my rendition of the original:

This is a normal curve with mean 2 and standard deviation (standard error) 8.1. It represents the true state of the world and is being compared to the null hypothesis with mean 0 and standard deviation 8.1. Assuming the null hypothesis with alpha of 0.05, the rejection (in the red) areas are set at plus or minus 1.96 times 8.1. Effect size here is the mean of the experiment divided by the standard error, in this case 2/8.1 around .25. The other information on the graph comes from a slight modification of the R code in this article. Power is the probability that an effect will be detected, that is, that we land in the red areas. This probability is .057 or 6 percent. Though the “real” mean is positive (2), 24% of the red area is to the left and negative. Thus the type S error. The exaggeration is got by simulating the process with mean 2 and standard deviation 8.1 ten thousand times and averaging the absolute values of those estimates that land in the red areas. Here the exaggeration ratio is nine.

Next I created this graph using the same methods with alpha = .005.

The rejection areas are barely visible. Finding a small effect using this method will occur much less often but if we do “find” it, there will still be a fair change of getting the sign wrong and effect will be much exaggerated.

All this is to say that it is hard to find a small effect in noisy data and if you do, the results will be deceptive. This is why scientific experimenters spend much of their time controlling for and eliminating extraneous variables. And yet sometimes discovering small effects can be important. Imagine discovering a drug that cures 0.1 percent of people with a common disease, say 0.1 percent of 10,000,000 – curing 10,000 people. The small effect exists but we can’t find it using reasonable sample sizes with statistical methods alone. This argues for putting more effort into understanding biological mechanisms and working to identify special subpopulations.

]]>

A summer chore was to dust the book shelves in my home. As you can see, the shelves (There are two more bays of shelves to the right.) have an open design thus collecting their share of dust and incidentally also a thin film of cooking grease that wafts out of the kitchen. As I worked, I was struck by how much of what I touched is now obsolete.

The shelves hold other objects that just books. Some are artifacts from the past – pictures of my parents, a toy passenger rail car that my dad built, the odd gift or piece of art – not obsolete but providing mostly nostalgia value. There is a shelf of CD’s and DVD’s and video tapes all superseded by internet streaming services. There are board games. Do I really need three scrabble sets? The games are kept for possible interested visitors who never appear. The shelves also contain six or seven Go sets. I only need one. The others gather dust.

The books are arranged by type. There is a shelf of Go books, some in Japanese which I can’t read. I might dip into them in an idle moment but the totality is more than I could ever read and learn from for the rest of my life. There are reference books. We sent the Encyclopedia Britannica to the dump years ago but still there are dictionaries, atlases and the like – even the complete Shakespeare with print too small for old eyes. All these have been superseded by the internet. Speaking of atlases, part of a shelf has a pile of maps now made obsolete by their age and google maps. Nostalgic value only. There are how-to books – knitting, knotting, auto mechanics, hiking, carpentry, etc. Now if I want to know how-to, I search youtube.

We used to collect what I call idea books – The Black Swan or Consilience for example. These are good for lending without expectation of return but have not been and never will be reread. Valuable ideas sitting dormant. Other books – novels, popular science also lie fallow on the shelves.

Then there is the floor to ceiling collection of math books on the left. Many of them I have read but few have I mastered. My college notes are also there. These math books now exist as a reminder of knowledge I will never have – what I don’t have time to learn if I could. At this point if I need to know about a mathematical topic I go to the internet.

So what is the purpose of my book shelves. They provide wistful ambiance from my past life and a stab of regret for paths not taken. As my friend Barry the golfer remarked, his book shelves just provide a decorative background for his television set. Mine just adorn the north side of the living room.

]]>- Did you go to every single class?
- If not, what did you do during those hours?
- If not, how did learn the material you missed?
- Did you inform the professor prior to your absence?
- Did you make up, if possible, any points you missed by not being in class?
- When you are in class, where do you sit?
- How many questions do you ask each class session?
- Do you ever text during class?
- Do you ever cruise the internet during class?
- Do you ever scroll around your phone during class?
- Do you ever gossip with another student instead of engaging in the class?
- Can I see your notes? How many times have you read them over?
- Did you turn in every assignment on time? If not, why not?
- Can I see the best assignment you turned in?
- Can I see the worst assignment you turned in?
- Did you take advantage of every opportunity to get extra points?
- How many hours and how many days did you study for the midterm?
- Show me your graded midterm.
- Pick one question you missed and explain what happened.

As you can see, these questions are really about the student’s commitment and habits for success. To repeat,

**Five Habits of Highly Effective Students**

**They show up on time or early.****They show up ready to work and prepared for the day.****They have done their homework.****Their work shows they care.****They actively participate in the endeavors of the day.**

Rubric data is commonly shown with a bar graph like this.

Note the gaps. I prefer a histogram-like bar graph like this.

I included the normal curve to make following point. If the scale on the horizontal axis has ordered meaning, then the bars on the histogram, if the same width, represent parts of a whole – think placing the bars end to end. This is what we are used to looking at with histograms – area = probability. The shape can be deceptive (see this post) but the idea gets across.

Yet overall I prefer a standard horizontal graph like so.

Here the redder the “badder” and the greener the “gooder” or for the color blind, “lefter” is “badder” and “righter” is “gooder.” One loses the ability to read frequencies directly but their relative size is easily seen. Comparisons work smoothly as can be seen in this graph from a forthcoming assessment committee report.

The problem with any of these choices is how to show prediction intervals (PI) This is a clunky graph from a previous post.

Now the rest of this saga.

The beauty of the Bayesian method as I gleaned from Statistical Rethinking by Richard McElreath was that one can get a distribution for each rubric category by repeated sampling using posterior probabilities. The ability to get means and PI’s and HPDI’s (prediction intervals with minimum width) and also densities comes naturally. I was proud of this graph for 15 minutes.

So proud I showed it to my wife. It is easy to read from afar – nice thick lines and color-coded. But it is deceptive (and mistitled). The overlap of the density plots have no meaning. The four graphs are just mushed together. The overlaps will be large or small depending on the sharpness of the density plots and also on their position. For instance two categories may have the same estimate for the mean.

I was also proud of this graph for a while.

This is similar in style to those in Statistical Rethinking except for larger plotted points and no shading extending between the bars for the PI’s. I particularly liked the contrast between posterior sampling mean and data using large open and filled disks. Yet the information seems to float in space and there is no sense that this is frequency data.

Maybe I could modify the horizontal stacked bar graph. I tried several ways of showing the “fuzz”, the uncertainty between the categories using transparent coloring and shades of gray. Not much success. Here is an example with the grey sections presenting uncertainty (95%).

The grey sections dominate the graph since the sample size was so small and are a bit deceptive since a smaller percentage for one category will necessarily cause a larger value in another category.

A cool feature of the Bayesian tools in Statistical Rethinking is the ability to sample from the posterior distribution. I decided to plot four thousand samples on one graph. I used forty narrow stacked graphs one above the other and iterated one hundred times with decreasing transparency. This gave a fair picture of the fuzz but no ability to visually quantify the sampling error. The last two graphs show how the uncertainty decreases with sample (original data) size.

Gradually changing the transparency one hundred times is overkill but I am out of ideas. So, at this point I will use the latter graphs to show fuzz and revert to a table to compare the various methods.

]]>