Utterly Meaningless » Blog Archive » STATISTICS 101

    Filed at 7:27 am under by dcobranchi

    States across the nation are playing statistical games with their test results to make it appear that they met goals that they probably haven’t.

    In Oregon, statistical latitude allowed one small high school to meet the state’s requirement that 40 percent of students pass the reading test, though raw test scores showed only 28 percent of students passed the test.

    There are valid reasons for recognizing that there is a statistical uncertainty in any measurement. There are two major problems, though, in how the schools are doing this.

    The first is that these uncertainty measurements are so large that it calls into question the methodolgy of analysis or the validity of the test. They may be statistically correct (the article doesn’t provide enough info to determine that) but why bother?

    The second problem is more insidious. The schools are using a one-tailed test to answer what should be a two-tailed problem. In other words, they apply the statistics selectively. If the raw scores indicate they missed the goal, they apply the confidence interval to make it appear that they might have “passed.” They “forget” to do that when they raw scores are good enough.

    I’m sure Kimberly, when she recovers, could shed more light on this. I’m a relative lightweight when it comes to statistics; she’s the real thing.

    4 Responses to “STATISTICS 101”

    Comment by
    Steven Gallaher
    September 29th, 2003
    at 4:59 pm

    So many questions, it is hard to know where to begin. I guess the main question I have is this: Where are they getting their standard deviation estimates? The story implies that it is the raw scores from the students.

    If so, this means that the worse the worst students do (and the better the best students do) on the test, the worse the marginal students can do and still have the school overall be “ok”. (The further the students are from the mean, the bigger the confidence interval. Here by marginal, I mean those students who failed, but are being counted as having passed statistically.) In as much as the stated goal is “No Child Left Behind”, this seems a bit perverse.

    By the way…40% of students passing is the cut-off?! What does that have to do with “No Child Left Behind”? I guess they meant “No more than 60% of the children left behind”. I bet those parents at those schools which had *only* half their students fail a basic reading test feel really good about things. (Yes, I know this is almost certainly an old complaint, but I am just getting up to speed.)

    In answer to your concerns (if my assumptions about what they are doing are correct): The only way the confidence intervals could be that large is if the sample (the number of students) was small. If they had even 100 students with 30% of them passing, the largest standard deviation possible would only be 4.6 (on a 100 point test). So with a 95% confidence interval, you could pretend that students who scored 9 points below passing actually passed, but no more. As far as the one- versus two-sided tail test, it is not all that bad (though it is an error). Their confidence intervals are 19% bigger than they should be if they wanted to make the claim “there is at least a 5% chance that we are actually a passing school, so you should pass us.” (That is the claim they are making, by the way. Think about that.) In our hypothetical, most-extreme, 100 student school, the correct minimum “pass” would be 8 points, instead of 9, below the passing grade.

    Comment by
    Daryl Cobranchi
    September 29th, 2003
    at 7:50 pm

    Oh Sleepless One,

    Two things-

    NCLB theoretically leaves children behind until (IIRC) 2014 when 100% of kids must “pass.”

    The problem isn’t using a two-tailed vs. one-tailed t-distribution. It’s the fact that they completely ignore the existence of the distribution to the low side. So, a pass is a pass, and a fail might be a pass, too. Definitely not kosher.

    Comment by
    Steven Gallaher
    September 30th, 2003
    at 10:23 am

    I understand, but I think that it is kosher. (Or at least, that aspect of it is.)

    If you are going to use confidence intervals (and, by extension, hypothesis testing) you must have a ‘null hypothesis’ to test. The null hypothesis here is that a student would normally pass. As long as a student’s score is within the confidence interval below, or anywhere above, the passing grade, we can not reject the null hypothesis. If we were being careful, we would not say that the student passed, merely that we can not reject the hypothesis that they would normally pass.

    Your complaint appears to be that the null hypothesis is not appropriate. I am having difficulty seeing an alternate one. “The student would normally fail” comes to mind, in which case a student would actually have to score a couple standard deviations above the passing grade in order to pass. That this sounds insane only means that the entire process is insane. Another possible hypothesis would be that a student scored exactly (or rather, deserved to score exactly) the passing grade. While this would be a nice, two sided test, its real world application is not clear.

    …time passes…

    As I think about this more, though, the more wrong the underlying (and unstated, naturally) assumptions seem to me. My initial question was the correct one; I just did not think it through. They seem to be assuming that every student has the same expected score (and variance of scores) on the test. This is rather like recording the temperature in every major city in the world on a given date, finding the standard deviation of the measurements, and then claiming that this number is the standard deviation of the measurments in Sydney, Australia for that date. This would be fine if the expected temperature for each of these cities was the same for that date (the the variances were the same as well), but that is clearly not true.

    In the test score case, the standard deviation estimates are just as non-sensical and can not be used for hypothesis testing. What we have is a single observation for each student’s expected score on the test. We can do no better than to assume that that observation is that student’s expected score and, therefore, do the obvious thing and pass those who passed and fail those who did not. (This would have the bonus of being consistent with the letter and spirit of the law.)

    Comment by
    September 30th, 2003
    at 4:55 pm

    Gadfly had a guest editorial this summer: