First though, there's this Rick and Morty show. Rick's a crazy scientist, and Morty is his grandson who ends up getting dragged on impossible adventures. Great stuff. This week's episode they were abducted by aliens and put into a simulation world for reasons. However, it's not a good simulation, and Rick points out obvious problems:
Like the guy putting a bun between two hot dogs, and the old lady walking her cat. |
Morty acknowledges that poptarts probably wouldn't live in a toaster, as that doesn't make much sense. |
Nor would they need to get in their car and go to work. |
The poptarts show up later, as well. |
Ok, statistics again. The problem is that I saw this article today, which basically complains that "no one really means to use standard deviation, as people intrinsically want to use the mean absolute deviation" which is, of course, complete bullshit.
First, no one would ever do mean absolute deviation in their head. Here are some numbers: {-1 2 3 -5 1 400}. If you had to guess another number that would belong to this set, you're going to guess like "dunno, zero maybe?" You know that 400 is probably bullshit, so you cut it out. People don't do real means when they filter data. It's some combination of a mode and median. Choose a number that doesn't seem crazy.
Second, this mean absolute deviation tells you about where the 50% point falls. Why that point? The standard deviation is more inclusive, as it tells you that most (Q(1) = 68.change%) samples are closer to the central value.
Third, all that obvious shit about moments analysis.
Anyway, time for plots. These are the same idea as the ones from the previous post, just remade with more samples and different stats. The horizontal lines are the true uncontaminated distribution sigma and the true fully contaminated sigma (sigma_uniform = sqrt((b - a)^2 / 12), because math). First thing to note: Actual sigma cleanly switches from the two extremes, as it really should. Gaussian fits are best, but IQD and MAD are comparable up to the 50% contamination point. MeanAD doesn't seem particularly good. The full contamination end is biased, as I'm using a parametric model (that it's a Gaussian distribution).
Biased samples. This nicely shows that IQD fails before MAD, and that Gaussian fits are reasonable up to 60% contamination. MeanAD is again off kind of doing its own thing. Median >>> mean for outlier rejection.
- I'm repurposing this thing Julie tweeted to remind future me to try out this pizza place.
- I disagree, as macroeconomics is clearly the emergent behavior of the enormously complex non-stationary Markov chain that is the economy.
- Everyone needs labor law enforcement.
- Places.
- I don't really think planetary science is that important, but I'm somewhat biased. In any case, government funding of science really should be about double what it is now, where my personal biases about the priority of sciences wouldn't matter because everyone would have funding.
No comments:
Post a Comment