Crazy cute baby red pandas at that.
Unfortunately, first, the rant about statistics. I saw
this story show up in my random link browsing, I believe originally from Brad Delong mentioning that someone else tweeted it because that other person is still unhappy that Nate Silver wasn't wrong. So I read it, and I'm trying to gauge what my reaction should be:
|
I only spent like two hours writing this post, so a week is optimistic. |
|
Maybe this is what they're thinking? |
|
Maybe it's just poo-brain. |
|
I could get angry. |
|
Or I can just do it. |
So let's find the
three four most annoying things in the article, and complain about those.
1. "Lost until Chapter 8 is the fact that the approach Silver lobbies for is hardly an innovation; instead (as he ultimately acknowledges), it is built around a two-hundred-fifty-year-old theorem that is usually taught in the first weeks of college probability courses." Really? I don't think Nate Silver ever claimed that he had some deep insight into things. He basically sat down, noticed that there are lots of samples of things, and that maybe the average of those things would be a useful thing to minimize the noise in the various samples. Why no one thought to do this years ago suggests that most people don't like math, and that journalists in particular really don't like math.
2. "A Bayesian approach is particularly useful when predicting outcome probabilities in cases where one has strong prior knowledge of a situation. [...] But the Bayesian approach is much less helpful when there is no consensus about what the prior probabilities should be." Again, really? Let's cover this here in as clear a way as possible. The way I see it, most problems suggest a prior by the very definition of the problem. There's case A, which is apparently the only one Marcus and Davis care about: you have a pretty good idea what you expect to get. You expect to get (let's say) 10, and you assume that maybe there's some range around that where the result might actually be (let's say like +/- 2). So what prior do you choose? How about a Gaussian, with the mean at 10 and a standard deviation of 2. Cool. Now we have a prior based on what we kind of already knew.
Now let's move on to case B: you have no clue what the value should be. At all. Ok, but you know something because you don't know what the value is: you know that you don't know if the value should be 10, or if it should be like 20. This means that the probability of 10 and 20 are the same. How about -40? No clue, so it gets the same probability. So maybe the prior is just a constant value for all possible values, to reflect the fact you don't have any idea. If you have a range you think the value falls in, you could choose a constant of 1 / (max - min). It's not the end of the world if you really have no clue what that range should be, because a lot of the time you don't need it (model comparison for one, but that's generally not what Silver does, so I'm ignoring this case). As an afterthought, this is what you'd do if you want to find a location kind of thing. The mean in case A would be this kind of problem.
Finally, case C: the Jeffrey's prior. This one is more complicated, partially because there's more math, and partially because my conceptual picture of it is probably more abstract than you want. To start off with, let's consider that we know the mean in case A, but don't know the value of sigma. Sigma can't be negative, so it has to range from zero to infinity. Now consider two values, let's say 10 and 20. The ratio of the probability of these two points should be P(10)/P(20). But do you know anything about the units? No, because you told me you don't know anything about the problem. So if you change units from 1s to 2s, this says that this must be the same as P'(20)/P'(40). At this point, you probably want to just go look at the puppy. I'll wait, and it won't make me sad if you don't come back to finish this rant.
Ok, welcome back. So this is where my conceptual picture is bad, because I see here that if the units don't matter, then you probably have something of the form 1/x. My brain is trying to describe the idea why, but I can't come up with words, so I made a gif:
So other than the centering changing with the size of the yaxis labels, you'll notice that this shape is identical with changes in the units, so describes the problem above. Side note: this is also how I solve differential equations a lot of the time, too. In any case, I think this largely shows that you don't need to have a strong prior to do Bayesian inference.
3. "[...] Milgram [...]" Christ, what assholes.
4. "[...] it’s because in any given year, drug companies and medical schools perform thousands of experiments. In any study, there is some small chance of a false positive; if you do a lot of experiments, you will eventually get a lot of false positive results [...]" One of the reasons I chose astronomy is because if I screw something up, I'm pretty sure that no one's going to die. Not having to lift heavy objects is a plus too, but there's a lot to be said about doing something that's largely irrelevant to most people's lives. However, this isn't really the case for drug companies and medical schools. That was the main point I took away from the
Ioannidis paper: lots of medical studies have a result that claims a treatment is effective, when it may not actually be. Sure, you learn that if you do a lot of experiments, but in the world of drug companies and medical schools "a lot of experiments" usually means "we're doing this to real people who are sick." If you claim something is effective when it really isn't, you could kind of end up with lots of dead people around. That's why it's a big deal to make sure you're analysing you data correctly.
See? Astronomy. Everyone gets to go home at the end of the day.
Ok, I promised a puppy:
|
Wow, I hope I haven't posted this already. I have like no puppy pictures saved. |