Wednesday, July 26, 2017

Wednesday: Everything needs more and better documentation.

Earlier today, Julie IMed me and said something to effect of, "Drop whatever boring shit you're currently doing, and watch this youtube video."  So I did, and it was reasonably interesting, and showed me that I should rewrite some of my stats programs to use his more efficient standard deviation calculation.

When I got home, I attempted to install the package.  Finally, on the third computer I tried, I was able to get it to install and run.

Too bad it comes up with the wrong answer.

To be fair, this is a particularly difficult case, but it chose to do the split in the wrong dimension.  There's two closely located nearly Gaussian populations.  According to the documentation, you can specify different distributions for each dimension, but I couldn't get that to work, receiving the super helpful error message "ValueError: sample only has 2 dimensions but should have 1 dimensions".  The video suggested that it is able to take data that's partially classified, and use that along with the unclassified data to do better fits, so maybe tomorrow I'll see if I can figure out how to do that.  If I just say "these three are class A, and these three are class B", hopefully it'll get the dimensions correct.


No comments:

Post a Comment