Galvanize Data Science: Week 2

Today, the high in Seattle hit 92°F. It’s very hot in my apartment, I feel like a normal distribution with an increasing standard deviation…

shade_plot

Animated normal distribution with a changing standard deviation. Made with matplotlib and imageMagick.

Before entering the Data Science Intensive Program at Galvanize, I reached out to current and previous cohort members to see if they had any thoughts or advice for someone thinking to enter. Nearly every person came back with a glowing review and the same analogy:

“…it’s like trying to drink from a firehose.”

They weren’t joking. There is so much information being thrown at us at one time that it is physically impossible to absorb it all. The best you can do is take good notes and try to take in as much as you can.

In this week we covered an incredible amount of material. We covered (no joke) about a year’s worth of lectures on probability, statistics (frequentist and bayesian), A/B testing, hypothesis testing, and bootstrapping all in one week. Because of memorial day, all of that was crammed into four days. Needless to say, I’m exhausted, but satisfied.

Each day’s lectures were complimented with programming exercises illustrating the topics of the day. For the A/B testing exercise for example, we used data from Etsy to determine if changing their homepage would drive additional customer conversion. For the Bayesian lecture, we developed and performed simulations for coin flips and die rolls to illustrate the concepts behind Bayesian probability. This topic was somewhat mind-blowing to me, as Bayesian probability is a way of thinking about statistics which is totally different and foreign to the ways that most people (including myself) are taught.

This was also the second week of pair-programming and I’m finding I like it more and more.  Pair-programming is when one person “drives” while the other person “worries”. You switch off every 30 minutes or so. The brilliance of pair-programming is that, in addition to learning to work with other people, by the end of the day, we are often very tired and having a partner helps. Working in groups makes you answerable to your partner. Your mutual success depends on both of you working hard to get the assignment done. We switch partners daily, so each day has a different dynamic. Sometimes I’m the stronger programmer, sometimes I’m not. I’ve found it’s a nice way of humbling oneself.

Thanks for reading.

Next week: It’s Linear and Logistic Regression, sounds fun huh?

5 Comments

  1. It sounds like all of the advantages of a running partner combine with all of the advantages of a study partner! Both things I value a lot! Keep up the hard work, my friend!!! 💕💕😀

    Reply

  2. I recently brushed up on this material myself because I realized that my understanding of continuous Bayesian probability was weak. Fascinating stuff. I’ve also been playing around with probabilistic programming languages which have Bayesian concepts built directly into the language.

    Reply

    1. That’s awesome, do you have an example of the types of things you are studying with that? Any VR stuff?

      Reply

  3. I mainly studied this for the pure mathematical enjoyment of it. I knew how to solve probability and expectation problems where all the choices were discrete, but not when they were continuous, for example:

    Two points are chosen randomly and independently from the interval [0,1] according to a uniform distribution. What is the expected distance between the two points?

    And I had absolutely no idea how to apply Bayes’ rule in the context of these continuous probabilities, for example:

    Romeo and Juliet start dating, but Juliet will be late on any date by a random amount X, uniformly distributed over the interval [0, z]. The parameter z is unknown and is modeled as the value of a random variable Z uniformly distributed between zero and one hour. Assuming that Juliet was late by an amount x on their first date, how should Romeo use this information to update the distribution of Z.

    A programming language called Anglican can elegantly solve these sorts of continuous Bayes’ problems experimentally, even when they get too complicated to derive exact answers algebraically.
    http://www.robots.ox.ac.uk/~fwood/anglican/index.html

    What interests me about continuous Bayes is that it is really useful for building systems that use fuzzy, erroneous sensor data and need to make the best possible deductions about where the object actually is. Imagine, for example, a robot that gets pings from several stationary towers, and the strength of those pings relates to the distance from the tower but with some noise factor that can be modeled as a normal distribution. Even though each measurement is imprecise, you can use Bayes’ rule with all the measurements to get a fairly accurate picture of where the robot is.

    And you’re right, this also applies to VR where you need to pinpoint the player’s head and hand positions from a bunch of noisy sensor readings.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *