Since I was a kid, I've been "Spock". A scientist at heart, I grew up on Cosmos and read all of Richard Feynman's books and anything I could find about quantum mechanics or Einstein's theories of relativity. To this day, I offer to take physicists to coffee or breakfast just to pick their brains about the mysteries of reality. And yes, like any true nerd, I write electronic music.


I earned my B.A. in Mathematics at Pomona College, but it wasn’t until my final year at Pomona when the seeds for my lifelong love of data science were planted. My enjoyment of classes like "Mathematical Modeling" and "Probability and Its Applications" provided the first clue that something like data science could be in my future.

Even after working as a software developer, I hadn’t realized the magic of combining programming with math until my former professor Art Benjamin challenged me with a seemingly unsolvable mathematical proof he was working on for his upcoming book “Proofs that Really Count: The Art of Combinatorial Proof.” I didn’t think I could possibly figure out such a difficult problem, but somehow he thought I could bring a unique approach to the problem. It turns out that he was right, as I created a custom computer program that used a kind of Monte Carlo approach to search possible solutions while also allowing the user to nudge the program in the right direction. Before long, I had surprised myself by conquering four problems and was excited to get mentioned in his book.

After this, I was a strategic advisor for the winning entry in the international 2007 AAAI Computer Poker Competition. In an article in the San Bernardino County Sun, Michael Bowling, from University of Alberta’s Computing Science Department, stated “they are going up against top-notch universities that are doing cutting-edge research in this area, so it was very impressive that they were not only competitive, but they won.”

Following 11 years as a software developer, I finally followed my inner data wonk to the Analytics department at Oversee.net. When I interviewed for the job, I didn't know Python or R, or even what a Github Repo was, but I did have a scientific mindset. I showed the interviewer how I had developed a profitable online poker system by downloading hand history and experimenting with a simple strategy and he said "this is perfect". Before long, I was modifying the statistics they used in their A/B testing and managing the testing pipeline. A few years later, I managed Domain Acquisitions and developed predictive models to find profitable domain names in the wild. At our peak, we owned and monetized over a million domain names.


I participated in my grand successes and epic failures during those years and it was there that I truly learned the value of the scientific mindset. After that, I improved my technical skills in UC Berkeley's Master of Information and Data Science program and since graduation, I've been on a mission to spread the message that data science education should be centered around scientific reasoning. The way around the pitfalls of data science is to walk the path of scientists.






The Scientific Mindset

Data doesn't speak for itself; it needs an interpreter.

​​Why is the average age of death for male rappers under 30? Why are the best scoring schools the smallest ones? Are large earthquakes on the rise? Do dead salmon have brain activity when shown photographs? Are the Sophomore Slump and the Sport Illustrated Jinx real or imaginary? Do drugs for relaxation help students score higher on the SAT? Why does punishment seem to work better than reward? Why are movie sequels rarely as good as the originals? These are the types of questions that data scientists should be well positioned to answer, but despite their impressive array of technical skills, many are unable to critically solve problems with data. They were never taught how to be critical thinkers, how to be skeptical and ask questions, or the importance of conducting experiments. They were never taught about the scientific mindset.


The golden rule of science is: do whatever it takes to avoid fooling yourself. Look for explanations that make sense, generate ideas that are falsifiable, design and conduct experiments, and be willing to change your mind. If you can't think of any evidence that, if it were produced, would cause you to change your mind, then you don't care about evidence. Having an idea that can't be proven wrong is not a strength, it's a weakness. 


What putting science into data science allows you to do: 

  • Effectively interpret data and figure out if it's saying what you think it's saying.
  • Evaluate evidence and develop a Spidey Sense that tells you when you're not seeing the whole picture.
  • Identify which features might be useful for making predictions. If you include too many nonsense variables, they'll crowd out the real ones.
  • Run experiments whenever possible. The strongest evidence is evidence that could've gone against you.

 

Businesses call themselves "data driven" and think they know what data is telling them ("up is up"). However, many are not analyzing data in a scientifically rigorous way and are setting themselves up to be duped by data. They p-hack (test repeatedly until it finally backs them up), they HARK (create hypotheses after the results are already known), and they hide their misses while showing their hits (publication bias). My goal is to help train the next generation of data scientists and managers to avoid the pitfalls, by teaching data science in a new way that emphasizes scientific reasoning. Data science works, but only if you work like a scientist.