​​I am passionate about data science.  I love reading about it, blogging about it, and I have a history of using data-driven techniques to tackle and solve problems that seemed intractable.  

I earned my B.A. in Mathematics at Pomona college, but at first didn't really realize how much I enjoyed problem solving when it utilizes a combination of mathematics and computer programming.  It wasn’t until my final year when the pieces came together and the seeds for my lifelong love of data science were sown.  I found my calling in classes such as Investigational Statistics, Mathematical Modeling, Fundamental Concepts in Math (surprisingly similar to programming), and Probability and Its Applications.

Even after becoming a professional software developer, I hadn’t realized the magic of combining programming with math until my former professor Art Benjamin challenged me with a proof he was working on for his upcoming book “Proofs that Really Count: The Art of Combinatorial Proof.”  I didn’t think I could possibly solve such a difficult problem, but somehow he had confidence that I could bring a unique approach to the problem.  It turns out that he was right, as I created a custom computer program that used a kind of Monte Carlo approach while allowing the user to nudge the program in the right direction.  Before long, I had surprised myself by conquering four problems and even got mentioned in his book.

After this, I was part of a two-person team who developed the winning entry in the international 2007 AAAI Computer Poker Competition.  In an article in the San Bernardino County Sun, Michael Bowling, from University of Alberta’s Computing Science Department, stated “they are going up against top-notch universities that are doing cutting-edge research in this area, so it was very impressive that they were not only competitive, but they won.”

Following 11 years as a software developer, I better aligned my career with my true interests by joining the Analytics team at Oversee.net.  Early on, I overheard a couple co-workers discussing the statistics they used while running randomized A/B experiments and asked them to explain it to me.  Something didn’t sound right and after digging into it, I found that their use of statistics was leading them to false and premature conclusions.  Oversee switched their testing approach over to my recommended approach and I became the manager of the testing pipeline.

The problem I discovered with their approach (which to this day is still being used by other companies) was that they were treating metrics derived from web visitor statistics, such as the ratio of clicks-per-load, as if they came from coin flips.  The problem is that, unlike the ratio of heads to tails, the ratio of loads to clicks can fluctuate wildly from day to day, thanks to bots that occasionally blast webpages with activity.  The statistical methods used to determine significance must reflect that volatility in the error bars or they may lead to unjustifiable confidence in the results.  With the standard coin-flip approach, the error bars actually never increase in size, no matter how wildly the metrics are varying.  The uncertainty of the ratio follows directly from how close it is to 50% (the ratio with the maximum standard deviation) and how many data points there are.  My recommended fix was to bucket the ratio of clicks-per-load by day and use the variance of those data points to determine significance. 

The field of data science is exploding right now and will undoubtedly lead to many incredible discoveries.  I'm just excited to be going along for the ride and will enjoy learning as much as I can.

MOOCs taken so far

Math and Statistics

  Stanford: StatLearning Statistical Learning
  Duke:  Data Analysis and Statistical Inference 

  The Caltech-JPL Summer School on Big Data Analytics

  DavidsonX: D003x.1 Applications of Linear Algebra (Part 1)
  Khan Academy: Linear Algebra (140 videos)

Personal Background

Computer Science

     MITx: 15.071x The Analytics Edge​

     MITx: 6.00.1x Introduction to Computer Science and Programming Using Python

     HarvardX: PH525.1x Statistics and R for the Life Sciences
     HarvardX: PH525.2x Matrix Algebra and Linear Models
     HarvardX: PH525.3x Advanced Statistics for the Life Sciences
     DavidsonX: D003x.2 Applications of Linear Algebra (Part 2)

     Johns Hopkins: The Data Scientist’s Toolbox

     Johns Hopkins: R Programming 

UC Berkeley's MIDS Program (Master of Information and Data Science)

(Video-only and bridge courses at UC Berkeley)

  MIDS 1a - Fundamentals of Linear Algebra

  MIDS 1b - Fundamentals of Data Structures and Algorithms

  INFO W18 - Python Bridge