UC Berkeley's MIDS Program (Master of Information and Data Science)
(Video-only and bridge courses at UC Berkeley)
MIDS 1a - Fundamentals of Linear Algebra
MIDS 1b - Fundamentals of Data Structures and Algorithms
INFO W18 - Python Bridge
I am passionate about data science. I love reading about it, blogging about it, and I have a history of using data-driven techniques to tackle and solve problems that seemed intractable.
The field of data science is exploding right now and will undoubtedly lead to many incredible discoveries. I'm just excited to be going along for the ride and will enjoy learning as much as I can.
I earned my B.A. in Mathematics at Pomona college, but at first didn't really realize how much I enjoyed problem solving when it utilizes a combination of mathematics and computer programming. It wasn’t until my final year when the pieces came together and the seeds for my lifelong love of data science were sown. I found my calling in classes such as Investigational Statistics, Mathematical Modeling, Fundamental Concepts in Math (surprisingly similar to programming), and Probability and Its Applications.
Even after becoming a professional software developer, I hadn’t realized the magic of combining programming with math until my former professor Art Benjamin challenged me with a proof he was working on for his upcoming book “Proofs that Really Count: The Art of Combinatorial Proof.” I didn’t think I could possibly solve such a difficult problem, but somehow he had confidence that I could bring a unique approach to the problem. It turns out that he was right, as I created a custom computer program that used a kind of Monte Carlo approach while allowing the user to nudge the program in the right direction. Before long, I had surprised myself by conquering four problems and even got mentioned in his book.
After this, I was part of a two-person team who developed the winning entry in the international 2007 AAAI Computer Poker Competition. In an article in the San Bernardino County Sun, Michael Bowling, from University of Alberta’s Computing Science Department, stated “they are going up against top-notch universities that are doing cutting-edge research in this area, so it was very impressive that they were not only competitive, but they won.”
Following 11 years as a software developer, I better aligned my career with my true interests by joining the Analytics team at Oversee.net. Early on, I overheard a couple co-workers discussing the statistics they used while running randomized A/B experiments and asked them to explain it to me. Something didn’t sound right and after digging into it, I found that their use of statistics was leading them to false and premature conclusions. Oversee switched their testing approach over to my recommended approach and I became the manager of the testing pipeline.
The problem I discovered with their approach (which to this day is still being used by other companies) was that they were treating metrics derived from web visitor statistics, such as the ratio of clicks-per-load, as if they came from coin flips. The problem is that, unlike the ratio of heads to tails, the ratio of loads to clicks can fluctuate wildly from day to day, thanks to bots that occasionally blast webpages with activity. The statistical methods used to determine significance must reflect that volatility in the error bars or they may lead to unjustifiable confidence in the results. With the standard coin-flip approach, the error bars actually never increase in size, no matter how wildly the metrics are varying. The uncertainty of the ratio follows directly from how close it is to 50% (the ratio with the maximum standard deviation) and how many data points there are. My recommended fix was to bucket the ratio of clicks-per-load by day and use the variance of those data points to determine significance.
My most memorable instance of data-driven decision-making was my development of a simple and profitable strategy for online poker. It also provided content for my guest lectures for Harvey Mudd College’s popular Mathematics of Games course on how the combination of data and mathematics can result in successful strategies that defy conventional wisdom.
It all started when I saw a “poker corner” segment on TV stating that a player who is short-stacked (has few chips remaining) has only one move: all-in. This was presented as a bad situation, but in my mind it was a great opportunity to make the game tractable. Some poker sites allowed you to start with a short-stack, so if my hypothesis was correct, I could actually make a profit. Being somewhat risk-averse, I only ever deposited $50 into my online poker account.
After utilizing an initial all-in or fold strategy that allowed me to gather hand history files on my opponents, I engineered an exploitive strategy by calculating the expected call equity (value when my bet is called), fold equity (value when everyone folds), and the cost of patience (the blinds). Conventional wisdom states that repetitive strategies can’t work, and that your specific opponents and position at the table are the most important things to consider. However, my data was telling me that all of this was incorrect and that a handy profit could be made.
While the strategy was simple, the analysis was not. In addition to creating a predictive model to evaluate potential strategies, I also had to estimate my precise edge in the game, in order to use the Kelly Criterion to minimize exposure to bad luck while maximizing hourly winnings.
In the end, my $50 became $30,000, and after sharing the strategy with friends, we collected some crazy stories to tell disbelieving family members.
MOOCs taken so far
Math and Statistics
Stanford: StatLearning Statistical Learning
Duke: Data Analysis and Statistical Inference
The Caltech-JPL Summer School on Big Data Analytics
DavidsonX: D003x.1 Applications of Linear Algebra (Part 1)
Khan Academy: Linear Algebra (140 videos)
MITx: 15.071x The Analytics Edge
MITx: 6.00.1x Introduction to Computer Science and Programming Using Python
HarvardX: PH525.1x Statistics and R for the Life Sciences
HarvardX: PH525.2x Matrix Algebra and Linear Models
HarvardX: PH525.3x Advanced Statistics for the Life Sciences
DavidsonX: D003x.2 Applications of Linear Algebra (Part 2)
Johns Hopkins: The Data Scientist’s Toolbox
Johns Hopkins: R Programming