Driven data science engineer skilled at applying machine learning methods, developing code, breaking down complex concepts into understandable pieces, and communicating data science insights.
Successfully completed all 5 levels of increasingly difficult Google coding challenges in Python. These challenges required an assortment of math and programming skills ranging from complex concepts in Group Theory to implementing a variety of search algorithms. In addition to a set of visible and hidden test cases, solutions needed to complete within a time limit and have a limited amount of code. This often required breaking down the problems and planning solutions prior to writing any code. My solutions and tests along with the challenge descriptions are available on GitHub.
The goal of this project was to train a computer agent to land a spacecraft in a target zone through the use of reinforcement learning. The learning algorithm I used for the solution was a Q-learning algorithm that used a neural network to generate value approximations. The number of layers and depth of each layer of the neural network needed to be carefully tuned along with other parameters such as action replay, and ε-greedy decay. The action replay is a set of saved memories used for retraining throughout learning. It needs to balance newer memories which allow the agent to learn from better runs and older memories which prevent the agent from getting stuck in a local minimum. ε-greedy determines how often the agent explores (chooses actions at random) and exploits (chooses actions based on its learnings). As the agent improves, it finds itself in new states and needs to explore these states, but too many random actions prevent it from reaching its full potential. Click here for a video of the trained lunar lander.
The goal of this project was to help a robot navigate an unknown territory and extract a set of predetermined gems. Using measurements from robot sensors, the robot could determine the distance to the gems. This distance, along with the robot movements were used to map the environment as well as locate the robot in that environment with a SLAM algorithm. Limited by a maximum speed and turning radius and noisy sensory data, the robot needed to determine a course of action and extract the necessary gems within a specified time limit and without making costly mistakes.
Recognizing the complexities of player performance, this project aims to predict player tendencies and expectations in key aspects of the game. After defining the project goal, I collected hundreds of individual player statistics from the NBA website and additional data from Basketball Reference required to evaluate different outcomes of plays. I then created target features of performance and rate for different plays and engineered features that better valued a player's current performance. For each target, I closely examined the most related features using graphs, functions I created, and feature selection tools from scikit-learn. Using linear and regularized regression, I optimized model performance and interpretability. These models were able to outperform baseline estimates by up to 29%. See more on GitHub
This was a team project with the goal of creating a cost benefit analysis of when and where the city of Chicago could spray pesticides in order to reduce the affect of West Nile virus on humans. We began by researching effective surveillance techniques and the best predictors of West Nile virus. We then retrieved mosquito trap data for the last 12 years from the city of Chicago website along with weather data for those years from the NOAA. Using the data, I oversampled positive cases in order to create a model that was sensitive to its presence. To avoid overfitting on sparse data, I used a random forest with a small depth. These methods increased our model's sensitivity from 0 to .77. Taking into account areas with dense susceptible populations, the predictions of our model, and the cost of spraying, we created our recommendations for the city of Chicago. See more on GitHub
The goal of this project was to better understand the board gaming and video gaming communities by determining the differences in their word choice. Using the Reddit API, I scraped over 5,000 posts from the board game and video game subreddits. I stemmed and applied a TFIDF vectorizer to the posts. From there, I optimized the hyperparameters of different classifiers. I chose a logistic regression due to strong predictability and interpretability. It was able to determine subreddits with 98% accuracy. Using it, I could determine which words were the best predictors of each subreddit. See more on GitHub
The goal of the project was to create a model to predict home prices in Ames, Iowa using a dataset with around 2000 homes and 80 features. I began by cleaning the data and exploring it to understand the relationships of the features with each other and with home prices. Once completed, I used background knowledge to create features that had stronger correlations with sale price. Using a few different functions that I created, I narrowed down the features to ones that were less correlated with each other and best correlated with sales prices. To create interaction terms I used polynomial features. Finally, I scaled the data, and optimized L1 and L2 regularized regressions to predict the home prices.
Python
SQL
C++
I am a huge fan of all things basketball. As much as possible, I like to play basketball. When that is not possible, I love watching the Celtics (or other NBA teams) and I love analyzing teams using advanced analytics.
In high school, I fell in love with board games while playing Settlers of Catan. Years later, I discovered that there was a whole world of board games out there. While most of my favorite games are medium-heavy Euros, I have been slowly getting into economic train games.
When I have some free time to myself, especially when commuting, I often fill it with reading. Mostly, I enjoy reading science fiction and fantasy. I am currently alternating between The Maze of Games and A People's Future of the United States.