Skills

Languages:
- Python
- R
- Spark
- VBA

I am proficient in Python, and I extensively use it in my projects. I am also familiar with R. Regarding Computer Science, I have a particularly good understanding of data structures, searching and sorting algorithms.

Data Cleaning & Wrangling:
- Pandas
- NumPy
-Alteryx

When working purely in python I use Pandas and Numpy to help with data cleaning and EDA. In my last position I have extensive Alteryx experience for these tasks

Data Visualization
- Matplotlib
- Seaborn
- Plotly
- Ggplot
- Holoviews

When using python I have experience using Seaborn, Matplotlib, and Plotly.

Machine Learning:
- Scikit-Learn
- Supervised ML
- Unsupervised ML

For machine learning models, I am most familiar with Scikit-Learn, but I am open to learning anything. I am familiar with Natural Language Processing (NLP), Decision Tree, Random Forest, Logistic and Linear Regression, Support Vector Machines (SVM), Naïve Bayes (NB), Stochastic Gradient Descent (SGD), K Nearest Neighbors (KNN), and Nuearal Networks.

Databases:
- SQL
- MongoDB
- NoSQL
- BigQuery

For databases, I am most skilled in SQL and MongoDB. In the past, I worked with Big Query.

Others:
- Tableau
- Jupyter Notebook
- Databricks
- UI Path

I have used various other tools in my projects and work. I am able to learn new tools quickly as well.

Recent Projects

Beer Ratings and Recomendations

This notebook explores trends in beer preferences using a dataset of about 1.5 million beer reviews from BeerAdvocate. The project's goals include characterizing the dataset, grouping similar beers, and recommending beers in each group. It poses intriguing questions about beer selection and factors influencing beer quality. The notebook covers extensive data analysis, including individual beer and brewery analysis, and employs various modeling techniques such as clustering and recommendation systems (Content-Based, Collaborative, and ALS Recommendation).

2016 Voter analysis

This notebook documents the 2016 election and delves into data preparation and cleaning, with a focus on handling NA data. It highlights the application of various modeling techniques, including clustering and logistic regression, demonstrating a comprehensive approach to data analysis and machine learning.

Census Income Study

A series of projects in the 'Census Income Study' are dedicated to the in-depth analysis of census data, employing advanced data science methodologies. The primary objectives are to meticulously clean and analyze the data, apply clustering techniques for uncovering patterns, and utilize logistic regression for robust predictive modeling, thereby extracting valuable insights from the census dataset.