Shubham Goyanka Portfolio

Hi! I am Shubham

I am currently pursuing MS in Business Analytics and Information Management (STEM Degree) at Purdue University. I am a Computer Science graduate with four years of experience in Data Analytics and Engineering. I look forward to working with a world-class analytics team and making an impact by my combining technical expertise, business acumen and data-driven-decision making skills to solve challenging problems.

Education

Purdue University, Krannert School of Management, USA

Master of Science in Business Analytics and Information Management (August 2021 - June 2021)

Relevant Coursework:

-> Business Analytics, Customer Analytics, Data Mining, Pricing Strategies, Time Series Forecasting, Web Data Analytics

Activities & Honors:

-> Dean's List, Runner up - STAMINA IT/Analytics Case Competition

Malaviya National Institute of Technology, India

Bachelor of Technology, Computer Science & Engineering (July 2013 - May 2017)

Relevant Coursework:

-> Data Structures & Algorithms, Database Management, Computer Programming, Machine Learning, Natural Language Processing

Professional Experience

Plivo Communications, India

Product Analyst (August 2020 - July 2021)

Capgemini, India

Data Science & Analytics Consultant (August 2017 - May 2020)

Projects

Here's some of the analytics projects, I have been working on.

Streamlining Machine Learning Lifecycle using MLflow

Designed & implemented end-to-end machine learning pipeline using PySpark and MLflow. Evaluated MLflow feasibility as a MLops tool

Built a recommender system to analyze customer preferences for products and improve promotions efficacy

Tech Stack: python, pyspark, pandas, sparkml, azure databricks, mlflow

Credit Default Prediction

Trained a predictive model that utilizes historical payment status and certain demographical information to evaluate the likeliness of credit default

Achieved ROC-AUC of 0.751 using Ensemble method on SAS Enterprise Miner with limited imbalanced dataset and ROC-AUC of 0.93 using Ensemble method and oversampling

Tech Stack: python, numpy, pandas, matplotlib, scikit-learn, sas-em

CraigList Job Scam Detection

See Project on GitHub

Trained a predictive model to correctly categorize jobs into their functions and detect if the job is fake/scam or not using features like job title, description, etc

Achieved F1 score of 0.77 on multi-class job classification problem using Linear SVM with Tf-Idf and F1 score of 0.946 on binary scam detection problem using LSTM with word embedding

Tech Stack: python, numpy, pandas, matplotlib, scikit-learn, nltk, keras

AirBnb Data-Driven Decision-Making

See Project on GitHub

Analyzed whether boutique hotels moving to the Airbnb platform is a lucrative venture

Collected & scraped structured and unstructured data from multiple sources, identified metrics, defined and tested hypotheses, performed regression analysis, and drew conclusions.

Tech Stack: python, numpy, pandas, matplotlib, statsmodel, scipy, nltk

RFM Segmentation & Churn Prediction

Built segmentation model using RFM analysis & K-Means clustering to profile customers and devised engagement strategy for different segments

Used the results from segmentation analysis to devise a classification model to predict if a customer will churn in the next six months.

Tech Stack: python, numpy, pandas, matplotlib, scikit-learn

EDA UEFA Champions League

See Project on GitHub

Perfomed exploratory data analysis of highly renowned football event - UEFA Champions League on the data of five seasons 2014-2019

Analysed different football leagues, clubs, and players performances & startegies

Tech Stack: python, numpy, pandas, matplotlib, plotly

X-Ray Image Analytics

See Project on Github

Trained binary and multi-class models to predict the probability of a chest x-ray having signs of pneumonia and other ailments using transfer learning from a pre-trained VGG model

Achieved an F1 score of ~91 for the binary model and 80 for the multi-class model with limited data (~6000 images, ~8000 images respectively)

Tech Stack: python, numpy, pandas, matplotlib, opencv, keras

Contact Me

Github