Hi! I am Shubham

I am currently pursuing MS in Business Analytics and Information Management (STEM Degree) at Purdue University. I am a Computer Science graduate with four years of experience in Data Analytics and Engineering. I look forward to working with a world-class analytics team and making an impact by my combining technical expertise, business acumen and data-driven-decision making skills to solve challenging problems.

Education


Purdue University, Krannert School of Management, USA

Master of Science in Business Analytics and Information Management (August 2021 - June 2021)
Relevant Coursework:
-> Business Analytics, Customer Analytics, Data Mining, Pricing Strategies, Time Series Forecasting, Web Data Analytics
Activities & Honors:
-> Dean's List, Runner up - STAMINA IT/Analytics Case Competition

Malaviya National Institute of Technology, India

Bachelor of Technology, Computer Science & Engineering (July 2013 - May 2017)
Relevant Coursework:
-> Data Structures & Algorithms, Database Management, Computer Programming, Machine Learning, Natural Language Processing

Professional Experience


Plivo Communications, India

Product Analyst (August 2020 - July 2021)

Capgemini, India

Data Science & Analytics Consultant (August 2017 - May 2020)

Projects

Here's some of the analytics projects, I have been working on.

Streamlining Machine Learning Lifecycle using MLflow

  • Designed & implemented end-to-end machine learning pipeline using PySpark and MLflow. Evaluated MLflow feasibility as a MLops tool
  • Built a recommender system to analyze customer preferences for products and improve promotions efficacy
  • Tech Stack: python, pyspark, pandas, sparkml, azure databricks, mlflow
  • Credit Default Prediction

  • Trained a predictive model that utilizes historical payment status and certain demographical information to evaluate the likeliness of credit default
  • Achieved ROC-AUC of 0.751 using Ensemble method on SAS Enterprise Miner with limited imbalanced dataset and ROC-AUC of 0.93 using Ensemble method and oversampling
  • Tech Stack: python, numpy, pandas, matplotlib, scikit-learn, sas-em
  • CraigList Job Scam Detection

    See Project on GitHub
  • Trained a predictive model to correctly categorize jobs into their functions and detect if the job is fake/scam or not using features like job title, description, etc
  • Achieved F1 score of 0.77 on multi-class job classification problem using Linear SVM with Tf-Idf and F1 score of 0.946 on binary scam detection problem using LSTM with word embedding
  • Tech Stack: python, numpy, pandas, matplotlib, scikit-learn, nltk, keras
  • AirBnb Data-Driven Decision-Making

    See Project on GitHub
  • Analyzed whether boutique hotels moving to the Airbnb platform is a lucrative venture
  • Collected & scraped structured and unstructured data from multiple sources, identified metrics, defined and tested hypotheses, performed regression analysis, and drew conclusions.
  • Tech Stack: python, numpy, pandas, matplotlib, statsmodel, scipy, nltk
  • RFM Segmentation & Churn Prediction

  • Built segmentation model using RFM analysis & K-Means clustering to profile customers and devised engagement strategy for different segments
  • Used the results from segmentation analysis to devise a classification model to predict if a customer will churn in the next six months.
  • Tech Stack: python, numpy, pandas, matplotlib, scikit-learn
  • EDA UEFA Champions League

    See Project on GitHub
  • Perfomed exploratory data analysis of highly renowned football event - UEFA Champions League on the data of five seasons 2014-2019
  • Analysed different football leagues, clubs, and players performances & startegies
  • Tech Stack: python, numpy, pandas, matplotlib, plotly
  • X-Ray Image Analytics

    See Project on Github
  • Trained binary and multi-class models to predict the probability of a chest x-ray having signs of pneumonia and other ailments using transfer learning from a pre-trained VGG model
  • Achieved an F1 score of ~91 for the binary model and 80 for the multi-class model with limited data (~6000 images, ~8000 images respectively)
  • Tech Stack: python, numpy, pandas, matplotlib, opencv, keras