"Each step you take towards your dream counts. Be Consistent. Be Confident. AND, Keep Going."

Neethu Mariya

I am Neethu Mariya, a results-driven data scientist and data engineer with a passion for leveraging data to drive meaningful insights and solutions. With a solid foundation in machine learning, statistical analysis, and data engineering, I have honed my skills through practical experience and academic pursuits, including a master's degree in Data Science from University of Waterloo. Here, you'll find a showcase of my work, demonstrating my expertise in solving real-world challenges and delivering impactful results.

Linkedin

GitHub


E-mail : nmariya@uwaterloo.ca

Resume

Work Experience

Data Scientist

J.D. Power
May 2021 - Present
  • Leveraged advanced statistical and machine learning techniques to construct robust predictive models, delivering valuable insights and driving data-informed decision-making processes
  • Developed and implemented customer segmentation strategies using clustering algorithms, effectively categorizing customers into meaningful segments based on shared characteristics and behaviors
  • Applied feature engineering methodologies to enhance the predictive power of models, extracting relevant insights from complex datasets and improving model performance
  • Conducted hypothesis testing and statistical analysis to validate hypotheses and measure the impact of key variables on various business metrics, providing actionable recommendations for optimizing performance
  • Collaborated with cross-functional teams to identify business needs, design data-driven solutions, and translate business requirements into analytical models and algorithms
  • Led data quality improvement projects, ensuring reliable and trustworthy data by implementing data validation processes, reducing errors, and enhancing data integrity
  • Employed A/B testing methodologies to evaluate the effectiveness of different strategies, enabling data-driven decision-making and continuous improvement
  • Implemented measures to enhance survey data quality and minimize the risk of fraudulent activities and manipulated feedback by conducting sentiment analysis on survey verbatim data, categorizing sentiments, calculating CSAT scores, and comparing the results to identify potential discrepancies and indicators of fraud
Company Website

Data Engineer (Co-op)

Manulife Financial
May 2020 - August 2020
  • During my Co-op, I conducted a retrospective study of the Ingestion KPIs and related incidents and built a fully dynamic and interactive dashboardas a one-stop shop for internal operational purposes
  • Other responsibilities involved automating ETL process by creating metadata tagging script
  • Also built adata pipeline using a combination of python and bash
Company Website

My Specialty...

Python Programming

Picked up the skill over years through actual coding and getting my hand dirty through various projects.

Artificial Intelligence

Specialized in Machine Learning and Neural network through projects and course-work

Critical Thinking & Problem Solving

What? Why? What's next? is how I get my brains around a problem. And solving them has always been my thing.

Quick & Constant Learner

I can grasp a new concept over short span of time and I keep motivating myself through learning new things on a continous basis.

Data Visualization

Skilled and trained at creating excellant visualizations with perfect balance between perception and cognition.

My Complete Skill-Set

Python
93%

SQL
75%

NLP
80%

Statistics
90%
PyTorch, Keras
85%

C++
82%

Machine Learning/Deep Learning
80%

Statistical Analysis
90%

Projects

Security Camera Installation

Designed as an optimization problem called the Vertex-Cover problem, this project is aimed at minimizing the installations of security cameras in streets for effective monitoring. Implemented using the graph theory and reduced it to CNF-SAT solvable format. This is a project with multiple coding assignments written in Python and C++, that communicates with each other via Inter-Process Communication(IPC). The project implements multi-threading and parallel processing to run more efficiently.

see Project
Techniques of Crowd Counting using CNN: A Review

A Literature Review on the state-of-the-art Crowd Counting techniques using CNN. This 15-min-read not only gives the gist of the existing techniques, but also compares them with each other and gives a detailed analysis that highlights the improvements made by each authors from the previous ones. The document concludes by giving an insight on the open problems that are yet to be solved.

see Project
Wine Quality Analysis using R

This project aims to analyze the quality of the Portuguese 'Vinho Verde' wine to build a model to predict the quality of the two variants(red and white) of the wine in terms of the selected variables in the best possible way.

see Project
LA City Webtraffic Forecasting

This project aims to give a detailed step-by-step analysis of the time series data, collected from the lacity.org website in an attempt to analyze its web traffic pattern. Here, we have come up with a descriptive analysis and a predictive model. You could find the detailed report of the project in my github repository, where we have addressed to the weekly cycle pattern and outliers.

see Project
E-commerce Predictive App

This is an ML project that classifies e-commerce products to 27 categories. The data includes categorical features, noisy text description and noisy images for each product. The python code is trained to make use of both the text and the images to accurately classify the products. It makes use of Recurrent Neural Net(RNN) with LSTM units to train the text description and ResNet model train the noisy images and finally ensemble learning techniques to combine the individual predictions. This model classifies the products with 94% accuracy. The data is available at https://www.kaggle.com/c/uw-cs480-fall20/data.

see Project