Projects

Table of Contents

My projects spanned across a variety of different domains and have allowed me to enhance my coding skills through use of a wide variety of different libraries.

Python projects I’ve completed showcase my skills for:

Machine Learning Models
Exploratory Data Analysis and Data Visualisation
Data Cleansing and Data Wrangling
Statistical Analysis
Object-Oriented Programming
Algorithims Analysis

Throughout this page I will explore with you some of the core projects that I completed within my Data Science MSc. However, not all of my projects are displayed here so, please click here to see more of my projects if you’re interested.

Machine Learning for Fraudulent Transaction Detection ⚙️ #

Deployed baseline supervised and unsupervised models (XGBoost, Random forest, logistic regression, K-means clustering) enabling fraud detection with an initial F1-Score of 50-90%(depending on the model used)
Compared the efficiency of models using F1-Scores, assisted by contemporary literature enhancing analysis quality by 13%
Optimised Random forest and XGBoost models via GridSearchCV, leading to £ savings for this potential businesses
Utilised Random Undersampling to balance class distribution, improving model training times by up to 90%
Performed data transformation using label encoding, leading to a 12% boost in fraud classification accuracy
Performed exploratory data analysis (EDA) to uncover class imbalance, allowing for the application of data balancing techniques
Conducted comparative analysis of different machine learning models, identifying the best-performing approach

Libraries utilised:

Pandas
Numpy
Sklearn
Seaborn
Matplotlib
Imblearn
XGBoost

Development environment:

Jupyter Notebook (Anaconda)

Health Data Analysis 🍎 #

Performed data cleansing and wrangling by addressing missing values, removing duplicates, and standardising formats, enhancing data quality by approximately 40%, which facilitated effective data visualisation and analysis
Performed exploratory data analysis (EDA) to examine dataset structure, uncovering critical trends to support feature selection
Utilised data transformation techniques, including feature engineering, to create new relevant variables, resulting in a 19% improvement in health data analysis
Utilised data visualisations to identify patterns and correlations to help select appropriate features for mathematical modelling
Deployed mathematical models to predict heart rate, identifying trends to support machine learning model development and contribute to £ value for our theoretical business
Utilised K-means clustering to analyse correlations between heart rate from high intensity vs low intensity activities, generating a 14% improvement in heart rate correlation analysis

Libraries utilised:

Pandas
Numpy
Matplotlib
Seaborn
Sklearn
SciPy

Development environment:

Jupyter Notebook (Anaconda)

Hippo Quest 👑 #

Implemented object-oriented programming (OOP) to structure the game using classes and objects, improving code maintainability and modularity by 20%
Deployed unit testing to minimise game errors and identify any bugs, reducing game errors by 15%
Created UML diagram to assist in game and code explanation for interested any parties
Utilised debugging to monitor breakpoints and handle fixes

Library utilised

Unittest

Development environment:

PyCharm