Projects
Table of Contents
My projects spanned across a variety of different domains and have allowed me to enhance my coding skills through use of a wide variety of different libraries.
Python projects I’ve completed showcase my skills for:
- Machine Learning Models
- Exploratory Data Analysis and Data Visualisation
- Data Cleansing and Data Wrangling
- Statistical Analysis
- Object-Oriented Programming
- Algorithims Analysis
Throughout this page I will explore with you some of the core projects that I completed within my Data Science MSc. However, not all of my projects are displayed here so, please click here to see more of my projects if you’re interested.
Machine Learning for Fraudulent Transaction Detection ⚙️ #
- Deployed baseline supervised and unsupervised models (XGBoost, Random forest, logistic regression, K-means clustering) enabling fraud detection with an initial F1-Score of 50-90%(depending on the model used)
- Compared the efficiency of models using F1-Scores, assisted by contemporary literature enhancing analysis quality by 13%
- Optimised Random forest and XGBoost models via GridSearchCV, leading to £ savings for this potential businesses
- Utilised Random Undersampling to balance class distribution, improving model training times by up to 90%
- Performed data transformation using label encoding, leading to a 12% boost in fraud classification accuracy
- Performed exploratory data analysis (EDA) to uncover class imbalance, allowing for the application of data balancing techniques
- Conducted comparative analysis of different machine learning models, identifying the best-performing approach
Libraries utilised:
- Pandas
- Numpy
- Sklearn
- Seaborn
- Matplotlib
- Imblearn
- XGBoost
Development environment:
- Jupyter Notebook (Anaconda)
Health Data Analysis 🍎 #
- Performed data cleansing and wrangling by addressing missing values, removing duplicates, and standardising formats, enhancing data quality by approximately 40%, which facilitated effective data visualisation and analysis
- Performed exploratory data analysis (EDA) to examine dataset structure, uncovering critical trends to support feature selection
- Utilised data transformation techniques, including feature engineering, to create new relevant variables, resulting in a 19% improvement in health data analysis
- Utilised data visualisations to identify patterns and correlations to help select appropriate features for mathematical modelling
- Deployed mathematical models to predict heart rate, identifying trends to support machine learning model development and contribute to £ value for our theoretical business
- Utilised K-means clustering to analyse correlations between heart rate from high intensity vs low intensity activities, generating a 14% improvement in heart rate correlation analysis
Libraries utilised:
- Pandas
- Numpy
- Matplotlib
- Seaborn
- Sklearn
- SciPy
Development environment:
- Jupyter Notebook (Anaconda)
Hippo Quest 👑 #
- Implemented object-oriented programming (OOP) to structure the game using classes and objects, improving code maintainability and modularity by 20%
- Deployed unit testing to minimise game errors and identify any bugs, reducing game errors by 15%
- Created UML diagram to assist in game and code explanation for interested any parties
- Utilised debugging to monitor breakpoints and handle fixes
Library utilised
- Unittest
Development environment:
- PyCharm