Machine LearningClassification

Spotify Playlist Reconstruction

Reconstructing deleted Spotify playlists using machine learning classification.

PythonPandasScikit-learn

Overview

This model was built as a part of my Machine Learning I course at KEDGE Business School. After accidentally losing playlist data from a shared family Spotify account, I developed a two-stage machine learning classification system to reconstruct the original playlists by predicting which user each song belonged to and what year it was added.

Problem Statement

A mixed family Spotify account had playlists from multiple users merged together. The challenge was to:

•Identify which songs belonged to which family member
•Predict the approximate year each song was added
•Reconstruct the original personalized playlists

Data

•Source: Spotify API (audio features, track metadata)
•Size: 3,500 labelled 100 unlabelled songs with 22 audio features each
•Features: Danceability, energy, valence, tempo, acousticness, etc.
•Labels: User (4 family members), Year added (2018-2024)

Approach

Two-Stage Classification

•
Stage 1 - User Prediction
- •Feature engineering from audio attributes
- •Tested multiple classifiers (Random Forest, XGBoost, SVM)
- •Best model: Random Forest with 97.28% accuracy
•
Stage 2 - Year Prediction
- •Time-based features combined with audio features
- •Gradient Boosting chosen for year prediction
- •Achieved 87.14% accuracy

Validation Strategy

•Stratified 5-fold cross-validation
•Classification report

Results & Impact

•User Classification: 97% accuracy
•Year Prediction: 87% accuracy
•Successfully reconstructed 4 personalized playlists

Key Learnings

•Audio features alone carry significant user preference signals
•Feature importance analysis revealed danceability and valence as top predictors
•Pipeline architecture enables easy extension to new users

Links

•GitHub Repository

Interested in this project?

Let's discuss how I can bring similar analytical skills to your team.

Get in Touch