This project investigates temporal purchasing behavior within a retail bakery environment using transactional sales data. The objective is to identify patterns in customer demand across time and evaluate how these patterns can inform operational decision-making.
The analysis provides insight into how temporal dynamics influence retail performance and demonstrates how data-driven approaches can support optimization of:
| Category | Techniques |
|---|---|
| Data Preprocessing | Cleaning, type conversion, missing value handling |
| Feature Engineering | Time-based feature extraction (hourly segmentation) |
| Exploratory Analysis | Distribution analysis, correlation matrices, trend visualization |
| Statistical Testing | Relationship analysis between price, quantity, and time |
| Clustering | K-Means (manual implementation + scikit-learn) |
| Classification | K-NN (manual implementation + scikit-learn) |
| Dimensionality Reduction | Principal Component Analysis (PCA) |
transaction-pattern-analysis/
├── transaction_pattern_analysis.ipynb # Main analysis notebook
├── Bakery sales.csv # Raw dataset (234,005 transactions)
├── Bakery_Sales1.json.zip # JSON export
└── README.md # Project documentation
The dataset contains 234,005 transactional records from a French retail bakery spanning from January 2021 to September 2022.
| Field | Description |
|---|---|
date |
Transaction date |
time |
Transaction time (HH:MM) |
ticket_number |
Unique transaction identifier |
article |
Product name |
Quantity |
Number of items purchased |
unit_price |
Price per unit (€) |
Source: Kaggle - French Bakery Daily Sales
Note: Data cleaning is performed directly within the notebook to ensure transparency and reproducibility.
The analysis reveals clear peak demand periods throughout the day, enabling targeted staffing and inventory decisions.
Peak Hours: Morning rush (8-10 AM) and afternoon (12-2 PM)
K-Means clustering segments transactions into distinct behavioral groups based on quantity and pricing patterns.
Principal Component Analysis reduces dimensionality while preserving variance, revealing underlying structure in transaction data.
git clone https://github.com/anaya33/transaction-pattern-analysis.git
cd transaction-pattern-analysis
pip install pandas numpy matplotlib seaborn scikit-learn
jupyter notebook transaction_pattern_analysis.ipynb
transaction_pattern_analysis.ipynbBakery sales.csv or connect via GitHubContributions are welcome! Here’s how you can help:
git checkout -b feature/your-feature-name
git commit -m "Add: description of your changes"
git push origin feature/your-feature-name
| Difficulty | Task |
|---|---|
| Easy | Add more visualizations (box plots, violin plots) |
| Easy | Improve code comments and documentation |
| Medium | Implement DBSCAN or hierarchical clustering |
| Medium | Add time series forecasting (ARIMA, Prophet) |
| Medium | Create an interactive dashboard (Plotly/Dash) |
| Advanced | Build a recommendation system for products |
| Advanced | Deploy as a web application |
This project is licensed under the MIT License — see the LICENSE file for details.