Unit 6 – Group Project
During Unit 6, we formed a team to work collaboratively on a real-world dataset related to Airbnb listings in New York City. The aim was to explore price determinants and patterns in listing behaviors through Exploratory Data Analysis (EDA), linear regression, and K-Means clustering.
The group jointly conducted:
- Preprocessing and cleaning of the dataset (removal of zero-price entries and outliers).
- EDA with violin plots, histograms, correlation matrices, and boxplots.
- Regression analysis to test if variables like reviews, location, and host activity impacted price.
- K-Means clustering to identify distinct market segments (Luxury, Mid-range, Budget).
I contributed specifically to the K-Means clustering section, which later became a key insight in the group's final report. The visual outputs and cluster profiling I produced were integrated into our submission.
Unit 11 – Individual Project
For Unit 11, I worked independently on a different problem: object recognition using Convolutional Neural Networks (CNNs) on the CIFAR-10 dataset. The focus shifted from unsupervised learning in business contexts to deep learning for image classification.
Key individual tasks included:
- Designing two CNN architectures (baseline and improved).
- Implementing regularization techniques such as dropout, batch normalization, data augmentation, and L2.
- Tracking learning curves, validating performance, and analyzing results through confusion matrices and F1-scores.
- Delivering a full pipeline including preprocessing, model training, tuning, and result interpretation.
Comparison and Evaluation
| Aspect | Unit 6 (Group) | Unit 11 (Individual) |
|---|---|---|
| Topic | Airbnb pricing and market segmentation | Image classification with CNNs |
| Dataset | Tabular data (Airbnb NYC) | Image data (CIFAR-10) |
| Model Type | Linear Regression, K-Means Clustering | Deep Neural Networks (CNNs) |
| Collaboration | Team-based (shared EDA, distributed model work) | Independent work |
| My Role | Clustering analysis and visuals using K-Means | Full design, training, and evaluation of CNN pipeline |
| Key Learning | Clustering logic, feature scaling, working in Jupyter teams | Deep learning concepts, architecture tuning, regularization |
| Output Format | Word report + shared Jupyter notebook | Presentation (PPTX) + annotated Jupyter notebook |
Reflection
This comparison highlights my growth in both collaborative and independent machine learning contexts. While Unit 6 developed my ability to work across roles, share tasks, and merge findings into one cohesive story, Unit 11 pushed me to take end-to-end ownership of a complex model pipeline.
I learned to adapt my thinking from business analysis to technical computer vision challenges. Although both projects were very different, they reinforced the importance of data understanding, iterative testing, and clear communication—whether with teammates or within my own workflow.