Clustering: Start of Project 4

I’m using the Tech Layoffs Dataset (2020–2024) from Kaggle, which tracks tech layoffs collected from public reports and websites. It includes company names, industries, locations, number of employees laid off, total company size, percentage of workforce cut, and the dates the layoffs happened. There’s also info about each company’s stage (like startup or public) and how much funding they raised. Basically, it’s a snapshot of how the tech world’s been shrinking since 2020.

I want to use clustering to see if there are natural groups in the data like whether certain types of companies, locations, or funding stages experienced similar layoffs. I’m curious if startups had bigger percentage cuts, or if there were clusters based on timing, like mass layoffs happening at the same time across similar industries.

The main benefit of doing this is understanding patterns behind tech layoffs without just relying on headlines. It could show which parts of the tech world were hit hardest or recovering the fastest, which might actually be useful for job seekers or analysts. The downside is that clustering can oversimplify things aka just because companies fall into the same group doesn’t mean they laid people off for the same reasons. Also, the dataset doesn’t include any info about the people affected, rehiring, or what happened afterward, so it’s kind of one-sided but I think that is ethical because of privacy. Still, it’s a good starting point for exploring how unstable the tech industry has been lately.