Project 1: Do Gun Laws have anything to do with School Safety?
School shootings are a deeply personal and pressing issue for me. I chose this dataset because I want to move beyond fear and look for patterns that might help us understand the “why” behind these tragedies. The dataset, created by David Reidan (K–12 School Shooting Database), includes detailed information on incidents, shooters, victims, weapons, and more.
In this project, I focus on exploring whether gun laws have anything to do with school shootings. To do this, I walk through loading and exploring the data, visualizing patterns, and then normalizing the results using state size and state gun law strictness.
The main thing I have to critique about the dataset is that I didn’t create it myself, so if there is an error or something was recorded incorrectly, I won’t know about it unless it’s obvious. This matters because errors can affect the inferences and assumptions I make about the data.
Loading the Data
import pandas as pd
incident_shooting_data_df = pd.read_excel("Public v4.1 K-12 School Shooting Database (8 28 2025).xlsx", sheet_name = 'Incident')
incident_shooting_data_df.head()
incident_shooting_data_df.describe()
Processing the Data
Before creating any visualizations, I had to process and prepare the data so that the comparisons would be fair and meaningful.
- Standardizing states: Some sources used state abbreviations (e.g., “CA”), while others used full names (e.g., “California”). I created a mapping dictionary to align them across datasets.
- Handling missing values: Weapon type data often included “No Data” or “Unknown.” Instead of dropping these rows, I kept them as categories to show where gaps exist.
- Merging external sources:
- County landmass data → summed by state to calculate incidents per 1,000 square miles, normalizing for state size.
- Giffords Law Center scorecard → scraped annual gun law grades (A–F) to compare states by law strictness.
- Feature engineering: Added “Incidents per 1,000 sq. mi.” as a normalized variable and color-coded states by Giffords grade for visual clarity.
Limitations: the dataset includes all shootings on school grounds (targeted, spontaneous, gang-related, or accidental). Normalizing by land area doesn’t account for population density or socioeconomic factors. Still, these steps allow for meaningful comparisons between states and their laws.
Step 1. Seeing the data
To understand the distribution of school shootings across the U.S., I first created a bar chart of total incidents by state. This highlights which states appear most often in the dataset and gives a sense of raw frequency.
However, frequency alone can be misleading, since larger states naturally report more incidents. That’s why I normalized by state size in the next step.
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
state_incident = incident_shooting_data_df['State'].value_counts().reset_index()
state_incident.columns = ["State", "Frequency"]
plt.figure(figsize=(15, 6))
sns.barplot(x="State", y="Frequency", data=state_incident)
plt.title("Frequency of Incidents Per State")
plt.show()

Step 2: Using Square Miles
The first chart showed California and Texas far ahead in raw counts. This wasn’t surprising given their size and population. To check whether those states are actually more dangerous or just bigger, I normalized incidents by land area.
This revealed a different picture: smaller states like Delaware and Maryland rose to the top once size was factored in, showing that raw counts alone don’t tell the full story.
county_landmass_df = pd.read_csv("county_landmass.csv")
state_sq_mi_total = (county_landmass_df.groupby("state", as_index=False)["sq_mi"].sum())
state_sq_mi_total.columns = ["State", "Total_Square_Miles"]
# Had AI write this to make it easy
us_state_abbrev = {
'AL': 'Alabama',
'AK': 'Alaska',
'AZ': 'Arizona',
'AR': 'Arkansas',
'CA': 'California',
'CO': 'Colorado',
'CT': 'Connecticut',
'DE': 'Delaware',
'FL': 'Florida',
'GA': 'Georgia',
'HI': 'Hawaii',
'ID': 'Idaho',
'IL': 'Illinois',
'IN': 'Indiana',
'IA': 'Iowa',
'KS': 'Kansas',
'KY': 'Kentucky',
'LA': 'Louisiana',
'ME': 'Maine',
'MD': 'Maryland',
'MA': 'Massachusetts',
'MI': 'Michigan',
'MN': 'Minnesota',
'MS': 'Mississippi',
'MO': 'Missouri',
'MT': 'Montana',
'NE': 'Nebraska',
'NV': 'Nevada',
'NH': 'New Hampshire',
'NJ': 'New Jersey',
'NM': 'New Mexico',
'NY': 'New York',
'NC': 'North Carolina',
'ND': 'North Dakota',
'OH': 'Ohio',
'OK': 'Oklahoma',
'OR': 'Oregon',
'PA': 'Pennsylvania',
'RI': 'Rhode Island',
'SC': 'South Carolina',
'SD': 'South Dakota',
'TN': 'Tennessee',
'TX': 'Texas',
'UT': 'Utah',
'VT': 'Vermont',
'VA': 'Virginia',
'WA': 'Washington',
'WV': 'West Virginia',
'WI': 'Wisconsin',
'WY': 'Wyoming',
'DC': 'District of Columbia'
}
state_abbrev = {v: k for k, v in us_state_abbrev.items()}
state_sq_mi_total["State"] = state_sq_mi_total["State"].map(state_abbrev)
merged_summary = pd.merge(
state_incident,
state_sq_mi_total,
on="State",
how="inner"
)
merged_summary["Incidents_per_1000_sqmi"] = (
merged_summary["Frequency"] / merged_summary["Total_Square_Miles"] * 1000
)
plot_df = merged_summary.sort_values("Incidents_per_1000_sqmi", ascending=False)
plt.figure(figsize=(15, 6))
sns.barplot(
x="State",
y="Incidents_per_1000_sqmi",
data=plot_df,
)
plt.xticks(rotation=90)
plt.title("School Shooting Incidents per 1,000 Square Miles by State")
plt.xlabel("State")
plt.ylabel("Incidents per 1,000 sq. mi")
plt.show()

2.a Web Scraping Guildford Ranking
Next, I wanted to see if stricter or looser gun laws correlated with these incident rates. To do this, I scraped the Giffords Law Center Scorecard, which grades each state from A (strict laws) to F (weak laws).
By merging these grades with my normalized dataset and coloring the chart by law grade, I could visually compare incident rates against gun law strength.
The results suggest that states with stronger gun laws (grades A or B) often had fewer school shootings per square mile, while states with weaker laws (grades D or F) tended to have more.
(Chart: Incidents Per 1,000 Square Miles Colored by Giffords Grade)
This doesn’t prove causation, but it points to a possible link between legislation and outcomes, echoing findings in peer-reviewed studies.
url = "https://giffords.org/lawcenter/resources/scorecard/"
tables = pd.read_html(url)
scorecard_df = None
for t in tables:
if "Grade" in t.columns or "Rank" in t.columns:
scorecard_df = t
break
giffords_df = scorecard_df[["State", "Grade"]]
state_abbrev = {v: k for k, v in us_state_abbrev.items()}
giffords_df["State"] = giffords_df["State"].map(state_abbrev)
comparison_df = pd.merge(
merged_summary,
giffords_df,
on="State",
how="inner"
)
import matplotlib.patches as mpatches
plot_df = comparison_df.sort_values("Incidents_per_1000_sqmi", ascending=False)
grade_colors = {
"A": "green", "A-": "limegreen",
"B+": "yellowgreen", "B": "greenyellow", "B-": "khaki",
"C+": "gold", "C": "orange", "C-": "darkorange",
"D+": "orangered", "D": "red", "D-": "firebrick",
"F": "darkred"
}
plot_df["Color"] = plot_df["Grade"].map(grade_colors)
plt.figure(figsize=(15,6))
sns.barplot(
x="State",
y="Incidents_per_1000_sqmi",
data=plot_df,
palette=plot_df["Color"]
)
plt.xticks(rotation=90)
plt.title("School Shooting Incidents per 1,000 Square Miles by State\n(Colored by Giffords Grade)")
plt.xlabel("State")
plt.ylabel("Incidents per 1,000 sq. mi")
legend_patches = [mpatches.Patch(color=color, label=grade) for grade, color in grade_colors.items()]
plt.legend(handles=legend_patches, title="Giffords Grade", bbox_to_anchor=(1.02, 1), loc="upper left")
plt.show()

Step 3: Looking Deeper
To explore further, I looked at weapons used and violence type.
- Weapons: Handguns overwhelmingly dominated the dataset, far more than rifles or shotguns. Since handguns are also the easiest to access and carry, this fits with arguments that handgun regulations and safe storage laws may have the biggest impact.
- Violence type: Surprisingly, “Spontaneous” incidents outnumbered “Targeted” ones. This reminded me that the dataset includes all shootings on school grounds, not just pre-planned attacks. For prevention, targeted shootings may be easier to stop with red flag laws or safe storage, since attackers often plan ahead, whereas spontaneous ones are harder to predict.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
incident_weapon_data_df = pd.read_excel("Public v4.1 K-12 School Shooting Database (8 28 2025).xlsx", sheet_name = 'Weapon')
plt.figure(figsize=(15, 6))
# Count occurrences of each weapon type
weapon_counts = incident_weapon_data_df["Weapon_Type"].value_counts().reset_index()
weapon_counts.columns = ["Weapon_Type", "Count"]
# Plot bar chart
sns.barplot(
x="Weapon_Type",
y="Count",
data=weapon_counts
)
plt.xticks(rotation=90)
plt.title("School Shooting Weapon Count and Type")
plt.xlabel("Weapon Type")
plt.ylabel("Count")
plt.show()

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
incident_shooting_data_df = pd.read_excel("Public v4.1 K-12 School Shooting Database (8 28 2025).xlsx", sheet_name = 'Incident')
plt.figure(figsize=(15, 6))
# Count occurrences of each weapon type
gv_counts = incident_shooting_data_df["GV_Type"].value_counts().reset_index()
gv_counts.columns = ["GV_Type", "Count"]
# Plot bar chart
sns.barplot(
x="GV_Type",
y="Count",
data=gv_counts
)
plt.xticks(rotation=90)
plt.title("School Shooting Gun Violence Type")
plt.xlabel("Gun Violence Type")
plt.ylabel("Count")
plt.show()

4. Conclusion
So, do gun laws have anything to do with school shootings? Based on this analysis, the answer appears to be yes. States with stronger gun laws (as measured by Giffords grades) often had fewer school shooting incidents per square mile, and research supports that laws like child-access prevention (safe storage), background checks, and red flag laws reduce youth access to firearms.
Of course, gun laws are only part of the picture. Factors like population density, income, and local culture also matter, and further analysis would need to control for those. But this project suggests that legislation plays a measurable role in shaping the risk of school shootings.
Moving forward, I would like to expand this work by examining where weapons were obtained and incorporating socioeconomic data at the county level. This would allow for a deeper understanding of not just how often school shootings happen, but why they happen more in some places than others.
References:
- https://k12ssdb.org/methodology-1
- https://giffords.org/lawcenter/resources/scorecard/
Find code repository here: https://github.com/aborland123/school-shooting-project
