Deaths per firearm violence event
At a glance:
A negative binomial model isn't adequate for modelling the number of people killed per firearm incident in the USA; the real data has more events of one death, and also more extreme values, than the model. But estimating the model was an interesting exercise in fitting a single negative binomial model to two truncated subsets of data.
Caution - this blog post discusses homicide and suicide statistics and could be upsetting. If you feel disturbed or unhappy then here is a list of ways to get some emergency counselling helparound the world.
A grim research question
In my last post I looked at fitting a Poisson distribution to a set of count data that had been truncated so that observations below a certain threshold were not available. My reason for doing that was as a small step forward to today’s task, which is to model the number of deaths per “firearm violence incident” in the United States of America.
My original research question was “what is the distribution of number of fatalities per incident in USA firearm-related violence?”. The motivation was to better understand how material mass shooting or multiple-shooting events are in a broader context, and what proportion of violence is in smaller events than those that get the headlines.
Of course, there’s a lot written on this topic already. For example, Five Thirty Eight addressed the question in Maggie Koerth-Baker’s piece Mass shootings are a bad way to understand gun violence. Some of the important points made there:
- Mass shootings are rare despite the publicity they get; and the people doing them are different from most firearm violence perpetrators
- Two-thirds of firearm deaths are suicides (which may or may not be part of an incident involving other injuries and fatalities)
- Most firearm homicide victims are men between the ages of 15 and 34, and two-thirds are black; again, the profile of most firearm homicide victims differs from that of mass shootings (where 50% of victims are women, and around have of mass shootings involve domestic or family violence)
Putting aside the natural interest, importance and indeed horror that firearm violence in the USA attracts around the world, my question was more about the number of deaths as interestingextreme values. Most firearm incidents have few if any deaths, but some have very many; how can this be statistically modelled?
Some final bits of context. I wrote about violent deaths as a percentage of population in an earlier post. The USA has more homicides per capita than other rich countries, although less than the front runners Colombia, Brazil and Russia. Here’s a key graphic from that post:
On the particular issue of suicides that make up so many firearm-related deaths, here is a new chart along similar lines to the assaults graphic from that earlier post:
Note that I’ve chosen to keep the vertical scales “fixed” this time, to give a better sense of comparison across countries. Also that the suicide numbers are higher for many countries than they are for deaths from assault, but less variable (both between countries, and over time), which is why fixed vertical scales is more of an option here.
Like the earlier post, this is based on the OECD’s synthesis of data from national governments. I can’t vouch for the data’s comparability across countries - there are some definite surprises here - other than to say that I’m sure every possible effort has been made by the various national statistical offices to get it right. Here’s the code to download and draw the chart of suicide rates:
library(tidyverse)
library(scales)
library(readr)
library(lubridate)
library(ggrepel)
library(viridis)
library(rstan)
library(rsdmx)
library(ISOcodes)
#----------contextual data on suicides---------------
url <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/HEALTH_STAT/CICDHARM.TXCMFETF+TXCMHOTH+TXCMILTX.AUS+AUT+BEL+CAN+CHL+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ISR+ITA+JPN+KOR+LVA+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA+NMEC+BRA+CHN+COL+CRI+IND+IDN+LTU+RUS+ZAF/all?startTime=2000&endTime=2020"
# Deaths per 100,000 population, standardised only
if(!exists("suic_sdmx")){
dataset <- readSDMX(url)
suic_sdmx <- as.data.frame(dataset)
}
data(ISO_3166_1)
suicide <- suic_sdmx %>%
mutate(variable = ifelse(UNIT == "TXCMILTX", "All", ""),
variable = ifelse(UNIT == "TXCMHOTH", "Males", variable),
variable = ifelse(UNIT == "TXCMFETF", "Females", variable)) %>%
rename(year = obsTime, value = obsValue) %>%
select(year, COU, value, variable) %>%
left_join(distinct(ISO_3166_1[ , c("Alpha_3", "Alpha_2", "Name")]), by = c("COU" = "Alpha_3")) %>%
mutate(year = as.numeric(year),
country = Name)