The most recent Morbidity and Mortality Weekly Report, dated May 2, 2014, from the Centers for Disease Control and Prevention had a report by Yoon et al. (2014) on potentially preventable deaths from 5 leading causes of death for people under the age of 80. In this post, I use interactive bar charts and choropleths to help visualize state-wise statistics. For these charts, I use googleVis and RStudio's shiny server platform. This post was generated using slidify and the code necessary to recreate it can be found on github. The code for the accompanying shiny app can also be found on github.
The report mentions that in 2010, the top 5 causes of death - diseases of the heart, cancer, chronic lower respiratory disease, cerebrovascular diseases (stroke), and unintentional injuries accounted for approximately 63% of all deaths. For the purposes of their report, they used mortality data from the National Vital Statistics System for 2008-2010. Please read their report for caveats associated with the data as well as the assumptions underlying the procedures used. Implications are also discussed in the report and the discussion section of the report is really worth a read.
This section of the R code retrieves data from CDC's report.
library(XML)
URL = "http://www.cdc.gov/mmwr/preview/mmwrhtml/mm6317a1.htm?s_cid=mm6317a1_w"
table = readHTMLTable(URL)
statewise = table[[1]] # first of two tables on that page
Let's clean the dataset by doing the following.
Let's also check the structure of the data.
colnames(statewise) = c("State", "HeartDiseasesObserved", "HeartDiseasesExpected",
"HeartDiseasesPreventable", "CancerDiseasesObserved", "CancerDiseasesExpected",
"CancerDiseasesPreventable", "ChroniclowerrespiratoryDiseasesObserved",
"ChroniclowerrespiratoryDiseasesExpected", "ChroniclowerrespiratoryDiseasesPreventable",
"CerebrovascularDiseasesObserved", "CerebrovascularDiseasesExpected", "CerebrovascularDiseasesPreventable",
"UnintentionalinjuriesObserved", "UnintentionalinjuriesExpected", "UnintentionalinjuriesPreventable")
statewise = statewise[-(1:3), ]
statewise = statewise[-(52:53), ]
str(statewise)
## 'data.frame': 51 obs. of 16 variables:
## $ State : Factor w/ 56 levels "Abbreviation: DC = District of Columbia.\r\n\t\t\t\t\t\t\t\t*\tExpected deaths are the lowest three-state average age-specific "| __truncated__,..: 2 3 4 5 6 7 8 11 9 12 ...
## $ HeartDiseasesObserved : Factor w/ 54 levels "1,007","1,080",..: 43 31 29 24 22 20 17 49 46 12 ...
## $ HeartDiseasesExpected : Factor w/ 54 levels "1,063","1,194",..: 22 33 29 8 14 20 15 42 32 12 ...
## $ HeartDiseasesPreventable : Factor w/ 54 levels "0","1,089","1,092",..: 29 10 49 6 39 9 32 26 40 35 ...
## $ CancerDiseasesObserved : Factor w/ 54 levels "1,054","1,304",..: 44 45 41 30 27 32 29 3 46 23 ...
## $ CancerDiseasesExpected : Factor w/ 54 levels "1,006","1,112",..: 33 39 43 22 27 31 24 1 37 20 ...
## $ CancerDiseasesPreventable : Factor w/ 51 levels "0","1,059","1,126",..: 21 14 44 7 30 16 41 35 18 40 ...
## $ ChroniclowerrespiratoryDiseasesObserved : Factor w/ 53 levels "1,016","1,035",..: 15 17 12 3 45 7 42 29 47 41 ...
## $ ChroniclowerrespiratoryDiseasesExpected : Factor w/ 53 levels "1,004","1,148",..: 42 43 1 32 28 38 34 12 44 24 ...
## $ ChroniclowerrespiratoryDiseasesPreventable: Factor w/ 51 levels "0","1,013","1,117",..: 2 30 39 42 4 35 1 47 1 12 ...
## $ CerebrovascularDiseasesObserved : Factor w/ 52 levels "1,003","1,119",..: 5 49 45 41 36 38 31 18 12 28 ...
## $ CerebrovascularDiseasesExpected : Factor w/ 52 levels "1,015","1,108",..: 34 37 46 26 21 32 28 7 36 16 ...
## $ CerebrovascularDiseasesPreventable : Factor w/ 49 levels "1,527","1,783",..: 36 12 39 16 1 44 27 31 21 43 ...
## $ UnintentionalinjuriesObserved : Factor w/ 52 levels "1,010","1,013",..: 21 34 25 6 47 10 49 29 18 44 ...
## $ UnintentionalinjuriesExpected : Factor w/ 52 levels "1,074","1,093",..: 49 17 4 38 42 50 43 19 14 29 ...
## $ UnintentionalinjuriesPreventable : Factor w/ 51 levels "0","1,027","1,054",..: 4 23 5 46 12 39 25 16 38 30 ...
Let's change columns for numbers from factor variables to numeric variables and view the data using googleVis's table. Entries can be sorted in this table by clicking on the header for a column.
for (i in 2:16){statewise[, i] = as.character(statewise[,i])}
for (i in 2:16){statewise[, i] = gsub(",","",statewise[,i])}
for (i in 2:16){statewise[, i] = as.numeric(statewise[,i])}
library(googleVis)
plot(gvisTable(statewise,options=list(height=400, width=800)))
For each type of disease, we do the following. Instead of dealing with raw numbers of potential deaths preventable, we compute the percentage of potential deaths preventable among the number of deaths observed. We then also compute the average percentage of potential deaths preventable among the 5 categories of diseases.
statewise$PercentageHeartDiseasesPreventable = round(statewise$HeartDiseasesPreventable *
100/statewise$HeartDiseasesObserved, 2)
statewise$PercentageCancerDiseasesPreventable = round(statewise$CancerDiseasesPreventable *
100/statewise$CancerDiseasesObserved, 2)
statewise$PercentageChroniclowerrespiratoryDiseasesPreventable = round(statewise$ChroniclowerrespiratoryDiseasesPreventable *
100/statewise$ChroniclowerrespiratoryDiseasesObserved, 2)
statewise$PercentageCerebrovascularDiseasesPreventable = round(statewise$CerebrovascularDiseasesPreventable *
100/statewise$CerebrovascularDiseasesObserved, 2)
statewise$PercentageUnintentionalinjuriesPreventable = round(statewise$UnintentionalinjuriesPreventable *
100/statewise$UnintentionalinjuriesObserved, 2)
statewise$PercentageAveragePreventableDeaths = round((statewise$PercentageHeartDiseasesPreventable +
statewise$PercentageCancerDiseasesPreventable + statewise$PercentageChroniclowerrespiratoryDiseasesPreventable +
statewise$PercentageCerebrovascularDiseasesPreventable + statewise$PercentageUnintentionalinjuriesPreventable)/5,
2)
save(statewise, file = "statewise.Rda")
Let's now start plotting bar charts and choropleths using googleVis within the shiny server environment. This application is hosted by RStudio in their shinyapps.io server. Before we do that, we make the following modifications to the dataset.
library(reshape2)
statewisemelt = melt(statewise, id = "State")
statewisemelt$variable = factor(statewisemelt$variable, levels(statewisemelt$variable)[c(21,
16:20, 1:15)])
save(statewisemelt, file = "statewisemelt.Rda")
As mentioned previously, this application is hosted on R-Studio's shinyapps.io platform. As mentioned at the beginning of the post, the code necessary to recreate the post can be found on github. The code for the shiny app below can also be found on github. You can hover over either the bars of the barchart or over the map to get the corresponding values. A quick update: If this app doesn't show up, an alternate app hosted on R-Studio's glimmer server can be found here. The code of that alternative glimmer app can be found here.