We didn’t realize it, but back in April, CMS made Medicare Part D prescription drug data available. The raw data is viewable on the CMS website here. This might be useful for estimating costs for oral medications, or at least getting a ballpark idea of cost to use as a placeholder in other analyses or economic models. Therefore, we took a closer look at the data.
The CMS file has prescribing information by physician; hence, it is over 2.5 gigabytes in size. Since we are not interested in this level of granularity for drug costing, we can reduce the size dramatically and focus on the unique drugs and their costs. Below is some R code to turn the large dataset into a small, usable dataset of drugs. We use the data.table package to read in the data, and combine across physicians quickly.
There are a few issues in using this data. First, the information is provided in terms of cost per day. So, drugs that are dosed more than once per day will require caution in estimating cost per dose (e.g., a twice-a-day drug that costs $2.00 per day will cost $1.00 per dose). Second, not all drugs are included – only those drugs that are not covered by Part B (generally physician administered drugs). However, Part B reimbursement information has been available for years in Medicare’s ASP data. Third, this is not a 100% of all Part D claims. The details of the methodology are given here. The costs in this file are the reimbursed amounts by Medicare.
The code below is also available on GitHub here. The download itself can take a few minutes since it is a 500 megabyte file (1.5 minutes on my computer with high-speed internet access). Make sure you save the resulting data.table (part_d) so you don’t have to repeat this every time.
library(data.table)
library(magrittr)
pde <- "http://download.cms.gov/Research-Statistics-Data-and-Systems/
Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/
PartD_Prescriber_PUF_NPI_DRUG_13.zip"
tf <- tempfile() download.file(pde, tf)
x <- unzip(tf, exdir = tempdir())
z <- fread(x
[2], verbose = TRUE)
unlink(x) # removes the tempfiles
rm(x)
setkey(z, "GENERIC_NAME")
partd <-
z[, .(
claims = sum(TOTAL_CLAIM_COUNT, na.rm = TRUE),
days_supp = sum(TOTAL_DAY_SUPPLY, na.rm = TRUE),
tot_cost = sum(TOTAL_DRUG_COST, na.rm = TRUE)
),
by = .(drug = GENERIC_NAME)] %>%
.[, drug := tolower(drug)] %>%
.[, cost_per_day := round(tot_cost / days_supp, 2)]
The resulting file has 1,574 drugs listed with 5 columns. This version is MUCH easier to read. Here are the first five and last five rows. Claims are the total number of claims across all physicians, and days_supp is the total number of days supplied across all physicians. Costs are in 2013 US dollars. Tot_cost is the sum across all physicians and cost_per_day is just the average cost per day (days_supp / tot_cost).
drug claims days_supp tot_cost cost_per_day
1: 304 7701 3925.40 0.51
2: 0.9 % sodium chloride 121038 576858 2323360.41 4.03
3: aa 4.25%/calcium/lytes/d5w 24 358 20545.38 57.39
4: aa 4.25%/calcium/lytes/dex 10% 119 272 16548.79 60.84
5: aa 5 %/calcium/lytes/dext 15 % 21 128 6049.62 47.26
---
1570: zoledronic acid/mannitol&water 673 38796 628571.49 16.20
1571: zolmitriptan 5421 90200 2395151.44 26.55
1572: zolpidem tartrate 11163599 373660293 118909355.59 0.32
1573: zonisamide 245861 8407562 14889877.10 1.77
1574: zoster vaccine live/pf 1006721 5709658 186948798.58 32.74
Now you can use the data.table %like% to search, or you can use the function grepl which returns a logical vector with “TRUE” for every match and consequently returns all matched (TRUE) rows in the data. Of course, you can write a function to do this as well.
For example, type partd[drug %like% "vastatin"]
or partd[grepl("vastatin", drug)]
and you will get the results below. (Keep in mind that I am using data.table syntax for my query, not base R syntax.)
drug claims days_supp tot_cost cost_per_day
1: amlodipine/atorvastatin 69955 3027012 1.512854e+07 5.00
2: atorvastatin calcium 26205342 1314072113 9.238067e+08 0.70
3: ezetimibe/atorvastatin calcium 458 15420 8.609675e+04 5.58
4: ezetimibe/simvastatin 916128 42560699 2.211880e+08 5.20
5: fluvastatin sodium 46067 1692716 7.812786e+06 4.62
6: lovastatin 5994849 325492593 9.438652e+07 0.29
7: niacin/lovastatin 9155 336725 2.189334e+06 6.50
8: niacin/simvastatin 37662 1449722 7.573038e+06 5.22
9: pitavastatin calcium 114822 4648186 2.050395e+07 4.41
10: pravastatin sodium 13489543 697418906 2.611930e+08 0.37
11: rosuvastatin calcium 8602219 387907070 2.089238e+09 5.39
12: simvastatin 36185092 1882153779 4.290968e+08 0.23
13: sitagliptin/simvastatin 878 28227 2.416610e+05 8.56