Part D Drug Costs

Home/R, Reimbursement/Part D Drug Costs

Part D Drug Costs

We didn’t realize it, but back in April, CMS made Medicare Part D prescription drug data available. The raw data is viewable on the CMS website here. This might be useful for estimating costs for oral medications, or at least getting a ballpark idea of cost to use as a placeholder in other analyses or economic models. Therefore, we took a closer look at the data.

The CMS file has prescribing information by physician; hence, it is over 2.5 gigabytes in size. Since we are not interested in this level of granularity for drug costing, we can reduce the size dramatically and focus on the unique drugs and their costs. Below is some R code to turn the large dataset into a small, usable dataset of drugs. We use the data.table package to read in the data, and combine across physicians quickly.

There are a few issues in using this data. First, the information is provided in terms of cost per day. So, drugs that are dosed more than once per day will require caution in estimating cost per dose (e.g., a twice-a-day drug that costs $2.00 per day will cost $1.00 per dose). Second, not all drugs are included – only those drugs that are not covered by Part B (generally physician administered drugs). However, Part B reimbursement information has been available for years in Medicare’s ASP data. Third, this is not a 100% of all Part D claims. The details of the methodology are given here. The costs in this file are the reimbursed amounts by Medicare.

The code below is also available on GitHub here. The download itself can take a few minutes since it is a 500 megabyte file (1.5 minutes on my computer with high-speed internet access). Make sure you save the resulting data.table (part_d) so you don’t have to repeat this every time.

library(data.table) 
library(magrittr) 

pde <- "http://download.cms.gov/Research-Statistics-Data-and-Systems/
Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/
PartD_Prescriber_PUF_NPI_DRUG_13.zip" 
tf <- tempfile() download.file(pde, tf) 
x <- unzip(tf, exdir = tempdir()) 
z <- fread(x
[2], verbose = TRUE) unlink(x) # removes the tempfiles rm(x) setkey(z, "GENERIC_NAME") partd <- z[, .( claims = sum(TOTAL_CLAIM_COUNT, na.rm = TRUE), days_supp = sum(TOTAL_DAY_SUPPLY, na.rm = TRUE), tot_cost = sum(TOTAL_DRUG_COST, na.rm = TRUE) ), by = .(drug = GENERIC_NAME)] %>% .[, drug := tolower(drug)] %>% .[, cost_per_day := round(tot_cost / days_supp, 2)]

The resulting file has 1,574 drugs listed with 5 columns. This version is MUCH easier to read. Here are the first five and last five rows. Claims are the total number of claims across all physicians, and days_supp is the total number of days supplied across all physicians. Costs are in 2013 US dollars. Tot_cost is the sum across all physicians and cost_per_day is just the average cost per day (days_supp / tot_cost).

                                drug   claims days_supp     tot_cost cost_per_day
   1:                                     304      7701      3925.40         0.51
   2:          0.9 % sodium chloride   121038    576858   2323360.41         4.03
   3:     aa 4.25%/calcium/lytes/d5w       24       358     20545.38        57.39
   4: aa 4.25%/calcium/lytes/dex 10%      119       272     16548.79        60.84
   5: aa 5 %/calcium/lytes/dext 15 %       21       128      6049.62        47.26
  ---                                                                            
1570: zoledronic acid/mannitol&water      673     38796    628571.49        16.20
1571:                   zolmitriptan     5421     90200   2395151.44        26.55
1572:              zolpidem tartrate 11163599 373660293 118909355.59         0.32
1573:                     zonisamide   245861   8407562  14889877.10         1.77
1574:         zoster vaccine live/pf  1006721   5709658 186948798.58        32.74

Now you can use the data.table %like% to search, or you can use the function grepl which returns a logical vector with “TRUE” for every match and consequently returns all matched (TRUE) rows in the data. Of course, you can write a function to do this as well.

For example, type partd[drug %like% "vastatin"] or partd[grepl("vastatin", drug)] and you will get the results below. (Keep in mind that I am using data.table syntax for my query, not base R syntax.)

                              drug   claims  days_supp     tot_cost cost_per_day
 1:        amlodipine/atorvastatin    69955    3027012 1.512854e+07         5.00
 2:           atorvastatin calcium 26205342 1314072113 9.238067e+08         0.70
 3: ezetimibe/atorvastatin calcium      458      15420 8.609675e+04         5.58
 4:          ezetimibe/simvastatin   916128   42560699 2.211880e+08         5.20
 5:             fluvastatin sodium    46067    1692716 7.812786e+06         4.62
 6:                     lovastatin  5994849  325492593 9.438652e+07         0.29
 7:              niacin/lovastatin     9155     336725 2.189334e+06         6.50
 8:             niacin/simvastatin    37662    1449722 7.573038e+06         5.22
 9:           pitavastatin calcium   114822    4648186 2.050395e+07         4.41
10:             pravastatin sodium 13489543  697418906 2.611930e+08         0.37
11:           rosuvastatin calcium  8602219  387907070 2.089238e+09         5.39
12:                    simvastatin 36185092 1882153779 4.290968e+08         0.23
13:        sitagliptin/simvastatin      878      28227 2.416610e+05         8.56
By | 2017-05-18T19:11:44+00:00 October 8th, 2015|R, Reimbursement|0 Comments

Share This Story, Choose Your Platform!

About the Author: