How I Approach Commercial Pharma Analytics Using Real-World Patient Data

ankitmorajkar
Apr 16, 2025
5 min read

Updated: Sep 16, 2025

In the world of commercial pharmaceutical analytics, working with real-world data (RWD) is as challenging as it is fascinating. When we talk about “real-world patient-level data,” we’re referring to massive datasets that capture how therapies are actually used across millions of lives, conditions, and care settings. My work in commercial pharma analytics largely revolves around making sense of this data to solve specific business questions posed by pharma clients. Most of these clients hold positions of marketing, strategy and analytics managers. These questions often tie back to understanding how a therapy is performing in-market, how patients are responding, how physicians are prescribing, and what kind of commercial opportunities or challenges exist in that landscape.

Understanding the Business Context: Where It All Begins

Every project begins with alignment. This isn’t just about scoping the analysis, it’s about deeply understanding the client’s business context. We typically start by discussing the key questions they are trying to answer. Are they interested in understanding their new-to-brand patient share compared to competitors? Are they trying to evaluate persistence and compliance among patients? Or are they looking to quantify switching behavior across a therapeutic class? Each of these questions demands a different analytical lens.

Once we clarify the objective, we define the scope; this includes agreeing on the time frame for analysis, the drugs to be studied, and the indications of interest. Drug identification typically involves using NDC (National Drug Code) or HCPCS (Healthcare Common Procedure Coding System) codes, while indications are captured through ICD-10 diagnosis codes. All of this must be aligned upfront, along with timelines, based on the complexity of the metrics, team bandwidth, and how urgent the need is on the client’s end.

Choosing the Right Data Source

With a solid understanding of the client’s goals and a clear analytical scope, the next step is choosing the right data source. In most commercial projects, we rely on open claims data because it offers large-scale, de-identified patient-level claims from various sources, making it ideal for understanding market-level dynamics. However, the choice of data source is never one-size-fits-all. For diseases that are more prevalent in older populations — like prostate cancer, age-related macular degeneration, or Alzheimer’s — we often supplement with Medicare 100% data to ensure adequate representation. If the analysis needs clinical markers like biomarker status, lab results, or vital signs, then EHR data is the way to go. And if we want to track a patient’s entire longitudinal journey with minimal gaps, often useful in persistence or adherence studies, closed claims data, though limited in size, provides a high-fidelity view of each interaction.

Laying the Groundwork

Before jumping into the data pull, I always carve out time for desk research. This background work gives structure to the upcoming analysis. I’ll typically read up on the indication’s epidemiology like prevalence, incidence, and patient demographics. I look into the standard of care, the typical treatment journey, and the types of therapies available: are they oral pills, subcutaneous injections, or intravenous infusions? What’s the typical dosing frequency? Which specialists usually treat the disease? Is it oncologists, rheumatologists, neurologists? Are there lines of therapy defined by NCCN or other clinical guidelines? This context helps shape patient selection logic and ensures the analysis is grounded in clinical reality.

Patient and Treatment Identification

Once the prep is complete, we dive into the data. We begin by filtering eligible patients i.e. those diagnosed with the disease of interest within a specific timeframe. This is done using ICD-10 codes, and we may set criteria around the frequency of diagnosis or require confirmation across multiple dates to increase specificity.

For these diagnosed patients, we then pull in treatment claims. On the medical side, these are identified using HCPCS codes (often for infusions or injectables administered in clinics or hospitals), and on the pharmacy side, using NDC codes (capturing prescriptions filled at retail or specialty pharmacies). Importantly, we often apply a “Dx precedes Tx” rule — ensuring the diagnosis precedes the treatment — because many therapies are approved for multiple indications, and we want to accurately link treatment usage to the correct condition. A nuanced step in this process is assigning a primary physician to each patient. While some data sources offer this explicitly, others don’t — so we develop logic to assign physicians based on the recency and frequency of visits.

Another critical filter we apply is around patient activity. We typically only include patients who have at least one claim in each semester of each year covered in the analysis. This ensures we’re working with patients who are consistently captured in the dataset, which is essential for persistence and compliance studies. If a patient drops off the data due to insurance changes or data capture issues, we don’t want to misinterpret that as therapy discontinuation.

At this point, we have a robust cohort of patients diagnosed with the target condition, who received relevant treatments, and who are active in the dataset over time. This dataset becomes the foundation for a wide range of insights.

From Data to Strategy: How Analytics Informs Commercial Decisions

We can analyze the market landscape by looking at how many new patients started on each therapy during the study period, or how the total patient share for each brand evolved over time.
We can break down drug usage by patient demographics like age and gender, or understand how usage varies by prescriber specialty.
We often explore switching behavior to identify whether patients are moving between drugs within the same class or switching to an entirely different treatment modality and how quickly that switch occurs.
Another frequent area of analysis is the class of trade of healthcare organizations associated with prescribers and whether they are hospital-affiliated, group practices, or individual providers.
We also analyze the payer mix, slicing the data by commercial, Medicare, Medicaid, or other payer types using either the primary payer field or book of business indicators.
Finally, we get into persistence and compliance. Using methods like gap-based persistence or proportion of days covered (PDC), we quantify how long patients stay on therapy and how adherent they are.
These metrics are often used in internal performance benchmarking, payer discussions, or even promotional strategy refinement.

Final Thoughts: Real-World Data as a Strategic Asset

Bringing all this together, the goal is to convert raw data into clear, actionable stories. Whether it’s informing sales strategy, refining access models, or guiding future launch planning, commercial pharma analytics provides the evidence base behind many business-critical decisions. Real-world data has its limitations, there’s noise, missingness, and assumptions, but when used thoughtfully, it offers unparalleled visibility into how medicines are used in the real world. And for those of us working at the intersection of analytics, business, and healthcare, it’s both a challenge and a privilege to uncover those insights.