An Analysis of Payer Machine-Readable Files (MRFs) and the Challenges of Healthcare Price Transparency
A technical deep-dive into the structural challenges of payer MRFs: file size, schema inconsistencies, provider matching, and why processing them at scale requires specialized infrastructure.
When the Transparency in Coverage rule first required health insurers to publish machine-readable files of their negotiated rates, the healthcare industry anticipated a new era of pricing transparency. What nobody fully anticipated was the sheer scale and complexity of the data problem these files would create.
The Scale Problem
The numbers are staggering. A single national payer’s Transparency in Coverage MRF can exceed 500 gigabytes in compressed JSON format. Expanded, some payers’ complete rate files approach multiple terabytes. There are currently over 700 distinct payer entities required to publish MRFs, with new files published monthly.
To put this in perspective: if you wanted to download and process every payer’s MRF for a single month, you would be working with a volume of raw data that overwhelms conventional tooling. No off-the-shelf database or ETL pipeline is designed to handle this.
Structural Inconsistencies Across Payers
Beyond scale, payer MRFs vary dramatically in structure and quality, despite sharing a nominally common schema. Our analysis of MRF files from the 50 largest U.S. payers reveals:
Schema Version Fragmentation
CMS has released multiple schema versions since the TiC rule took effect. However, compliance with schema updates has been inconsistent. As of Q1 2025, approximately 31% of payer MRFs we monitor are still published in deprecated schema versions, and 8% use proprietary schema extensions that break standard parsers.
Provider Identifier Inconsistencies
The TiC MRF schema requires providers to be identified by NPI (National Provider Identifier). However, our analysis finds that:
- 14% of NPI references in payer MRFs do not match active NPPES records
- 22% of group billing NPIs resolve to entities with no valid individual provider associations
- ~5% of records use TIN (Tax Identification Number) where NPI is required
This makes provider-level rate lookups unreliable without an external enrichment layer.
Rate Record Duplication
Payers often publish multiple overlapping rate records for the same provider-procedure combination across different plan names, product lines, and network tiers. In extreme cases, we’ve observed a single payer publishing over 10,000 distinct “plans” in their MRF — each with slightly different negotiated rate values for the same provider.
Without deduplication logic, a naive count of available rates dramatically overstates the actual number of unique provider-payer rate relationships.
The Provider Matching Challenge
Even when MRFs are structurally valid and fully parsed, using them for meaningful analysis requires linking payer rate records to real-world provider entities. This is harder than it sounds:
- Hospital systems operate under dozens of billing NPIs
- Multi-site provider groups may contract at the individual NPI, group NPI, or TIN level — and different payers use different conventions for the same provider
- Physician NPIs are not always stable over time (provider type changes, retirements, new practices)
SumHealth maintains a proprietary provider graph that resolves these identifiers against NPPES, CMS facility data, and state licensure records — enabling reliable provider-level rate lookups even when payer MRF identifiers are inconsistent.
What Good MRF Infrastructure Looks Like
Processing payer MRFs at production scale requires:
- Distributed ingestion — parallel download and decompression across cloud compute clusters
- Schema validation and normalization — detecting and correcting schema deviations before data enters the processing pipeline
- Streaming JSON parsing — traditional JSON parsers run out of memory on files exceeding a few gigabytes; streaming parsers process records incrementally
- Deduplication logic — identifying and collapsing redundant rate records across plan variations
- Provider entity resolution — mapping payer-supplied identifiers to canonical provider entities
- Incremental refresh — payers publish new MRFs monthly; the system must detect and process only changed records efficiently
This is the infrastructure SumHealth has built over four years of working exclusively with healthcare pricing data. It is not a weekend project.
The State of MRF Data Quality in 2025
The good news: MRF data quality has improved meaningfully since 2022. CMS enforcement, combined with market pressure from employers and health tech companies using the data, has pushed payers to fix systematic errors in their files.
The bad news: the data is still far from clean, and the volume continues to grow as more provider-payer relationships are required to be disclosed. For organizations trying to use this data to make real decisions — whether in benefits design, payer contracting, or clinical cost benchmarking — the infrastructure gap between raw MRF files and actionable insights remains significant.
SumHealth bridges that gap. Our L1 structured data layer processes payer MRFs nationwide on a continuous basis, delivering clean, queryable, enriched pricing data through an API — without requiring your team to build or maintain any of the underlying infrastructure.
Ready to see the data for yourself?
SumHealth processes hospital and payer MRF rate data nationwide so you don't have to. Talk to our team about how our platform can power your pricing strategy.
Schedule a Demo