The latest edition of the journal Obstetrics and Gynecology contains a commentary destined to make a splash: Electronic Fetal Monitoring as a Public Health Screening Program; The Arithmetic of Failure by Drs. David Grimes and Jeffrey Piepert. The article makes a bold claim:
Electronic fetal monitoring has failed as a public health screening program… Because of low-prevalence target conditions and mediocre validity, the positive predictive value of electronic fetal monitoring for fetal death in labor or cerebral palsy is near zero. Stated alternatively, almost every positive test result is wrong…
It is critical to note that the authors are not claiming that fetal monitoring is a failure, merely that electronic fetal monitoring fails to provide additional benefits over monitoring by intermittently listening to the fetal heart rate. The authors provide a breathless analysis of the causes for this purported failure, implying that basic statistical analysis made this failure easily predictable.
In my judgment, the authors commit two serious, and inexplicable, errors.
1. Although, the authors provide a detailed statistical analysis of the limited ability of electronic fetal monitoring (EFM) to detect fetal death (stillbirth), such an analysis utterly misses the point. The purpose of electronic fetal monitoring is not to detect fetal death, but to prevent it. The primary purpose of fetal monitoring (whether by auscultation or electronic) is to diagnose fetal distress in progress, not to diagnose death, the end point of severe fetal distress. Curiously, the authors give short shrift to this. And since the authors virtually ignore the primary purpose of the test, their analysis, while sure to garner headlines, is not particularly compelling.
2. The authors complain that screening for rare events leads to tests with poor predictive value. Fortunately, adverse outcomes in labor are relatively rare. That’s why neonatal deaths are expressed per 1,000 births. Therefore, it is not a surprise that screening for poor fetal outcomes has a poor predictive value. But if are goal is to prevent rare events, that is virtually inevitable.
The authors explain the nature of screening tests and the measurements that determine the validity of a screening test, including positive predictive value, negative predictive value and the impact of prevalence. I performed a similar analysis in a post written 2 years ago (Sensitivity, specificity and fetal monitoring). I used round numbers to illustrate the concept and it may helpful to read my post before reading the actual paper.
The key finding of the Grimes, Piepert paper is this:
Here, electronic fetal monitoring is assumed to have a sensitivity of 57% and specificity of 69%,7 and the prevalence of fetal death is low: 50 per 100,000… [T]he predictive value of a positive electronic fetal monitoring screen [is] 29/31,013, which rounds off to zero percent. Because of poor test specificity, more than 30,000 false-positive tests … overwhelm fewer than 30 true-positive results … Given a worrisome tracing, the probability of fetal death is, rounded to percent, nil.
In other words, if EFM is used to predict which babies will definitely die, only 1/1000 will actually die. That seems compelling until you consider that EFM is not used to identify babies who will definitely die, it is used to identify babies who are not getting enough oxygen and therefore may suffer permanent brain damage or die. As the authors briefly acknowledge in what is virtually an aside, EFM performs very differently in that situation.
More common but less serious, fetal acidemia at birth [as a result of low oxygen in labor] may provide the most charitable assessment of electronic fetal monitoring. In a large randomized controlled trial with a frequency of fetal acidemia at birth (umbilical cord artery pH less than 7.15) of 10%, nonreassuring fetal heart rate patterns had a positive predictive value of 37%.13 Even for this common outcome, most positive tests were wrong.
Yes, the majority of babies identified as suffering from oxygen deprivation turn out to be fine, but 37 out of 100 (more than 1/3) are suffering from oxygen deprivation so severe that it may result in brain damage or death. That’s a number too large to ignore.
For perspective, it helps to consider a real world example, like mammography. The positive predictive value of mammograms is low. Most abnormal findings on mammography turn out to be benign. The positive predictive value for screening mammography in detecting breast cancer is in the range of 10%, considerably less that the PPV for electronic fetal monitoring in detecting oxygen deprivation (37%).
Moreover, routine mammographic screening of women under 50 saves only 1 life per 1400 women screened. That’s a PPV for preventing death of 0.07%, nearly zero using the methodology that Grimes and Piepert applied to EFM. Nonetheless, the recent recommendation to suspend routine screening of women under 50 met with a firestorm of protest.
The bottom line is that obstetricians are well aware of the serious limitations of electronic fetal monitoring. For every neonatal life saved, for every case of brain damage averted, hundreds if not thousands of monitoring strips falsely predict fetal oxygen deprivation. The issue is not whether fetal monitoring is a good screening test; everyone knows that it is a bad screening test. The problem is that there is no screening test that’s better.
The question we face is not whether EFM is highly effective, the question is whether EFM is worth it. That’s an ethical issue, not an arithmetic one.