Endometriosis fertility index: the new, validated endometriosis staging system
G. David Adamson, M.D. and David J. Pasta, M.S.
Fertility Physicians of Northern California, Palo Alto and San Jose, California
Objective: To develop a clinical tool that predicts pregnancy rates (PRs) in patients with surgically documented endometriosis who attempt non-IVF conception.
Design: Prospective data collection on 579 patients and comprehensive statistical analysis to derive a new staging system—the endometriosis fertility index (EFI)—from data rather than a priori assumptions, followed by testing the EFI prospectively on 222 additional patients for correlation of predicted and actual outcomes.
Setting: Private reproductive endocrinology practice.
Patient(s): A total of 801 consecutively diagnosed and treated infertile patients with endometriosis.
Intervention(s): Surgical diagnosis and treatment followed by non-IVF fertility management.
Main Outcome Measure(s): The EFI and life table PRs.
Result(s): A statistically significant variable used to create the EFI was the least function score (i.e., the sum of those scores determined intraoperatively after surgical intervention that describe the function of the tube, fimbria, and ovary on both sides). Sensitivity analysis showed that the EFI varies little, even with variation in the assignmentof functional scores, and predicted PRs.
Conclusion(s): The EFI is a simple, robust, and validated clinical tool that predicts PRs after endometriosis surgical staging. Its use provides reassurance to those patients with good prognoses and avoids wasted time and treatment forthose with poor prognoses. (Fertil Steril 2010;94:1609–15. 2010 by American Society for Reproductive Medicine.)
Key Words: Endometriosis, fertility, index, infertility, surgery, predict, prognosis, statistics, staging, laparoscopy
Endometriosis remains an enigmatic disease. Our continued frustration in staging its clinical presentation and impact on associated pain and infertility reduces our ability to ameliorate its effect on millions of women. There are important reasons to stage endometriosis, or any other disease: to create a common language, to enable specificity of diagnosis, to standardize comparisons, and to facilitate research applications. This study presents a staging system that meets most of these requirements and has been validated as clinically useful for surgically confirmed patients with endometriosis attempting non-IVF conception.
Sampson, Acosta et al., and many other investigators (1–4) developed staging systems that have all been criticized for multiple reasons, including their inability to predict clinical outcomes, especially pregnancy rates (PRs) in infertile patients. In 1979, the American Fertility Society (AFS) (now the American Society for Reproductive Medicine, or ASRM) first proposed a classification system (5). This was extensively evaluated, modified in 1985, and is still used today (6–9). Despite these revisions the currently used revised AFS system has serious limitations, including not effectively predicting the outcome of treatment (10–17). Because an endometriosis classification system that effectively predicts outcomes has eluded scientists for decades, we chose a different approach: collect clinical data prospectively, assess infertility outcomes, and use comprehensive statistical analysis to derive a new staging system from the data rather than from a priori assumptions. The new staging system could then be validated prospectively and potentially be modified.
The purpose of this study was to develop a clinical tool (endometriosis fertility index or EFI) that predicts PRs in patients—with surgically documented endometriosis—who attempt non-IVF conception.
MATERIALS AND METHODS
Since 1984 data have been prospectively collected at the time of surgery on a standardized form used in the clinical care of all our surgical patients. The Western Institutional Review Board (IRB) determined that this research project met the condition for exemption under 45 CFR 46.101(b)(4). The prospectively collected detailed clinical and surgical data on 579 consecutive infertile patients with endometriosis were used to create a database with 275 variables. The data were analyzed by sophisticated statistical methods, including life table survival and Cox proportional hazards regression analysis, to identify those factors most predictive of pregnancy. Patients were censored from the study when they were lost to follow-up, became pregnant, had subsequent surgery for endometriosis, took ovarian suppression medications, or underwent assisted reproductive technologies (ART). Preliminary analyses addressed the importance of groups of variables for predicting pregnancy and then evaluated alternative ways of combining the variables within groups. The main groups of variables were historical factors, results of hysteroscopy, and results of abdominal surgery. Subsequent analyses combined the most predictive variables and established a simple scoring system, the EFI. After development of the EFI, the same data were prospectively collected on 222 additional consecutive patients, the EFI calculated on each patient, and actual PRs compared with predicted rates.
The historical factors evaluated in preliminary analyses included age, duration of infertility, and pregnancy history, which repeatedly have been shown to be predictive of pregnancy (17). Many additional historical factors were evaluated, including factors relating to the male partner, previous endometriosis treatment, and results of diagnostic tests.
The variables documenting the results of hysteroscopy were investigated using variable clustering and then Cox proportional hazards regression. The hysteroscopy variables were assessed alone and as potential supplements to historical factors. The results of abdominal surgery were recorded in substantial detail, allowing for the comparison of three prospective operative coding systems:  revised American Fertility Society total, lesion, adhesion, and cul-de-sac scores,  percentage of filmy and dense adhesions on the ovaries and tubes bilaterally, and  intraoperative pretreatment and post-treatment functional score. The functional score was determined by the surgeon for each of the tube, fimbria, and ovary bilaterally where 0 ¼ absent or nonfunctional; 1, 2, and 3 ¼ severe, moderate, and mild dysfunction, respectively; and 4 ¼ normal with respect to the capacity of the organ/structure to effect its purpose in the reproductive process (Table 1). Thus, the functional score measures the ability of the tube to move over the ovary, to be the passage for the sperm from the uterus, to provide the early environment for the egg and embryo, and to enable transport of the embryo to the uterus; the ability of the fimbria to move over the ovary and to pick up an egg; and the ability of the ovary to house eggs, develop follicles, ovulate eggs, and allow them to be picked up by the fimbria. These three intraoperative scoring systems were considered supplements to the historical factors that predicted PRs: age, duration of infertility, and pregnancy history.
Many combinations of the functional scores were evaluated systematically by Cox proportional hazards regression. Scores for the tube, fimbria, and ovary were combined by summing or taking the minimum, separately by side and for both sides combined. The sides were combined by taking the sum or the maximum. One such composite score was a ”least function score”: the sum of the lowest function score on each side from among the fallopian tube, fimbria, and ovary. A score of 4 could be obtained on one side only if the tube, fimbria, and ovary each were entirely normal; therefore, each received the maximum functional score of 4. A score of 0 on one side could be obtained if the tube was absent, obstructed proximally, completely fibrotic, or completely encased in dense adhesions; if the fimbria was involved in a hydrosalpinx, was completely fibrotic, or was separated from the ovary by dense adhesions that had not been removed; or if the ovary was surgically or otherwise absent, or completely encased in dense adhesions such that an egg could not enter the fallopian tube. In the presence of any of these conditions, the adnexa on that side of the patient would have essentially no chance of creating a pregnancy. Because pregnancy requires the functioning of all three—tube, fimbria, and ovary—the lowest score of those three structures determines the ability of that side to function effectively. The total least function score is obtained by adding the lowest score from the right side to the lowest score from the left side to give a combined total of potential for reproductive function in the pelvis. A completely normal pelvis would have a score of 4 þ 4 ¼ 8 and have excellent reproductive potential. A completely nonfunctional pelvis with no chance of reproductive potential would have a score of 0 þ 0 ¼ 0. If the ovary is absent on one side, all the ovulations will occur from the ovary on the other side. Therefore, in this situation, the least functional score is obtained by determining the function score on the side with the ovary and then doubling it.
The presence of an endometrioma would reduce the ”AFS endometriosis score” to 0 and also potentially reduce the least function score and the ”AFS total score”.
The postoperative treatments were based on the clinical situation of the patient. Generally, patients attempted on their own for 3–9 months, then had clomiphene citrate (CC) 100 mg/d from day 3 through day 7 plus IUI for 2–6 cycles; a very few had gonadotropins plus IUI treatment for 1–4 cycles. Younger patients tended to have longer intervals at each treatment level. The final EFI life table curves are based on these generally accepted clinical treatment paradigms, and therefore are the closest to actual clinical practice.
To create the EFI the statistically significant variables were assigned a whole number of points. For continuous variables, alternative cutoffs were systematically evaluated to maximize the explanatory power of the index while maintaining simplicity and clinical relevance. Once a tentative EFI was developed, omitted variables (including alternative forms of variables included in the EFI) were tested to determine whether they had additional statistically significant explanatory power.
Once the data were analyzed to determine the validity of the EFI in predicting PRs prospectively in the 222 patients, the final PRs were subsequently derived using life table methods from the total of 801 patients.
Statistical analyses were performed using SAS (SAS Institute, Inc., Cary, NC) and BMDP (BMDP Statistical Software, Inc., Los Angeles, CA). Variable clustering was performed using the VARCLUS procedure in SAS. Cox proportional hazard regression was performed using BMDP2L and life tables using BMDP1L. Two-tailed P values less than .05 were considered statistically significant.
Characteristics of the 579 patients used to develop the index can be found elsewhere (17). The evaluation of historical factors showed that age, years infertile, and various alternative measures of pregnancy history were all statistically significant predictors of pregnancy. Among the measures of pregnancy history, total pregnancies, at least one pregnancy, and pregnancy with current partner were all predictive. For years infertile, a dichotomization at 3 years of infertility encompassed most of the explanatory power.
Variable clustering and Cox proportional hazards regression of the hysteroscopic findings revealed that only 1 of 9 (11%) patients with a large uterus became pregnant, compared with 4 of 13 (31%) with a small uterus, and 169 of 348 (49%) with a normalsized uterus. The poor prognosis associated with a large uterus may be artifactual but remained statistically significant, even after controlling for other factors.
Evaluation of the abdominal surgery variables included variable clustering. Regardless of whether the uterus was normal or abnormal at laparoscopy, there was essentially no correlation between laparoscopic and hysteroscopic findings. Adhesions and lesions were highly correlated with AFS total score. After controlling for AFS total score and years infertile, none of the laparoscopic cluster scores were associated with fertility.
Unlike the cluster scores from the surgical variables as a whole, the least function score determined intraoperatively after surgical intervention was a statistically significant predictor of fertility, even after controlling for AFS total score and years infertile. The predictive power of the least function score after controlling for the AFS total score and years infertile demonstrates that the least function score measures something different than the AFS total score, presumably the postoperative functionality of the reproductive organs. There was high correlation between both dense adhesions, especially tubal adhesions, and the least function score. There was moderate correlation between filmy adhesions alone and the least function score.
Cox proportional hazards analysis of all the historical, hysteroscopy, and laparoscopy findings showed that the only variables that achieved statistical significance were duration of infertility, prior pregnancy with partner, least function score (all P<.01), and uterine abnormality (P¼.04). Because any prior pregnancy predicted about as well as prior pregnancy with partner, this factor was selected because it was expected to be more broadly applicable. These variables, together with alternative pregnancy history variables and various AFS scores, were considered for creation of a numerically simple EFI. The final score uses age (in three categories), years infertile (in two categories), prior pregnancy (whether or not with the present partner), the least function score (in three categories), the AFS endometriosis lesion score (in two categories), and the AFS total score (in two categories). Details are given in Figure 1.
The EFI score ranges from 0–10, with 0 representing the poorest prognosis and 10 the best prognosis. Half of the points come from the historical factors and half from the surgical factors. Uterine abnormality was not included in the score.
The prospective testing of the EFI on 222 additional patients showed a good correlation of predicted and actual outcomes for all stages of endometriosis. The distribution of the EFI score for the original 579 and the validation sample of 222 are shown in Figure 2.
The life table estimated cumulative percent pregnant by year end by value of the EFI score for all 801 patients, and the 222 validation patients are given in Table 2. A simplified figure showing the estimated cumulative percent pregnant by 3-month interval by EFI score, suitable for presentation to patients, is shown at the bottom of Figure 1.
A sensitivity analysis was performed to assess the effect on the EFI of potential differences in the assignment of the least function scores by different surgeons. For each function score as coded, a replacement score was calculated according to a probability distribution. It was assumed that extreme scores (0 or 4) would be reproduced 90% of the time and that the other 10% of the time scores would be reproduced within 1 point of the original. Other scores (1–3) were assumed to be reproduced 80% of the time, to be 1 point lower 10% of the time, and to be 1 point higher 10% of the time. Based on this simulation, the least function score itself changed 50% of the time (8% higher and 42% lower). The EFI, which changes only when the least function score category changes, only changed about 15% of the time: 4% of the time higher and 11% of the time lower. The EFI changed by more than 1 point in only 5.4% of the cases. In practice, changes in the EFI are material only for the middle values, and, knowing that the index tends to change downward, and only slightly, with uncertainty in the least function scores allows this variability to be taken into account clinically.
More than two decades of clinical data have been used to develop a clinical tool that predicts an infertile endometriosis patient’s probability of pregnancy with standard, non-IVF treatment after surgical staging. The EFI is useful only for infertility patients who have had surgical staging of their disease. It is not intended to predict any aspect of endometriosis-associated pain. It is required that the male and female gametes are sufficiently functional to enable attempts at non-IVF conception. One factor found to predict pregnancy that is not included in the EFI is uterine abnormality. Severe uterine abnormality that is clinically significant was omitted because it is so uncommon in infertile patients with endometriosis. However, when this condition is found, it does need to be taken into account in predicting PRs. Deficiencies in the reproductive function of the gametes or uterus will obviously affect the prognosis and must be considered separately as fertility factors, just as they would with any patient with any other type of disease.
The postoperative least function score is central to the EFI. It has predictive power even after controlling for the AFS total score and years infertile, although there is some association with AFS scores. This finding is consistent with the perspectives that adhesions reduce the ability of the fallopian tubes to function and that dense adhesions, especially ovarian, cul-de-sac obliteration, and endometriomas, also contribute to infertility (18). This relationship between adhesions and the least function score persists, although the least function score is determined postsurgically, because it is more difficult to achieve a good surgical result with disease that is initially more severe.
The AFS endometriosis lesion scores of 16 and 71 were important cutoff points in our calculations. To obtain a lesion score of 16, a patient must have an endometrioma or complete cul-de-sac obliteration— both severe forms of disease. An AFS score of 71 or greater represents extensive endometriosis and may be a stage important to recognize beyond the severe category (19).
A criticism can be made that the least function score is subjective for any given surgeon and for different surgeons. Although true, the least function score in fact is a robust measure of pelvic reproductive potential because the categories are fairly clear, any subjective differences in assessment tend to be averaged through the calculations on one side and then the other, and the least function score represents only 30% of the EFI. Sensitivity analysis showed that even with substantial variation in the assignment of functional scores the EFI varies very little. To provide some clinical guidance, examples of adnexa scored as 1, 2, or 3 are presented (Fig. 3). Improvements in imaging, technology, ovarian reserve testing, and sperm assessment could potentially affect PRs predicted by the EFI, but not likely as much as capabilities of individual surgeons.
In conclusion, the EFI is a simple, robust, and validated clinical tool that predicts PRs for patients after surgical staging of endometriosis. The EFI is very useful in developing treatment plans in infertile patients with endometriosis. It is hoped that further prospective validation by other clinical investigators will encourage widespread application of the EFI to the benefit of our patients. Further efforts are required to develop similar staging systems that will help predict outcomes for patients with endometriosis and pelvic pain for both surgical and nonsurgical treatment.
1. Sampson JA. Perforating hemorrhagic (chocolate) cysts of ovary. Their importance and especially their relation to pelvic adenomas of endometriotic type (”adenomyoma” of the uterus, rectovaginal septum etc.). Arch Surg 1921;3:245–61.
2. Acosta AA, Buttram VC Jr, Besch PK, Malinak LR, Franklin RR, Vanderheyden JD. A proposed classification of pelvic endometriosis. Obstet Gynecol 1973;42:19–25.
3. Kistner RW, Siegler AM, Behrman SJ. Suggested classification for endometriosis: relationship to infertility. Fertil Steril 1977;28:1008–10.
4. Buttram VC Jr. An expanded classification of endometriosis. Fertil Steril 1978;30:240–2.
5. American Fertility Society. Classification of endometriosis. Fertil Steril 1979;32:633–4.
6. Guzick DS, Bross DS, Rock JA. Assessing the efficacy of the American Fertility Society’s classification of endometriosis: application of a dose–response methodology. Fertil Steril 1982;38:171–6.
7. Adamson GD, Frison L, Lamb EJ. Endometriosis: studies of a method for the design of a surgical staging system. Fertil Steril 1982;38:659–66.
8. Buttram VC Jr. Evolution of the revised American Fertility Society classification of endometriosis. Fertil Steril 1985;43:347–50.
9. American Fertility Society. Revised American Fertility Society classification of endometriosis: 1985. Fertil Steril 1985;43:351–2.
10. Stripling MC, Martin DC, Chatman DL, Zwaag RV, Poston WM. Subtle appearance of pelvic endometriosis. Fertil Steril 1988;49:427–31.
11. Candiani GB, Vercellini P, Fedele L. Laparoscopic ovarian puncture for correct staging of endometriosis. Fertil Steril 1990;53:994–7 [comment: 54:1186–8].
12. Vercellini P, Vendola N, Bocciolone L, Rognoni MT, Carinelli SG, Candiani GB. Reliability of the visual diagnosis of ovarian endometriosis. Fertil Steril 1991;56: 1198–2000 [comment: 1992;58:221–2, discussion: 223–4; comment: 1992;58:222, discussion: 223–4].
13. Canis M, Bouquet De Jolinieres J, Wattiez A, Pouly JL, Mage G, Manhes H, et al. Classification of endometriosis. Baillieres Clin Obstet Gynaecol 1993;7:759–74.
14. Hornstein MD, Gleason RE, Orav J, Haas ST, Friedman AJ, Rein MS, et al. The reproducibility of the revised American Fertility Society classification of endometriosis. Fertil Steril 1993;59:1015–21.
15. Rock JA. The revised American Fertility Society classification of endometriosis: reproducibility scoring. ZOLADEX Endometriosis Study Group. Fertil Steril 1995;63:1108–10.
16. Wiegerinck MA, Van Dop PA, Brosens IA. The staging of peritoneal endometriosis by the type of active lesion in addition to the revised American Fertility Society classification. Fertil Steril 1993;60:461–4.
17. Adamson GD, Hurd SJ, Pasta DJ, Rodriguez BD. Laparoscopic endometriosis treatment: is it better? Fertil Steril 1993;59:35–44.
18. Adamson GD, Subak LL, Pasta DJ, Hurd SJ, von Franque O, Rodriguez BD. Comparison of CO2 laser laparoscopy with laparotomy for treatment of endometriomata. Fertil Steril 1992;57:965–73.
19. Canis M, Pouly JL, Wattiez A, Manhes H, Mage G, Bruhat MA. Incidence of bilateral adnexal disease in severe endometriosis (revised American Fertility Society [AFS], stage IV): should a stage V be included in the AFS classification? Fertil Steril 1992;57:691–2. Fertility