These tools will no longer be maintained as of December 31, 2024. Archived website can be found here. PubMed4Hh GitHub repository can be found here. Contact NLM Customer Service if you have questions.
Pubmed for Handhelds
PUBMED FOR HANDHELDS
Search MEDLINE/PubMed
Title: Test-retest reproducibility of a deep learning-based automatic detection algorithm for the chest radiograph. Author: Kim H, Park CM, Goo JM. Journal: Eur Radiol; 2020 Apr; 30(4):2346-2355. PubMed ID: 31900698. Abstract: OBJECTIVES: To perform test-retest reproducibility analyses for deep learning-based automatic detection algorithm (DLAD) using two stationary chest radiographs (CRs) with short-term intervals, to analyze influential factors on test-retest variations, and to investigate the robustness of DLAD to simulated post-processing and positional changes. METHODS: This retrospective study included patients with pulmonary nodules resected in 2017. Preoperative CRs without interval changes were used. Test-retest reproducibility was analyzed in terms of median differences of abnormality scores, intraclass correlation coefficients (ICC), and 95% limits of agreement (LoA). Factors associated with test-retest variation were investigated using univariable and multivariable analyses. Shifts in classification between the two CRs were analyzed using pre-determined cutoffs. Radiograph post-processing (blurring and sharpening) and positional changes (translations in x- and y-axes, rotation, and shearing) were simulated and agreement of abnormality scores between the original and simulated CRs was investigated. RESULTS: Our study analyzed 169 patients (median age, 65 years; 91 men). The median difference of abnormality scores was 1-2% and ICC ranged from 0.83 to 0.90. The 95% LoA was approximately ± 30%. Test-retest variation was negatively associated with solid portion size (β, - 0.50; p = 0.008) and good nodule conspicuity (β, - 0.94; p < 0.001). A small fraction (15/169) showed discordant classifications when the high-specificity cutoff (46%) was applied to the model outputs (p = 0.04). DLAD was robust to the simulated positional change (ICC, 0.984, 0.996), but relatively less robust to post-processing (ICC, 0.872, 0.968). CONCLUSIONS: DLAD was robust to the test-retest variation. However, inconspicuous nodules may cause fluctuations of the model output and subsequent misclassifications. KEY POINTS: • The deep learning-based automatic detection algorithm was robust to the test-retest variation of the chest radiographs in general. • The test-retest variation was negatively associated with solid portion size and good nodule conspicuity. • High-specificity cutoff (46%) resulted in discordant classifications of 8.9% (15/169; p = 0.04) between the test-retest radiographs.[Abstract] [Full Text] [Related] [New Search]