These tools will no longer be maintained as of December 31, 2024. Archived website can be found here. PubMed4Hh GitHub repository can be found here. Contact NLM Customer Service if you have questions.
Pubmed for Handhelds
PUBMED FOR HANDHELDS
Search MEDLINE/PubMed
Title: Assessing agreement between multiple raters with missing rating information, applied to breast cancer tumour grading. Author: Fanshawe TR, Lynch AG, Ellis IO, Green AR, Hanka R. Journal: PLoS One; 2008 Aug 13; 3(8):e2925. PubMed ID: 18698346. Abstract: BACKGROUND: We consider the problem of assessing inter-rater agreement when there are missing data and a large number of raters. Previous studies have shown only 'moderate' agreement between pathologists in grading breast cancer tumour specimens. We analyse a large but incomplete data-set consisting of 24,177 grades, on a discrete 1-3 scale, provided by 732 pathologists for 52 samples. METHODOLOGY/PRINCIPAL FINDINGS: We review existing methods for analysing inter-rater agreement for multiple raters and demonstrate two further methods. Firstly, we examine a simple non-chance-corrected agreement score based on the observed proportion of agreements with the consensus for each sample, which makes no allowance for missing data. Secondly, treating grades as lying on a continuous scale representing tumour severity, we use a Bayesian latent trait method to model cumulative probabilities of assigning grade values as functions of the severity and clarity of the tumour and of rater-specific parameters representing boundaries between grades 1-2 and 2-3. We simulate from the fitted model to estimate, for each rater, the probability of agreement with the majority. Both methods suggest that there are differences between raters in terms of rating behaviour, most often caused by consistent over- or under-estimation of the grade boundaries, and also considerable variability in the distribution of grades assigned to many individual samples. The Bayesian model addresses the tendency of the agreement score to be biased upwards for raters who, by chance, see a relatively 'easy' set of samples. CONCLUSIONS/SIGNIFICANCE: Latent trait models can be adapted to provide novel information about the nature of inter-rater agreement when the number of raters is large and there are missing data. In this large study there is substantial variability between pathologists and uncertainty in the identity of the 'true' grade of many of the breast cancer tumours, a fact often ignored in clinical studies.[Abstract] [Full Text] [Related] [New Search]