Quantitatively assessing the level of confidence on a test score can be a challenging problem, especially when the available information is based on multiple criteria. A concrete example beyond the usual grading of tests occurs with recommendation letters, where a recommender assigns a score to a candidate, but the reliability of the recommender must be assessed as well. Here, we present a statistical procedure, based on Bayesian inference and Jaynes’ maximum entropy principle, that can be used to estimate the most probable and expected score given the available information in the form of a credible interval. Our results may provide insights on how to properly state and analyze problems related to the uncertain evaluation of performance in learning applied to several contexts, beyond the case study of the recommendation letters presented here.

1.
E.
Kanoulas
,
V.
Pavlu
,
K.
Dai
, and
J. A.
Aslam
, “Modeling the score distributions of relevant and non-relevant documents,” in Advances in Information Retrieval Theory, edited by L. Azzopardi, G. Kazai, S. Robertson, S. Rüger, M. Shokouhi, D. Song, and E. Yilmaz (Springer, Berlin, 2009), pp. 152–163.
2.
E.
Kanoulas
,
K.
Dai
,
V.
Pavlu
, and
J. A.
Aslam
, “Score distribution models: Assumptions, intuition, and robustness to score manipulation,” in Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’10) (Association for Computing Machinery, New York, 2010), pp. 242–249.
3.
E. H.
Shuford
,
A.
Albert
, and
H.
Edward Massengill
, “
Admissible probability measurement procedures
,”
Psychometrika
31
,
125
145
(
1966
).
4.
E. S.
Epstein
, “
A scoring system for probability forecasts of ranked categories
,”
J. Appl. Meteorol.
8
,
985
987
(
1969
).
5.
B.
Zadrozny
and
C.
Elkan
, “Transforming classifier scores into accurate multiclass probability estimates,” in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’02) (Association for Computing Machinery, New York, 2002), pp. 694–699.
6.
P. C.
Austin
, “
An introduction to propensity score methods for reducing the effects of confounding in observational studies
,”
Multivar. Behav. Res.
46
,
399
424
(
2011
).
7.
E. T.
Jaynes
, “
Information theory and statistical mechanics
,”
Phys. Rev.
106
,
620
630
(
1957
).
8.
E. T.
Jaynes
,
Probability Theory: The Logic of Science
(
Cambridge University Press
,
2003
).
9.
M. G.
Aamodt
,
D. A.
Bryan
, and
A. J.
Whitcomb
, “
Predicting performance with letters of recommendation
,”
Public Pers. Manage.
22
,
81
90
(
1993
).
10.
T.
Schmader
,
J.
Whitehead
, and
V. H.
Wysocki
, “
A linguistic comparison of letters of recommendation for male and female chemistry and biochemistry job applicants
,”
Sex Roles
57
,
509
514
(
2007
).
11.
D. V.
Girzadas
, Jr.,
R. C.
Harwood
,
J.
Dearie
, and
S.
Garrett
, “
A comparison of standardized and narrative letters of recommendation
,”
Acad. Emerg. Med.
5
,
1101
1104
(
1998
).
12.
A.
Caticha
, “
Entropic inference
,”
AIP Conf. Proc.
1305
,
20
29
(
2011
).
13.
A.
Giffin
and
A.
Caticha
, “
Updating probabilities with data and moments
,”
AIP Conf. Proc.
954
,
74
84
(
2007
).
14.
A.
Caticha
, “
Entropy, information, and the updating of probabilities
,”
Entropy
23
,
895
(
2021
).
15.
M.
Smithson
and
E.
Merkle
, Generalized Linear Models for Categorical and Continuous Limited Dependent Variables, Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences (CRC Press, 2013).
16.
J. M.
Bernardo
, “
Psi (digamma) function
,”
J. R. Stat. Soc. C: Appl. Stat.
25
,
315
317
(
1976
).
17.
K.
Ito
and
K.
Kunisch
,
Lagrange Multiplier Approach to Variational Problems and Applications
(
SIAM
,
2008
).
18.
P.
Whittle
,
Probability via Expectation
(
Springer Science & Business Media
,
2000
).
19.
S.
Davis
and
J.
Peralta
(2022). “Statistical inference for unreliable grading,” GitLab. https://gitarra.cl/resutil/siung.
20.
A.
Gelman
,
J. B.
Carlin
,
H. S.
Stern
, and
D. B.
Rubin
,
Bayesian Data Analysis
(
Chapman and Hall/CRC
,
2014
).
You do not currently have access to this content.