Abstract
Two possible objectives for a laboratory participating in a measurement comparison are (i) to test the reliability of its submission (x, u) and (ii) to estimate the systematic, i.e., enduring, error or bias in the technique that generated the estimate x. The first of these objectives relates to a statistical hypothesis test and the second relates to a statistical estimation procedure. With regard to the first objective, the usual way in which the results are graphed appears to confuse the two statistical tasks, but the conduct of the test itself remains appropriate. However, with regard to the second objective, careful consideration shows that — if the measurand has true value θ and reference estimate θ^ — the best estimate of the bias is not the conventional difference d = x − θ^ that forms part of the degree of equivalence. In fact, it is numerically smaller than d, meaning that — on average — the laboratory is measuring more accurately than might previously have been thought. Similarly, the standard uncertainty of the best estimate of the bias is generally smaller than the figure often denoted u(d).