Why do we define the standard error to ignore bias (unlike MSE which includes bias)?

Why is standard error of an estimator $\hat \theta$ defined as $$se = \sqrt{Var(\hat \theta)}$$ and not $$se = \sqrt {MSE(\hat \theta)} = \sqrt{Bias^2(\hat \theta) + Var(\hat \theta)}.$$

That is, standard error should be the square root of mean squared error. Of course, if the estimator is unbiased, there's no difference. But in any case I can think of where we use standard error, if the estimator is biased, that bias needs to be part of the error.

For example, consider performing the Wald test. We can always come up with an estimator of $\sigma^2$ of arbitrarily low variance, if we are are willing to increase the bias. For example, given $\hat \sigma^2$, define $$\hat \sigma_1^2 = (1-t)\hat \sigma^2 + tk$$ for arbitrary constants $t,k$ will give such an estimator. If we use this to perform the Wald test, we can get whatever $\alpha$ we desired, simply by lowering the se, without actually improving the test.

This problem would be solved if the definition of se would include bias - and this would be more consistent with the words standard error. Why don't we do that?

Terminology aside, there is a impactful question here: In cases where our estimator is indeed biased, should we use standard error or the above definition in hypothesis testing? There are cases where this will make a difference in the test result.

It can't be wrong (it's a definition), and it can't really be changed (it is too standard), so the question is whether it is helpful in some way or a regrettable historical misstep (like the terms 'error' and 'regression').

I think it is actually a helpful definition, because it matters for interval estimation, where you do need to consider bias and variability separately. That's where we typically report and use the standard error (as contrasted to the MSE or RMSE as summaries of point prediction).

A second reason is that when have good reason to expect the bias is small relative to the uncertainty (a very common situation for estimators of smooth finite-dimensional parameters) we can estimate the standard error without needing to estimate the bias. Estimating the bias (at least in data analysis, rather than simulation) is much harder.

The reason that the quantity represented by "standard error" has a name is that we want named concepts for both things rather than just one of them. One is called the "standard error" of the estimator and one is called the "root-mean-squared-error" of the estimator. It is okay to have named concepts for both of these things and to keep in mind what both concepts do and don't imply.

As to why that particular name is used, the answer is largely historical. The term "standard error" seems to have been introduced in Yule (1897) and then used in his later introductory statistics text Yule (1911). The latter was a popular textbook for statistics in the early twentieth century and so the name stuck. As to the "root-mean-squared-error", this goes back even further. The term "mean error" appears in Gauss (1821) to describe what we would now call the root-mean-squared-error.$^\dagger$

Contrary to what you seem to assume in your question, the mere fact that we have names for both these concepts, and that they sound a bit similar to one another, does not lead to any serious confusion in the profession. It is well-known that lower "standard error" does not necessarily imply a better estimator. (Indeed, it is well-known that a constant is an estimator with zero standard error and is a really crappy estimator!) As with any long-standing discipline with nomenclature derived from historical processes, there are some areas in mathematics and statistics where it might be nice to rename and redefine some things, to have nomenclature that fits more smoothly. This is not really one of them --- statisticians and other experienced statistical users do not make the mistake you are suggesting, so they do not really see this as a "problem" that needs to be "solved".

$^\dagger$ Link is to the 1995 English translation of Gauss (1821). Interestingly, Gauss actually called this quantity "the mean error to be feared" with "mean error" being the shortened version. What a wonderful name! Perhaps we should make a push to reintroduce that nomenclature!

Ask AI

#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70