statistics will avoid approaches especially prone to serious
abuse. In this regard, we join others in singling out the
degradation of P values into ‘‘significant’’ and ‘‘nonsignif-
icant’’ as an especially pernicious statistical practice [126].
Acknowledgments SJS receives funding from the IDEAL project
supported by the European Union’s Seventh Framework Programme
for research, technological development and demonstration under
Grant Agreement No. 602552. We thank Stuart Hurlbert, Deborah
Mayo, Keith O’Rourke, and Andreas Stang for helpful comments, and
Ron Wasserstein for his invaluable encouragement on this project.
Open Access This article is distributed under the terms of the Creative
Commons Attribution 4.0 International License (http://creative
commons.org/licenses/by/4.0/), which permits unrestricted use, distri-
bution, and reproduction in any medium, provided you give appropriate
credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made.
References
1. Lang JM, Rothman KJ, Cann CI. That confounded P-value.
Epidemiology. 1998;9:7–8.
2. Trafimow D, Marks M. Editorial. Basic Appl Soc Psychol.
2015;37:1–2.
3. Ashworth A. Veto on the use of null hypothesis testing and p
intervals: right or wrong? Taylor & Francis Editor. 2015.
Resources online, http://editorresources.taylorandfrancisgroup.
com/veto-on-the-use-of-null-hypothesis-testing-and-p-intervals-
right-or-wrong/. Accessed 27 Feb 2016.
4. Flanagan O. Journal’s ban on null hypothesis significance test-
ing: reactions from the statistical arena. 2015. Stats Life online,
https://www.statslife.org.uk/opinion/2114-journal-s-ban-on-null-
hypothesis-significance-testing-reactions-from-the-statistical-arena.
Accessed 27 Feb 2016.
5. Altman DG, Machin D, Bryant TN, Gardner MJ, eds. Statistics
with confidence. 2nd ed. London: BMJ Books; 2000.
6. Atkins L, Jarrett D. The significance of ‘‘significance tests’’. In:
Irvine J, Miles I, Evans J, editors. Demystifying social statistics.
London: Pluto Press; 1979.
7. Cox DR. The role of significance tests (with discussion). Scand J
Stat. 1977;4:49–70.
8. Cox DR. Statistical significance tests. Br J Clin Pharmacol.
1982;14:325–31.
9. Cox DR, Hinkley DV. Theoretical statistics. New York: Chap-
man and Hall; 1974.
10. Freedman DA, Pisani R, Purves R. Statistics. 4th ed. New York:
Norton; 2007.
11. Gigerenzer G, Swijtink Z, Porter T, Daston L, Beatty J, Kruger
L. The empire of chance: how probability changed science and
everyday life. New York: Cambridge University Press; 1990.
12. Harlow LL, Mulaik SA, Steiger JH. What if there were no
significance tests?. New York: Psychology Press; 1997.
13. Hogben L. Statistical theory. London: Allen and Unwin; 1957.
14. Kaye DH, Freedman DA. Reference guide on statistics. In:
Reference manual on scientific evidence, 3rd ed. Washington,
DC: Federal Judicial Center; 2011. p. 211–302.
15. Morrison DE, Henkel RE, editors. The significance test con-
troversy. Chicago: Aldine; 1970.
16. Oakes M. Statistical inference: a commentary for the social and
behavioural sciences. Chichester: Wiley; 1986.
17. Pratt JW. Bayesian interpretation of standard inference state-
ments. J Roy Stat Soc B. 1965;27:169–203.
18. Rothman KJ, Greenland S, Lash TL. Modern epidemiology. 3rd
ed. Philadelphia: Lippincott-Wolters-Kluwer; 2008.
19. Ware JH, Mosteller F, Ingelfinger JA. p-Values. In: Bailar JC,
Hoaglin DC, editors. Ch. 8. Medical uses of statistics. 3rd ed.
Hoboken, NJ: Wiley; 2009. p. 175–94.
20. Ziliak ST, McCloskey DN. The cult of statistical significance:
how the standard error costs us jobs, justice and lives. Ann
Arbor: U Michigan Press; 2008.
21. Altman DG, Bland JM. Absence of evidence is not evidence of
absence. Br Med J. 1995;311:485.
22. Anscombe FJ. The summarizing of clinical experiments by
significance levels. Stat Med. 1990;9:703–8.
23. Bakan D. The test of significance in psychological research.
Psychol Bull. 1966;66:423–37.
24. Bandt CL, Boen JR. A prevalent misconception about sample
size, statistical significance, and clinical importance. J Peri-
odontol. 1972;43:181–3.
25. Berkson J. Tests of significance considered as evidence. J Am
Stat Assoc. 1942;37:325–35.
26. Bland JM, Altman DG. Best (but oft forgotten) practices: testing
for treatment effects in randomized trials by separate analyses of
changes from baseline in each group is a misleading approach.
Am J Clin Nutr. 2015;102:991–4.
27. Chia KS. ‘‘Significant-itis’’—an obsession with the P-value.
Scand J Work Environ Health. 1997;23:152–4.
28. Cohen J. The earth is round (p \ 0.05). Am Psychol.
1994;47:997–1003.
29. Evans SJW, Mills P, Dawson J. The end of the P-value? Br
Heart J. 1988;60:177–80.
30. Fidler F, Loftus GR. Why figures with error bars should replace
p values: some conceptual arguments and empirical demon-
strations. J Psychol. 2009;217:27–37.
31. Gardner MA, Altman DG. Confidence intervals rather than P
values: estimation rather than hypothesis testing. Br Med J.
1986;292:746–50.
32. Gelman A. P-values and statistical practice. Epidemiology.
2013;24:69–72.
33. Gelman A, Loken E. The statistical crisis in science: Data-de-
pendent analysis—a ‘‘garden of forking paths’’—explains why
many statistically significant comparisons don’t hold up. Am
Sci. 2014;102:460–465. Erratum at http://andrewgelman.com/
2014/10/14/didnt-say-part-2/. Accessed 27 Feb 2016.
34. Gelman A, Stern HS. The difference between ‘‘significant’’ and
‘‘not significant’’ is not itself statistically significant. Am Stat.
2006;60:328–31.
35. Gigerenzer G. Mindless statistics. J Socioecon.
2004;33:567–606.
36. Gigerenzer G, Marewski JN. Surrogate science: the idol of a
universal method for scientific inference. J Manag. 2015;41:
421–40.
37. Goodman SN. A comment on replication, p-values and evi-
dence. Stat Med. 1992;11:875–9.
38. Goodman SN. P-values, hypothesis tests and likelihood: impli-
cations for epidemiology of a neglected historical debate. Am J
Epidemiol. 1993;137:485–96.
39. Goodman SN. Towards evidence-based medical statistics, I: the
P-value fallacy. Ann Intern Med. 1999;130:995–1004.
40. Goodman SN. A dirty dozen: twelve P-value misconceptions.
Semin Hematol. 2008;45:135–40.
41. Greenland S. Null misinterpretation in statistical testing and
its impact on health risk assessment. Prev Med. 2011;53:
225–8.
42. Greenland S. Nonsignificance plus high power does not imply
support for the null over the alternative. Ann Epidemiol.
2012;22:364–8.
348 S. Greenland et al.
123