In 2018, Kent Anderson published a list of 102 things journal publishers do. It appeared on The Scholarly Kitchen, a blog of the Society for Scholarly Publishing.
The list was thorough, detailed, and intended to demonstrate the enormous value publishers provide to the scholarly communication ecosystem. It succeeded.
Every one of those 102 functions is performed by paid professionals: paid editors, paid designers, paid lawyers, paid IT staff, paid marketers, paid managers, paid executives.
Every one -- except the most intellectually demanding task in the entire publishing pipeline. That of the reviewer.
This is the companion list. It catalogs what peer reviewers actually do when they agree to evaluate a manuscript. Not the abstraction. Not the polite fiction that peer review is a ‘quick look’ that takes an hour. The actual work. Item by item, hour by hour, judgment call by judgment call.
Every item on this list is performed by a skilled professional who has spent years -- often decades -- building the expertise required to do it competently. Every item requires training that the reviewer acquired at their own expense, maintained on their own time, and deployed at the publisher’s request. The copy editor is paid. The typesetter is paid. The platform engineer is paid. The publisher’s CEO is paid, sometimes very handsomely indeed. The reviewer is not.
Here are 102 things journal reviewers do. For free.
Before Reading the Manuscript
1. Receive and assess the review invitation. The email arrives on a Tuesday afternoon, or a Sunday evening, or during a conference dinner. It contains an abstract, an author list, and a deadline. You read the abstract and make a rapid professional judgment: Is this within my expertise? Do I have a conflict? Can I deliver a quality review in the time allotted? This initial triage requires the kind of subject-matter knowledge that took decades to build, deployed in under five minutes. If you decline, you are often asked to suggest alternative reviewers, which requires you to mentally survey your professional network and identify colleagues with the right expertise and availability. No one compensates you for this decision, even when you decline. The publisher benefits either way: they get a review or they get a referral.
2. Check for conflicts of interest. This is not a box you check. It is a genuine ethical self-examination. Have you collaborated with any of the authors in the past five years? Have you reviewed a previous version of this manuscript for another journal? Do you hold a financial interest in a company whose product is being studied? Have you published a competing study whose conclusions might bias your evaluation? Are any of the authors former trainees, mentors, or close colleagues? In some fields, the community is small enough that pure independence is nearly impossible, and the reviewer must make a judgment call about whether proximity constitutes a disqualifying conflict. You do this because the integrity of the system depends on it. The publisher’s checklist asks if you have a conflict. Your conscience determines whether you actually do.
3. Clear your schedule. A serious peer review takes between 3 and 8 hours of concentrated, uninterrupted professional time. For a complex systematic review or meta-analysis, it can take 10 or more. You do not have this time sitting idle in your calendar. It must be carved from somewhere: a research day sacrificed, an evening with family forfeited, a weekend morning at the desk instead of with the children. If you are a clinician, your patients do not reschedule because a publisher set a 14-day turnaround. If you are an academic, your grant deadline does not move. The opportunity cost is real and measurable, but it never appears on any publisher’s balance sheet. You absorb it because you were asked, and because the system taught you that this is what professionals do.
4. Download and organize the materials. The manuscript arrives through a submission system -- ScholarOne, Editorial Manager, eJournalPress, or one of a dozen others -- each with its own interface, its own quirks, and its own login credentials that you last used four months ago. The materials may include a main manuscript, supplementary tables, supplementary figures, an appendix, a data availability statement, a CONSORT or STROBE checklist, author response letters from a prior round, and previous reviewer comments. Some systems deliver everything as a single PDF. Others require you to download each file individually, sometimes through multiple clicks, sometimes through a system that times out. You organize these into a working folder, rename them so they are intelligible, and set up your note-taking system. This administrative overhead is trivial per review but cumulative across a career. It is work that the publisher’s paid staff used to do when manuscripts arrived in manila envelopes.
5. Familiarize yourself with the journal’s scope, standards, and review criteria. Not all journals are the same. What constitutes an acceptable sample size for a specialty journal may be inadequate for a general journal. The level of statistical rigor expected varies by field and by publication. Some journals prioritize novelty; others prioritize confirmatory work. Some expect structured reviews with scored rubrics; others expect narrative assessments. You learn this for each journal you review for, often without formal guidance, by reading the journal’s instructions to reviewers, by studying previously published papers, and by inferring editorial preferences from prior correspondence. The publisher provides a style guide for authors. Reviewers receive almost nothing. You calibrate your expectations through experience and professional judgment, adapting your standards to each journal’s implicit culture -- a skill that cannot be automated and is never acknowledged.
First Read: The Overview
6. Read the manuscript from beginning to end. The first complete read-through is for orientation. You are not yet critiquing; you are absorbing. Does this study ask a question worth asking? Is the overall design sensible? Does the arc from introduction through methods through results to discussion hold together? This first read takes 45 minutes to two hours for a standard original research article, longer for a meta-analysis or a paper with extensive supplementary material. You bring to this read everything you have learned in your career -- every study you have read, every patient you have treated, every lecture you have given. That accumulated expertise is what allows you to sense, even on a first pass, whether something is fundamentally sound or fundamentally flawed. A second-year trainee can read the same paper and see words. You read it and see the architecture. This distinction is the product of decades, and it is the reason you were invited to review.
7. Assess the originality and significance of the research question. Is this question new? Has it been answered before, and if so, does this study add meaningfully to what is already known? Is the incremental contribution sufficient to justify publication in this journal? These are questions that require you to hold in your mind a working model of the current state of knowledge in your field -- hundreds or thousands of papers, synthesized into a landscape you can survey in seconds. Nobody provides you with a literature map. You are the literature map. When you write in your review that ‘this question has been addressed by Smith et al. (2019) and Jones et al. (2021) using similar methodology,’ you are drawing on a mental database that took years to populate and hours per week to maintain. The publisher could not build this database for any price. They get it from you for free.
8. Evaluate the title for accuracy and clarity. The title is the single most widely read element of any scientific paper. It appears in search results, citation lists, social media shares, and news reports. A misleading title can distort how a finding is understood for years. Does the title accurately reflect the study design? Does it overstate the conclusions? Does it use causal language for an observational study? Is it specific enough to distinguish this study from similar work? You assess all of this in the time it takes to read one sentence, because you know the conventions of your field and you know when a title is trying too hard. A small correction here -- changing ‘X causes Y’ to ‘X is associated with Y’ -- can prevent a cascade of public misunderstanding. The publisher does not make this correction. You do.
9. Evaluate the abstract for completeness and accuracy. More people will read the abstract than will ever read the full paper. In medicine, clinical decisions are sometimes made based on abstracts alone -- a reality that is troubling but true. You check whether the abstract accurately represents the methods, whether the results numbers match the full text, whether the conclusions are supported by the reported data, and whether anything important has been omitted or overstated. You compare the abstract to the body of the paper for consistency, catching discrepancies that arise from revision: an author updates a table but forgets to update the corresponding abstract number. This kind of error, if published, enters databases and meta-analyses and propagates. You are the last line of defense before publication, and you catch it because you read carefully. Automated systems check formatting. You check truth.
10. Check the structured abstract against journal requirements. Structured abstracts typically require Background, Methods, Results, and Conclusion sections, though the specific headings vary by journal. Some journals require the trial registration number in the abstract. Some require a word count. Some require specific subheadings within each section. You verify that the format meets the journal’s requirements, that each section contains the appropriate information, and that the structure imposes the discipline on the authors’ summary that it was designed to impose. This is a mechanical check, yes, but it is also a quality signal: an author who cannot follow abstract formatting instructions may not have followed methods reporting guidelines either. You notice this. A system checks compliance. You check what compliance implies.
Introduction and Background
11. Assess the introduction for appropriate context. The introduction should frame the problem clearly, review the relevant literature fairly, and establish why this particular study was needed at this particular time. You evaluate whether the literature review is balanced or whether it cherry-picks studies that support the authors’ hypothesis while ignoring contradictory evidence. You check whether the introduction is proportionate -- too long and it becomes a review article; too short and it fails to establish context. You assess whether the authors have cited the seminal work in the field or whether important contributions have been overlooked, intentionally or not. Self-citation is a particularly common problem: some authors cite their own prior work disproportionately, creating an appearance of a larger evidence base than actually exists. You catch this because you know whose work matters in the field and whose work the authors should have acknowledged.
12. Verify that the research question or hypothesis is clearly stated. A study without a clear question is a study that cannot be properly evaluated. You look for an explicit statement of the primary research question or hypothesis, ideally at the end of the introduction. If it is absent, vague, or buried in the methods, you flag it. If multiple questions are posed, you assess whether the study design is adequate to address all of them or whether the paper is trying to answer too many questions at once. A clearly stated question disciplines the entire paper. When the question is ambiguous, the methods may not match the aim, the results may wander, and the conclusions may overreach. You identify this structural problem because you have read thousands of papers and you know what a well-framed question looks like. This is pattern recognition born of experience, not a checklist item.
13. Check for adequate citation of prior work. Adequate citation serves multiple purposes: it credits the researchers who built the foundation, it demonstrates that the authors understand the landscape, and it allows readers to trace the intellectual lineage of the work. You assess whether key studies are cited, whether the citations are accurate, and whether the authors have selectively cited work that supports their position while omitting work that challenges it. In competitive fields, citation omission can be strategic -- authors may ignore a rival group’s work to diminish its visibility. You know the field well enough to notice these omissions. You also check for citation inflation: padding a reference list with tangentially related work to create an appearance of thoroughness. Adequate citation is neither too few nor too many; it is the right references, honestly presented. This judgment requires deep field knowledge that no algorithm possesses.
14. Identify potential bias in the framing. Some introductions are written as if the conclusion is already known. The literature is presented to build a case rather than to set a context. The question is framed in a way that presupposes the answer. This kind of confirmatory framing is subtle and often unintentional, but it signals a deeper problem: if the authors began with a conclusion and designed the study to confirm it, the methods and analysis may reflect that bias. You detect this because you know what neutral framing looks like, and you recognize when language is doing persuasive work that should be left to the data. An introduction that reads like an argument rather than a question is an introduction you send back for revision. The publisher cannot make this call. It requires someone who understands the difference between asking a question and building a case.
Methods: The Heavy Lifting
15. Assess whether the methods section is reproducible. Reproducibility is the foundational promise of science. If another competent researcher cannot read the methods section and replicate the study, the methods are insufficient. You evaluate whether the description is detailed enough to reproduce the procedures: are the instruments identified? Are the protocols specified? Are the decision rules explicit? In clinical research, you ask whether the intervention is described with enough precision that a clinician in another country could implement it. In laboratory research, you check reagent sources, concentrations, and conditions. The trend toward shorter papers has compressed methods sections to the point where critical details are missing or relegated to supplements that nobody reads. You push back because you know that a finding that cannot be reproduced is a finding that does not exist.
16. Evaluate the study design. Is this the right design to answer the stated question? A case-control study where a cohort was needed. An observational analysis where a trial was feasible. A cross-sectional snapshot where longitudinal follow-up was required. You assess not just whether the design is internally valid but whether it is the strongest design the authors could reasonably have employed. If a randomized trial was possible but the authors chose an observational approach, you want to know why. If a prospective design was feasible but they used retrospective data, you assess what was lost. Design choice determines what conclusions are permissible, and you enforce those boundaries. A cohort study cannot prove causation. An underpowered trial cannot prove equivalence. These distinctions matter, and authors routinely overinterpret their designs. You correct them.
17. Assess the sample size and power calculation. Was a formal power analysis conducted before the study began? If so, what assumptions were used -- effect size, alpha, power, dropout rate -- and were those assumptions reasonable? If no power calculation is reported, is the sample size justified by other means, such as the available population or a pilot study? Underpowered studies are a waste of scientific resources: they expose participants to research procedures without having a reasonable chance of detecting the effect of interest. Overpowered studies can detect trivially small effects that have no clinical significance. You assess whether the sample is right-sized for the question, and if the study is underpowered, you determine whether this is a fatal flaw or a limitation that should be acknowledged. This assessment requires statistical literacy that many reviewers had to acquire on their own, because most clinical training programs do not teach power analysis. You learned it anyway, because the manuscripts required it.
18. Evaluate inclusion and exclusion criteria. The criteria for who gets into a study and who is excluded determine what population the results apply to. You assess whether the criteria are appropriate for the research question, whether they are too narrow (limiting generalizability) or too broad (introducing heterogeneity), and whether they introduce selection bias. You look for criteria that seem designed to enrich the sample for a favorable outcome -- for example, excluding patients who are sicker, older, or more complex. You also check whether the criteria are clearly stated, consistently applied, and fully reported. In clinical trials, you compare the stated criteria against the baseline characteristics table to see whether the enrolled population matches what was promised. Discrepancies suggest either sloppy reporting or post-hoc modification of criteria. Either way, you catch it.
19. Assess the definition and measurement of exposures. How an exposure is defined determines what the study actually measures. In a study of labor induction, for example, the definition of ‘elective induction’ versus ‘medically indicated induction’ changes the results dramatically. You assess whether the exposure definitions are clear, consistent, and clinically meaningful. You check whether the measurement tools are validated, whether there is potential for misclassification, and whether the exposure was ascertained prospectively or retrospectively. Misclassification bias is one of the most common and most consequential problems in observational research. It is also one of the hardest to detect without deep domain knowledge. A reviewer who does not practice in the field cannot assess whether ‘gestational diabetes’ was defined by Carpenter-Coustan criteria, IADPSG criteria, or something else entirely. You can, because you treat these patients.
20. Assess the definition and measurement of outcomes. Primary and secondary outcomes must be clearly defined, clinically meaningful, and measured using validated instruments. You check whether the primary outcome was pre-specified or chosen after the data were collected. Outcome switching -- changing the primary endpoint after seeing the results -- is one of the most damaging forms of reporting bias, and it is your job to catch it. You compare the registered protocol (if available) against the published manuscript to identify discrepancies. You also assess whether the outcomes are meaningful to patients or merely statistically convenient. A surrogate endpoint like a laboratory value may be easier to measure but may not translate to outcomes patients care about. In obstetrics, a composite outcome that lumps together minor and major complications can obscure more than it reveals. You make these distinctions because you understand what matters clinically, not just statistically.
21. Evaluate the control or comparison group. The choice of comparator determines the meaning of the results. You assess whether the control group is appropriate: is it truly comparable to the intervention group, or are there baseline differences that confound the comparison? In drug trials, is the comparator an active treatment, a placebo, or ‘usual care’ -- and is the choice justified? In observational studies, you evaluate whether the comparison group was selected in a way that minimizes confounding or whether it introduces immortal time bias, selection bias, or other structural problems. You check the baseline characteristics table for balance between groups, flagging any differences that could explain the observed effect. A study that compares induced labor at 39 weeks to spontaneous labor at any gestational age is comparing two fundamentally different populations. You know this. An automated check does not.
22. Check for appropriate blinding. Blinding prevents knowledge of group assignment from influencing the measurement of outcomes. You assess whether the level of blinding is appropriate for the study design: was the participant blinded? The clinician? The outcome assessor? The data analyst? If blinding was not possible (as in many surgical or behavioral interventions), you evaluate whether the lack of blinding was adequately addressed through objective outcome measures or independent adjudication. You also check whether blinding was maintained throughout the study or whether it was compromised by adverse events, treatment effects, or protocol violations. Inadequate blinding inflates treatment effects in predictable ways, and you adjust your interpretation accordingly. The publisher’s checklist asks whether blinding was described. You assess whether it was adequate.
23. Assess randomization procedures. Randomization is the single most powerful tool for eliminating confounding in clinical trials, but only if done correctly. You evaluate the method of randomization (computer-generated, block, stratified), the adequacy of allocation concealment (sealed envelopes are not adequate; central allocation is), and whether the baseline characteristics suggest that randomization achieved balance. You look for evidence of pseudo-randomization -- alternating assignment, assignment by birth date, assignment by medical record number -- which preserves the appearance of randomization while undermining its purpose. You also assess whether randomization was broken at any point during the study. A trial with compromised randomization is an observational study with extra steps. You recognize this even when the authors do not.
24. Evaluate the statistical methods. This is where many reviews succeed or fail. You assess whether the statistical tests are appropriate for the data type (continuous, categorical, time-to-event) and distribution (normal, skewed, zero-inflated). You check whether assumptions are met: independence of observations, proportional hazards, linearity of log-odds. You evaluate whether the analysis was pre-specified or post-hoc, whether corrections for multiple comparisons were applied, and whether sensitivity analyses were conducted to test the robustness of the findings. For complex analyses -- mixed models, propensity score matching, instrumental variables, Bayesian approaches -- you assess both the appropriateness of the method and the correctness of its implementation. This single item on this list can consume one to two hours of concentrated work for a methodologically complex paper. Statistical review is the most technically demanding component of peer review, and it is the component most likely to be done poorly if the reviewer lacks the necessary training. You have that training because you invested in acquiring it, on your own time, at your own expense.
25. Check for appropriate handling of missing data. Missing data is the silent destroyer of research validity. You assess how much data is missing, whether it is missing at random or systematically, and whether the analytical approach adequately addresses the missing data problem. Complete case analysis (discarding incomplete records) can introduce bias if the reasons for missingness are related to the outcome. Multiple imputation addresses this but introduces assumptions that must be justified. Sensitivity analyses comparing different approaches to missing data are essential but often omitted. You check whether the authors reported the proportion of missing data for each variable, whether they described their approach to handling it, and whether they conducted sensitivity analyses to assess its impact. In large database studies, missing data can affect thousands of records. In clinical trials, differential dropout between treatment groups can invalidate the intention-to-treat analysis. You catch these problems because you know where to look.
26. Assess adjustment for confounders. In observational research, confounding is the central threat to validity. You evaluate which confounders were controlled for, whether the choice of covariates was justified by prior knowledge or by data-driven selection (which introduces its own problems), and whether important confounders were omitted. You bring clinical knowledge to this assessment: you know which factors are likely to confound the relationship under study because you understand the clinical biology, not just the statistical framework. You also assess whether the authors over-adjusted -- controlling for variables on the causal pathway, which can attenuate a real effect -- or under-adjusted, leaving residual confounding that inflates the apparent association. The difference between a confounder and a mediator is clinical judgment, not statistical output. This is why you, and not a software package, are reviewing this paper.
27. Evaluate the statistical software and reproducibility. You check whether the statistical software is identified (SAS, Stata, R, SPSS), whether the version is stated, and whether sufficient detail is provided for another researcher to reproduce the analysis. In the era of complex computational analyses, reproducibility requires more than naming the software: it requires specifying the packages, the parameters, the seed for random number generation, and ideally, providing the code. You assess whether the level of detail meets the standard for the field and the journal. If the analysis involves machine learning or other computationally intensive methods, you evaluate whether the training, validation, and test sets were properly separated, whether overfitting was addressed, and whether performance metrics are appropriate. This is a rapidly evolving area, and staying competent to evaluate these methods requires continuous self-education that is itself uncompensated.
28. Assess ethical approvals and informed consent. You verify that the study received institutional review board or ethics committee approval and that the approval is documented with a protocol number, not just a vague statement. You check whether informed consent was obtained from participants, whether the consent process was appropriate for the study’s risk level, and whether vulnerable populations (children, pregnant women, prisoners, cognitively impaired individuals) received the additional protections required by regulation and by ethics. In studies using existing databases or medical records, you assess whether the waiver of consent was appropriate and properly documented. In international research, you consider whether the ethical review met standards recognized by the journal, as the rigor of ethics review varies considerably across countries. The publisher’s submission form has a checkbox for ethics approval. You verify what the checkbox cannot: whether the ethical framework was adequate for the research conducted.
29. Evaluate the timeline and follow-up period. Is the follow-up period long enough to detect the outcomes of interest? A study of cancer recurrence with six months of follow-up may miss the majority of events. A study of surgical complications that only reports 30-day outcomes may miss important late morbidity. You assess whether the duration of follow-up is appropriate for the clinical question and whether differential loss to follow-up between groups could bias the results. You check the flow diagram for the numbers: how many were enrolled, how many completed follow-up, how many were lost and at what time points, and whether the reasons for loss differed between groups. High attrition is a warning sign. Differential attrition is a red flag. You quantify it because the authors may not have.
30. Check for protocol registration and adherence. For clinical trials, you verify that the protocol was registered in a public registry (ClinicalTrials.gov, ISRCTN, or equivalent) before enrollment began, as required by the ICMJE. You compare the registered primary outcome against the published primary outcome, looking for discrepancies that suggest outcome switching. You check whether the sample size matches the registered target, whether the inclusion criteria changed, and whether new secondary outcomes appeared that were not in the original protocol. This detective work is tedious but essential: outcome switching is documented to occur in approximately one-third of trials, and it systematically inflates the literature with positive findings. You are the system’s primary mechanism for detecting this. Automated comparison between registry and manuscript is possible in theory but rarely done in practice. You do it manually, because the integrity of the evidence base depends on it.
31. Verify internal consistency of reported numbers. Do the numbers in the text match the numbers in the tables? Do the numbers in the tables match the figures? Do the subgroup totals sum to the overall total? Do the percentages add up to 100, or close enough? This is arithmetic, yes, but it is arithmetic performed on a document that may contain hundreds of individual numbers, any one of which could be wrong. A transposed digit in a table -- reporting 31% instead of 13% -- can change the meaning of a finding entirely. A discrepancy between the abstract and the results section suggests that one version was updated and the other was not, which raises questions about the care with which the manuscript was prepared. You perform this tedious verification because you know that once published, these numbers enter databases, meta-analyses, clinical guidelines, and treatment decisions. A wrong number in a published paper can propagate for decades. You are the last person standing between a typo and the permanent scientific record.
Results
32. Assess the tables for clarity, completeness, and accuracy. Tables are where the data live, and they demand careful scrutiny. You check that column headers are clear and unambiguous, that units are specified, that confidence intervals are reported alongside point estimates, and that the number of decimal places is appropriate and consistent. You examine the baseline characteristics table to assess whether the study groups are balanced, flagging any differences that could confound the results even after adjustment. You check that the denominators are correct, that missing data are accounted for, and that the table tells a coherent story that supports the text. You also look for formatting issues that create confusion: merged cells, inconsistent abbreviations, footnotes that do not match their markers. A poorly constructed table can mislead even a careful reader. You reconstruct the logic of each table to verify that it presents the data honestly and clearly. This is not copy editing. This is data auditing.
33. Evaluate the figures for accuracy and appropriateness. Figures are the most immediately persuasive element of a manuscript, and they are also the element most susceptible to manipulation, whether intentional or not. You check that axes are labeled, that scales start at zero (or that truncated axes are justified and clearly marked), that error bars are defined (standard deviation, standard error, or confidence interval -- they mean very different things), and that the chosen visualization is appropriate for the data type. Bar graphs hide distributional information; dot plots reveal it. Kaplan-Meier curves should report numbers at risk; forest plots should include the scale and direction. You assess whether the figures could stand alone -- a reader should be able to understand the main finding from the figure and its legend without reading the text. If the figure tells a different story than the text, you have found a problem.
34. Check for appropriate reporting of effect sizes. The choice of effect measure determines how the reader understands the magnitude of a finding. A relative risk reduction of 50% sounds impressive. The corresponding absolute risk reduction -- from 2% to 1% -- sounds less so. You assess whether the authors report both relative and absolute measures, whether the choice of effect size (risk ratio, odds ratio, hazard ratio, mean difference, standardized mean difference) is appropriate for the study design, and whether confidence intervals are provided. You flag studies that report only relative measures, which inflate the apparent importance of findings. In clinical research, you calculate the number needed to treat or the number needed to harm when the authors have not, because these are the numbers that translate statistical findings into clinical decisions. You do this arithmetic because you know that a p-value alone tells a clinician nothing about whether to change practice.
35. Assess whether results address the stated research question. Sometimes the results section drifts away from the stated aim. The study was designed to assess whether A affects B, but the results focus primarily on the association between A and C, because C turned out to be more interesting. You notice this because you read the introduction carefully, memorized the stated question, and now hold the results accountable to it. If the most prominently reported finding is a secondary or exploratory outcome, you flag this as a potential example of outcome migration -- the gradual shift of emphasis from pre-specified to post-hoc findings that happens to produce a more publishable result. This is one of the subtlest forms of reporting bias, and it requires a reviewer who reads the whole paper as a single logical structure, not as a series of independent sections.
36. Evaluate the use of p-values and confidence intervals. The misuse of p-values is one of the most persistent problems in the biomedical literature. You check whether p-values are interpreted correctly -- a non-significant p-value does not mean there is no effect, and a significant p-value does not mean the effect is large or clinically important. You look for the fallacy of ‘trending toward significance’ (p = 0.06 is not significant, regardless of the narrative) and for the misinterpretation of non-significance as evidence of equivalence. You assess whether confidence intervals are reported and whether the authors interpret them correctly: a wide interval that includes clinically important values is not reassuring, even if it also includes the null. You flag multiple comparisons without correction, subgroup analyses presented as confirmatory, and the selective emphasis on whichever comparison happened to produce p < 0.05. The ASA statement on p-values was published in 2016, and the problems it described are still everywhere. You correct them, one manuscript at a time.
37. Identify selective reporting. Are all pre-specified outcomes reported, or have some quietly disappeared? Are negative results presented with the same detail as positive ones, or are they buried in a supplementary table that nobody will read? Are subgroup analyses -- which should be hypothesis-generating, not confirmatory -- presented in the main results as if they were planned all along? Selective reporting is among the most damaging practices in science because it systematically biases the published literature toward positive findings. Meta-analyses built on selectively reported studies produce inflated effect estimates that mislead clinical practice. You are the primary line of defense against this. You compare the methods section against the results section, and if outcomes promised in the methods do not appear in the results, you ask why. This is not nit-picking. This is protecting the evidence base.
38. Evaluate the handling of adverse events and harms. In clinical studies, the reporting of harms is consistently less thorough than the reporting of benefits. This asymmetry is well documented: adverse events are underreported, described in less detail, and sometimes relegated to supplementary materials. You check whether adverse events are reported completely, whether they are presented by treatment group, whether serious adverse events are individually described, and whether the timing and duration of adverse events are documented. You assess whether the authors minimized or obscured safety signals through selective reporting, vague language, or composite endpoints that dilute individual harms. In a world where published trial reports inform prescribing decisions, incomplete reporting of adverse events can directly harm patients. You hold the authors to account because nobody else at this stage of the process will.
39. Assess whether the data support the magnitude of the claims. A statistically significant p-value does not mean a clinically significant finding, and a large relative risk does not necessarily translate to a meaningful absolute risk. You apply clinical judgment to statistical output, assessing whether the observed effect is large enough to matter in practice, whether the confidence interval is narrow enough to be informative, and whether the finding would change clinical behavior if adopted. You also assess the opposite: whether a null finding is truly null or whether the study was too small to detect a real effect. This is where the clinical reviewer earns their (hypothetical) salary. A statistician can tell you whether the result is significant. Only a clinician can tell you whether it matters. The publisher gets both from you, for the same price: zero.
40. Check for data fabrication indicators. You look for patterns that suggest the data may not be real. Are the standard deviations suspiciously uniform across groups? Do the means fall into patterns that are statistically improbable? Are the results too clean -- every comparison significant at exactly the same level? Do the distributions follow Benford’s law? Are there impossible values (negative ages, gestational ages of 50 weeks, birth weights of 10 kilograms)? These checks do not prove fabrication, but they raise flags that warrant further investigation. You perform this forensic work because you have seen enough real data to know what real data looks like. The publisher does not have this expertise. You provide it, uncompensated, as a byproduct of your clinical and research experience.
41. Evaluate whether the discussion stays within the bounds of the data. The discussion section is where authors are most tempted to overreach. An observational study begins to use causal language. A single-center trial generalizes to all settings. A finding in one subgroup is discussed as if it applies to the entire population. You rein these claims in, not because you enjoy being restrictive, but because you know the downstream consequences: a clinician reads the discussion, accepts the framing, and changes practice based on an interpretation the data do not support. Your job in this section is to enforce the logical link between what was measured and what is claimed. If the methods section says ‘association,’ the discussion should not say ‘effect.’ If the population was limited to one ethnic group or one health system, the discussion should not imply universal applicability. This discipline is the essence of scientific integrity, and it falls disproportionately on the reviewer.
Discussion and Conclusions
42. Assess the comparison with existing literature. A good discussion places the current findings in context, comparing results with prior studies and explaining discrepancies. You assess whether this comparison is fair and balanced. Do the authors acknowledge studies that contradict their findings, or do they cite only supportive evidence? When their results differ from prior work, do they offer plausible explanations, or do they dismiss the contradictory evidence? Are the comparisons methodologically sound -- is the author comparing apples to apples, or apples to a fruit salad? You know the prior literature well enough to assess whether the discussion is honest, and you flag gaps, misrepresentations, and overly favorable self-comparisons. A balanced discussion builds trust; a biased one erodes it. You enforce balance because the field depends on it.
43. Check that limitations are honestly stated. Every study has limitations, and an honest limitations section addresses the real weaknesses rather than offering cosmetic acknowledgments of minor issues. You assess whether the limitations section engages with the methodological problems you identified in your review: the unmeasured confounders, the missing data, the short follow-up, the non-representative sample. If the authors list ‘the cross-sectional design limits causal inference’ while ignoring the 40% loss to follow-up, you redirect their attention. A limitations section that does not address the paper’s actual weaknesses is not a limitations section -- it is a deflection. You have read enough of both to know the difference.
44. Evaluate the conclusions for overstatement. This is the single most common problem in manuscripts submitted to medical journals: conclusions that exceed what the data support. ‘Our findings suggest’ becomes ‘we have shown.’ ‘This association may indicate’ becomes ‘this proves.’ An observational finding becomes a recommendation. A preliminary result becomes a definitive answer. You catch these overstatements because you have read the entire paper and you know exactly what the data do and do not support. You rewrite the world’s conclusions, one review at a time, to match what the evidence actually shows. This is unglamorous, repetitive, essential work. It is the reason peer-reviewed science is more trustworthy than non-peer-reviewed science, and it is done by volunteers.
45. Assess whether the language is appropriate for a global readership. Medical research is published in English but read in every country on earth. You flag idioms, culturally specific assumptions, health system-specific terminology, and language that could confuse non-native English speakers. You note when the authors assume that all readers are familiar with a particular country’s insurance structure, drug naming conventions, or clinical guidelines. You ensure that the paper communicates clearly to a physician in Nairobi, a researcher in Tokyo, and a student in Buenos Aires. This is not copy editing. It is ensuring that the global reach of the journal is matched by the global accessibility of the content. You are an unpaid globalization consultant.
46. Assess the clinical or practical implications. If this is applied research, you ask the question that matters most: would a practicing clinician change their behavior based on this evidence? Should they? You assess whether the findings are robust enough, the effect large enough, and the population representative enough to warrant a change in practice, or whether this is preliminary evidence that needs confirmation. In medicine, premature adoption of insufficiently validated interventions causes harm. The history of obstetrics is full of examples: routine episiotomy, continuous electronic fetal monitoring for low-risk pregnancies, bed rest for preterm labor prevention. Each was adopted on insufficient evidence and later shown to be ineffective or harmful. You stand between the manuscript and the patient, applying clinical judgment that took a career to develop.
47. Evaluate the suggested future directions. Are the recommended future studies meaningful and feasible, or are they boilerplate? ‘Future research should explore this association in larger populations’ is a sentence that appears in approximately ten thousand papers per year and advances knowledge by exactly zero. You assess whether the suggested directions follow logically from the study’s findings and limitations, whether they are specific enough to guide actual research planning, and whether they acknowledge the practical and ethical constraints on the proposed work. Good future directions tell the next researcher what to do differently. Bad ones fill space. You know the difference.
48. Check key references for accuracy. Citation accuracy is worse than most people assume. Studies have found that 25% or more of references contain errors, and a meaningful proportion do not support the claim for which they are cited. You spot-check the most important citations: does the referenced paper actually say what the authors claim it says? Was the finding reported correctly, or was it distorted in translation? Is the citation being used to support a claim that the original authors would not endorse? You do not check every reference -- there are too many -- but you check the ones that matter, the ones on which the paper’s argument depends. Citation distortion is a form of misinformation that propagates through the scientific literature, and you are one of the few people who catch it. You catch it because you have read the original papers. That is what expertise is.
References and Supplementary Material
49. Assess supplementary materials. Supplementary tables, figures, appendices, datasets, protocols, sensitivity analyses: these are the attic of the manuscript, and problems accumulate there because authors assume nobody looks. You look. You check supplementary tables for consistency with the main text, review supplementary figures for the same rigor you apply to main figures, and read supplementary methods for details that should have been in the main methods section but were exiled for space. In some manuscripts, the supplementary material is larger than the main paper and contains the analyses that actually matter. If you skip the supplements, you are reviewing only part of the paper. The publisher does not pay you extra for the supplements. But you read them because the science requires it.
50. Evaluate data availability statements. Many journals now require authors to state whether and where their data are available. You check whether the stated repository actually contains the promised data, whether the data are truly accessible or hidden behind bureaucratic barriers, and whether the level of detail is sufficient for replication. In some cases, the data availability statement says ‘available upon reasonable request,’ which in practice often means ‘not available.’ You flag this because open data is a stated value of the scientific community, and you hold manuscripts to the standard the community claims to embrace. This is especially important for publicly funded research, where the data arguably belong to the public. You verify what the checklist merely asks.
51. Assess adherence to reporting guidelines. CONSORT for randomized trials, STROBE for observational studies, PRISMA for systematic reviews, STARD for diagnostic accuracy studies, CARE for case reports: each reporting guideline specifies the minimum information that should be included in a paper of that type. You check whether the appropriate guideline was followed, whether the checklist is complete, and whether adherence is genuine or superficial. Some authors complete the checklist by filling in section numbers without actually including the required information in those sections. This is checkbox compliance, and you see through it because you know what the guidelines require substantively, not just formally. Reporting guidelines were created because the quality of published research was poor. You enforce them because, without enforcement, they are merely suggestions.
Crosscutting Quality Checks
52. Evaluate the writing quality. You assess the clarity, organization, grammar, and logical flow of the manuscript. Some papers require minor language editing. Others require fundamental restructuring: the key finding is buried on page 12, the introduction does not connect to the conclusion, the methods and results are presented in different orders, or the writing is so opaque that the science cannot be evaluated. You diagnose the severity of the writing problems and prescribe the appropriate remedy: minor revisions for polishing, major revisions for reorganization, or rejection when the problems are so severe that revision would essentially mean writing a new paper. In many cases, particularly with manuscripts from non-native English speakers, you essentially rewrite portions of the text in your comments, providing corrected sentences or paragraphs as examples. This is editorial labor at the highest level, performed for free.
53. Check for plagiarism and redundant publication. Plagiarism detection software catches verbatim copying, but the subtler forms -- paraphrased plagiarism, idea theft, self-plagiarism, and redundant publication -- require human judgment. You notice when a passage feels familiar, when the phrasing is too polished for the rest of the manuscript, or when you have seen this dataset before in a different paper with a slightly different research question. Salami-slicing -- publishing the same dataset multiple times with minor variations -- is a judgment call that no algorithm can make, because determining whether two papers are truly independent requires understanding the data, the question, and the field. You make this call because you have the context that the software lacks.
54. Assess for figure manipulation or data fabrication. Do the Western blot bands look spliced? Are the microscopy images duplicated across panels? Are there unusual patterns in the histograms that suggest data simulation? Do the standard deviations defy the expected relationship with the means? You apply pattern recognition built over a career of looking at real data, comparing what you see against what you expect. This is forensic work, and in fields plagued by image manipulation (cell biology, for example), it has become an essential part of the reviewer’s role. Journals are beginning to use AI-assisted image screening, but these tools supplement rather than replace the human eye. You are not paid forensic rates for this forensic work. You are not paid at all.
55. Evaluate the author list and contributions. Does the number of authors match the scope of work? A straightforward survey with 14 authors raises questions. A multi-center international trial with three authors raises different questions. You check whether author contributions are stated and whether they are plausible: can all listed authors truly have contributed substantially to conception, design, analysis, or writing? Are there likely ghost authors -- professional medical writers or statisticians whose contributions are not acknowledged? Is there evidence of honorary authorship -- senior names added for prestige without meaningful contribution? These are delicate assessments that require knowledge of how research is actually conducted in your field. You make them because the integrity of authorship is the integrity of accountability.
56. Assess whether the manuscript was likely generated or substantially written by AI. This is a new obligation that did not exist two years ago, and it has been added to your job description without discussion, training, or compensation. You evaluate whether the prose has the hallmarks of large language model output: uniformly correct but oddly generic phrasing, lack of a distinctive authorial voice, suspiciously smooth logical transitions, and an absence of the rough edges that characterize authentic human writing. You look for factual errors of the kind AI systems produce: plausible-sounding citations that do not exist, confident claims unsupported by the referenced source, and numerical details that are internally inconsistent in ways a human author would catch. Some journals now require disclosure of AI use. You assess whether that disclosure is honest. The publisher added this to your unpaid responsibilities because the technology moved faster than their policies.
57. Check for undisclosed conflicts of interest. The disclosure form lists what the authors chose to declare. You bring knowledge the form does not: you know, from your presence in the field, that an author consults for a company whose product is being studied, that a co-author’s spouse holds a patent related to the intervention, that the corresponding author’s laboratory is funded by the manufacturer. You flag undisclosed conflicts not because you are adversarial but because you know that financial interests, when unreported, distort the literature in measurable and documented ways. The publisher cannot make this assessment. Only a reviewer embedded in the field’s social and financial network can, and you do it as a matter of professional ethics, without compensation, without recognition, and often without thanks.
58. Evaluate the overall coherence of the manuscript as a single argument. Beyond checking individual sections, you assess whether the paper holds together as a unified intellectual contribution. Does the introduction’s question lead logically to the methods’ design? Do the results answer the question that was asked? Does the discussion interpret the results that were reported, or has it drifted into territory the data do not support? A paper can be technically competent in every section and still fail as a whole because the pieces do not fit together. You detect this because you read the paper as a reader, not just as a checker of boxes. This holistic assessment is the most valuable thing you provide, and it cannot be decomposed into a checklist or automated by an algorithm.
59. Write a structured summary of the paper’s strengths. Good reviewing starts with what the authors did well. This is not diplomacy; it serves a specific purpose. It tells the editor which elements of the paper are solid and should be preserved through revision. It signals to the authors that the reviewer engaged seriously with their work. It prevents the demoralization that comes from receiving a review that is purely negative, which can discourage legitimate researchers from continuing their work. Writing a genuine, specific summary of strengths requires that you actually found strengths, which in turn requires that you read carefully enough to recognize them. ‘The question is interesting’ is not a strength summary. ‘The use of propensity score matching with multiple imputation for missing covariates represents a methodological improvement over prior work in this area’ is.
Writing the Review
60. Write a detailed critique of each weakness. Specific, constructive, actionable. Not ‘the methods are weak’ but ‘Table 2 reports unadjusted odds ratios for the primary outcome; adjustment for maternal age, parity, and insurance status would strengthen the analysis and should be included as a sensitivity analysis.’ This level of specificity requires deep engagement with the manuscript and genuine thought about how to improve it. You write not merely to identify problems but to solve them, providing the authors with a roadmap for revision. Each critique is a miniature consultation: diagnosis of the problem, explanation of why it matters, and prescription for how to fix it. This intellectual labor is the core of peer review, and it is what distinguishes a helpful review from a gate-keeping exercise. You invest this effort because you believe that making a paper better is the point, not just deciding whether it is good enough.
61. Prioritize your concerns. Not all problems are equal. A fundamental design flaw that cannot be corrected by additional analysis is a different problem from a poorly constructed figure that can be remade in an afternoon. You categorize your concerns: major issues that threaten the validity of the conclusions, moderate issues that weaken the paper but are addressable, and minor issues of presentation, formatting, or style. The editor needs this hierarchy to make an informed decision about the manuscript’s fate. A paper with one fatal flaw and twenty minor issues is a reject. A paper with zero fatal flaws and twenty minor issues is a major revision. Your prioritization determines the outcome, and getting it right requires judgment that cannot be reduced to a scoring rubric.
62. Provide specific, actionable recommendations. For each weakness, you provide a path forward. Run this additional analysis. Reframe this conclusion. Add this limitation. Remove this unsupported claim. Restructure this section. Cite this paper. Each recommendation is a gift of expertise: you are telling the authors, based on your knowledge of the field and your assessment of their data, exactly what they should do to make their paper publishable. Some recommendations take more time to formulate than the original criticism. ‘The survival analysis does not account for competing risks; a Fine-Gray model would be more appropriate here’ is a sentence that requires you to know what a Fine-Gray model is, when it should be used, and whether the data support its application. That sentence represents years of training, delivered in one line, for free.
63. Make a recommendation to the editor. Accept as is, accept with minor revisions, major revisions required, or reject. This recommendation is the most consequential judgment you make in the review. It affects the authors’ careers, their funding, their promotion cases, and their academic futures. It also affects what the scientific literature contains and what clinicians will read and act upon. You take this recommendation seriously. You agonize over borderline cases. You weigh the strengths against the weaknesses, the novelty against the limitations, the potential impact against the methodological concerns. And then you make a call, knowing that you may be wrong, and that the editor may overrule you. The weight of this decision is entirely yours to bear, and the publisher bears none of the cost of the expertise required to make it.
64. Write confidential comments to the editor. Separately from the review that the authors will see, you write a candid assessment for the editor’s eyes only. This may include concerns about the authors’ integrity that would be inappropriate to share directly, your honest assessment of whether the paper is suitable for this particular journal, context about the field that helps the editor interpret your review, and the reasoning behind your recommendation. These comments require a different register: more direct, more strategic, more honest about ambiguity. They are the conversation between professionals that enables the editorial decision. You write them carefully because you know they carry weight. They are also the most invisible part of your labor -- even the authors never see them.
65. Proofread and refine your own review. Before submitting, you review your review. Is it fair? Have you been too harsh, too lenient, too vague, too specific? Have you missed anything? Are your comments clear enough that the authors can act on them without ambiguity? Would you be comfortable if the authors learned your identity -- if this were an open review? Would a colleague reading your review consider it constructive and professional? You revise, rewrite, and sometimes sleep on it, returning the next day with fresh eyes to ensure that the review reflects your best professional judgment, not your late-night irritation at a poorly written methods section. Quality control on your own unpaid labor: yet another thing nobody compensates you for.
66. Format the review to the journal’s specifications. Some journals want numbered comments organized by section. Others want a narrative assessment followed by a list of specific concerns. Some have online forms with scoring rubrics (originality 1-5, methodology 1-5, significance 1-5). Some want separate sections for major and minor comments. Some require you to indicate whether the paper meets ethical standards, adheres to reporting guidelines, and is appropriate for the journal’s scope -- each as a separate field. You adapt to each journal’s format because nobody standardized this, and the reviewer, as usual, absorbs the cost of the system’s inconsistency.
67. Re-read the original manuscript and your original review. When the revised manuscript arrives, weeks or months after your original review, you retrieve your notes, re-read your critique, and prepare to evaluate whether the authors addressed your concerns. This requires re-engaging with a paper you have largely moved on from mentally. You must reconstruct your detailed understanding of the methods, your specific objections, and your overall assessment -- all from a document you last thought about weeks ago. Some reviewers keep detailed files. Others rely on the journal’s system to provide the prior correspondence. Either way, re-entry into a previous review is cognitive overhead that compounds across your review workload.
Revision Review
68. Assess the authors’ response to each point. Authors provide a point-by-point response to your comments. You read each response and assess: Did they address the concern substantively, or did they provide a superficial acknowledgment? Did they perform the additional analysis you requested? Did they modify the text as suggested? Did they push back on any of your points, and if so, is their rebuttal convincing? Some authors genuinely engage with reviewers’ critiques and improve their papers substantially. Others pay lip service while changing nothing. You distinguish between the two by comparing the response letter against the revised manuscript, line by line, ensuring that promised changes were actually made. This is meticulous, time-consuming work. It is also entirely unpaid. The publisher bills for the published version. Your role in creating it is invisible.
69. Re-evaluate the revised manuscript. A revised manuscript is not always a better manuscript. Sometimes the revisions introduce new problems: new analyses that were done incorrectly, new text that contradicts existing text, new conclusions that overreach the new data. Sometimes the authors’ attempt to address one concern creates a different weakness. You read the revised manuscript with the same rigor as the original, assessing not just whether your concerns were addressed but whether the revision is coherent and sound as a whole. The goal is not merely to check boxes but to determine whether the paper, as revised, now meets the standard for publication. This second review typically takes 1 to 3 hours -- less than the first, but still a significant investment of time.
70. Check that new analyses are correctly performed. If you requested additional statistical analyses -- subgroup comparisons, sensitivity analyses, adjusted models, alternative endpoints -- you now verify that they were executed correctly. This means examining new tables and figures, checking whether the analytical approach matches what you suggested, and assessing whether the results change the paper’s conclusions. Sometimes new analyses reveal that the original conclusions were wrong, which requires a complete rethinking of the discussion. You evaluate all of this with the same expertise you brought to the original review. The additional intellectual labor scales with the complexity of your original requests: the more thorough your first review, the more work the second review requires.
71. Write a second review. You provide the editor with your assessment of the revision: were your concerns addressed? Are new issues present? Is the paper now acceptable, or does it need further work? This second review is usually shorter than the first but still requires careful articulation. In some cases, a third round of revision is needed, and you review again. Each round is uncompensated. Each round requires the same professional standards. The publisher charges the same amount for a paper that went through one round of review as for a paper that went through four. Your additional labor is not reflected in the price.
72. Stay current in your field. You cannot review what you do not understand, and understanding your field requires continuous investment. You read newly published studies, attend conferences, participate in journal clubs, follow preprint servers, and maintain active engagement with the clinical or research questions in your specialty. This ongoing education is what makes you competent to review. It is not compensated by the publisher, who benefits from your expertise without contributing to its maintenance. The hours you spend staying current are the hours that make you valuable as a reviewer. They are invisible in the economics of publishing but essential to its function. If every reviewer stopped reading for a year, the review system would collapse within months. The publisher depends on your intellectual curiosity being self-funded.
Maintaining Expertise
73. Maintain statistical literacy. Research methods evolve, and the reviewer must evolve with them. Propensity score methods, Bayesian frameworks, machine learning applications, causal inference techniques, network meta-analysis, individual patient data meta-analysis: these approaches were rare or nonexistent when many current reviewers completed their training. You learned them on your own, through textbooks, online courses, workshops, and sheer necessity. The papers you are asked to review use methods you were not taught, and you are expected to evaluate them competently. This ongoing methodological self-education is a prerequisite for useful peer review, and it is entirely self-funded. The publisher benefits from your investment without sharing the cost.
74. Maintain awareness of research integrity issues. Paper mills, image manipulation, fake peer review rings, citation cartels, predatory publishing: the landscape of scientific misconduct is evolving rapidly, and new schemes emerge faster than detection methods can adapt. You stay informed through retraction watch databases, integrity conferences, social media discussions, and publications on research ethics. This awareness allows you to recognize warning signs in manuscripts you review: suspiciously fast data collection timelines, implausibly large sample sizes, boilerplate methods sections that appear across multiple papers, and author lists that mix unrelated specialties in suspicious combinations. Your vigilance is unpaid but essential. When a reviewer catches a paper mill submission, the publisher avoids a retraction. When a reviewer misses it, the published literature is contaminated.
75. Teach peer review to trainees. Residents, fellows, junior faculty: you train the next generation of reviewers by involving them in your reviews, discussing methodology and critique strategies, and modeling the standards of constructive evaluation. Some programs formally include peer review training; most do not, leaving senior reviewers to fill the gap informally. You create the future reviewer workforce that the publishing system depends on. This mentorship is uncompensated by publishers and unrecognized by most academic promotion committees. It persists because experienced reviewers understand that the system’s continuity depends on transferring not just knowledge but judgment -- and judgment can only be taught through supervised practice.
76. Update your reviewer profile across platforms. Publons, ORCID, Web of Science reviewer profiles, journal-specific databases: you maintain your availability, expertise areas, and review history across multiple platforms so that editors can find and invite you efficiently. Each platform has its own interface, its own update requirements, and its own nagging reminder emails. You keep these current because the publisher’s ability to recruit appropriate reviewers depends on the accuracy of the information you provide. This is administrative labor that serves the publisher’s recruitment pipeline. It is, naturally, uncompensated.
77. Respond to editorial queries and follow-up. The editor has a question about your review. The author’s response raises an issue you did not anticipate. The editor-in-chief wants clarification on a specific point before making a final decision. You receive these queries days or weeks after you submitted your review, after you have mentally moved on to your own research, your patients, your teaching. You re-engage, reconstruct your reasoning, and provide the requested clarification or additional assessment. This follow-up can involve a single email or a sustained exchange over several weeks. It is never scheduled, always reactive, and consistently uncompensated. The editor needs your expertise on their timeline, and you provide it because you committed to the review and you see it through.
System Participation
78. Meet deadlines. The publisher sets the timeline: typically 14 to 21 days for a first review, 7 to 14 for a revision. These deadlines arrive without regard to your existing commitments. The manuscript lands during grant season, during exam week, during your vacation, during a family crisis. You are expected to deliver a quality review on schedule because the authors are waiting, the editor is tracking turnaround times, and the journal’s publication timeline depends on reviewer compliance. When you cannot meet a deadline, you negotiate an extension, which requires an email exchange that is itself unpaid labor. The irony is not lost on you: the publisher penalizes slow turnaround in its metrics and editorial reports, but has no mechanism to reward timely delivery beyond an automated thank-you email.
79. Decline reviews you cannot do well. Knowing what you do not know is a form of expertise, and declining a review appropriately is a service to the system. You decline when the topic is outside your competence, when you cannot meet the deadline, when you have a conflict, or when your current review load is too high for quality work. Each declination requires a response to the editor, often with an explanation and sometimes with suggested alternative reviewers. This generates no credit, no compensation, and no formal acknowledgment. But it prevents a bad review -- an incompetent review, a rushed review, a conflicted review -- from entering the system. The quality of peer review depends as much on wise declinations as on excellent reviews. Nobody tracks or rewards this.
80. Navigate multiple submission systems. ScholarOne, Editorial Manager, eJournalPress, OJS, Aries, Elsevier Editorial System, Springer’s system, BMJ’s system, and whatever new platform a society journal adopted last year: each has a different interface, different login credentials, different workflows, different quirks. Some remember your preferences; others do not. Some allow you to download all files at once; others require individual downloads. Some time out if you pause too long while composing your review. You learn them all, not because you want to, but because the system’s fragmentation is the reviewer’s problem to solve. Publishers invest heavily in submission system infrastructure. The user experience for reviewers is consistently an afterthought. You adapt because the alternative is to stop reviewing, and you are not ready to do that yet.
81. Manage your review workload. You track which reviews you have agreed to, what is due when, and what you should decline. No centralized system does this for you across journals. You use spreadsheets, calendar entries, email flags, or memory -- often a combination. You balance the professional obligation to the field against your other responsibilities, trying to review enough to contribute but not so much that quality suffers. The rule of thumb is that you should review at least as many manuscripts per year as you submit, but many reviewers do far more because the demand exceeds the supply. Workload management is a meta-task that sits on top of all other review labor, and it is as invisible as it is essential.
82. Maintain confidentiality. The manuscript you are reviewing is unpublished, proprietary scientific work. You do not discuss it with colleagues, share it with trainees (unless the journal explicitly permits co-reviewing), mention it at conferences, or use the unpublished data in your own research. You maintain this confidentiality even when the findings are exciting, even when they directly relate to your own work, and even when sharing them would advance your interests. This ethical discipline is self-enforced; no one monitors your compliance. The system trusts you because you are trustworthy, and that trust is the foundation of confidential peer review. It costs you nothing in money and everything in self-restraint.
83. Handle ethically complex situations. You discover what appears to be data fabrication. You realize mid-review that you have a previously unrecognized conflict of interest. The manuscript is from your department chair, your grant collaborator, or your former mentor. The findings contradict a study you just published. The paper describes a clinical practice you believe is harmful to patients. Each of these situations requires ethical judgment exercised under ambiguity, without a manual, and without compensation. You navigate them by applying the professional values you internalized over a career: honesty, fairness, intellectual rigor, and concern for the integrity of the scientific record. The publisher provides guidelines for some of these scenarios. For most, you are on your own.
84. Read background literature to prepare for the review. For a topic adjacent to but not squarely within your primary expertise, you may need to read 5 to 15 additional papers before you are competent to review the manuscript. This preparation time is invisible to the publisher. A meta-analysis of interventions for gestational diabetes requires you to understand both meta-analytic methodology and the clinical management of gestational diabetes. If your expertise is primarily in one of those areas, you invest hours filling the gap in the other. This background reading is not optional; without it, your review would be superficial. Yet it appears nowhere in the publisher’s accounting of what peer review costs, because it occurs before you type a single word of your review.
The Hidden Labor
85. Cross-reference the study against trial registries. You search ClinicalTrials.gov, ISRCTN, the WHO International Clinical Trials Registry, or other relevant registries to find the original protocol for the trial under review. You compare registered endpoints against published endpoints, registered sample sizes against achieved enrollment, and registered analysis plans against reported methods. This detective work is how outcome switching is detected: when a trial registers a primary outcome of hospital readmission but publishes with a primary outcome of patient satisfaction, someone has to notice. That someone is you. This cross-referencing can take 30 minutes to an hour, depending on the complexity of the trial and the quality of the registry record. It is among the most important fraud-detection activities in science, and it is performed by unpaid volunteers.
86. Verify that the study was not previously published elsewhere. Duplicate publication wastes journal space, inflates the literature, and can bias meta-analyses by counting the same data twice. You check whether the manuscript, or a substantially similar version, has been published elsewhere, is available as a preprint, or overlaps with the authors’ other publications. This requires familiarity with the authors’ body of work and with the broader literature in the field. When you identify an overlap, you must assess whether it is legitimate (a brief report followed by a full paper) or problematic (the same dataset published with minor variations in different journals). This judgment requires the kind of deep field knowledge that takes years to acquire and cannot be replicated by a database search.
87. Assess the paper in the context of the journal’s recent publications. Has this journal recently published a similar paper? Would this manuscript add incrementally to the journal’s coverage of this topic, or would it be redundant? Is this submission part of an emerging cluster of papers on the same topic, suggesting that the field is moving rapidly and the paper is timely? You consider the journal’s editorial portfolio because you read the journal regularly, and you can place this submission within the trajectory of what the journal has recently published. This context helps the editor make a decision that considers not just the paper’s individual merit but its fit within the journal’s evolving content. You provide this strategic assessment because you are a member of the journal’s community, not just an anonymous labor source.
88. Consider the paper’s potential impact on clinical practice. In clinical fields, what a journal publishes can change how doctors treat patients. If this paper is published and its conclusions are adopted, what would happen? Would patient outcomes improve? Would they worsen? Would a treatment that is effective in the study population be applied inappropriately to a different population? Would a premature recommendation lead to widespread adoption of an unproven intervention? You think through the downstream consequences of publication because you understand the clinical reality that the paper addresses. This is not abstract. It is the connection between the published literature and the patient in the clinic. You stand between the two, and the publisher does not.
89. Evaluate whether the study population is representative. External validity -- the generalizability of a study’s findings to populations beyond the one studied -- is one of the most important and most commonly overlooked aspects of clinical research. You assess whether the study population is representative of the broader population to which the findings will be applied. A study conducted exclusively in academic medical centers may not generalize to community hospitals. A study in Northern European populations may not apply to South Asian or African populations. A study that excludes patients with comorbidities may not reflect the patients clinicians actually treat. You flag these limitations because you know the patients you treat, and you know when a study population does not look like them.
90. Assess the funding source for potential bias. Industry-funded studies produce favorable results more often than independently funded studies -- this is documented across multiple systematic reviews and is one of the most robust findings in the meta-research literature. You do not dismiss industry-funded studies reflexively, but you read them with calibrated skepticism: Are the comparators fair? Are the outcomes patient-centered or surrogate? Is the analysis plan optimized to produce favorable results? Are the limitations stated honestly? This contextual awareness is clinical epidemiology applied to the manuscript itself, and it requires a level of sophistication that goes beyond what any checklist can capture.
91. Evaluate whether the informed consent process was adequate for the study’s risks. Beyond checking that consent was obtained, you assess whether the consent process was proportionate to the study’s risks. Were participants genuinely informed of alternatives? Were the potential harms explained in understandable language? In studies involving vulnerable populations -- pregnant women, critically ill patients, neonates -- you assess whether the consent process reflected the particular ethical obligations these populations entail. The gap between a signed consent form and genuine informed consent is where patient autonomy lives or dies. You assess it because you know, from clinical experience, what patients actually understand and what they merely sign.
92. Consider the patient or public perspective. Will patients understand this finding if it reaches the media? Could the results be taken out of context in ways that cause alarm or false reassurance? Would the conclusions, if reported by a journalist with no scientific training, be accurately conveyed or inevitably distorted? You think beyond the academic bubble because you have seen what happens when a nuanced finding is reduced to a headline. A study showing that a medication slightly increases a rare risk can generate panic if the absolute risk is not communicated. A study showing marginal benefit can generate false hope if the limitations are not stated. You consider these downstream effects because the publisher’s marketing department will not.
93. Assess whether the conclusions could be weaponized. In politically charged areas of medicine -- reproductive health, vaccination, gender-affirming care, substance use -- published findings can be extracted from context and used to support policy agendas the authors never intended. You consider whether the paper’s language could be selectively quoted to support a position the data do not actually support. This is not censorship; it is responsible science communication. You may suggest that the authors add caveats, reframe conclusions, or explicitly state what the data do not show, to reduce the risk of misuse. This forward-looking assessment requires not just scientific judgment but awareness of the political and social environment in which the science will be received. No publisher asks for this. You do it because you live in the real world.
94. Provide mentorship through your review. For early-career researchers, a thoughtful peer review may be the most detailed, expert feedback they have ever received on their work. You write your review not only to evaluate but to teach: explaining why a certain statistical approach is preferable, why a particular framing is misleading, how the discussion could be strengthened. This pedagogical dimension of peer review is invisible to publishers and unquantifiable in metrics, but it is one of the most valuable functions the system performs. Many researchers can trace their methodological improvement to specific reviewer comments on early papers. You provide this mentorship because you remember the reviews that made your own work better, and you pay it forward. The publisher charges for the product. You invest in the people.
95. Catch errors that automated systems miss. Unit errors, decimal point misplacements, mislabeled graphs, transposed digits in tables, incorrect confidence interval bounds, means that fall outside the range of possible values, percentages that sum to 110%: these errors elude formatting checks and statistical software because they are content errors, not format errors. You catch them because you know what the numbers should look like, because you have clinical intuition about plausible values, and because you read carefully. A birth weight of 35,000 grams is obviously wrong. A confidence interval of 0.5 to 15.3 for a risk ratio that should be close to 1.0 demands explanation. A p-value of 0.000 is not a number. You catch what the system cannot because you bring judgment to arithmetic.
96. Assess whether the work is ethical. Beyond IRB approval and informed consent, you assess whether the research question itself is ethically defensible. Were vulnerable populations adequately protected? Were the burdens and benefits of participation distributed fairly? Was the study designed to produce knowledge that could not have been obtained through less invasive means? In some cases, you encounter studies that are technically approved but ethically troubling -- studies that exploit power differentials, that impose unreasonable burdens on disadvantaged populations, or that ask questions whose answers could be used to harm the people being studied. You raise these concerns not as a regulatory body but as a moral agent. This is perhaps the most consequential judgment a reviewer can make, and it is entrusted to unpaid volunteers.
97. Recognize when a manuscript requires expertise you do not have, and say so. Sometimes you realize mid-review that the paper has moved into territory beyond your competence: an unfamiliar statistical method, a subspecialty clinical question, a population you have no experience with. Rather than bluffing through an incompetent review, you contact the editor and recommend that an additional reviewer with the appropriate expertise be recruited. This honest self-assessment is a service to the system that goes entirely unrecognized. It is the opposite of Dunning-Kruger, and it is what keeps peer review functional despite the impossibility of any single reviewer mastering every domain a paper might touch.
98. Accept that you will not be paid. The publisher charges the author, or the reader, or the library, or the funder for the product you helped create. Article processing charges range from $2,000 to $11,000 per paper. Subscription revenues generate billions annually for the major commercial publishers. Your labor is the only essential input in the entire production chain that is donated. The copy editor is paid. The typesetter is paid. The production manager is paid. The platform engineer is paid. The sales team is paid. The marketing department is paid. The CEO is paid -- at Elsevier, RELX’s CEO earns over 1.5 million euros per year. The reviewer, whose intellectual contribution is the single thing that distinguishes a peer-reviewed journal from a preprint server, is the only person in the building who works for free.
99. Accept that your review may be ignored. Editors sometimes overrule reviewers. Papers you recommended rejecting get published. Papers you championed get killed by the other reviewer. Your carefully reasoned assessment may be weighed against a contradictory review from someone with less expertise but more enthusiasm, and the editor may split the difference in a way that satisfies nobody. You contribute your best professional judgment knowing that the outcome is not yours to control. This is not a complaint -- editorial discretion is necessary and appropriate. But it means that the hours you invested may not determine the result. You review anyway, because the process matters even when the outcome disappoints, and because the next paper might be the one where your review prevents a harmful finding from reaching patients.
The System Cost
100. Absorb the opportunity cost. Every hour spent reviewing is an hour not spent on your own research, your patients, your teaching, your grant applications, your family, or your rest. The true cost of peer review is not the hypothetical fee of $300 to $600 per review. It is the paper you did not write, the patient you did not see, the student you did not mentor, the proposal you did not submit, and the sleep you did not get. Opportunity cost is invisible in financial statements, but it is real in careers and in lives. Multiplied across millions of reviews per year, the aggregate opportunity cost to the global academic workforce is staggering -- and it accrues entirely to the benefit of publishers who record it nowhere in their accounts.
101. Subsidize a profitable industry. For the large commercial publishers, your unpaid labor contributes directly to operating margins that consistently exceed 30% -- margins that would make most industries envious. Elsevier reported adjusted operating profit of over one billion euros in 2023. Springer Nature, Wiley, Taylor and Francis: the same pattern. These are not struggling enterprises that cannot afford to compensate their most important contributors. They are highly profitable corporations whose business model depends on a single structural anomaly: the one participant whose labor is free happens to be the one whose expertise is irreplaceable. You are not volunteering for a charity. You are donating skilled professional time to a commercial enterprise that has decided, as a matter of business strategy, not to pay for it.
102. Do it anyway, because the alternative is worse. A world without peer review is a world where anything gets published and nothing can be trusted. The system is imperfect, slow, sometimes biased, and structurally dependent on exploitation. But it is better than the alternative. You review because the work matters, because the scientific record matters, because patients downstream of that record matter. You review because you were trained by people who reviewed, and because you are training people who will review after you. You review because you believe that someone should stand between a manuscript and the public and apply the standard of ‘is this true and does it matter.’ You just wish that someone would notice that the person standing there is the only one in the room working for free.
The Tally
Anderson tallied which stakeholders benefit from each of his 102 publisher functions. Publishers themselves appeared as primary beneficiaries in 57 of 102. Editors and reviewers appeared as primary beneficiaries in 38.
Here is the tally for the 102 things reviewers do:
Compensation received for all 102 tasks combined: $0
Professional training required to perform them competently: 10 to 30 years
Time per review (first round): 3 to 8 hours
Time per revision review: 1 to 3 additional hours
Reviews performed globally per year: approximately 15 million
Estimated annual value of donated reviewer labor: $5 to $25 billion
Number of other participants in the publishing pipeline who work for free: 0
My Take
I have reviewed hundreds of manuscripts over my career. I have done every one of these 102 things, most of them hundreds of times. I do not regret the work. I believe in peer review. I believe it is the best mechanism we have for distinguishing reliable science from unreliable science, and I intend to keep doing it.
But I can count. And when a publisher with a 30% operating margin tells me there is no money to compensate reviewers, and then sends me a list of 102 things they do -- every one of them performed by paid employees -- to justify their revenue, I notice that the most skilled person in the room is the only one working for free.
The question is not whether publishers add value.
They do.
Anderson’s list proves it. The question is why the most intellectually demanding labor in the entire system -- the labor that is the sole reason a peer-reviewed journal is worth more than a preprint server -- is the only labor that is uncompensated.
If this list does nothing else, I hope it makes one thing visible: the next time a publisher explains why they cannot afford to pay reviewers, count the people in the building. Count the ones who are paid. Then count the ones who are not. The arithmetic is not complicated.


