I4R.
← Back to BlogPublishing

Assessing Research Using Error Carry Forward

March 10, 2026 · Derek Mikola

Error carry forward is a marking scheme for multistep, complex mathematics questions, where the dependency from previous steps makes binary (correct-incorrect) schemes too coarse a measure for work quality. The basic premise is that not all errors are equal: some errors are more costly than others. Instead of a binary marking scheme, assessors may opt to give partial marks. I propose evaluating research containing coding errors in this light.

Research projects which contain coding errors need not be void of use but should be reviewed considering the error. Put simply, a corrected error which does not change the “conclusion” of the research (in a null hypothesis, statistical testing framework) is nonetheless an error. Whether the “conclusion” holds once correcting a coding error seems immaterial for discussing the quality of the research.

GOOD RESEARCH = GOOD IDEAS × GOOD INGREDIENTS × GOOD EXECUTION

I am supposing the combination of ideas, ingredients (data, literature, methods, etc.) and execution largely determine the quality of the research. Splitting along these three dimensions helps focus the conversation about coding errors. The presence of coding errors supposes improvement only through the execution: the research can be no worse when we correct coding errors.

We must keep in mind the quality of the idea and ingredients are important. A perfect codebase cannot remedy inappropriate ingredients, an unanswerable research question or unconventional ideas. If any category is deficient, the whole research suffers.

ALL ERRORS GREAT AND SMALL

Executions thought to matter ex ante must be done correctly; everything else can be given grace. Herein lies the error carry forward marking scheme: small execution errors don’t ruin the whole; only large ones do.

Here is a motivating example from cooking: the preparation of a meal requires understanding what you are trying to cook (an idea or recipe), having ingredients and materials, and the execution. A meal can fail in many places. Yet, in my view, infrequent, simple mistakes (slightly not enough butter, incorrect portion sizes, too salty, etc.) are of lesser importance than inappropriate ingredients. We are likely to completely ruin a meal with larger errors (cooking too low a heat for twice the length, swapping fish for beef) than small execution errors.

NUANCE

Unfortunately, I have no mapping for “things which matter” and those which do not. This is highly dependent on the research presented.

One heuristic which might prove fruitful: grace should be given if the error could be mistaken for a robustness check. Examples may include: accidental but innocuous inclusion or exclusion of some observations, slight changes in the control variables, using analogous estimators.

Additional examples where grace may be considered: unimportant variables mislabeled (but correctly constructed); differences in statistics attributable to rounding; anything outside the control of the research-producer.

Examples where I believe grace may not be considered: substantial misrepresentations of your ingredients (like collected data, deviations in used samples, incorrect scaling and construction of interpreted variables); not following your model in estimation or in presentation.