Assessing Research Using Error Carry Forward

Error carry forward is a marking scheme for multistep, complex mathematics questions, where the dependency from previous steps makes binary (correct-incorrect) schemes too coarse a measure for work quality. The basic premise is that not all errors are equal: some errors are more costly than others. Instead of a binary marking scheme, assessors may opt to give partial marks. I propose evaluating research containing coding errors in this light.

Research projects which contain coding errors need not be void of use but should be reviewed considering the error. Put simply, a corrected error which does not change the “conclusion” of the research (in a null hypothesis, statistical testing framework) is nonetheless an error. Whether the “conclusion” holds once correcting a coding error seems immaterial for discussing the quality of the research. The error may be sufficiently large to diminish the quality of the research without changing the conclusion of the research.

GOOD RESEARCH = GOOD IDEAS x GOOD INGREDIENTS x GOOD EXECUTION

I am supposing the combination of ideas, ingredients (data, literature, methods, etc.) and execution largely determine the quality of the research. An alternative way I’m thinking about this: do the ingredients and appropriate execution permit reasonable answers to the research question(s)? Splitting along these three dimensions helps focus the conversation about coding errors. The presence of coding errors supposes improvement only through the execution: the research can be no worse when we correct coding errors.

We must keep in mind the quality of the idea and ingredients are important. A perfect codebase cannot remedy inappropriate ingredients, an unanswerable research question or unconventional ideas. If any category is deficient, the whole research suffers. That is also to say a substantial part of the research quality is determined before execution.

What kinds of coding errors are deviations from the given quality of the work for one to no longer accept any outcome of the work?

ALL ERRORS GREAT AND SMALL

Executions thoughtto matter ex ante must be done correctly; everything else can be given grace. Herein lies the error carry forward marking scheme: small execution errors don’t ruin the whole; only large ones do.

Here is a motivating example from cooking: the preparation of a meal requires understanding what you are trying to cook (an idea or recipe), having ingredients and materials, and the execution. A meal can fail in many places: there can be execution mistakes, the ingredients need not be fresh or available, or the cook may not understand how everything comes together. Yet, in my view, infrequent, simple mistakes (slightly not enough butter, incorrect portion sizes, too salty, etc.) are of lesser importance than inappropriate ingredients, which are lesser still than how the meal comes together at different stages. We are likely to completely ruin a meal with larger errors (cooking too low a heat for twice the length, swapping fish for beef, using water instead of butter) than small execution errors (not finishing the dish with parsley, or checking seasoning). Many small errors in execution may ruin a meal, too.

The difficulty is knowing when the resulting answer does not pass a threshold due to the errors.

NUANCE

Unfortunately, I have no mapping for “things which matter” and those which do not. I cannot say exactly what is a great error and what is a small error. This is highly dependent on the research presented.

One heuristic which might prove fruitful: grace should be given if the error could be mistaken for a robustness check. Examples may include: accidental but innocuous inclusion or exclusion of some observations, slight changes in the control variables, using analogous estimators.

Additional examples where grace may be considered: unimportant variables mislabeled (but correctly constructed); differences in statistics attributable to rounding; anything outside the control of the research-producer (like inputs from data- or software-providers); contemporary methodological advancements (i.e. scientific progress).

Examples where I believe grace may not be considered: substantial misrepresentations of your ingredients (like collected data, deviations in used samples, incorrect scaling and construction of interpreted variables); not following your model in estimation or in presentation.