Abstract
In a hectic Continuous Integration (CI) environment, where several builds are triggered concurrently, legitimate build failures (e.g., not caused by flaky tests) may not always be related to the current push. These unrelated build failures can burden developers as they devote hours to attest whether errors are truly associated with their present changes. In this paper, we extract 77,354 CI build failures from 7 open source projects to understand and identify unrelated build failures. We attempt to provide an indication for developers about whether a build failure is likely to be related to the current push or not. Our results reveal that developers likely invest a median of 4 hours to determine whether a build failure is (un)related to their pushes. We perform a document analysis on a sample of 371 unrelated build failures (based on the 95% confidence level and 5% confidence interval from 10,316 potentially unrelated failures) to understand why build failures are deemed as unrelated by developers. The themes generated from our document analysis reveal that unrelated tests failures represent 20% of the cases of why build failures are deemed unrelated by developers. To predict whether a build failure is unrelated to the current push, we extract 33 features from issue reports, issue comments, and from the commits pertaining to the triggering push. We build semi-supervised PU-learning models over seven Apache projects and achieve precision ranging from 0.70 ± 0.01 to 0.88 ± 0.02 , recall ranging from 0.30 ± 0.03 to 1.00 ± 0.00, and F1-scores ranging from 0.44 ± 0.03 to 0.91 ± 0.00, while the area under the ROC curve (AUC) spans 0.63 ± 0.02 to 0.97 ± 0.03. Our analysis of feature importance reveals that (i) the time taken from a submitted patch to the build triggering push (CI latency), (ii) build failures sharing similar error messages with recent failures, and (iii) the number of comments preceding the build failure, are all efficient indicators for identifying potential unrelated build failures. The semi-supervised approach proposed in this work can help developers identify build failures that are unrelated to their current push, providing actionable guidance such as re-running builds, inspecting infrastructure logs, or prioritizing code-level debugging based on prediction outcomes.