CCS researchers find Github CoPilot generates vulnerable code 40% of the time

A recent study by cybersecurity researchers at NYU Tandon finds that a significant amount of the code generated by Github CoPilot programming assistant is, at best, buggy, and at worst, potentially vulnerable to attack. The researchers drew their conclusion after creating 89 potential scenarios and having CoPilot output 1,692 programs. When these programs were reviewed, about 40 percent included bugs or design flaws that could be exploited by an attacker.

CoPilot was released by Github in June 2021 with the claim that it “puts the knowledge you need at your fingertips, saving you time and helping you stay focused.” Programmers can submit a brief description of functionality and CoPilot will automatically generate source code. Yet, as noted by the research team of Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri, all affiliated with NYU’s Center for Cybersecurity, there had been “no systematic examination of the security of ML-generated code. Their paper, released in August through ArXiv, attempts to fill this void with a study that characterizes “the tendency of Copilot to produce insecure code, giving a gauge for the amount of scrutiny a human developer might need to do for security issues.”

Their findings, as summarized in the paper is that “developers should remain vigilant” when “using Copilot as a co-pilot.” The team recommends pairing Copilot “with appropriate security-aware tooling during both training and generation to minimize the risk of introducing security vulnerabilities.”

The survey has attracted news coverage from Wired, New Scientist, The Register, and Communications of the ACM.