A recent investigation led by Purdue University in the United States has highlighted a troubling concern regarding the accuracy of ChatGPT’s responses to programming inquiries. The study’s findings revealed that ChatGPT’s responses were erroneous in over half of the cases, and its sophisticated linguistic capabilities often led a substantial portion of participants astray.
The research team analyzed 517 programming queries sourced from Stack Overflow and assessed various aspects of ChatGPT’s answers, encompassing correctness, consistency, comprehensiveness, and brevity. Unfortunately, the results of the assessment were disappointing, unveiling that 52% of the provided answers were incorrect, and a significant 77% were unnecessarily lengthy. What added further apprehension was the observation that the AI’s eloquent and systematic language style frequently confounded participants. Only in instances where the mistakes were glaringly apparent were the participants able to discern the inaccuracies.
Despite the inaccuracies, nearly 40% of the participants favored ChatGPT’s responses. However, an alarming 77% of these preferred responses turned out to be wrong. The researchers, including figures like Samia Kabir, David Udo-Imeh, Bonan Kou, and Assistant Professor Tianyi Zhang, clarified that many errors stemmed from ChatGPT’s inability to grasp the contextual subtleties of the questions.
These revelations present a compelling argument that existing generative AI, in its current form, may not be a suitable tool for aiding in code generation and could even yield counterproductive outcomes. Recognizing this reality, various tech giants like Google, Apple, Amazon, and Samsung have issued alerts or imposed restrictions on the use of generative AI for code suggestions.
Reports indicate that OpenAI is actively developing its upcoming GPT iteration, GPT-5, which is anticipated to rectify these issues.