ChatGPT failed the programming test – but fooled most of the examiners

AI experts openly recognize the fact that neural networks are often wrong. For this reason, they cannot be used, for example, to write program code. Nevertheless, there has been no definitive point on this issue. To gain clarity, the ChatGPT chatbot was asked to answer a large number of programming-related questions. The neural network managed to answer only a small part of them correctly.

The test was conducted by researchers from Purdue University and included more than 500 questions related to highly specialized programming problems. The obtained results were submitted to the experts-volunteers for consideration. Not only the correctness and consistency of answers were evaluated, but also their conciseness and completeness. In addition, the team analyzed the linguistic turns used and the general mood of each answer. As it turned out, the OpenAI AI was able to give a sane answer to only 48% of the questions.

Users found about 40% of the answers “plausible,” although 77% of those answers were incorrect. Human experts confidently identified an error only when it was obvious. If identifying the error required the use of external development environments, studying documentation, users tended to underestimate the likelihood of error in the AI’s answers. The pleasant, confident and positive tone of the presentation was partly to blame for this. The completeness of the answers and the use of a simple, easy-to-understand style made even incorrect answers seem correct in the eyes of the reviewers.

You may be interested: Amazon has announced the successful implementation of AI across the company’s business processes