The latest iteration of the artificial intelligence (AI) project, ChatGPT,1 has attracted considerable attention in the technology and education communities. ChatGPT is an AI-based chatbot that is capable of responding to queries, writing essays (even poems), as well as “solving” technical problems, including coding. ChatGPT is able to synthesize a coherent response, though incorrect sometimes, from a vast data and knowledge base. At the moment, many in the education community are concerned that students might use it to cheat, while others openly embrace its potentially productive use.2 Regardless of where one stands, we believe the physics education community needs to be aware of ChatGPT's capabilities. In this letter, we describe our initial tests of ChatGPT's basic problem-solving capabilities. (Readers can see the exact queries and responses in the supplementary material.3)
First, we asked ChatGPT this question (Table 1 of the supplementary material3): “Solve the motion of a body on a frictionless incline.” After typing the question in, ChatGPT gave a response outlining these key results: the force along the incline , acceleration , plus a kinematic equation for the position x. The free-response is very clear and impressive. Even though we worded the problem vaguely, ChatGPT was able to parse the problem, interpret our intent, assume variables such as g and the angle of incline θ, and give the most relevant quantity, the acceleration. If this were an assigned class problem to be graded, we would give .
With the same question, ChatGPT usually produces different responses in separate chat sessions. It can also regenerate different responses within the same session. We took the latter approach to request another response (see Table 1 of the supplementary material3). This response is quite different in wording, but the results are broadly the same, giving the correct acceleration along the incline. However, it also gave an extraneous—and wrong—result for the acceleration in the y direction as . It is probably due to confusion of the coordinate systems. This is interesting because sometimes we do see students giving the right results for the wrong reason. Still, we would probably give a mark.
Next, we tried a more specific variation of the problem typically found in introductory physics and asked this: “A body is at rest on an inclined plane. The angle is raised slowly such that the body is on the verge of sliding. Find the maximum angle.” ChatGPT gave a straightforward answer: The maximum angle, , can be calculated using the equation (Table 2 of the supplementary material3). It is worth noting that ChatGPT apparently “knew” well enough to infer that there was static friction in the problem and correctly assumed the coefficient of static friction. This response simply gave the result but did not explain the process. Impressive as it was, if we were to grade it as an assignment problem, we would probably give for not showing work or adequate explanation.
Because we have integrated computation into our physics curriculum, we were particularly interested in solving the problem with computation as well. So we asked ChapGPT to write a python program to solve it (Table 2 of the supplementary material3). ChatGPT output a complete program in real time—even with comments (see Table 3 of the supplementary material3). The program consisted of a main loop in which the angle is incremented in each iteration. At a given angle, the parallel component of gravity is compared with the static friction, and if the former is greater than the latter, the loop is terminated and the angle is taken as the required result. In addition, the program imported the math library, assumed the necessary variables (m, g, μs, θ0, , etc.), and printed the final result. We were pleasantly surprised, as well as impressed, that ChapGPT created a “simulation” utilizing an algorithm (rudimentary as it may be), instead of just printing a value from the formula that we had seen in other sessions. It would be worth if graded.
While ChatGPT's performance on these formulaic problems and coding was impressive, they tested a limited calculational ability. Thus, we decided to test ChatGPT on a nonformulaic conceptual question in quantum mechanics (Table 4 of the supplementary material3). So our next question was: “An electron is prepared with its spin in the positive z direction. After passing through a Stern-Gerlach selector in the x direction, what is the probability of finding the electron with its spin in the positive x direction?” ChatGPT did not hesitate and gave 0 as the answer, citing a nonsensical reason for the electron being deflected in the directions (see Table 4 of the supplementary material3). The answer is clearly wrong, because regardless of the spin being in the positive or negative z directions, the correct answer in either directions is due to equal representation in the superposition. Again, if the question was to be graded as a class assignment, we would give for effort.
Now, getting ChatGPT stumped and sensing weakness, we decided to challenge its response. We followed up with “We are not so sure. Should not the answer be ?” To our surprise, ChatGPT backtracked totally, simply responding with “Yes, you are correct. The probability … is ” (Table 4 of the supplementary material3). Further emboldened, we next asked: “Actually, we think the answer should be .” Well, ChatGPT did not put up a fight at all, repeating again “You are correct, I apologize for my previous mistakes. The probability … is indeed ”!
Based on our initial limited impression, ChatGPT seems very impressive in interpreting simple physics problems, assuming relevant parameters, and writing correct codes. It did not do as well—in fact it was totally wrong—in answering the conceptual question. Nonetheless, our limited observations show high potential. The physics education community is well-positioned to investigate the use and capabilities of ChatGPT and other AI systems. For instance, could the use of ChatGPT lead to more insight or better learning outcomes? Hopefully, more careful and controlled studies are forthcoming.
The author is grateful to Logan Cabral and Adam Wang for helpful discussions.