Come August, thereβll be plenty of takes on what generative AI will bring in the coming school year. I decided to get mine out here in July, even though plenty will happen between now and the first day of classes.
The revolution will not be
Last month, Dave Karpf posted an essay on
questioning the premise that LLMs are a general purpose technology like the printing press or the internet. If he is right, the promised Generative AI revolution will not happen.1 Karpf points out that a lot of what we see is just incremental improvements to existing technology.Think about ChatGPTs actual use-cases. Itβs a better Siri. A better Clippy. A better Powerpoint and Adobe. A better Eliza. A better Khan Academy and WebMD. None of these are new. They all exist.
Chatting with an LLM and trying to get it to do something useful for you is only one often frustrating form of generative AI that people encounter. More common is discovering a new AI feature embedded in Grammarly, Zoom, Canvas, or some other tools you used before ChatGPT came along. I just noticed a summary tool in JIRA, βpowered by AI.β When you use it, a note at the bottom says, βContent quality may vary.β JIRA is a workflow-tracking tool used by many technologists. Reading through a ticket with dozens of comments to find out whatβs going on can be frustrating, so itβs nice to have a shortcut. If I can trust it.
The glaring contradiction between βpowered by AIβ and the warning that its content may not be accurate sums up the situation with LLMs. They have uses, but they are not revolutionary or even all that useful. LLMs provide shortcuts or time-savers, and often, they are not worth the effort, especially if you care about the quality of the output.
Along with AI will take your job! and AI will let your students cheat! a central element of AI hype has been the promise that LLM-powered tools will do boring administrative work, freeing teachers and other knowledge workers to spend more time doing important stuff like talking with students or sleeping. This fall, we will start to discover just how much βmundane utility,β as Zvi Mowshowitz of
likes to call it, teachers and administrators will get out of co-pilots, summarizers, and natural language interfaces to complex information tools and databases. Maybe not as much as everyone was imagining six months ago? Remember, the demo always looks more exciting than the experience of actually using the product.The excitement about LLMs seems to be fading, and not just among the people using them. Goldman Sachs, which last year predicted that AI could raise global GDP by 7% is now asking skeptical questions about its economic prospects. As the newly skeptical analysts offer their pronouncements, it is worth keeping in mind that the sheer amount of capital involved and the historical cycles of boom and bust of past technology have been distorting the discourse. As Karpf says,
These LLMs are a significant advance on existing technology. It would be a mistake, I think, to pretend otherwise. But the only reason I can see for treating it like a distinct new general purpose technology is to shield these tools from the track record of dashed expectations and abject failures that recently preceded them.
Looks like this next academic year will be when the cycle turns, when the Silicon Valley disruptors discover, to their astonishment, that personalized learning in the form of the latest tech is not the answer to the challenges of schooling in modern society. Sure, they could have listened to Audrey Watters as she blogged about the history of this dream or read her 2021 book Teaching Machines: The History of Personalized Learning. Or, as Karpf reminded me, read Morgan Amesβs excellent account of the One Laptop per Child project that attempted to export the US-based dream of computer-based personalized learning to Paraguay.
Instead, AI enthusiasts read reports like this one published by venture capitalist Mary Meeker or watch another video demo of Khanmigo. Maybe if they just keep saying, βItβs a game-changer,β the money will continue to flow.
There are plenty of writers pointing out the ways that personalized learning through decontextualized tutoring tools doesnβt work and that while transformer-based language models may have entertainment value, that doesnβt mean we have figured out how to use them in classrooms. For example, read John Warnerβs takedown of Meekerβs report, or any of his essays on
. The problem for Silicon Valley is that history lessons and nuanced analysis of the social contexts of technology are not a recipe for instant utopia or instant riches. That takes vision.Iβve been blogging quite a bit about the failure of AllHere and the fate of Ed, the LA Unified School Districtβs chatbot, which seems like a sign of what might be coming. But I donβt want to get carried away by the backlash. Even amid the wreckage of Ed, I still think the idea of using an LLM as a language translator to help non-English speaking students and family members navigate LAUSDβs educational bureaucracy was worth trying. But instead of focusing on a specific problem and carefully designing a solution by working with their clients, officials got excited by AllHereβs vision of a magic chatbot providing personalized learning paths βtailored to each student's unique needs, supporting them to reach their full potential.β It didnβt matter that the technology only existed in their collective imaginations. Until it did.
The AI ed-tech start-ups launched last year seem to be run by people with a great deal of confidence that artificial intelligence is a game-changer, even if they arenβt exactly sure what game it is that is being played.
There are potential educational uses for actually existing LLMs. Maybe the ability to code using simple voice instructions will enable teachers to bypass the middlemen of ed-tech companies to produce useful tools for their teaching. Perhaps instead of personalized learning, a vision for using LLMs to help structure active classroom activities will emerge. As Karpf points out, the βstuff like flipped classrooms and experiential activitiesβ that Ethan Mollick and other enthusiasts are talking about is nothing new. For the past ten years, several colleges and universities have been engaged in a revival of the venerable Deweyan idea that teaching students through structured social activities is better than talking at them.
Instead of continuing to try to use LLMs to tutor individuals, maybe we will use them as engines to generate interesting educational problems. The fact that Mollick has deep experience in simulations and games is one reason he is so good at seeing their educational potential. I just wish heβd stop saying we should treat LLMs like people. I can imagine using an LLM to generate cultural artifacts for a problem-solving team of students to analyze during class. The problem would be βpowered by AI,β but the βcontent quality may varyβ warning would be a feature, not a bug. Students would work together to figure out what was true and what was confabulated. Then, present their analysis to the class.
To get there, weβll need to get over the current obsession with chatbots, especially chatbot teaching assistants.
I just finished an essay titled βLetβs Stop Treating LLMs Like People.β Subscribe to π¨π° π³ππ for free to have that and other essays about the social contexts of educational technology sent directly to your email inbox.
Chatbots, here, there, and everywhere redux
When ChatGPT was a surprise hit in early 2023, the idea of hooking up an LLM to some course material was obviously a better concept to start a business than collecting underpants. The equation AI-powered ChatbotTA + ? = Profit launched a thousand ed-tech pivots and start-ups. I wrote about ubiquitous chatbots when I attended Educause in October. This coming fall term, I expect the chatbot revolution to continue to not be.
We talk a lot around here about why anthropomorphizing transformer-based language models is a bad idea, but the problem facing developers is more fundamental. It goes by the misnomer βhallucinationβ and is well enough understood that anyone building an educational product using generative AI knows to be worried about reliability. No teacher wants an LLM to answer the question of where the midterm is or what day a paper is due. And LLMs are not able to handle grading reliably. Ninety-seven percent accuracy may be impressive for a technology based on probabilistic word generation, but for teachers grading hundreds of papers or tests, even a tiny error rate represents a level of hassle they cannot afford.
Despite those facts, and also maybe because of them, experiments in this space are deeply interesting. Last year, students in CS50 at Harvard were introduced to a βweirdly informative robotβ rubber duck they accessed through Ed Discussion, a third-party, online class discussion platform. Students in a few classes at Penn GSE, where I teach, had the chance to try out JeepyTA. Both of these examples are run by people with decades of teaching experience and a deep understanding of how LLMs work. I have seen news coverage of other interesting chatbot experiments being run by individual teachers or programs.
Those experiments arenβt where the venture capital is flowing. The AI ed-tech start-ups launched last year seem to be run by people with a great deal of confidence that artificial intelligence is a game-changer, even if they arenβt exactly sure what game it is and who plays it. Iβm not aware of any breakout successes among that crowd, maybe because raising capital is such a different game than teaching and learning.
The fundamental question about applying any new technology is: What problem are you trying to solve? The problem a TA chatbot aims to solve is I am a student who has a question and there is no human available to answer it. The class syllabus, the textbook, or a college website may have the answer to your question, but not all students share their teacherβs assumptions about how course information should be organized or enjoy the scintillating prose of an introductory textbook. As for university websites, Randall Monroe explained their shortcomings years ago.
One thing human TAs have that computational next-word predictors lack is the actual experience of being a student.
Introducing a weird natural language interface to a database of materials from the class you are taking is definitely a solution to that problem, but Iβm not sure it is a good solution. A purpose-built LLM looks like it could be a moderately useful and very expensive way to support students taking an introductory class. Generating on-demand assessments, instructional drills, and exercises (think flashcards) might be helpful, especially when students are pulling an all-nighter before a midterm. That is, as long as they donβt mind the occasional confabulation.
I am more skeptical that an LLM can provide the sort of Socratic exchange that leads to understanding or that it can consistently answer nuanced questions about an assignment. One thing human TAs have that computational next-word predictors lack is the actual experience of being a student.
The key question is how expensive are LLM chatbots relative to the limited value they provide. All the ed-tech companies that pivoted to AI or went all-in on LLMs will want to see a return on their investment. This fall, we will continue to explore how useful chatbots are and begin to see what price educational institutions will pay to use the LLMs that power them.
Part Two of this essay starts with an overview of the ed-tech market in higher ed and the price discovery now underway for LLMs.
π¨π° π³ππ Β© 2024 by Rob Nelson is licensed under CC BY-SA 4.0.
After I came up with this title, the Economist published What happened to the artificial-intelligence revolution? I like my title better, but readers may not know the Gil Scott-Heron reference.