What have we learned about generative AI and ourselves since ChatGPT was released?
It’s been quite a year.
One of the insights I’m trying to hang on to as we all attempt to make sense of what generative AI means for education is that it is still early days. Sure, machine learning and chatbots have been around for nearly seventy years, but so much has happened since ChatGPT was released that a year later there is a solid vibe of a journey just begun. The turmoil at OpenIA over the past week has invigorated the sense of uncertainty.1
The long-standing conflict within OpenAI between those pushing user growth and those who view the success of ChatGPT as a dangerous distraction reinforces a fact that was clear initially but is easily forgotten: ChatGPT’s runaway success was a happy accident. Happy for some, anyway. OpenAI’s purpose in releasing ChatGPT was to crowdsource work on GPT-3.5, their then cutting-edge Large Language Model (LLM). Hooking up Generative Pre-Trained Transformer to a chatbot made it easier for users to access…and OpenAI wanted users. Specifically, they wanted user feedback to improve their model. The use of human feedback was key to the emergence of the transformer architecture as the most promising approach to extending the capabilities of machine learning. With ChatGPT, OpenAI was just trying to save the hassle and expense of hiring humans to train their new model. They ended up creating the fastest-growing consumer product of all time. Oops?
The explosive user growth was due to the widespread use of ChatGPT by students as a labor-saving device, which caught the attention of teachers and reporters. The resulting moral panic bad-pressed a curiosity into a hit product. The rapid-fire generative AI releases including GPT-4, internet search integration through Bing, DALLE-3, and multimodal inputs were less the result of a well-executed plan and more a wild ride to channel the unexpected enthusiasm into a dominant market position. The announcement earlier this month of the coming availability of GPTs and lower pricing for all their products was a bid by OpenAI to end 2023 as strongly as it did 2022, but this time on purpose.
The boardroom drama still unfolding over Sam Altman’s firing has derailed that plan, but the bad press could result in another happy accident. Much of the focus has been on the complex and unusual corporate restructuring put in place to balance the commercial potential with the transformational potential of their work. The most salient element of the story for me is the 650 (of 770) OpenAI employees who demanded the board bring Altman back. This is presented as evidence of Sam Altman’s power as a leader, and that is true as far as it goes. But it is also an indication of the power of labor in this moment and suggests that knowledge workers in Silicon Valley are doing their own balancing of doing well and doing good. The past two decades have seen a shift in corporate governance from a Friedmanite goal of maximizing shareholder value to a larger, vaguer set of goals related to the social good. Environmental, social, and governance (ESG) investing is the best-known illustration of this shift, but the arrangements between labor and capital in companies building artificial intelligence are perhaps the most telling signs of what’s coming. Early days, right?
The language used to chart the rapid shifts and developments in the field of AI over the last year is itself a sign of this year’s volatility. The phrase “game-changing” as a way to describe the latest AI development is now a permanent adjective where AI news is concerned. New words and acronyms are proliferating. Some are technical terms becoming mainstream–multimodal, natural language processing (NLP), deep neural networks (DNN); others are buzzwords—genAI, alignment, guardrails. Even the meaning of the acronym AI has itself become stretched so thin it is little more than a marketing term. Coca-Cola, the greatest marketing company of the past century, released its newest flavor this fall—Y3000, made with AI!
Hype aside, one clear lesson is that a lot of boring work–lesson plans, problem sets, basic coding, marketing copy, meeting notes, and of course, homework–can be done efficiently and effectively by these new machines. LLMs changed classroom practices as teachers woke up to the truth that while the internet made cheating on homework easy, thanks to outsourcing writing essays to low-paid workers in Kenya and India, ChatGPT has made it even easier…as well as faster and cheaper. Plus, co-pilots and LLM-powered chatbots ease the administrative burdens of teaching by taking care of mundane writing tasks. Or they might, once we figure out issues related to privacy, intellectual property, safety, ethics, and whatever else they might be good for.
Many of those concerns are just coming into focus. The most anxiety-producing questions from the past year have been about the line between labor-saving and labor-replacing. Experience suggests automation may be fine for the overall economy but not so good for replaced workers. In Blood in the Machine: The Origins of the Rebellion Against Big Tech, Brian Merchant offers an account of the first time a new technology came for the jobs of skilled workers. The book–short version here–is a game-changer for the meaning of the word “Luddite.” The marketing of new AI products as teaching assistants suggests where to look for potential battles in the coming years. The increased power of labor on college campuses, the financial pressures from declining college enrollments, and the labor-saving promises of generative AI will make for interesting times in higher education.
Stochastic Parrots
For the past year, we have been choosing words and metaphors to make sense of all that is new. OpenAI got a head start. Despite apparently being named by engineers, ChatGPT won the race for proprietary eponym–when a product name becomes the name of the product category. Think Google for internet search or Kleenex for facial tissue. With each new release, technology companies had another opportunity to describe the newness of generative AI to the press. And much of the press repeated the latest releases with the same breathless excitement as the marketing departments. Even independent reporters and researchers have treated chatbots as magical creatures or alien technology. But these tools are just machines built by humans. Everyone using them should keep that in mind, even as we use language drawn from fantasy and science fiction to make sense of what they do.
The unpredictability makes for interestingly weird results. That is where the word “parrot” comes in.
One of the favorite phrases of those skeptical of all this hype has been stochastic parrot, a term first used in a 2021 paper, On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Its pre-publication review in late 2020 created controversy as Timnit Gebru left Google. Gebru’s firing points to a fundamental tension in the research community creating these new technologies between those working in highly capitalized tech companies bringing consumer products to market and those in universities committed to the scientific method. Google’s desire to limit the publication of critical research was a sign that the open exchange of ideas among artificial intelligence researchers was shifting toward secrecy. OpenAI was moving in a similar direction well before ChatGPT was released. Secrecy helped giant tech prevent skeptical headlines and annoying regulations, and it let them create a sense of mystery about the capabilities and risks of the new tools.
Again, I wonder about the dynamic between management and labor within Google in light of OpenAI’s turmoil. We get headlines each time a top Google manager predicts the number of months to AGI or somebody does something awful using generative AI, but as Google takes a more careful path to releasing its latest foundational model, I’m curious about what its workers think they are building and why. Is there greater skepticism about artificial general intelligence (AGI) among the rank and file than the headlines would suggest? How do workers think about the contradiction of spending time and effort on a project that may be an existential threat to humanity?
The term stochastic parrots is useful in these days of many questions and few answers because it challenges exaggerated descriptions of large language models (LLMs) by reminding us of their limitations. The word stochastic means that LLMs are based on predictive algorithms that lack precision. That is, they give answers that are based on probability and are somewhat unpredictable. Such answers are different from earlier machine outputs that aim for a replicable output that humans can understand and trust. The unpredictability makes for interestingly weird results. That is where the word “parrot” comes in.
The outputs from ChatGPT that made headlines over the past year are way more exciting than the outputs of earlier chatbots like ELIZA, the famous chatbot from the 1960s. ELIZA transformed statements from humans entered as text into little on-screen boxes into questions that sounded like something a lazy psychotherapist might ask a patient. The human response to this basic algorithm was to engage in an emotional dialogue that ascribed insight and importance to the simple chatbot outputs. Of course, ELIZA had no more understanding than a parrot of the social contexts of the exchange or of human language itself.
The same is true of ChatGPT, which is a much more powerful version of ELIZA. The term stochastic parrot deflates both the delusions of those experiencing the ELIZA Effect–projecting human traits onto a machine interface–and those who hope or fear that LLMs generating stochasticly derived outputs are a meaningful step toward AGI.
The Verbal Uncanny Valley and Your Racist Uncle
As Eliza taught us, understanding chatbots to be machines does not eliminate our emotional responses to them. Human-like objects, especially when they move, provoke eerie sensations of recognition and repulsion in humans. These feelings of creepiness have been a major topic of discussion in robotics and visual studies for decades, but LLMs have expanded the uncanny valley to include language. NYT reporter Kevin Roose’s experience last February reporting on the rollout of Microsoft Bing’s use of ChatGPT revealed the weirdness of a machine talking about its feelings and hitting on you. Because LLMs are “trained” using all the scrapeable internet, including cultural mystery meats of porngraphy websites and huge datasets of pirated material, they incorporate some of the worst elements of human language. Guardrails put in place to prevent chatbots from surfacing any of this nastiness help the same way that a pre-Thanksgiving talk with your racist uncle might make for a more pleasant dinner: it inhibits bad behaviors without addressing the underlying problems.
The problems associated with data used to train the largest of the large models, along with the environmental and human costs, might make you wonder if we should rethink the approach. And there are other approaches. For example, we could develop smaller models, trained on more carefully curated data. These models could be built with specific purposes in mind instead of pursuing AGI. And for all the ability of GPT-4 to generate passable prose and DALL-E 3 to produce interesting images, their outputs are only just becoming useful. If self-driving cars are any example, the next advances will proceed in fits and starts with wildly optimistic predictions about timelines and backlash born of genuine concerns. In other words, it will look like most knowledge discovery in the past two hundred years.
I sometimes wonder if the introduction of ChatGPT created a mass Eliza Effect. We lost perspective and believed we saw glimmers of human-like thinking in a machine.
I find myself rooting for open-source developers and models like Bloom that take a slower, more responsible approach. I hope that giant tech responds to the turmoil this week by returning to the transparency and openness of four years ago. Critics point out that openness and responsible development of AI are sometimes in tension with one another. For example, we really don’t want openness to extend so far that a friendly chatbot will talk you through constructing a chemical or biological weapon. There are no easy ways to develop tools this complex and we should expect to hear more voices questioning whether the best approach is to trust giant monopolies engaged in increasingly secretive research with the goal of building a machine that could, in the distant future, kill everybody.
Given the overwhelming attention paid to existential risk from AI and the credulous reaction to the weird outputs of LLMs, I sometimes wonder if the introduction of ChatGPT created a mass Eliza Effect. We lost perspective and believed we saw glimmers of human-like thinking in a machine. Perhaps much of the discourse about AI for the past year has simply been reflecting our desires and anxieties, blinding us to the limitations and actual potential of the tools. If that's true, then maybe we should think of generative AI as a mirror. Mirrors have applications in theoretical physics and consumer products. They are used in projectors, lasers, telescopes, and solar power. And they are objects of fascination, appearing frequently in stories of the fantastic. The ability of mirrors to reflect our physical selves back to us seems magical. But unlike parrots, which mistake the figure in the mirror for another parrot, we should keep in mind that the words and images coming from a chatbot are the outputs of an artifact built by humans and shaped by human intelligence.
Generative AI machines are not alien others designed to delight or frighten us. They are tools that will extend our capabilities, and we should focus our attention on that work. The moral panic about homework has peaked and we can begin to see the potential benefits of a personalized learning assistant for every student. We learned how these tools might release us from mundane knowledge work. As Ethan Mollick reminds us, the pace of change is unlikely to slow down, and “even if there is no development beyond the current level of AI, we have at least a decade of absorbing the effects of ChatGPT on our lives and work.”
Realizing the potential benefits of generative AI for education depends on how we put these tools into practice. What choices do we make in incorporating these tools into methods of research and instruction? How do we reorganize our staffing when we automate work? What products or services do we purchase from technology companies, and which companies do we choose to buy from?
We is the important word in those questions. Like the knowledge workers in Silicon Valley, knowledge workers in colleges and universities have more power than we may think. Provosts, CIOs, and professors holding named chairs will have a say, but I believe it is the departmental coordinators, instructional technologists, associate registrars, IT procurement administrators, and assistant professors who will collectively answer these questions. 𝑨𝑰 𝑳𝒐𝒈 was founded on the idea that as a community, higher education will be critical in determining the future we create using generative AI.
Here are overviews from the best writers in the business of what’s happened at OpenAI as of November 21: Matt Levine covering finance, Ben Thompson on the tech business, and Zvi Mowshowitz on AI.
𝑨𝑰 𝑳𝒐𝒈 © 2023 by Rob Nelson is licensed under CC BY-SA 4.0.