A glimpse of an AI future
AP's deal with OpenAI underlines the value of good journalism, and offers possibilities for a sustainable news model.
Hey, two quick notes before I get to the main event. First, turns out I forgot to link to my piece in The Atlantic on debate and differing views last time, so here it is; if you hit the paywall, send me an email and I’ll send you a PDF version. Second, have you played Immaculate Grid yet? As a kid who collected too many baseball cards as a kid, I am OBSESSED.
Longtime readers in this space are well aware the news industry is going through some troubles. I won’t rehash it here, but this piece I wrote for Slate four years ago still holds up if you need a recap. The industry was built on information monopolies and damaged by the open access to information that comes with online connections. Too often large media companies have tried to innovate by reassembling some form of that exclusive access to an audience.
One way I think about possibilities for new revenue and a sustainable model is to split apart different pieces of the news delivery model. In this case:
news organization gathers and verifies information
assembles a report
delivers it to the consumer
When you think about the economic consequences of the internet, only the third part of that distribution chain is affected by the end of local information oligopoly. The consumer had choices, which vastly increased competition for attention that news organizations had never faced before. Ballgame.
It’s the first two parts that clarify the value of the news, and that should theoretically be what any innovation is built around because it’s what the news does best. The third part is a moving target because you have to constantly be in touch with those who use your product. But the first two are the core of the product. Journalists research and verify information prior to publication. It’s what separates content from news. The first four Elements of Journalism, according to Kovach & Rosenstiel:
Journalism’s first obligation is to the truth
Its first loyalty is to citizens
Its essence is a discipline of verification
Its practitioners must maintain an independence from those they cover
The other six Elements are good, but these four are about as core to the practice as you can get. They also are how you can judge good journalism from bad journalism. The first three sketch out the goal (truth), who it’s for (citizens, not the powerful), and the method of doing those first two (verification). The fourth, independence, is a critical extra. Journalism isn’t of power and cannot function well if it’s beholden to anyone, friend or foe.
[ Side note: if I could add an 11th Element it would definitely be “Journalism corrects the record” … either by follow-up stories that add to what is known (because our understanding of the truth is evolving, particularly with breaking news) or outright corrections that often run on page 2 to fix egregious fact errors. Journalists are human and make mistakes, but good journalists own their mistakes and aren’t shy about this; this is why I think of most evening cable news as entertainment and not journalism. ]
Anyhow, those fundamentals of journalism were on my mind with yesterday’s announcement that the Associated Press inked a deal with OpenAI to, among other things, license its content in service of OpenAI’s products (think ChatGPT and the like). This is not new territory for AP. It has been using bots to write boilerplate stories for almost a decade, as I’ve written about before.
No, what’s useful here is to see the value AP is providing. One drum I’ve been beating about AI is it’s only as good as the library of data you feed it. I think a lot about this as an educator. A deficient or incomplete education leaves people with holes in their thinking. Good education exposes people to a wider set of views and experiences, and so whether a person changes their mind or sticks to what they know, it is more likely to be built from evidence and reason if it has withstood the crucible of academic inquiry. It’s the argument for liberal arts education like we have at Lehigh, and why it’s so distressing to see public universities like those in Texas and Florida unable to teach certain subjects due to political interference.
As an example, our Provost at Lehigh Nathan Urban has shared in a few public settings an experience he had on DALL-E 2’s image generation AI months ago. When he wanted to call up an image of provosts for a presentation he was giving, DALL-E gave him a lot of images of white men in ties. While it’s undeniable this is probabilistically accurate due to historical fact, it isn’t universally true even in less modern times and certainly isn’t true now. But it speaks to the idea that what you feed an AI—in this case, a library full of images that aren’t curated for diversity and reality—is likely to offer a skewed view at the output level. So what we feed these systems matters a great deal if what we get from them colors our sense of the world.
Completeness—in education, in the information you consume, and in AI—leads to more informed and rigorous thinking. So AI faces a data quality problem. OpenAI has been silent on the corpus of information that is powering its answers; other alternatives like Bing Chat offer links to references so you can see the AI’s work. This is a transparency issue, of course, but even full transparency can’t make up for junk sources if consumers aren’t able to tell the difference.
I really, really like what AP is doing here. It’s taking those first two parts of the journalism process (gathering and verifying, then reporting) and saying those have value. It is work that AI cannot do by itself, and it’s something human hands with training, sound methods, and high standards can do. Those first two parts of the distribution model still have value even as the economic model around consumption is in shambles
The vision here is ChatGPT should theoretically be better because it is built on better information. It’s easy to scrape information off the web, but the information in a library file can contain a volatile mix of ideological publication, good journalism, rants on social media, outright disinformation, and so forth. Without careful curation by those building the AI (and in the case of ChatGPT it’s 45 terabytes of text data, by one estimate the equivalent of about 1.3 million books), there’s a push-pull between the ability to fact-check every detail in every document fed to the AI and the need for massive amounts of text in a library to make the AI more useful.
The solution I half-jokingly tell interested audiences is we are going to need a lot more librarians. The job of the librarian is to gatekeep quality before distributing. This curatorial role provides at least some layer to make sure the sources offered in libraries aren’t junky, and that they represent a wide view of the world. You go into a library knowing someone has at least thought about quality.
AP’s deal is in a similar vein, but around public events and community knowledge. Rather than a librarian curator, the idea is that AP’s content has been vetted and verified before publication, and it offers specific value that isn’t easily found elsewhere online. Narratives are always fraught with the benefits and pitfalls of human analysis in that they’re an interpretation of the facts, but at least the on-the-record facts that are attributed to sources or documents have a gold standard of quality about them.
In a time of information overload, it’s even more critical if we can count on a source to at least apply basic standards of verification to information. And in building an AI, there is a definable difference between the types of sources you feed a library. Just as with education, feeding it information up to a certain standard is going to increase its reliability over time.
There are glimpses of a better future for news here if news organizations can be visionary and not protectionist. The past 30 years are an example of how disruptive technologies can break apart revenue models even as the methods of journalism remain timeless, and the economic suffering has been in part because of a lack of vision from those running news companies.
We are headed to a future full of all kinds of AIs. AIs for national news, for sports, for local information, for niche interests and hobbies, and so forth. Each type of AI needs specific libraries to make them useful. Imagine what a knowledge base of searchable, interactive history a city or town could offer by building an AI based on 250 years of local newspaper text. One of our excellent Lehigh students Nina Cialone, is doing this right now. She’s building a chatbot around our Brown & White student publication that has been publishing continuously for 125 years (her latest piece is here, but look through the archive and subscribe if you want to follow her journey).
Think about the value the news brings to an AI product here. The Brown & White is perhaps the best ongoing record of the history and life of Lehigh. Giving people the ability to query it (“What were the three most memorable Lehigh-Lafayette football games?” or “How did Lehigh adapt in the era of WWI and WWI?”) is a unique product made possible only by offering it a specific library; it’s the kind of question an AI built on The New York Times cannot answer well because its coverage footprint of Lehigh is so small. So what AP is licensing is a full offering of its take on history. Not the only take, but a significant one given AP’s history and scope.
There is value for journalism here. News companies are sitting on a goldmine of historical and contextual data of the past, and it offers a daily updated view of the world in the present. Its value isn’t set merely on the existence of its content, but rather by its purpose and methods: truth, verification, and independence in the public interest.
I’ve been thinking for a while this is going to be a big part of journalism’s future. It offers a type of revenue stream that can reinforce and incentivize more and better journalism. If the act of creating good, verifiable news has value for AIs, there is a good reason to invest some of that revenue in the product and keep making more. It’s the argument against layoffs and cutbacks because it’s a vision for the value of news going forward. Over time, some of these companies might just build their own AI and realize the value of offering answers to civic events and history. There’s a lot of room to play here, and arguably a lot of money to make in the effort.
And if you’re building an AI, there is value here too. The one barrier to adopting these systems is trust, and for chatbots in particular this is fundamentally the same problem journalism faces given it relies on being able to give you accurate, trustworthy answers. Some of the answers ChatGPT gives can be hilariously wrong even as it projects a self-assured posture. Every human interaction with bad AI answers degrades confidence in it and disincentivizes further consumer use. Again, this is exactly why journalism thrives when it pursues the truth without flinching. So the project of AI is akin to the project of journalism, making a collaboration between the two an obvious step.
On the journalist side, there would need to be some letting go. Looking again at that distribution chain, the middle part (reporting the news) centers the narrative as the product of the human mind. In an AI, those narratives would have value, but so would merely feeding it verified facts and letting it create a narrative. In particular, AIs like Bing Chat are increasingly good at mining the news and explaining it, so you can see a future where AIs can replace the synthesis of multiple news stories (aka “explainer journalism”). AI can do explainers, and humans can still create the building blocks.
But this kind of evolution isn’t new. Technology has time and again disrupted journalism. Stamp presses to steam presses, to radio, to television, to the internet, to social media, to flooded and polluted social sharing ecosystems. The news is in a constant state of evolution and journalism’s history is a tale of adapting. I am convinced new technologies cannot make obsolete journalism’s core principles of truth and verification in the public interest, because they are timeless and downright necessary for free democracy. Change happens, but it only highlights the value of those principles and the need to infuse what’s new with those standards.
Jeremy Littau is an associate professor of journalism and communication at Lehigh University. Find him on Bluesky, Threads, and Mastodon.
Immaculate Grid is amazing! Owning a half-dozen Larry McWilliams cards came in handy this morning for the first time EVER.
Great piece, too.
really like this piece.....again you are thinking ahead of others my friend