With the advent of generative artificial intelligence, more AI-created material is available online without explicitly being tagged as such. Some people have already begun to wonder what might happen if the systems start to pull in the generated content and base what they create on the previous output that customers use and make available online.

New research in the journal Nature says that the ultimate result is bad. Researchers from the University of Oxford, University of Cambridge, Imperial College of London, University of Toronto, Vector Institute, and University of Edinburgh looked at so-called large language models that create new text from sophisticated statistical analysis of existing text.

They found that "indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear." In other words, the broad use of AI-generated content begins to spoil the existing pools of data when people indiscriminately use the gen AI systems to create text and then put it on the Internet. In developing the latest versions of their products, the software vendors pull in the AI-generated content and use it for new training data of the systems.

Want to continue reading?
Become a Free ALM Digital Reader.

Once you are an ALM Digital Member, you’ll receive:

  • Breaking commercial real estate news and analysis, on-site and via our newsletters and custom alerts
  • Educational webcasts, white papers, and ebooks from industry thought leaders
  • Critical coverage of the property casualty insurance and financial advisory markets on our other ALM sites, PropertyCasualty360 and ThinkAdvisor
NOT FOR REPRINT

© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.