Welcome to The Pubcast with Jon Loomer!

The Problem with AI Chatbots

This is what worries me about AI-powered chatbots...

When you create a chatbot, you provide the content to train it on. You can provide files, text, or website URLs. You can have it crawl the entire sitemap of a domain or submit links individually.

The potential issue is that you can submit any content and any links. It doesn't matter whether you own that content.

You could then build and market a resource on a topic that is trained entirely on content that you do not own. That seems... wrong.

This, of course, is a philosophical debate about ChatGPT and Large Language Models generally. But, whereas the source of a specific answer from ChatGPT may be a bit murkier because it's derived from a compilation of information, this feels more clear-cut.

When you create the bot, you provide the content to train it on. The source is clear to the app creator and app developer because it was submitted to create that bot.

There are some potential solutions that would fall on the app developer:

1. Require domain verification before submitting links or a sitemap. This seems like a reasonable requirement.

2. Create a tool that scans submitted documents and text for matches on the internet. This could trigger a notification to the content owner or a restriction that would prevent that content from being used for training.

Possibly the easiest solution: Require all bots to link to their trained material.

Ideally, every answer will include a source. But even without that, a link to the full training material would provide some transparency. There's little argument for hiding this information.

Right now, there's little preventing someone from building a bot (or another AI-powered tool) that is trained on another website's content and then marketing it as their own.

Of course, lawsuits will be coming, and the type of transparency that I recommend here could help.

What do you think?