Generative AI tools are based on models that use huge amounts of content scraped from the web.

OpenAI and Anthropic have said publicly they respect robots.txt and blocks to their web crawlers.

Yet, both companies are ignoring or circumventing such blocks, BI has learned.

Sign up to get the inside scoop on today’s biggest stories in markets, tech, and business — delivered daily. Read preview Thanks for signing up! Access your favorite topics in a personalized feed while you're on the go. download the app Email address Sign up By clicking “Sign Up”, you accept our Terms of Service and Privacy Policy . You can opt-out at any time by visiting our Preferences page or by clicking "unsubscribe" at the bottom of the email.

Advertisement

The world's top two AI startups are ignoring requests by media publishers to stop scraping their web content for free model training data, Business Insider has learned.

OpenAI and Anthropic have been found to be either ignoring or circumventing an established web rule, called robots.txt, that prevents automated scraping of websites.

TollBit, a startup aiming to broker paid licensing deals between publishers and AI companies, found several AI companies are acting in this way and informed certain large publishers in a Friday letter, which was reported earlier by Reuters. The letter did not include the names of any of the AI companies accused of skirting the rule.

This story is available exclusively to Business Insider subscribers. Become an Insider and start reading now. Have an account? Log in .