OpenAI scraping bots are winning. More Data For AI

https://img.inleo.io/DQmf37uM8Q5DtYJSJrytgEnWfzrvpuPfFmjpA8sGWfc3JdB/robot-7720806_1280.webp

source

I personally use AI for so many projects of mine besides writing. So much so that I've seen the data limitations of AI. You can blame ChatGPT if it's not good enough because it only runs on the data it's been served with.

How does it get it's data? Through website data scraping bots. When the website content publishers share their news or any form of content, it's this same data companies like OpenAI is scraping to train their AI. So by inference, Chatgpt is only as good as the data the publishers share online.

For a while, the publishers have been trying their best to stop those AI scraping bots because it's against their policy and simply looks unfair to take their data to train AI bots for free and without permission.

But it seems like nowadays the AI companies and news publishers are in a new phase of working together.

The past situation where many websites blocked AI crawlers like OpenAI’s GPTBot, stopping them from using their data is slowly coming to an end. These sites used the robots.txt file, which tells web crawlers what they can or cannot do.

But now it looks like things are changing. OpenAI has made deals with some publishers, which gives the company access to their data. I'm pretty sure money is involved and we'll not everyone publisher will be paid by OpenAI.

It's not like they're going to pay you hundreds of dollars for sharing your cooking tips online, but probably the renowned content publishers with authentic and world standard content might be in on the Open AI deals.

When these deals are made, publishers often stop blocking the crawlers. So thanks to this new move, the number of websites blocking GPTBot has dropped. This is good because OpenAI will need all the data it can get if they intend to release GPT-5.

Some sites, like Time magazine still block OpenAI access to their data but the company has still found ways around this by using direct data feeds from partners.

The question on my mind is; Is blocking AI crawlers a new way to push AI companies into making deals with these publishers? Perhaps this is a new means of income for them

Eventually, Open AI and all other AI companies that rely on scraping data from websites will have all the data access they need and it'll make AI much better for us since it'll have more capabilities. So surely with the way things are moving, we may see fewer barriers for AI companies soon.

Posted Using InLeo Alpha



0
0
0.000
1 comments
avatar

Congratulations @dinodino! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

You received more than 1000 upvotes.
Your next target is to reach 1250 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out our last posts:

Our Hive Power Delegations to the September PUM Winners
0
0
0.000