Reddit CEO Says Microsoft and Anthrophic Need to Pay to Use Site Data



Reddit CEO Steve Huffman says Microsoft, Anthrophic and Perplexity will have to pay if they want to continue "scraping" site data to train their artificial intelligence (AI) models. This is according to Huffman's interview with The Verge. Reddit tries to prevent the data from being retrieved by placing a barrier on the site's Robot.txt text.


But according to Huffman this blocking process is becoming increasingly difficult because not all AI companies respect the Robot.txt used by Reddit. This is the reason why Reddit prevents searches for certain topics except through Google Search. This year Reddit and Google signed an agreement to use site data for AI training purposes.


New data is needed by AI companies to train more powerful and capable language models (LLM). At this point new quality data is increasingly difficult to find due to the issue of unauthorized use of intellectual property. As a result synthetic data is used for training purposes but it also raises the issue of data corruption which can cause the AI ​​model to be destroyed if it is used without filtering.

Previous Post Next Post

Contact Form