Wikimedia Enterprise is now offering some of Wikipedia's datasets to companies that want to use them to train artificial intelligence (AI) models. They are working with Keggle – a Google subsidiary – to offer selected datasets in English and French.
The data has been optimized for training models by not including links and text formatting code like those offered on Wikpedia. The move to offer the dataset comes after the site's traffic was hit hard by bots trying to steal articles to train models without permission. Last month, Wikipedia said that the amount of traffic accessing multimedia content increased by 50% last year due to bot activity.
Keggle will pay Wikipedia Enterprise for the use of the data. At the same time, all data used will be given back attribution under the Creative Commons Attribution-Share-Alike 4.0 and GNU Free Documentation License (GFDL).