Yesterday Apple and several other technology companies such as Anthropic and NVIDIA were reported to have used the data of 170,000 YouTube videos without permission to train their artificial intelligence (AI) models. This dataset is provided by EleutherAI from video transcript files. Today, 9to5Mac reported Apple gave a statement Apple Intelligence was not trained using this unauthorized data.
YouTube data is used to train the OpenELM AI model which is an open source AI accessible to anyone. This model is for research purposes only and is not used to power Apple Intelligence. Even the details of this model study can also be accessed on the Machine Learning site by Apple.
In the article Apple's On-Device and Server Foundation Models, Apple states that Apple Intelligence is not trained with personal data or user interaction data. Instead, only trained with licensed data, as well as open data. Apple also reportedly stated that it will not develop a new model based on OpenELM.
What do you think is it still acceptable to train an AI model using unauthorized YouTube subtitle data only for models that are open source?