At least if we can control the data sets we can avoid mis information going in to the training data.
This would also include ensuring peer review for data and a way to remove any sources that have been withdrawn or retracted due to errors or concerns on how the research was conducted.
If writing papers etc people are still going to have to fully cite sources.