• AbouBenAdhem@lemmy.world
    link
    fedilink
    English
    arrow-up
    25
    ·
    edit-2
    4 months ago

    All ai projects should be forced to show the entirety of their training data.

    Agreed—but note that in this case the information was only discovered because the organizations involved (Common Crawl and LAION) do show their data. We should assume that proprietary data sets have similar issues—but this case should be seen as an opportunity to improve one of the rare open data sets, not to penalize its openness and further entrench proprietary sources.