• Artisian@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      5 days ago

      No. I’ll name three.

      Pleias, an LLM family of models that train on the common corpus, compliant with EU copyright and fair use law. They filtered a public domain dataset for racism and other bias’s, and released the results.

      common canvas is a (suite) of text-to-image models trained on a data they know is well sourced.

      Apertus, public ai is a chat-gpt style bot made in collaboration with the swiss government, with a commitment to using only training data that complies with swiss fair use. They’ve chosen a model design that let’s them remove training data which is improperly labeled, or becomes no longer accessible (ie, by changing robots.txt).

      Not to mention the hundreds of models academics in ML have trained using things like open diffusion and public datasets (see also these hobbyists).

      They don’t have advertising budgets (generally). But you see a steady stream of open models on arXiv.