I’ve been thinking about adding this to my “Fuck it, I’ll do it myself” / SHTF pile. I have a spare 10-15GB for a good selection of basic articles (across sciences, history, pop culture trivia etc).

https://get.kiwix.org/en/solutions/hotspots/content-bundles/

https://get.kiwix.org/en/solutions/hotspots/imager-service/

There’s something inherently cool about having wikipedia in a box (yes, you’d likely need to refresh it once a year) but I’ve never heard of anyone actually self hosting a Kiwix instance.

  • surfrock66@lemmy.world
    link
    fedilink
    English
    arrow-up
    21
    ·
    18 hours ago

    Yes, and I actually use it to train a local llm so I’m not hammering the internet. I have a ton of storage, and like to keep my kids in the sandbox, so we have wikipedia, project gutenberg, kahn academy, and a bunch of others all hosted behind an apache reverse proxy which is using mellon so there’s LDAP auth.

    • Domi@lemmy.secnd.me
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 hours ago

      Do you actually train the LLM or use RAG? I have been looking for a local LLM + Wikipedia RAG solution for a while now.

      For now I just have kiwix-serve + searxng doing a simple search but the Kiwix search is…questionable.

    • SuspciousCarrot78@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      10
      ·
      18 hours ago

      That was actually my immediate thought. I already have Wikipedia as a trusted source for llm, but I would prefer to self host and not hammer them.

      130GB to fit the entirely of Wikipedia is basically nothing and I’m mildly embarrassed not to have done it already.

      • surfrock66@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        18 hours ago

        I also try to participate in some of the farms, running zimit and mwoffliner to help make more archives. Feels like I’m helping.