• LedgeDrop@lemmy.zip
      link
      fedilink
      English
      arrow-up
      2
      ·
      18 hours ago

      Thanks for sharing links to this project.

      I’ve always been kind of curious, why their wasn’t an OSS possibility to “download” chunks of aggregated search content.

      I know that technically it would be a challenge, but forcing crawler after crawler to fetch the exact same content (again and again), is also rather inefficient.

    • Em Adespoton@lemmy.ca
      link
      fedilink
      English
      arrow-up
      2
      ·
      18 hours ago

      One other hurdle to note specifically in this time: because of the absolute flood of AI bots that tend to hammer popular sites and ignore robots.txt, many web servers out there have moved to a bot allow list — which means if you’ve got a new properly behaved and reporting index bot, you’ll likely find it gets blocked by default on a LOT of the Internet.