There is https://openwebsearch.eu/ which might kickstart things a bit and many do not know about, but your point still stands.
Its a eu funded effort to have a open index others could use and build upon.
Dashboard and Crawling Status: https://openwebindex.eu/
Experimental Search Engine: https://ourrs.eu/
I’ve always been kind of curious, why their wasn’t an OSS possibility to “download” chunks of aggregated search content.
I know that technically it would be a challenge, but forcing crawler after crawler to fetch the exact same content (again and again), is also rather inefficient.
One other hurdle to note specifically in this time: because of the absolute flood of AI bots that tend to hammer popular sites and ignore robots.txt, many web servers out there have moved to a bot allow list — which means if you’ve got a new properly behaved and reporting index bot, you’ll likely find it gets blocked by default on a LOT of the Internet.
There is https://openwebsearch.eu/ which might kickstart things a bit and many do not know about, but your point still stands.
Its a eu funded effort to have a open index others could use and build upon. Dashboard and Crawling Status: https://openwebindex.eu/ Experimental Search Engine: https://ourrs.eu/
Thanks for sharing links to this project.
I’ve always been kind of curious, why their wasn’t an OSS possibility to “download” chunks of aggregated search content.
I know that technically it would be a challenge, but forcing crawler after crawler to fetch the exact same content (again and again), is also rather inefficient.
One other hurdle to note specifically in this time: because of the absolute flood of AI bots that tend to hammer popular sites and ignore robots.txt, many web servers out there have moved to a bot allow list — which means if you’ve got a new properly behaved and reporting index bot, you’ll likely find it gets blocked by default on a LOT of the Internet.