There is https://openwebsearch.eu/ which might kickstart things a bit and many do not know about, but your point still stands.
Its a eu funded effort to have a open index others could use and build upon.
Dashboard and Crawling Status: https://openwebindex.eu/
Experimental Search Engine: https://ourrs.eu/
I’ve always been kind of curious, why their wasn’t an OSS possibility to “download” chunks of aggregated search content.
I know that technically it would be a challenge, but forcing crawler after crawler to fetch the exact same content (again and again), is also rather inefficient.
One other hurdle to note specifically in this time: because of the absolute flood of AI bots that tend to hammer popular sites and ignore robots.txt, many web servers out there have moved to a bot allow list — which means if you’ve got a new properly behaved and reporting index bot, you’ll likely find it gets blocked by default on a LOT of the Internet.
Because Google, Bing, Yandex and Baidu already exist.
Any new competition starting up now would need really deep pockets to compete for the decade or so it would take to make any sort of profit.
There is https://openwebsearch.eu/ which might kickstart things a bit and many do not know about, but your point still stands.
Its a eu funded effort to have a open index others could use and build upon. Dashboard and Crawling Status: https://openwebindex.eu/ Experimental Search Engine: https://ourrs.eu/
Thanks for sharing links to this project.
I’ve always been kind of curious, why their wasn’t an OSS possibility to “download” chunks of aggregated search content.
I know that technically it would be a challenge, but forcing crawler after crawler to fetch the exact same content (again and again), is also rather inefficient.
One other hurdle to note specifically in this time: because of the absolute flood of AI bots that tend to hammer popular sites and ignore robots.txt, many web servers out there have moved to a bot allow list — which means if you’ve got a new properly behaved and reporting index bot, you’ll likely find it gets blocked by default on a LOT of the Internet.