Hmm, I took an original list and added to it. You got a website I can check? If so I’ll happily remove. I don’t mind slow web crawlers at all.
I’m the administrator of kbin.life, a general purpose/tech orientated kbin instance.
Hmm, I took an original list and added to it. You got a website I can check? If so I’ll happily remove. I don’t mind slow web crawlers at all.
So on my mbin instance, it’s on cloudflare. So I filter the AS numbers there. Don’t even reach my server.
On the sites that aren’t behind cloudflare. Yep it’s on the nginx level. I did consider firewall level. Maybe just make a specific chain for it. But since I was blocking at the nginx level I just did it there for now. I mean it keeps them off the content, but yes it does tell them there’s a website there to leech if they change their tactics for example.
You need to block the whole ASN too. Those that are using chrome/firefox UAs change IP every 5 minutes from a random other one in their huuuuuge pools.
Yeah, I probably should look to see if there’s any good plugins that do this on some community submission basis. Because yes, it’s a pain to keep up with whatever trick they’re doing next.
And unlike web crawlers that generally check a url here and there, AI bots absolutely rip through your sites like something rabid.
If you’re running nginx I am using the following:
if ($http_user_agent ~* "SemrushBot|Semrush|AhrefsBot|MJ12bot|YandexBot|YandexImages|MegaIndex.ru|BLEXbot|BLEXBot|ZoominfoBot|YaK|VelenPublicWebCrawler|SentiBot|Vagabondo|SEOkicks|SEOkicks-Robot|mtbot/1.1.0i|SeznamBot|DotBot|Cliqzbot|coccocbot|python|Scrap|SiteCheck-sitecrawl|MauiBot|Java|GumGum|Clickagy|AspiegelBot|Yandex|TkBot|CCBot|Qwantify|MBCrawler|serpstatbot|AwarioSmartBot|Semantici|ScholarBot|proximic|GrapeshotCrawler|IAScrawler|linkdexbot|contxbot|PlurkBot|PaperLiBot|BomboraBot|Leikibot|weborama-fetcher|NTENTbot|Screaming Frog SEO Spider|admantx-usaspb|Eyeotabot|VoluumDSP-content-bot|SirdataBot|adbeat_bot|TTD-Content|admantx|Nimbostratus-Bot|Mail.RU_Bot|Quantcastboti|Onespot-ScraperBot|Taboolabot|Baidu|Jobboerse|VoilaBot|Sogou|Jyxobot|Exabot|ZGrab|Proximi|Sosospider|Accoona|aiHitBot|Genieo|BecomeBot|ConveraCrawler|NerdyBot|OutclicksBot|findlinks|JikeSpider|Gigabot|CatchBot|Huaweisymantecspider|Offline Explorer|SiteSnagger|TeleportPro|WebCopier|WebReaper|WebStripper|WebZIP|Xaldon_WebSpider|BackDoorBot|AITCSRoboti|Arachnophilia|BackRub|BlowFishi|perl|CherryPicker|CyberSpyder|EmailCollector|Foobot|GetURL|httplib|HTTrack|LinkScan|Openbot|Snooper|SuperBot|URLSpiderPro|MAZBot|EchoboxBot|SerendeputyBot|LivelapBot|linkfluence.com|TweetmemeBot|LinkisBot|CrowdTanglebot|ClaudeBot|Bytespider|ImagesiftBot|Barkrowler|DataForSeoBo|Amazonbot|facebookexternalhit|meta-externalagent|FriendlyCrawler|GoogleOther|PetalBot|Applebot") { return 403; }
That will block those that actually use recognisable user agents. I add any I find as I go on. It will catch a lot!
I also have a huuuuuge IP based block list (generated by adding all ranges returned from looking up the following AS numbers):
AS45102 (Alibaba cloud) AS136907 (Huawei SG) AS132203 (Tencent) AS32934 (Facebook)
Since these guys run or have run bots that impersonate real browser agents.
There are various tools online to return prefix/ip lists for an autonomous system number.
I put both into a single file and include it into my web site config files.
EDIT: Just to add, keeping on top of this is a full time job! EDIT 2: Removed Mojeek bot as it seems to be a normal web crawler.
I use vim, aliased to vi, on Arch btw.
The sun always shines on pc.
Well for a gamer no real comment. But there is one metric Intel still trashes AMD in for the APU. Hardware video acceleration/encoding. The quality is objectively better on Intel Quicksync.
When getting a home box that also needed to do transcoding, Intel CPU was a requirement. My desktop development/gaming system? Ryzen + NVidia.
I’m on NVidia with blob driver, KDE Plasma on wayland on Arch. Yeah, standby to resume is like 50/50 the screen will come back. I just turned off stand-by and kept screen sleep only.
But I’m on desktop so less of a problem for me than it would be for a laptop user.
I did a routine upgrade on my mbin server, where I had an old version with changes I made myself.
Well turns out I upgraded something (probably redis) that broke symfony that broke everything.
So I had a fun afternoon upgrading to the latest mbin version. I mean I needed to anyway but my hand was forced.
Yep sometimes an innocent looking update will change your weekend plans.
Anyways, any reason not to use ssh?
I’ve yet to have an actual game dislike wayland. But you’re right, there is always the option to swap.
I thought that too, and things got better when I set 1x scaling on both (it was 1/1.5) but it’s not stopped the problem entirely.
The privacy stuff? I’ve seen it happen in 11 for sure. I always check after an update now out of habit. But, not seen it in a while.
Resetting dual boot stuff? Before EFI/UEFI it would happen on most windows updates. It would just overwrite the boot record in a totally arrogant fuck you to whatever was already there. But since EFi/UEFI it plays nice with other operating systems generally.
I can’t use my plugins for elite dangerous or extra software, like EDMC.
Why not? The github page even says it will work with wine. I’ve not played ED for a long time. But, I am sure I had EDDiscovery at least working with it in linux a few years ago. Other games like WoW I have external tools that interface with it working fine, some within the same wine environment, some even external. You just need to make sure the drive is mapped (you can always go via the Z: drive too) where the app expects it.
From my experience, I have steam working and pretty much every game I want to play has worked. I don’t play games with kernel anti-cheat even in windows, so I’m not missing anything there. Battle net runs fine even with ray-traced shadows in wow. Pretty much everything else I need works. The only things I miss are the games that are part of XBOX/Windows store, but that’s hardly Linux’s fault. Maybe visual studio too. But I do have the OSS “Code” to cover most I did in VS so…
I have dual boot, I’ve not used it to go to windows in weeks. Almost everything just works fine.
I’ve been lucky then, only problems I’m having (Wayland + NVidia) are:
Oh and I disabled stand-by entirely. It’s was 50/50 if it would return from it. I think most problems are because I have mismatched resolutions (1080 and 1440).
I remember those times too. The difference today is that there are so many more libraries and projects use those libraries a lot more often.
So using configure and make means that the user also has the responsibility of ensuring all those libraries are up to date. Which again if we’re talking about not using binary install, each also need a regular configure/make process too. It’s not that unusual for large packages to have dependencies on 100+ libraries. At which point building and maintaining the build for all of them yourself becomes untenable really. However I think gentoo exists to automate a lot of this while still building from source.
I understand why binaries with references to other binary packages for prerequisites are used. I also understand where the limits of this are and why the AppImage/Flatpak/snaps exist. I just don’t particularly like the latter as a concept. But accept there’s times you might need them.
Yes, but it seems the French language pack is a dependency for pretty much everything else! Who knew?
This one threw me off. I’d muted discord by mistake. Weirdly voice still works. I spent ages checking and double checking settings to see why I wasn’t getting notification sounds and the ptt sound. Dismissing any mute possibility because voice was working.
When I found it was this…
These days with UEFI it’s much less likely to break things. Worse case though you just boot from a LIVE USB boot, chroot in and rerun grub/your bootloader installer. Often even if windows puts its own bootloader first, you can choose your bootloader from the bios boot menu and just rerun the bootloader installer.
It used to be a lot worse.
I said elsewhere, I hope this is just some way to track changes over time per user.
But they need to take an anonymous hash of some non changing data or create an install id that is used for this and nothing else (e.g it identifies a unique user but not the person or hardware behind the user).
Too much identifying info is just pushed around like we shouldn’t care, it’s become a real problem.
Didn’t have the link to hand. But a search turned this one up: https://reggiodigital.com/blog/nginx-rule-blocking-bad-bots/ it looks to be the same list, and you can see the ones I’ve added to the end of that list.