A real-world production migration from DigitalOcean to Hetzner dedicated, handling 248 GB of MySQL data across 30 databases, 34 Nginx sites, GitLab EE, Neo4j, and live mobile app traffic — with zero downtime.
Ok so if I’m reading this correctly: They migrated from an OS and MySQL version receiving no updates since at least 2 years to MySQL 8.0 which will stop getting updates in 4 days. Also every service is running without any containerization and there is a single database for everything… and it all runs on a single host and I didn’t read one word about a backup strategy or disk encryption. Also not a single word about infrastructure as code like ansible so that you can reliably recreate the system… and The whole stuff is hosted in Germany for a Turkish software company - sounds like very good latency.
My personal conclusion: This system WILL fail and the guy who designed it is stuck somewhere 10-20 years in the past.
They migrated from an OS and MySQL version receiving no updates since at least 2 years to MySQL 8.0 which will stop getting updates in 4 days.
I agree that it was an odd choice, as well as the OS. Going to Alma 9 when Alma 10 had already been out for some time. Would think if they wanted the long term updates they would have gone to 10 to get the most out of it. If they went from 8 to 9, sure, some people like staying in the area where RedHat got bored and won’t mess with it anymore, but 7 to 9 suggests they didn’t do timely upgrades before.
Also every service is running without any containerization and there is a single database for everything
Well, he said explicitly they have 30 databases, though I suppose you meant a single mysql instance. I will say I won’t judge one way or another about containerization, as I’ve seen about as much amateur hour containerization to not immediately judge one way or another on that.
it all runs on a single host
Yeah, that seems pretty dire given his stated usage scenario, and it seems very explicit that their entire internet facing world is that single host…
backup strategy or disk encryption
It was a post narrowly discussing migration, so I don’t expect a full inventory of everything they do, so backup strategy and disk encryption and all sorts of other things may be omitted as having nothing to do with the core thing. I guess the most red flag on this front is he explicitly mentions the old setup having “backups enabled” and new setup having “RAID1”, which does make me wonder if they think RAID1 is a credible answer for “backup”.
Also not a single word about infrastructure as code
Again, not necessarily in-scope for this document, so not sure if I’m going to judge on this one. I routinely take material expressed in terms of an ansible play and “generic it out” for general consumption when discussing with people outside my organization.
The whole stuff is hosted in Germany for a Turkish software company
I’ll confess to not liking it being in a single site, however to the extent they select a single site, Germany might make sense because:
Several live mobile apps serving hundreds of thousands of users
Their userbase may be better connected to Germany than Turkey, and the user latency matters more.
My biggest concerns would be mitigated if they said that the German hosted server is their off-prem solution but it is also hosted on-prem giving them multiple sites, but I think that’s a bit much to imagine given the process described. The described migration process wouldn’t make sense in that scenario.
Sure, though there is a worrying lack of backup and resilience in the described scenario. Has a smell of someone who hasn’t been bit yet and not paying attention to best practices in the industry.
I will give a break on some of the things as not necessarily being a ‘must’, but being hard bound to a singular server strikes me as a disaster waiting to happen.
Sounds like my homelab has better redundancy than these guys, and my monthly bill isn’t much different than their new one. I only pay for power and networking, since I own my own hardware. I’m colocating in my city, so my latency to home is about 1ms, and I’ve got a full mirrored server in my house. Certain files are further backed up elsewhere for proper 3-2-1 backup (+ each server running raidz2 with disk encryption). Even if my home Internet goes out, I still have full access to my files at home, and all my public services stay running in the data center. If either server fails, it’s all set up with containers so it’s easy to spin up each service somewhere else.
One thing that’s tricky to get right with disk encryption (especially with encrypted /boot) is having a redundant boot partition. I was able to hack this together by having sofware raid duplicate my boot partition to a second drive. Now if I remove either OS boot drive it falls back to the remaining one. To prevent breaking EFI boot, you need to use the Version 1 RAID format so the metadata is stored at the end of the partition, not the front where EFI reads.
every service is running without any containerization and there is a single database for everything… and it all runs on a single host and I didn’t read one word about a backup strategy or disk encryption.
Man, a paragraph that can give someone some serious PTSD flashbacks…
The number of times I’ve had to clean up a customer’s environment after they let little Billy play corporate IT and things went boom…
Ok so if I’m reading this correctly: They migrated from an OS and MySQL version receiving no updates since at least 2 years to MySQL 8.0 which will stop getting updates in 4 days. Also every service is running without any containerization and there is a single database for everything… and it all runs on a single host and I didn’t read one word about a backup strategy or disk encryption. Also not a single word about infrastructure as code like ansible so that you can reliably recreate the system… and The whole stuff is hosted in Germany for a Turkish software company - sounds like very good latency.
My personal conclusion: This system WILL fail and the guy who designed it is stuck somewhere 10-20 years in the past.
I agree that it was an odd choice, as well as the OS. Going to Alma 9 when Alma 10 had already been out for some time. Would think if they wanted the long term updates they would have gone to 10 to get the most out of it. If they went from 8 to 9, sure, some people like staying in the area where RedHat got bored and won’t mess with it anymore, but 7 to 9 suggests they didn’t do timely upgrades before.
Well, he said explicitly they have 30 databases, though I suppose you meant a single mysql instance. I will say I won’t judge one way or another about containerization, as I’ve seen about as much amateur hour containerization to not immediately judge one way or another on that.
Yeah, that seems pretty dire given his stated usage scenario, and it seems very explicit that their entire internet facing world is that single host…
It was a post narrowly discussing migration, so I don’t expect a full inventory of everything they do, so backup strategy and disk encryption and all sorts of other things may be omitted as having nothing to do with the core thing. I guess the most red flag on this front is he explicitly mentions the old setup having “backups enabled” and new setup having “RAID1”, which does make me wonder if they think RAID1 is a credible answer for “backup”.
Again, not necessarily in-scope for this document, so not sure if I’m going to judge on this one. I routinely take material expressed in terms of an ansible play and “generic it out” for general consumption when discussing with people outside my organization.
I’ll confess to not liking it being in a single site, however to the extent they select a single site, Germany might make sense because:
Their userbase may be better connected to Germany than Turkey, and the user latency matters more.
My biggest concerns would be mitigated if they said that the German hosted server is their off-prem solution but it is also hosted on-prem giving them multiple sites, but I think that’s a bit much to imagine given the process described. The described migration process wouldn’t make sense in that scenario.
Well, using containerization for everything is very 2015-ish.
My personal conclusion:he knows what he’s doing.
Sure, though there is a worrying lack of backup and resilience in the described scenario. Has a smell of someone who hasn’t been bit yet and not paying attention to best practices in the industry.
I will give a break on some of the things as not necessarily being a ‘must’, but being hard bound to a singular server strikes me as a disaster waiting to happen.
Sounds like my homelab has better redundancy than these guys, and my monthly bill isn’t much different than their new one. I only pay for power and networking, since I own my own hardware. I’m colocating in my city, so my latency to home is about 1ms, and I’ve got a full mirrored server in my house. Certain files are further backed up elsewhere for proper 3-2-1 backup (+ each server running raidz2 with disk encryption). Even if my home Internet goes out, I still have full access to my files at home, and all my public services stay running in the data center. If either server fails, it’s all set up with containers so it’s easy to spin up each service somewhere else.
One thing that’s tricky to get right with disk encryption (especially with encrypted /boot) is having a redundant boot partition. I was able to hack this together by having sofware raid duplicate my boot partition to a second drive. Now if I remove either OS boot drive it falls back to the remaining one. To prevent breaking EFI boot, you need to use the Version 1 RAID format so the metadata is stored at the end of the partition, not the front where EFI reads.
Man, a paragraph that can give someone some serious PTSD flashbacks…
The number of times I’ve had to clean up a customer’s environment after they let little Billy play corporate IT and things went boom…
Just install Google Ultron and you’ll be fine