My full-time job literally involves dealing with systemd’s crap. There is a raspberry pi that controls all of our signage. Every time it is powered on, systemd gets stuck because it’s trying to mount two separate partitions to the same mount point, whereupon I have to take a keyboard and a ladder, climb up the ceiling, plug in the keyboard, and press Enter to get it to boot. I’ve tried fixing it, but all I did was break it more.
Uh… Sounds like it’s not really system’s fault, your setup is just terrible.
I don’t know his specific issue, but the general behavior of systemd going completely nuts when something is a bit ‘off’ in some fashion that is supremely confusing. Sure, there’s a ‘mistake’, but good luck figuring out what that mistake is. It’s just systemd code tends to be awfully picky in obscure ways.
Then when someone comes along with a change to tolerate or at least provide a more informative error when some “mistake” has been made is frequently met with “no, there’s no sane world where a user should be in that position, so we aren’t going to help them out of that” or “that application does not comply with standard X”, where X is some standard the application developer would have no reason to know exists, and is just something the systemd guys latched onto.
See the magical privilege escalation where a user beginning with a number got auto-privileges, and Pottering fought fixing it because “usernames should never begin with a number anyway”.
If it has a buffer overflow exploit that caused it to execute arbitrary code is his response that people shouldn’t be sending that much data into that port anyway so we’re not going to fix it?
(I feel like this shouldn’t require a /s but I’m throwing it in anyway)
I’m gonna laugh if it’s something as simple as a botched fstab config.
In the past, it’s usually been the case that the more ignorant I am about the computer system, the stronger my opinions are.
When I first started trying out Linux, I was pissed at it and would regularly rant to anyone who would listen. All because my laptop wouldn’t properly sleep: it would turn off, then in a few minutes come back on; turns out the WiFi card had a power setting that was causing it to wake the computer up from sleep.
After a year of avoiding the laptop, a friend who was visiting from out of town and uses Arch btw took one look at it, diagnosed and fixed it in minutes. I felt like a jackass for blaming the linux world for intel’s non-free WiFi driver being shit. (in my defense, I had never needed to toggle this setting when the laptop was originally running Windows).
The worst part is that I’m a sysadmin, diagnosing and fixing computer problems should be my specialty. Instead I failed to put in the minimum amount of effort and just wrote the entire thing off as a lost cause. Easier then questioning my own infallibility, I suppose.
A typo in fstab shouldn’t wreck the system. Why is that not resilient ? I added an extra mount point to an empty partition but forgot to actually create it in LVM.
During boot, device not found and boot halted, on a computer with no monitor/keyboard
It will cause a critical error during boot if the device isn’t given the nofail mount option, which is not included in the defaults option, and then fails to mount. For more details, look in the fstab(5) man page, and for even more detail, the mount(8) man page.
Found that out for myself when not having my external harddrive enclosure turned on with a formatted drive in it caused the pc to boot into recovery mode (it was not the primary drive). I had just copy-pasted the options from my root partition, thinking I could take the shortcut instead of reading documentation.
There’s probably other ways that a borked fstab can cause a fail to boot, but that’s just the one I know of from experience.
Its a ‘failsafe’ , like if part of the system depends on that drive mounting then if it fails then don’t continue. Not the expected default, but probably made sense at some point.
Like if brakes are broken don’t allow starting truck, type failsafe.
Yea like the default is smart? How is it supposed to know if that’s critical or not at that point? The alternative is for it to silently fail and wait for something else to break instead of failing gracefully? I feel like I’m growing more and more petty and matching the language of systemd haters but like just think about it for a few minutes???
Looking at the systems that are supported, it makes the greatest sense to have the safest failure mode as default. If fault tolerance is available, that can be handled in the entry but, it makes sense but to assume. Having that capability built into the default adds more complexity and reduces support for systems that are not tolerant of a missing mount.
Edit: just saw your other comment, so this may not apply to you now…Not that the default is smart, but the default has been set to fail a boot if parts are missing. Imagine a rocket launch system check, is temperature system online, no, fail and abort. While as users – for convenience–we want the system to boot even though a drive went offline, that may not be best default for induatrial applications. Or where another system relylies on first one to be up and coherent. So we have to use the nofail option, to contine the boot on missing drive.
My full-time job literally involves dealing with systemd’s crap. There is a raspberry pi that controls all of our signage. Every time it is powered on, systemd gets stuck because it’s trying to mount two separate partitions to the same mount point, whereupon I have to take a keyboard and a ladder, climb up the ceiling, plug in the keyboard, and press Enter to get it to boot. I’ve tried fixing it, but all I did was break it more.
Uh… Sounds like it’s not really systemd’s fault, your setup is just terrible.
If you’re unable to fix it, maybe get somebody else? Like, this doesn’t sound like it’s an unfixable issue…
I don’t know his specific issue, but the general behavior of systemd going completely nuts when something is a bit ‘off’ in some fashion that is supremely confusing. Sure, there’s a ‘mistake’, but good luck figuring out what that mistake is. It’s just systemd code tends to be awfully picky in obscure ways.
Then when someone comes along with a change to tolerate or at least provide a more informative error when some “mistake” has been made is frequently met with “no, there’s no sane world where a user should be in that position, so we aren’t going to help them out of that” or “that application does not comply with standard X”, where X is some standard the application developer would have no reason to know exists, and is just something the systemd guys latched onto.
See the magical privilege escalation where a user beginning with a number got auto-privileges, and Pottering fought fixing it because “usernames should never begin with a number anyway”.
I love that mentality to development
If it has a buffer overflow exploit that caused it to execute arbitrary code is his response that people shouldn’t be sending that much data into that port anyway so we’re not going to fix it?
(I feel like this shouldn’t require a /s but I’m throwing it in anyway)
Curious, how does changing one of them to a different mount point make things worse?
I’m gonna laugh if it’s something as simple as a botched fstab config.
In the past, it’s usually been the case that the more ignorant I am about the computer system, the stronger my opinions are.
When I first started trying out Linux, I was pissed at it and would regularly rant to anyone who would listen. All because my laptop wouldn’t properly sleep: it would turn off, then in a few minutes come back on; turns out the WiFi card had a power setting that was causing it to wake the computer up from sleep.
After a year of avoiding the laptop, a friend who was visiting from out of town and uses Arch btw took one look at it, diagnosed and fixed it in minutes. I felt like a jackass for blaming the linux world for intel’s non-free WiFi driver being shit. (in my defense, I had never needed to toggle this setting when the laptop was originally running Windows).
The worst part is that I’m a sysadmin, diagnosing and fixing computer problems should be my specialty. Instead I failed to put in the minimum amount of effort and just wrote the entire thing off as a lost cause. Easier then questioning my own infallibility, I suppose.
A typo in fstab shouldn’t wreck the system. Why is that not resilient ? I added an extra mount point to an empty partition but forgot to actually create it in LVM.
During boot, device not found and boot halted, on a computer with no monitor/keyboard
It will cause a critical error during boot if the device isn’t given the
nofail
mount option, which is not included in thedefaults
option, and then fails to mount. For more details, look in thefstab(5)
man page, and for even more detail, themount(8)
man page.Found that out for myself when not having my external harddrive enclosure turned on with a formatted drive in it caused the pc to boot into recovery mode (it was not the primary drive). I had just copy-pasted the options from my root partition, thinking I could take the shortcut instead of reading documentation.
There’s probably other ways that a borked fstab can cause a fail to boot, but that’s just the one I know of from experience.
Cool ! The default should smarter than bork by default.
Its a ‘failsafe’ , like if part of the system depends on that drive mounting then if it fails then don’t continue. Not the expected default, but probably made sense at some point. Like if brakes are broken don’t allow starting truck, type failsafe.
Yea like the default is smart? How is it supposed to know if that’s critical or not at that point? The alternative is for it to silently fail and wait for something else to break instead of failing gracefully? I feel like I’m growing more and more petty and matching the language of systemd haters but like just think about it for a few minutes???
The system failed for no good reason, failing is exactly what it should never ever do. If it had just continued, everything would have been fine.
Looking at the systems that are supported, it makes the greatest sense to have the safest failure mode as default. If fault tolerance is available, that can be handled in the entry but, it makes sense but to assume. Having that capability built into the default adds more complexity and reduces support for systems that are not tolerant of a missing mount.
Edit: just saw your other comment, so this may not apply to you now…Not that the default is smart, but the default has been set to fail a boot if parts are missing. Imagine a rocket launch system check, is temperature system online, no, fail and abort. While as users – for convenience–we want the system to boot even though a drive went offline, that may not be best default for induatrial applications. Or where another system relylies on first one to be up and coherent. So we have to use the nofail option, to contine the boot on missing drive.
Does indeed sound likely to be an fstab issue, unless system services are being used in a really weird way.
can you get something besides a pi?