Breaking (and Recovering) SELinux

Page content

This is a post about caution and how a simple mistake you can make can blow up a month after you make it. But as with anything, breaking something leads to fix it and learning a lot. This post will cover SELinux file contexts, SELinux tools such as semanage, restorecon, and matchpathcon, the nature of immutable filesystems and some fun quarks of btrfs.

The Beginning

So to start, I run OpenSUSE MicroOS as my server environment at home. This atomic server works great because all of the apps that I run are microservices. I have yet to migrate to Kubernetes, I really need to do that, so I am using podman and it’s systemd integration to run my stack. It actually works surprisingly well since everything from containers, networks, volumes, etc. are all defined in systemd unit files. This makes setup really easy because it all standardized and you can use systemctl and journalctl to manage your containers and logging.

Anyway, one day I get a notification on my phone that my CalDAV service is down. I also noticed I stopped getting emails from my self-hosted email server more info about that here. I was busy doing yard work at the time so I ended up taking a look at it later that evening. I couldn’t SSH into the host, but the network lights were on the box so I had to bust out my emergency keyboard and monitor to locally access the box. Strangely I couldn’t get my keyboard to work, so I had to restart the box from the power button and the keyboard was detected on boot. The dmesg and systemd logs start stream and end with systemd dropping me into emergency mode. Unfortunately MicroOS disables the root account, so you can’t actually use emergency mode (fun!). So I now new something was definitely messed up on the box. Time for a deeper troubleshooting session.

Troubleshooting

This section is a more theatric telling of how I ran into the issues and worked my way to a solution. If you’d like to get to the educational or resolution, you can skip to the resolution section section.

Basic Troubleshooting

So I restarted the box and interrupted grub boot and added init=/bin/sh to the linux arguments to troubleshoot further. After looking at the logs I see a unable to mount /opt/storage error. This is an external drive that I use to store my backups in on a regular basis. So ok, that’s weird. I notice the share isn’t mounted, but that’s normal since I used the shell as the init entry point instead of systemd-init. So I think, let’s just allow the mount to fail. Backups are important, but it was getting kind of late and I wanted to just get things back up and running. Again, I had no email at this point so who knows what emails may have bounced since my SMTP host was down. I open /etc/fstab and add nofail to the defaults. Save and reboot. Once again, same deal. System boots and drops me into an emergency shell, which still doesn’t work.

So I set init=/bin/sh again and start looking at the systemd logs. journalctl -b -1 and begin working backward. I notice that the /opt/storage failure to mount. But it failed as a dependency job for a SELinux relabel job. Ok, that’s weird. Root should have permission to mount this drive. As I scroll up I notice that a lot of file systems also can’t mount. Like /boot/efi and virtual filesystems like /sys are all failing. This is a lot bigger of an issue that I anticipated. I see the relabel job and think, okay maybe a SELinux issue? So I reboot, and set selinux=0 on the grub screen. Sure enough the machine boots as normal and all my services start as expected. Okay, definitely a SELinux issue. Time to deep dive this now.

SELinux Troubleshooting

Ok, so SELinux is confirmed to be the issue. Now, why is SELinux causing issues? Naturally, the catchphrase of openSUSE enthusiasts alike. “Just rollback if there’s an issue!” Snapper allows you to do this for broken updates. So, away I go! Boot up, choose a snapshot from 5 days ago just to be safe and… same issue. Dropped unto an emergency shell, same issue with permissions not allowing the /opt/storage drive from being mounted. Okay.. that’s strange. I run rpm -q selinux-policy-targeted kernel-default. Sure enough there’s an update for this package the day before my box broke. I have my server set to restart every night after running updates. So, a broken policy was pushed out? I pull up google, the openSUSE forums, and the openSUSE subreddit. But, no policy issues that anyone else is running into. Ok, well, that would’ve been too easy anyway. I like a good challenge!

So I try again, but I use a snapshot with an older kernel version, just to be double sure that I am using a previous selinux policy package and previous kernel version. Same thing, emergency shell and permission denied to mount /opt/storage. So, this SELinux permission issue is permeating all my previous snapshots too. This is really not good. So my next idea is to check some SELinux labels to make sure things are working as expected. I run ls -alhZ /opt/storage and get: system_u:object_r:container_file_t:s0. Ok, that’s right. My backups use a container so, I need that context to write to my storage location. I run ls -alhZ /opt to see if there’s something wrong there and I see system_u:object_r:container_file_t:s0. Ok, that’s not right. I run matchpathcon -V /opt and get /opt verified. Oh boy, something is overwriting these file contexts to system_u:object_r:container_file_t:s0. I run a quick sestatus -vwell, SELinux is disabled. So I reboot again and passenforcing=0at the grub boot, this way SELinux is enabled, but put in permissive mode. After this I trysestatus -v` again.

SELinux status:                enabled
SELinuxfs mount:               /sys/fs/selinux
SELinux root directory:        /etc/selinux
Loaded policy name:            targeted
Current mode:                  permissive
Mode from config file:         enforcing
Policy MLS status:             enabled
Policy deny_unknown status:    allowed
Memory protection checking:    actual (secure)
Max kernel policy version:     35

Process Contexts:
Current Context:               unconfined_u:unconfined_r:unconfined_t:s0
Init context:                  system_u:object_r:kernel_t:s0
/sbin/agetty                   system_u:object_r:kernel_t:s0-s0:c0.c1023
/usr/sbin/sshd                 system_u:object_r:kernel_t:s0-s0:c0.c1023

File Contexts:
Controlling terminal:          unconfined_u:object_t:user_devpts_t:s0
/etc/passwd                    system_u:object_r:container_file_t:s0
/etc/shadow                    system_u:object_r:container_file_t:s0
/bin/bash                      system_u:object_r:container_file_t:s0
/bin/login                     system_u:object_r:container_file_t:s0
/bin/sh                        system_u:object_r:container_file_t:s0 -> system_u:object_r:shell_exec_t:s0
/sbin/agetty                   system_u:object_r:container_file_t:s0
/sbin/init                     system_u:object_r:container_file_t:s0 -> system_u:object_r:container_file_t:s0
/usr/sbin/sshd                 system_u:object_r:container_file_t:s0

Oh no… That’s definitely not good

Root Cause Analysis (RCA)

Ok, so from the smoking gun above it’s obvious the issue is that all of /etc was labeled with the “container file” context. Turns out every directory on my box was marked with this context. /etc, /usr, /lib, /sbin, /sys. You name it, it had this applied. Even /.snapshots which is definitely not good. So at this point it’s clear something changed and triggered a SELinux relabel event. This was likely because the selinux-policy-targeted package was updated. So we know what happened, so some reason the entirety of my box got hit with the “container file” context during an autorelabel event from a package update. But why did everything get relabeled to this? matchpathcon /etc /sys /opt /etc/passwd /etc/shadow all comes back with verified. So, bad policy? No, because there would be a much larger fuss on the internet if OpenSUSE pushed something like this out. Plus when reviewing the targeted policy files they all look good:

# grep -i "/etc/passwd" /etc/selinux/targeted/contexts/file/file_contexts
/etc/passwd[-\+]?      --    system_u:object_r:passwd_file_t:s0

So, the policy is right, but when I run matchpathcon /etc/passwd it comes back as being correct labeled as: system_u:object_r:container_file_t:s0. So.. local override?

I pop open /etc/selinux/targeted/contexts/files/file_contexts.local and this is what I’m greeted with:

# This file is auto-generated by libsemanage
# Do not edit directly

/var/lib/containers/storage/volumes/systemd-calibre-wa-ingest/_data(/.*)?  system_u:object_r:container_file_t:s0
(/.*)?                                                                     system_u:object_r:container_file_t:s0

See the problem? Here, let me highlight it for you:

# This file is auto-generated by libsemanage
# Do not edit directly

/var/lib/containers/storage/volumes/systemd-calibre-wa-ingest/_data(/.*)?  system_u:object_r:container_file_t:s0
(/.*)?                                                                     system_u:object_r:container_file_t:s0

This translates to everything under “/” which is… everything. So now we see how this happened. A new policy package was installed, an autorelabel was triggered and the new local override relabed everything with context “container file” as it was told to do. Well, how did this line get here? The file_contexts.local file is essentially where your semanage fcontext rules are added, so let’s check history. To see how this happened:

# history | grep fcontext
885 2026-05-28 23:42:32 semanage fcontext -a -t container_file_t "/var/lib/containers/storage/volumes/systemd-calibre-wa-ingest/_data(/.*)?"
889 2026-05-28 23:44:09 semanage fcontext -a -t container_file_t "$MOUNTPOINT(/.*?)"
893 2026-05-28 23:44:37 semanage fcontext -a -t container_file_t "$MOUNTPOINT(/.*?)"
1002 2026-06-23 22:05:25 history | grep fcontext

It was at this point that I 🤦. Unbound variables strike again! In bash, if a variable is unbound (or undefined) it is treated like an empty string. So "$MOUNTPOINT(/.*?)" is interpolated as “(/.*?)”. *sigh*. Well at least I have my answer. So the full RCA for this issue is:

  1. On May 28th I ran an fcontext command that essentially overrode the context files to label everything as: container_file_t. But, this did not immediately apply.
  2. Skip to almost a month later in June 21st. A policy file update triggers a SELinux relabel event and a reboot, as normal during a patching process with a SELinux policy/kernel update
  3. The machine reboots as normal and on boot relabels everything with context container_file_t immediately breaking everything and dropping us into the emergency environment, which still doesn’t work (as expected).

Resolution

Here’s the resolution to the RCA above. It’s fairly easy, but does have some interesting gotchas and catches. If you don’t want to continue to read my struggles please skip to the Real Solution section, but know you’re a party pooper.

Fixing fcontext

So the first step is to remove that bad local rule. Below the code block should clean this up nicely.

# delete the bad rule
semanage fcontext -d "(/.*)?"

# confirm it's gone from the file
cat /etc/selinux/targeted/contexts/files/file_contexts.local

# rebuild the policies with the new rules
semanage -B

# auto relabel on boot
touch /etc/selinux/.autorelabel

reboot

So, I reboot and… get dropped into an emergency shell. deep breaths now. Ok, autorelabel is not working. Let’s do it manually. Reboot, grub set enforcing=0 to get back in and I open a snapshot for relabeling since root is read-only, by design. transactional-update shell. Ok, I run a restorecon -Rv /etc /bin /sbin /usr /lib /lib64 /boot. The logs fly out in front of me and it takes around 3 minutes for it to complete. I type exit, the snapshot is saved. I reboot and… get dropped into an emergency shell. *loud frustrated exhale*. Ok, WHY?? Reboot, grub enforcing=0 drop to the shell run ls -alhZ on /etc/passwd and it still shows context “container file”. I just relabeled it. matchpathcon -V /etc/passwd shows that it will be relabeled to it’s proper context /etc/passwd system_u:object_r:container_file_t:s0 -> system_u:object_r:passwd_file_t:s0. I run restorecon -v /etc/passwd and get the following output: /etc/passwd not reset as customized by admin to system_u:object_r:container_file_t:s0. Uhh, okay? So apparently removing the local override still doesn’t fix the problem because SELinux thinks I overrode the rules on purpose…

So a good old restorecon -Fv /etc/passwd and the context is changed! We’ll get back to this in one second. But, I see I need the -F to finally fix this issue so I run transactional-update shell and then a restorecon -FRv / and see all the files get updated. I type exit the snapshot is finalize and I reboot. The box restarts and loads! I get dropped into a normal shell. I pack up my monitor, my keyboard, and go back to my desktop. I confirm my microservices are working properly and they are! Case close. I open up the terminal on my desktop to SSH into my box and I get a “ssh: connect to host X.X.X.X port 22: Connection refused”. *sigh* Time to bring the “crash” cart back down 2 flights of stairs.

###Real Solution

Ok, here’s the full solution to the problem. I get back into the box and run find to see if there’s any stragglers that weren’t relabled earlier. I used find / -context "*container_file_t* -not -path "/var/lib/containers/*" -not -path "/opt/storage/*" -not -path "/proc/*" -not -path "/sys/*" -not -path "/.snapshots/*" -not -path "/home/<user>/.local/share/containers/*" It turns out some ssh binaries and config files still have the container_file_t context. So, after doing everything it turns out that by the nature of snapper snapshots (which is really a chroot) that restorecon doesn’t really behave in chroots and can be unpredictable.

But, here’s the learning part. MicroOS uses brtfs as it’s way of remaining atomic in nature. So it creates a new btrfs snapshot, does any actual root file system changes in that snapshot, commits it, and when you reboot it uses that subvolume. It’s marked read-only on the subvolume level not the Virtual Filesystem Switch layer. What does this mean? Well, unlike XFS or EXT4 when you mount a btrfs subvolume as ro structural changes are blocked. This includes creating, deleting, or update files. However, xattrs and metadata writes are permitted. Since SELinux contexts are stored as xattrs on the filesystem this means that, only in btrfs, you can edit this metadata even on a read-only root.

So like before running restorecon -FRv / on the root (not through a transaction-update shell) will fix these contexts because of this unique metadata write being permitted in btrfs. I don’t really know how to feel about this feature. If you attempted this in XFS or EXT4 you’d get a EROFS error because read-only is truly read-only. So you’d have to remount the root as rw in order to make any SELinux context changes on the root filesystem.

Final words

So this is a great example of laying a landmine for yourself that goes off almost a month later with little to no explanation as to why. The lesson being, when you run code always ensure your variables are populated with values. There’s a great Youtube video called “How A Steam Bug Deleted Someone’s Entire PC” by Kevin Fang that shows a very similar situation as this one. This whole experience was an interesting learning opportunity for me as I really haven’t delt with SELinux that much. Especially file contexts and relabeling issues like this. I also got to learn some interesting quirks about how different filesystems handle extended attributes and how, at least in btrfs, read-only isn’t truly “read-only”