TL, DR
Images compiled with the devicemapper backend have some strange permissions issues on the Google Container Engine AUFS Backend.
I’m currently reproducing a minimum possible test case, but it’ll take a while, and I’m going for lunch now.
Update
I had further permissions issues with folders that were 770 when they were added to the container, despite chowning + chmodding inside the container and other voodoo hackery.
I solved that by setting the uid + guid of the user running the process inside the container on the files outside the container, before adding them.
Might have also solved this other issue, but I guess we’ll never know!
The Story
Background
I really like the idea of deploying immutable containers into production, and am currently trying to sell my work place on the idea (they’re somewhat already sold, but not all the way just yet). We specialize in an e-commerce system called “Magento”, which runs on PHP, which we serve through NGINX.
I’m using Kubernetes via Google’s excellent Container Engine as a playground to test things, and I’m not running Kubernetes locally — Just the containers.
Also, I’m using Arch Linux as a dev environment. ❤ Arch.
The weird bit
I initially ran into issues with PHP not reading files that were clearly in the container. Given this is an evening project that I didn't want to spend a heinous amount of time on, I ran PHP as root and left it with that. (I know, it’s a bad idea, but I just wanted a proof of concept — which I got)
The pod went together, everything got up and running, but there’s no CSS (or indeed, static resources at all). Being foolish I presumed I’d just forgotten to add it to the NGINX container (initially I had, but that’s not the fun problem) but the problem was a little more interesting:
$ # Local
$ kubectl logs magento-[HASH] -c nginx
2016/01/30 02:26:20 [crit] 7#7: *2 stat() "[REDACTED]/pub/static/frontend/Magento/luma/en_US/mage/calendar.css" failed (13: Permission denied), client: [REDACTED], server: _, request: "GET /static/frontend/Magento/luma/en_US/mage/calendar.css HTTP/1.1", host: "[REDACTED]
Super weird. Maybe the permissions are set up wrong? Let’s go check:
$ # Local
$ kubectl exec -it magento-[HASH] -c nginx /bin/bash
$ # Within a container
$ apt-get update && apt-get install -y sudo
$ sudo -u www-data /bin/bash
$ # The truly strange part
$ cd [REDACTED]/pub/static/frontend/Magento/
$ ls -l
ls: cannot access luma: Permission denied
ls: cannot access blank: Permission denied
total 0
d????????? ? ? ? ? ? blank
d????????? ? ? ? ? ? luma
What the hell? I have never seen that. Still, maybe www-data doesn't own the path, or doesn't have sufficient permissions. So,
# Within a container
$ ls ../
total 4
drwxrwx — — 6 www-data www-data 4096 Jan 22 12:00 Magento
Nup. Looks fine. Okay, so, maybe it’s broken? Let’s go take a look as root.
# Within a container
$ exit
$ cd [REDACTED]/pub/static/frontend/Magento
total 8
drwxrwx — — 4 www-data www-data 4096 Jan 22 12:00 blank
drwxrwx — — 4 www-data www-data 4096 Jan 22 11:58 luma
No, it’s fine. Getting super weird, but I could have just derped something.
$ # Within a container
$ sudo -u www-data /bin/bash
$ ls
total 8
drwxrwx--- 4 www-data www-data 4096 Jan 22 12:00 blank
drwxrwx--- 4 www-data www-data 4096 Jan 22 11:58 luma
What the hell?!
Enter the directory as www-data. Can’t access the files, everything is broken
Enter the directory as root. Files are fine, freely accessible
Enter the directory as www-data again. Now I can access the files?!
At this point, I concluded I’m nuts, and repeated that process several more times — Always with the same result. Once accessed by root, the files are accessible by www-data, and NGINX doesn't have any more problems serving the files. Time to check my work — maybe there’s something wrong with that build on the container?
$ # Local
$ docker run gcr.io/[NAMESPACE]/nginx:[TAG]
$ # Another window, Local
$ docker exec -it [HASH] /bin/bash
$ apt-get update && apt-get install -y sudo
$ sudo -u www-data /bin/bash
$ ls [REDACTED]/pub/static/frontend/Magento/
total 8
drwxrwx--- 4 www-data www-data 4096 Jan 22 12:00 blank
drwxrwx--- 4 www-data www-data 4096 Jan 22 11:58 luma
Some ls’ing and cat’ing around for a while I concluded local is fine, and the problem is probably environmental.
The annoyingly simple solution
So, some Googling later, I found that Docker has a pluggable storage backend. Maybe something weird is going on there? So, let’s make sure there the same:
$ # Local
$ docker info
$ ... # Boring stuff
$ Storage Driver: devicemapper # <-- Not what I expected, but who cares.
$ ssh [PROD]
$ sudo docker info
$ ... # Boring stuff
$ Storage Driver: aufs # <-- Uh Oh
My initial plan was to install AUFS on my local copy of Arch. But it’s not available compiled and compiling it was taking too long — So, instead I changed prod docker to use devicemapper instead.
Problem solved. So anti-climatic! Still, it’s nice to know why it wasn’t working.
Lessons Learned
Docker provides heaps of guarantees that your environments will be the same. But they’re not going to be entirely the same — This was a super fun bug that I hadn’t expected at all.
Notes
I skipped some boring parts which involved learning more about docker.
I rewrote the commands (and some of the output), rather than copying them out of the terminal. I figure there’s no reason you need to know that much about my environment.
There was more than a little googling of the exact permissions errors. If I jump from “not knowing something” to “knowing something”, there was a search that gave me a hint. I apologise, please, point that out, and I’ll amend my story.