In general, the easiest safe approach is to do everything in your Dockerfile as the root user until the very end, at which point you can declare an alternate USER that gets used when you run the container.
FROM ???
# Debian adduser(8); this does not have a specific known uid
RUN adduser --system --no-create-home nonroot
# ... do the various install and setup steps as root ...
# Specify metadata for when you run the container
USER nonroot
EXPOSE 12345
CMD ["my_application"]
For your more specific questions:
Is installing packages with apt-get as root ok?
It's required; apt-get won't run as non-root. If you have a base image that switches to a non-root user you need to switch back with USER root before you can run apt-get commands.
Best location to install these packages?
The normal system location. If you're using apt-get to install things, it will put them in /usr and that's fine; pip install will want to install things into the system Python site-packages directory; and so on. If you're installing things by hand, /usr/local is a good place for them, particularly since /usr/local/bin is usually in $PATH. The "user home directory" isn't a well-defined concept in Docker and I wouldn't try to use it.
When installing python packages with pip as root, I get the following warning...
You can in fact ignore it, with the justification you state. There are two common paths to using pip in Docker: the one you show where you pip install things directly into the "normal" Python, and a second path using a multi-stage build to create a fully-populated virtual environment that can then be COPYed into a runtime image without build tools. In both cases you'll still probably want to be root.
Anything else I am missing or should be aware of?
In your Dockerfile:
## get UID/GID of host user for remapping to access bindmounts on host
ARG UID
ARG GID
This is not a best practice, since it means you'll have to rebuild the image whenever someone with a different host uid wants to use it. Create the non-root user with an arbitrary uid, independent from any specific host user.
RUN usermod -aG sudo flaskuser
If your "non-root" user has unrestricted sudo access, they are effectively root. sudo has some significant issues in Docker and is never necessary, since every path to run a command also has a way to specify the user to run it as.
RUN chown flaskuser:users /tmp/requirements.txt
Your code and other source files should have the default root:root ownership. By default they will be world-readable but not writeable, and that's fine. You want to prevent your application from overwriting its own source code, intentionally or otherwise.
RUN chmod -R 777 /usr/local/lib/python3.11/site-packages/*
chmod 0777 is never a best practice. It gives a place for unprivileged code to write out their malware payloads and execute them. For a typical Docker setup you don't need chmod at all.
The bind mounted workspace is only for development, for a production image I would copy the necessary files/artifacts into the image/container.
If you use a bind mount to overwrite all of the application code with content from the host, then you're not actually running the code from the image, and some or all of the Dockerfile's work will just be lost. This means that, when you go to production without the bind mount, you're running an untested setup.
Since your development environment will almost always be different from your production environment in some way, I'd recommend using a non-Docker Python virtual environment for day-to-day development, have good (pytest) unit tests that can run outside the container, and do integration testing on the built container before deploying.
Permission issues can also come up if your application is trying to write out files to a host directory. The best approach here is to restructure your application to avoid it, storing the data somewhere else, like a relational database. In this answer I discuss permission setup for a bind-mounted data directory, though that sounds a little different from what you're asking about here.
Answer from David Maze on Stack Overflowdockerfile - Docker non-root User Best Practices for Python Images? - Stack Overflow
The Perfect Python Dockerfile - better performance and security
Those are nice and surprising insights, especially on the performance side.
I wonder however what's the point of using virtual envs in a docker machine ?
Virtual envs are necessary on your local machine to separe different projects with different requirements, but in the docker context your code should be fairly isolated.
More on reddit.comBest practices with Python in Docker
The Perfect Python Dockerfile - better performance and security
Videos
Having a reliable Dockerfile as your base can save you hours of headaches and bigger problems down the road.
https://luis-sena.medium.com/creating-the-perfect-python-dockerfile-51bdec41f1c8
This article shares a Dockerfile base that has been battle-tested through many different projects.
This can also serve as a succinct tutorial of the different features/commands used to improve the final image.
Nothing is perfect I know! Please feel free to provide any feedback and we can iterate on the shared Dockerfile if needed.
In general, the easiest safe approach is to do everything in your Dockerfile as the root user until the very end, at which point you can declare an alternate USER that gets used when you run the container.
FROM ???
# Debian adduser(8); this does not have a specific known uid
RUN adduser --system --no-create-home nonroot
# ... do the various install and setup steps as root ...
# Specify metadata for when you run the container
USER nonroot
EXPOSE 12345
CMD ["my_application"]
For your more specific questions:
Is installing packages with apt-get as root ok?
It's required; apt-get won't run as non-root. If you have a base image that switches to a non-root user you need to switch back with USER root before you can run apt-get commands.
Best location to install these packages?
The normal system location. If you're using apt-get to install things, it will put them in /usr and that's fine; pip install will want to install things into the system Python site-packages directory; and so on. If you're installing things by hand, /usr/local is a good place for them, particularly since /usr/local/bin is usually in $PATH. The "user home directory" isn't a well-defined concept in Docker and I wouldn't try to use it.
When installing python packages with pip as root, I get the following warning...
You can in fact ignore it, with the justification you state. There are two common paths to using pip in Docker: the one you show where you pip install things directly into the "normal" Python, and a second path using a multi-stage build to create a fully-populated virtual environment that can then be COPYed into a runtime image without build tools. In both cases you'll still probably want to be root.
Anything else I am missing or should be aware of?
In your Dockerfile:
## get UID/GID of host user for remapping to access bindmounts on host
ARG UID
ARG GID
This is not a best practice, since it means you'll have to rebuild the image whenever someone with a different host uid wants to use it. Create the non-root user with an arbitrary uid, independent from any specific host user.
RUN usermod -aG sudo flaskuser
If your "non-root" user has unrestricted sudo access, they are effectively root. sudo has some significant issues in Docker and is never necessary, since every path to run a command also has a way to specify the user to run it as.
RUN chown flaskuser:users /tmp/requirements.txt
Your code and other source files should have the default root:root ownership. By default they will be world-readable but not writeable, and that's fine. You want to prevent your application from overwriting its own source code, intentionally or otherwise.
RUN chmod -R 777 /usr/local/lib/python3.11/site-packages/*
chmod 0777 is never a best practice. It gives a place for unprivileged code to write out their malware payloads and execute them. For a typical Docker setup you don't need chmod at all.
The bind mounted workspace is only for development, for a production image I would copy the necessary files/artifacts into the image/container.
If you use a bind mount to overwrite all of the application code with content from the host, then you're not actually running the code from the image, and some or all of the Dockerfile's work will just be lost. This means that, when you go to production without the bind mount, you're running an untested setup.
Since your development environment will almost always be different from your production environment in some way, I'd recommend using a non-Docker Python virtual environment for day-to-day development, have good (pytest) unit tests that can run outside the container, and do integration testing on the built container before deploying.
Permission issues can also come up if your application is trying to write out files to a host directory. The best approach here is to restructure your application to avoid it, storing the data somewhere else, like a relational database. In this answer I discuss permission setup for a bind-mounted data directory, though that sounds a little different from what you're asking about here.
Thanks again for your extensive explanations David.
I had to digest all of that and after some more reading on the topic I finally grasped everything you said (so I hope).
The reason I first added the user with a UID/GID matching the host user was, that when I started, I ran my containers on my NAS, which only allows to SSH with root. So running the container with root while the project folder is owned by another user would result in permission issues when the Container-user was trying to access the bind mounted files. Back then I did not quite understand all of that so I carried a false thought along that the container user must always match the host user id.
So I have changed my Dockerfile to use an arbitrary user like you suggested, removed all the unnecessary chown/chmod and I can run this successfully on my local macbook and on a VPS I am currently testing out.
## ################################################################
## WEB Builder Stage
## ################################################################
FROM python:3.10-slim-buster AS builder
## ----------------------------------------------------------------
## Install Packages
## ----------------------------------------------------------------
RUN apt-get update \
&& apt-get install -y libmariadb3 libmariadb-dev \
&& apt-get install -y gcc \
## cleanup
&& apt-get clean \
&& apt-get autoclean \
&& apt-get autoremove --purge -y \
&& rm -rf /var/lib/apt/lists/*
## ----------------------------------------------------------------
## Add venv
## ----------------------------------------------------------------
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
## ----------------------------------------------------------------
## Install python packages
## ----------------------------------------------------------------
COPY ./requirements.txt /tmp/requirements.txt
RUN python3 -m pip install --upgrade pip \
&& python3 -m pip install wheel \
&& python3 -m pip install --disable-pip-version-check --no-cache-dir -r /tmp/requirements.txt
## ################################################################
## Final Stage
## ################################################################
FROM python:3.10-slim-buster
## ----------------------------------------------------------------
## add user so we can run things as non-root
## ----------------------------------------------------------------
RUN adduser flaskuser
## ----------------------------------------------------------------
## Copy from builder and set ENV for venv
## ----------------------------------------------------------------
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
## ----------------------------------------------------------------
## Set Python ENV
## ----------------------------------------------------------------
ENV PYTHONUNBUFFERED=1 \ PYTHONPATH="${PYTHONPATH}:/workspace/web/app:/opt/venv/bin:/opt/venv/lib/python3.10/site-packages"
## ----------------------------------------------------------------
## Copy app files into container
## ----------------------------------------------------------------
WORKDIR /workspace/web
COPY . .
## ----------------------------------------------------------------
## Switch to non-priviliged user and run app
## the entrypoint script runs either uwisg or flask dev server
## depending on FLASK_ENV
## ----------------------------------------------------------------
USER flaskuser
CMD ["/workspace/web/docker-entrypoint.sh"]
If I want to run the container on my NAS (from the NAS host CLI with root) using bind mounts, I can still do so by using a docker-compose.override.yml that will contain
myservice:
user: "{UID}:{GID}"
where "{UID}:{GID}" are matching my host user who owns the bind mounted folder.
But I am also gonna change this. I am developing and testing only locally now and might use my NAS as sort of first integration environment where I will just test the fully built containers/images pulled from a registry (so no need for bind mounts anymore.
I also started to use multistage builds, which, besides making the final images way smaller should hopefully decrease the attack surface by not including unnecessary build dependencies.
Having a reliable Dockerfile as your base can save you hours of headaches and bigger problems down the road.
This article shares a Dockerfile base that has been battle-tested through many different projects.
https://luis-sena.medium.com/creating-the-perfect-python-dockerfile-51bdec41f1c8
This can also serve as a succinct tutorial of the different features/commands used to improve the final image.
Nothing is perfect I know! Please feel free to provide any feedback and we can iterate on the shared Dockerfile if needed.
Those are nice and surprising insights, especially on the performance side.
I wonder however what's the point of using virtual envs in a docker machine ?
Virtual envs are necessary on your local machine to separe different projects with different requirements, but in the docker context your code should be fairly isolated.
This:
WORKDIR /home/myuser
USER myuser
RUN mkdir code
ADD . code
WORKDIR code
should probably better be:
USER myuser
RUN mkdir /home/myuser/code
WORKDIR /home/myuser/code
COPY . .
Cleaner in my opinion. Most importantly, COPY is preferred over ADD.
I’m looking for the best approach to manage a Python program running inside a container. Specifically, I’m interested in:
• Properly starting the program. • Handling logging (ideally using syslog or similar). • Enabling automatic restarts (similar to how I previously used systemctl).
What are the recommended tools and practices for achieving this in a containerized environment?