You can try the python:{version}-alpine version. It's much smaller:
>> docker image ls |grep python
python 3.6-alpine 89.4 MB
python 3.6 689 MB
python 3.5 689 MB
python 3.5.2 687 MB
python 3.4 833 MB
python 2.7 676 MB
At time of writing it looks like the official image supports -alpine on all python versions.
https://hub.docker.com/_/python/
Answer from toast38coza on Stack OverflowWhy is the python docker image so big (~750 MB)? - Stack Overflow
Alpine vs python-slim for deploying python data science stack?
Breaking change introduced in python:3.8-slim
How to make lightweight docker image for python app with pipenv - Stack Overflow
Videos
You can try the python:{version}-alpine version. It's much smaller:
>> docker image ls |grep python
python 3.6-alpine 89.4 MB
python 3.6 689 MB
python 3.5 689 MB
python 3.5.2 687 MB
python 3.4 833 MB
python 2.7 676 MB
At time of writing it looks like the official image supports -alpine on all python versions.
https://hub.docker.com/_/python/
Alpine Linux is a very lean distro avaliable for Docker. Without Python, it's around 5MB. With Python I'm getting images between 60 and 120 MB. The following Dockerfile yields a 110 MB image.
FROM alpine:3.4
RUN apk --update add \
build-base python-dev \
ca-certificates python \
ttf-droid \
py-pip \
py-jinja2 \
py-twisted \
py-dateutil \
py-tz \
py-requests \
py-pillow \
py-rrd && \
pip install --upgrade arrow \
pymongo \
websocket-client \
XlsxWriter && \
apk del build-base python-dev && \
rm -rf /var/cache/apk/* && \
adduser -D -u 1001 noroot
USER noroot
CMD ["/bin/sh"]
Also, it's very well mantained.
A word of warning, though. Alpine uses musl libc instead of glibc, and some Python modules rely on glibc, but this usually isn't a problem.
A bigger issue is, that because of this, manylinux wheels are not avaliable for Alpine, and therefore the modules need to be compiled upon installation (pip install). In some cases this can make a difference in build time between 20 seconds on Debian and 9 minutes or more on Alpine. The grpcio-module is notorious for that; it takes forever to compile.
There is a (somewhat unreliable) workaround where you tell Python that it is manylinux compatible.
The problem
I work on a small data science team in healthcare. We've been using Docker to deploy our data science models and ETL pipelines for some time now. Our team's senior engineer (dev1) has pushed us to use alpine due to its perceived size and security benefits.
We've recently received feedback from an experienced python/docker developer (dev2) that we should use python-slim instead of alpine. Dev2 says that alpine python images tend to be bigger and slower. Dev2 points out that wheels aren't built for alpine which means everything has to be built from scratch. Dev2 suggests that if you are concerned about security python-slim should be just as good as alpine. Dev2 points to the infamous Using Alpine can make Python Docker builds 50× slower as definitive proof that alpine should not be used for python docker images.
Dev1 insists that this blog post was contrived. Dev1 says that the reason for the additional slowness is that they haven't built optimized muslc wheels for their benchmarks which causes the performance loss. Dev1 has spent a lot of time trying to build custom alpine images with all of the libraries we need. Dev1 has not produced tangible results after many months trying to work through the technical challenges.
The details
-
Because we're deploying mostly sklearn models, and sklearn models are typically distributed as pickle files, we need to pin the versions of our Python libraries (NumPy, scipy, sklearn, pandas).
-
Data is typically ingested from an MsSQL server so we need to also package closed source client libraries in our containers, we'd like to figure out a way to publish our containers publicly so that we can open source our work but we haven't yet figured out a way to do this and include client libraries for the various closed source database drivers.
-
Each docker image needs to come in matching pairs. One image that is for development that has jupyter in it so that data scientists can create models and a matching production image that only has the minimum set of libraries for deploying these models into production.
-
We do have limited computing resources so the size of the image does matter, but probably not as much as security.
The question
Who's right, dev1 or dev2? What's the best option for us given the requirements outlined? Thanks for your help!
The problem comes when you need things like ciso8601, or some libraries, requiring build process. Build tools are not "incorporated" into the both slim and alpine variants, for low-size footprint.
So to install deps, you'll have to:
- Install build tools
- Deploy dependencies from Pipfile.lock system-wide
- Uninstall build tools and clean caches
And do that 3 actions inside a single RUN layer, like following:
FROM python:3.7-slim
WORKDIR /app
# both files are explicitly required!
COPY Pipfile Pipfile.lock ./
RUN pip install pipenv && \
apt-get update && \
apt-get install -y --no-install-recommends gcc python3-dev libssl-dev && \
pipenv install --deploy --system && \
apt-get remove -y gcc python3-dev libssl-dev && \
apt-get autoremove -y && \
pip uninstall pipenv -y
COPY app ./
CMD ["python", "app.py"]
- Manipulating build system would cost you around 300MiB and some extra time
- Uninstalling pipenv would save you another 20MiB (which is 10% of resulting size).
- Separating
RUNcommands would not delete data from layers, and would result in ~500MiB image. That's docker specifics.
So that would result in perfectly working ~200MiB sized image, which is
- 5 times less than original
python:3.7, (that is >1.0GiB) - Has no alpine incompabilities (these are typically tied to glibc replacement)
At the time, we're fine with slim (debian buster) build variants, preferring slim over alpine (for most compatibility). If you're really up to further size optimization, I'd recommend you to take a look at some excellent builds of these guys:
- Alpine Python
- 12.7MiB MariaDB
How about,
FROM python:3.7-alpine
WORKDIR /myapp
COPY Pipfile* ./
RUN pip install --no-cache-dir pipenv && \
pipenv install --system --deploy --clear
COPY src .
CMD ["python3", "app.py"]
- It utilises the smaller Alpine version.
- You won't have any unnecessary cache files left over using
--no-cache-diroption forpipand--clearoption forpipenv. - You also deploy outside of venv.
You can also add && pip uninstall pipenv -y after pipenv install --system --deploy --clear in the same RUN command to eliminate space taken by pipenv if that extra image size bothers you.