Docker is an open source platform that enables you to build, deploy, run, update and manage containerized applications easily. It lets you create and manage containers which contains everything the application/software needs to run, such as code, libraries, runtime, file system, etc.

Virtualization and Containerization are two popular ways to host an application in a machine or computer.

Virtualization:

Virtualization allows us to partition or split a single physical computer into several VMs. Each of these VMs can then work independently and run different operation systems and applications while sharing the resources of a single computer. This is possible due to an intermediary layer using software known as a hypervisor, which makes virtualization possible.. It divides the underlying physical computer into multiple VMs and allocates and manages resources in each divided virtual environment.

Containerization:

Containerization is basically another form of virtualization. It lets you run applications and their dependencies in isolated containers, but unlike VMs containers use the same operating system kernel as the host machine. This provides a portable and consistent runtime for applications and it’s more lightweight compared to VMs since only one operating system needs to be managed which allows you to quickly scale up or down based on demand. These containers can be run on any machine with a container runtime, such as Docker. These container runtimes provide an isolated environment for running applications consistently across different environments.

Virtualization	Containerization
Each VM runs its own operating system	Containers share the host’s operating system kernel
Each VM has its own set of resources	Containers are lightweight and share the host machine’s resources
Performance overhead since multiple OSs needs to be managed	Lower performance overhead since they share the same host OS
VVMs are less portable due to varying guest OSs	Containers are highly portable across different systems
Slower deployment times due to OS boot process	Faster deployment times since containers start quickly
Requires more resources as each VM has its own OS	More efficient resource utilization with containerization

Why containerize an application ?

Different operating systems: Developers and users often have different operating system which can lead to compatibility issues when running applications.
Varying project setups: The steps required to run the same application can be different depending on the user’s operating system, leading to alot of friction while setting up the application on their machines.
Dependency management: As applications grow in complexity keeping track of all the dependencies and ensuring everything is installed correctly across different environment becomes a difficult task.

Benefits of containers:

Single configuration file: Containers allow you to describe your application’s configuration, dependencies and runtime environment in a single file (e.g Docker file), making it easier to manage and reproduce environments.
Isolated environments: Each container runs in a separate isolated environment, ensuring their dependencies and configuration do not conflict with other applications or the host system.
Portability and local setup simplification: Containers make it easy for us to setup and run projects locally regardless of what operation system or environment we are currently using. This ensures consistent development experiences.
Auxiliary Services and Databases: Containers simplify the installation and management of auxiliary services and databases required for your projects, like MongoDB, PostgreSQL, etc.
Orchestration and scaling: Containers are lightweight and we can just launch alot of them for scaling our services. This is where container orchestration tools like Kubernetes come into the picture.

There are container management tools other than Docker, like Podman, Buildah, BuildKit, etc.

Docker Image

Docker image is a standalone, lightweight executable package that contains everything you need to run an application. This includes code, runtime, libraries, environment variables and configuration files.

For easier understanding we can think of a docker image similar to your github repository. Your github repository contains all the necessary files and dependencies it needs to run your application. Similarly docker image contains everything required to run a specific piece of software or application.

Docker images are built from a set of instructions called a Dockerfile. The Dockerfile specifies the steps to create the image, such as installing dependencies, copying files, and setting environment variables.

Docker Container

A Docker container is a running instance of a Docker image. It encapsulates the application or service and its dependencies, running in an isolated environment.

A good mental model for understanding a Docker container is to think of it as when you run node index.js on your machine from some source code you got from GitHub. Just like how running node index.js creates an instance of your application, a Docker container is an instance of a Docker image, running the application or service within an isolated environment.

Docker containers are created from Docker images and can be started, stopped, and restarted as needed. Multiple containers can be created from the same image, each running as an isolated instance of the application or service.

Docker Image: A lightweight, standalone package that contains everything needed to run a piece of software, similar to a codebase on GitHub.

Docker Container: A running instance of a Docker image, encapsulating the application or service and its dependencies in an isolated environment, similar to running node index.js from a codebase.

Docker Architecture

Docker uses client-server architecture which consists of:

Docker daemon: The docker daemon(dockerd) is a long running server which listens for Docker API requests and manages Docker objects such as images, containers, networks and volumes. This is a background process to manage containers on the host machine. A daemon can also communicate with other daemons to manage Docker services.
Docker client: The docker client(docker) is the primary way that many Docker users interact with Docker. When you use docker commands such as docker run, the client sends these commands to dockerd, which carries then out. The docker command uses the Docker API. The docker client can communicate with more than one daemon.
Docker registry: A docker registry stores Docker images. Docker hub is the largest public registry and Docker looks for images on Docker Hub by default. You can also run your own private registry.
Docker objects: When you use Docker, you are creating and using images, containers, networks, volumes, plugins, and other objects.

Links:

Dockerfile and Cheatsheet

If we want to create an image from a docker file. The docker file is a text file that contains instructions for building a docker image. It defines the base image, sets up the environment, copies files into the image, and installs any necessary dependencies.

It has two parts:

The first part is the FROM instruction, which specifies the base image to use for the new image.
The second part is the RUN instruction, which specifies the commands to run when building the image.

Here is an example of a Dockerfile:

# base image
FROM node:16-alpine
# working directory
WORKDIR /app
# copy files to working dir
COPY . .

# install dependencies
RUN npm install
# build the app
RUN npm run build

# expose port 3000
EXPOSE 3000

# command to run the app
CMD ["node", "dist/index.js"]

WORKDIR - Sets the working directory for any RUN, CMD, ENTRYPOINT, COPY instructions that follow it.
RUN - Executes any commands in a new layer on top of the current image and commits the results.
CMD - Provides defaults for executing a container. There can only be one CMD instruction in a Dockerfile.
EXPOSE - Informs Docker that the container listens on the specified network ports at runtime.
ENV - Sets the environment variable.
COPY - Allow files from the Docker host to be added to the Docker image.

To build a docker image from a Dockerfile, use the docker build command.

docker build -t image_name .

To run the container, use the docker run command.

docker run -p 3000:3000 image_name

We can also pass environment variables to the container using the -e flag.

docker run -p 3000:3000 -e DATABASE_URL="postgre_url_here" image_name

Docker Cheatsheet

1. Start a docker container.

-d : Run container in background (detached mode)
-p : Publish a container’s port to the host. Here it will map port 27020 of the host to port 27017 of the container.

We are using the latest version of mongo db. There are many images available on docker hub.

docker run -d -p 27020:27017 mongo

2. List all docker images in your machine.

docker images

3. List all running containers on your machine.

docker ps

4. Build a docker image from a Dockerfile.

docker build

5. Push your docker image to a docker registry.

docker push

6. Kill a running container.

docker kill container_id

7. Remove a docker image.

docker rmi image_id

8. Remove a docker container.

docker rm container_id

9. Execute a command in a running container.

docker exec -it container_name_or_id /bin/bash

Understanding Layers in Docker

In Docker, layers are a fundamental part of the image architecture that allows Docker to be efficient, fast, and portable. A Docker image is essentially built up from a series of layers, each representing a set of differences from the previous layer.

How layers are made -

Base Layer: The starting point of an image, typically an operating system (OS) like Ubuntu, Alpine, or any other base image specified in a Dockerfile.
Instruction Layers: Each command in a Dockerfile creates a new layer in the image. These include instructions like RUN, COPY, which modify the filesystem by installing packages, copying files from the host to the container, or making other changes. Each of these modifications creates a new layer on top of the base layer.
Reusable & Shareable: Layers are cached and reusable across different images, which makes building and sharing images more efficient. If multiple images are built from the same base image or share common instructions, they can reuse the same layers, reducing storage space and speeding up image downloads and builds.
Immutable: Once a layer is created, it cannot be changed. If a change is made, Docker creates a new layer that captures the difference. This immutability is key to Docker’s reliability and performance, as unchanged layers can be shared across images and containers.

Let’s see an example with a simple node js app. This is how our Dockerfile looks like.

# --- layer 1 ---
FROM node:18-alpine

# --- layer 2 ---
WORKDIR /app

# --- layer 3 ---
COPY . .

# --- layer 4 ---
RUN npm install
# --- layer 5 ---
RUN npm run build
# --- layer 6 ---
RUN npx prisma generate

EXPOSE  3000

CMD ["node", "dist/index.js"]

To build this image, we run the following command:

docker build -t simple_nodejs .

The terminal output will show the layers created for this image.

As we can see

The base layer is the first layer created, which is the node:18-alpine image.
Then each RUN, WORKDIR, COPY command creates a new layer.
EXPOSE and CMD does not create a new layer in the image because they don’t modify the filesystem or the image contents.
Since layers can get re-used accross docker builds you see CACHED in 1/6.

Why use Layers ?

If you change your Dockerfile, layers can get re-used based on where the change was made.

We made changes to our source code or modified the package.json file (added a dependency):

# --- layer 1 - Cached ---
FROM node:18-alpine

# --- layer 2 - Cached ---
WORKDIR /app

# --- layer 3 - Not Cached ---
COPY . .

# --- layer 4 - Not Cached ---
RUN npm install
# --- layer 5 - Not Cached ---
RUN npm run build
# --- layer 6 - Not Cached ---
RUN npx prisma generate

EXPOSE  3000

CMD ["node", "dist/index.js"]

Note: 1/6 and 2/6 layers are cached because they are not changed.
(Even though it doesn’t say it in the terminal, 1/6 is still cached)

Optimising the Dockerfile

How often do you think your dependencies change?
How often does the npm install layer need to change?
Wouldn’t it be nice if we could cache the npm install step considering dependencies don’t change very often?

We can take advantage of the fact that layers are cached and optimise our Dockerfile.

Let’s change our Dockerfile to the following:

FROM node:18-alpine                

WORKDIR /app                        

COPY package* .
COPY ./prisma .
RUN npm install                     

COPY . .                            
RUN npm run build                   
RUN npx prisma generate             

EXPOSE  3000

CMD ["node", "dist/index.js"]

Old File

FROM node:18-alpine                 

WORKDIR /app                        

COPY . .                            

RUN npm install                     
RUN npm run build                   
RUN npx prisma generate             

EXPOSE  3000

CMD ["node", "dist/index.js"]

We first copy over only the things that npm install and npx prisma generate needs.
Then we run these scripts
Then we copy over the rest of the source code

Then we can have two cases:

Case 1 - You change your source code (but nothing in package.json/prisma)

# --- layer 1 - Cached ---
FROM node:18-alpine

# --- layer 2 - Cached ---
WORKDIR /app

# --- layer 3 - Cached ---
COPY package* .
# --- layer 4 - Cached ---
COPY ./prisma .
    
# --- layer 5 - Cached ---
RUN npm run build
# --- layer 6 - Cached ---
RUN npx prisma generate
# --- layer 7 - Not Cached ---
COPY . .

EXPOSE  3000

CMD ["node", "dist/index.js"]

Case 2 - You change the package.json file (added a dependency)

# --- layer 1 - Cached ---
FROM node:18-alpine

# --- layer 2 - Cached ---
WORKDIR /app

# --- layer 3 - Not Cached ---
COPY package* .
# --- layer 4 - Not Cached ---
COPY ./prisma .
    
# --- layer 5 - Not Cached ---
RUN npm run build
# --- layer 6 - Not Cached ---
RUN npx prisma generate
# --- layer 7 - Not Cached ---
COPY . .

EXPOSE  3000

CMD ["node", "dist/index.js"]

Introduction To Docker