Docker containers simplify the development and deployment of software through isolation. Containers are especially useful when the targeted software has a complicated installation procedure. This is the case for SciDB. SciDB needs to be compiled from source and requires a multitude of libraries and development tools to be installed. Moreover, SciDB is pretty specific on the type of operating system it supports. In this post, we look into how to create a Docker image for SciDB. The image can be used to launch Docker containers. The containers run isolated from the host operating system and on a multitude of operating systems, including Windows. We assume the reader has some familiarity with Docker and focus on SciDB particularities.

Note: The Docker image described in this post, is for SciDB 15.12 and for a single node installation.

Dockerfiles

The easiest way to build a Docker image is to start from an existing Docker image, instantiate a container with it, make changes to the container, and commit the updated container as a new image. Although easy, this method is not very common because it is not very portable or reproducible.

The preferred way of building Docker images is to create a Dockerfile. Dockerfiles are sequences of instructions that can be executed by the Docker builder to create a new image starting from a pre-existing image. In this post, we create a Dockerfile to build our SciDB Docker image.

Getting Started

To build a Docker image for SciDB, we follow very closely the official SciDB Community Edition Installation Guide. Most of the steps present in the installation guide are reflected in our SciDB Dockerfile. The first section in this guide is the Requirements section. From the recommended operating systems, we chose the Ubuntu Linux distribution, version 14.04. We use the official Docker image for Ubuntu available on Docker Hub. To start off, our Dockerfile looks like this:

## Requirements
## ---
FROM ubuntu:14.04
RUN apt-get update
RUN apt-get install -y wget apt-transport-https software-properties-common

Lines starting with # denote comments and are ignored. The FROM statement (see Docker documentation) indicates the base image for Dockerfile, in this case, ubuntu:14.04. Next, we fetch the Ubuntu package index and install a few packages using the RUN statement (see Docker documentation). These packages are not mentioned in the Installation Guide but are needed for a successful installation.

To build a Docker image we need to place the lines above into a file called Dockerfile and run the docker build command (see Docker documentation):

$ cat > Dockerfile
FROM ubuntu:14.04
RUN apt-get update
RUN apt-get install -y wget apt-transport-https software-properties-common
^D
$ docker build --tag scidb .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu:14.04
 ---> 38c759202e30
 Step 2 : RUN apt-get update
 ---> Running in f4fb4aa2958e
...
 ---> ad6c2dea1b62
 Removing intermediate container f4fb4aa2958e
Step 3 : RUN apt-get install -y wget apt-transport-https software-properties-common
 ---> Running in 3822490df93f
...
 ---> e521dbf5055a
Removing intermediate container 3822490df93f
Successfully built e521dbf5055a

Notice how the build process has three steps, one for each of the statements in the Dockerfile. After each step, Docker generates and stores an intermediary image. In some steps (e.g., step 2 and step 3 above) Docker creates and uses an intermediary container which is later removed. As we update the Dockerfile, we can re-run the build process to see the effects of our changes. Running the build process multiple times does not lead to re-execution of a step if that step and any of the steps before it have not changed. This is a benefit of the Docker caching mechanism.

Next, we define a few environment variables using the ARG statement (see Docker documentation), and create a scidb user, as instructed in the Installation Notes:

## Installation Notes
## ---
ARG host_ip=127.0.0.1
ARG net_mask=$host_ip/8
ARG scidb_usr=scidb
ARG dev_dir=/usr/src

RUN groupadd $scidb_usr
RUN useradd $scidb_usr -s /bin/bash -m -g $scidb_usr

Pre-Installation Tasks

Now we address the Pre-Installation Tasks required for building and installing SciDB. First, we download and extract the SciDB source code:

## Download SciDB Community Edition
## ---
WORKDIR $dev_dir
ARG scidb_url="https://docs.google.com/uc?id=0B7yt0n33Us0raWtCYmNlZWRxWG8&export=download"
RUN wget --no-verbose --output-document scidb-15.12.1.4cadab5.tar.gz \
        --load-cookies cookies.txt \
        "$scidb_url&`wget --no-verbose --output-document - \
            --save-cookies cookies.txt "$scidb_url" | \
            grep --only-matching 'confirm=[^&]*'`"
RUN tar -xzf scidb-15.12.1.4cadab5.tar.gz
RUN mv scidb-15.12.1.4cadab5 scidbtrunk
WORKDIR $dev_dir/scidbtrunk
## Installing Expect, and SSH Packages
## --
RUN apt-get install -y expect openssh-server openssh-client

The official SciDB source code location is on Google Drive. In order to download a file from Google Drive we have to make two requests. The first request is to obtain some cookies and a confirmation code which are used in the second request. The WORKDIR statements (see Docker documentation) are used to set the current directory, initially /usr/src and later /usr/src/scidbtrunk. We also install a few more packages as instructed by in the installation guide in the Installing Expect, and SSH Packages section.

Password-less SSH

We setup and test password-less SSH as instructed in the Providing Passwordless SSH section:

## Providing Passwordless SSH
## ---
RUN ssh-keygen -f /root/.ssh/id_rsa -N ''
RUN chmod 755 /root
RUN chmod 755 /root/.ssh

RUN mkdir /home/$scidb_usr/.ssh
RUN ssh-keygen -f /home/$scidb_usr/.ssh/id_rsa -N ''
RUN chmod 755 /home/$scidb_usr
RUN chmod 755 /home/$scidb_usr/.ssh

## Avoid setting password and providing it to "deploy.sh access"
RUN cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
RUN cat /root/.ssh/id_rsa.pub >> /home/$scidb_usr/.ssh/authorized_keys

## Set correct ownership
RUN chown -R $scidb_usr:$scidb_usr /home/$scidb_usr

RUN service ssh start && \
    ./deployment/deploy.sh access root NA "" $host_ip && \
    ./deployment/deploy.sh access $scidb_usr NA "" $host_ip && \
    ssh $host_ip date

This set of steps is a bit more convoluted. Let’s go over it one at a time. We first generate SSH keys for both the root and the scidb accounts. Next, we authorize the public key of the root account on both the root and the scidb accounts. This would allow us to run the following deploy.sh script without needing to provide the account passwords for the root and scidb accounts. In fact, part of the what the deploy.sh script does it to authorize these keys.

Next, the installation guide instructs us to start the SSH server. Since Docker uses containers to build images, starting a server in one container has no effect on subsequent containers. Any running servers are killed when the container is saved as an image. Instead, we are starting any required servers in the exact container where they are needed. We do this using a RUN statement with multiple commands. We first start the SSH server and then run the deploy.sh scripts.

Build Tools and PostgreSQL

Next, we install the SciDB build tools using the instructions in the Installing Build Tools section of the guide:

## Installing Build Tools
## ---
RUN service ssh start && \
    ./deployment/deploy.sh prepare_toolchain $host_ip

The installation is done by the deploy.sh script using a remote shell. So, in order for the script to work, we need to start the SSH server again in the same container where the script runs.

The final step in the pre-installation section is to install and configure the PostgreSQL database software. SciDB uses PostgreSQL to store its catalog. We use the deploy.sh script as instructed in the Installing Postgres section:

## Installing Postgres
## ---
RUN service ssh start && \
    ./deployment/deploy.sh prepare_postgresql postgres postgres $net_mask $host_ip

## Providing the postgres user Access to SciDB Code
RUN usermod -G $scidb_usr -a postgres
RUN chmod g+rx $dev_dir
RUN /usr/bin/sudo -u postgres ls $dev_dir

We also make sure that the postgres user belongs to the same group as the scidb user and has access to the SciDB installation location.

Building and Installing SciDB

We are ready to build and install SciDB as advised in the Installing SciDB Community Edition section of the SciDB installation guide. First, we configure the environment as in the Configuring Environment Variables section:

## Configuring Environment Variables
## ---
ENV SCIDB_VER=15.12
ENV SCIDB_INSTALL_PATH=$dev_dir/scidbtrunk/stage/install
ENV SCIDB_BUILD_TYPE=Debug
ENV PATH=$SCIDB_INSTALL_PATH/bin:$PATH

RUN echo "\
export SCIDB_VER=$SCIDB_VER\n\
export SCIDB_INSTALL_PATH=$SCIDB_INSTALL_PATH\n\
export SCIDB_BUILD_TYPE=$SCIDB_BUILD_TYPE\n\
export PATH=$PATH\n" | tee /root/.bashrc > /home/$scidb_usr/.bashrc

### Activating and Verifying the New .bashrc File
RUN echo $SCIDB_VER
RUN echo $SCIDB_INSTALL_PATH
RUN echo $PATH

Note that we set the environment variables both for the Docker build process, using the ENV statement (see Docker documentation) and for the login shell, using export and .bashrc.

In order to build SciDB, we use the run.py script, as described in the Building SciDB CE section. Building requires a setup step and a make step as follows:

## Building SciDB CE
## ---
RUN ./run.py setup --force
RUN ./run.py make -j4

The make step might take somewhere between 30min-1h.

To install SciDB, we again use the run.py script, but we need to start both the SSH and the PostgreSQL servers before running the script:

## Installing SciDB CE
## ---
RUN service ssh start && \
    service postgresql start && \
    echo "\n\ny" | ./run.py install --force

The install step is intended to be run interactively and prompts the user to answer a few questions. Since Docker does not support interactive build steps, we provide input for the install step using echo.

Starting and Stopping SciDB

The image we built so far has everything needed to use SciDB. To make our image more user-friendly we add a script to be executed when a container is instantiated. The script follows the Starting and Stopping SciDB instructions from the installation guide:

RUN echo "#!/bin/bash\n\
service ssh start\n\
service postgresql start\n\
scidb.py startall mydb\n\
trap \"scidb.py stopall mydb; service postgresql stop\" EXIT HUP INT QUIT TERM\n\
bash" > /docker-entrypoint.sh
RUN chmod +x /docker-entrypoint.sh

## Starting SciDB
## ---
ENTRYPOINT ["/docker-entrypoint.sh"]

The script is saved in the docker-entrypoint.sh file and it is created using echo. Normally we would have the script as a separate file and add it to the image. We chose the echo method in order to have everything in a single file. In the script, before starting SciDB, we first start the SSH and PostgreSQL servers. The script uses a trap to catch various exit signals (i.e., when the container is stopped) and stops SciDB and PostgreSQL before exiting. As the last step, the script starts a Bash shell for user’s convenience. Finally, in our image, we set the script as the container entry point using the ENTRYPOINT statement (see Docker documentation).

Using the SciDB Image

Once we have all the steps described above in a Dockerfile, we use docker build to build the final SciDB image:

$ cat > Dockerfile
## Requirements
## ---
FROM ubuntu:14.04
RUN apt-get update
...
## Starting SciDB
## ---
ENTRYPOINT ["/docker-entrypoint.sh"]
^D
$ docker build --tag scidb .
Sending build context to Docker daemon 10.75 kB
Step 1 : FROM ubuntu:14.04
 ---> 38c759202e30
...
Step 45 : ENTRYPOINT /docker-entrypoint.sh
...
 ---> 00ad89598441
Successfully built 00ad89598441

To use it, we start a Docker container with it:

$ docker run --tty --interactive scidb
 * Starting OpenBSD Secure Shell server sshd                             [ OK ]
 * Starting PostgreSQL 9.3 database server                               [ OK ]
scidb.py: INFO: Found 0 scidb processes
scidb.py: INFO: start((server 0 (127.0.0.1) local instance 0))
scidb.py: INFO: Starting SciDB server.
scidb.py: INFO: start((server 0 (127.0.0.1) local instance 1))
scidb.py: INFO: Starting SciDB server.
scidb.py: INFO: start((server 0 (127.0.0.1) local instance 2))
scidb.py: INFO: Starting SciDB server.
scidb.py: INFO: start((server 0 (127.0.0.1) local instance 3))
scidb.py: INFO: Starting SciDB server.
root@71db8492009c:/usr/src/scidbtrunk# iquery --afl --query "list('libraries')"
{inst,n} name,major,minor,patch,build,build_type
{0,0} 'SciDB',15,12,1,80403125,'Debug'
{1,0} 'SciDB',15,12,1,80403125,'Debug'
{2,0} 'SciDB',15,12,1,80403125,'Debug'
{3,0} 'SciDB',15,12,1,80403125,'Debug'
root@71db8492009c:/usr/src/scidbtrunk# exit
scidb.py: INFO: stop(server 0 (127.0.0.1))
scidb.py: INFO: checking (server 0 (127.0.0.1)) 119 120 121 122...
scidb.py: INFO: Found 4 scidb processes
scidb.py: INFO: Found 0 scidb processes
 * Stopping PostgreSQL 9.3 database server

Notice how SSH, PostgreSQL and SciDB servers are started when the container starts and stopped when the container stops.

Please note that the Dockerfile described in this post is space inefficient (its size is 6GB) and does not follow the Dockerfile best practices. The image is built this way just for academic purposes. More efficient SciDB Docker images are available in the docker-library repository.

The full Dockerfile is available here.