19 October 2018
General considerations for open and reproducible research:
Create for each project an own, dedicated (private) GitHub repository
Use a standardised folder and documentation structure
Prepare for each project/task an own docker container
Place raw data in public repositories like figshare, maybe with embargo
Everything is collaborative even if it's just between you and yourself in the future!
. +-- Code/ | +-- Bash/ | +-- Docker/ | +-- R/ +-- Data/ +-- Notes/ +-- Results/ | +-- Figures/ | +-- Tables/ | +-- Objects/
Some really nice description.txt
There is often some confusion about the difference between virtual machines (Virtual Box) and containerization (docker)
Both methods have the same aim:
Isolate an application and its dependencies into a self-contained unit that can run anywhere
Containers share the host system's kernel with other containers whereas VM's rely on a hypervisor.
A hypervisor is usually a piece of software that VMs run on top of
It runs on a physical computer, referred to as the host machine
The host machine provides the VMs with resources (like RAM and CPU)
These resources are divided between VMs and can be distributed as needed.
The hypervisor provides the VMs with a platform to manage and execute its guest OS.
The guest machine contains the application and whatever it needs to run that application.
It also carries an entire virtualized hardware stack of its own (virtualized network adapters, storage, and CPU)
From the inside, the guest machine behaves as its own unit with its own dedicated resources.
From the outside, we know that VM share resources provided by the host machine.
A container provides operating-system-level virtualization
Each container gets its own isolated user-space to allow multiple containers to run
In principle, containers look like a VM
BUT: Containers package up just the user-space, and not the kernel or virtual hardware like a VM does.
Only libaries and binaries are needed, hence container are so lightweight
Docker is an open-source project based on Linux containers
Containerisation is not a new concept, e.g. Google has its own container architecture
Other popular containerisation solutions are e.g. LXC, FreeBSD jails, AIX Workload Partitions and Solaris Containers.
However, Docker is easy, fast, community-driven and very modular
Docker images are build of reusable layers
The docker engine contains the docker client and daemon
Example of a Dockerfile (to create an Image)
FROM ubuntu:16.04 MAINTAINER Daniel Fischer <firstname.lastname@example.org> RUN apt-get update && apt-get install -y \ curl \ unzip \ wget \ && rm -rf /var/lib/apt/lists/* RUN wget -qO- https://github.com/alexdobin/STAR/archive/2.5.2b.tar.gz | \ tar -xz && mv /STAR-2.5.2b/ /bin/STAR-2.5.2b/ ENV PATH $PATH:/bin/STAR-2.5.2b/bin/Linux_x86_64_static/
This creates a docker images and pushes it to Docker hub (account required)
docker build -t fischuu/star:2.5.2b . docker push fischuu/star:2.5.2b docker images
REPOSITORY TAG IMAGE ID CREATED SIZE ubuntu 16.04 14f60031763d 15 months ago 120 MB fischuu/star 2.5.2b 31d682b42362 10 months ago 321 MB
Just run a basic Hello-World docker example
fischuu@Orome ~ $ docker run hello-world
Unable to find image 'hello-world:latest' locally latest: Pulling from library/hello-world d1725b59e92d: Pull complete Digest: sha256:0add3ace90ecb4adbf7777e9aacf18357296e799f81cabc9fde470971 Status: Downloaded newer image for hello-world:latest Hello from Docker! This message shows that your installation appears to be working correctly.
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED 3813c13a6451 hello-world "/hello" 14 minutes ago eafbdfe3ff79 fischuu/star:2.5.2b "/bin/bash" 4 seconds ago STATUS PORTS NAMES Exited (0) 14 minutes ago distracted_hugle Up 3 seconds STAR
This is a more complex call to start a docker container as background daemon
docker run -dit -v /path/on/host/:/mp1/ \ -v /other/path/:/mp2/ \ --name STAR fischuu/star:2.5.2b
Send commands to this container
docker exec STAR /bin/sh -c "star createIndex myFasta.fa"
Stop and remove the container
docker stop STAR ; docker rm STAR
Docker can be applied to a wide field of applications
Link to an overview of public data repositories:
Slides are based on the following materials: