Skip to main content

Launching Jobs

CloudRift command interface is inspired by Docker. Most of the time running a Docker command using rift command line tool will work. You can start containers, list containers on the machine, inspect logs, kill them, etc.

However, CloudRift tool manages the entire cluster, and thus you can run multiple jobs at once or inspect multiple executors. There are also additional commands designed specifically for cluster management.

Listing Executors in the Cluster

Before launching any jobs it is good to inspect the cluster, i.e. see summary of all the executors that we have. You can do it with CloudRift by invoking the following command:

rift cluster info
What is an Executor?

You're adding your machine to the cluster when you're running rift-desktop or service on your machine. However, big servers might have more CPUs, GPUs, and other resources necessary to run user jobs. Thus, we allow them to be split into multiple executors. An executor is an isolated job runner with limited access to node HW.

Running a Container

To run a task on a container rift docker run command is used (similar to docker run). The difference with Docker is that we can additionally specify the executor on which we want to run the job via -e <executor_name> argument. By default, rift will run the job on an arbitrary executor.

Specifically, rift docker run command does the following:

  1. Starts a container on the specified executor.
  2. Starts a process in the container.
  3. Attaches to stdin, stdout, stderr of the process. The content of stdout and stderr will be continuously sent to the server such that you can interactively see process output, stdin will be continuously polled and can be used to send input to the process.

For example, try the following command

rift docker run python:slim -- python -c "print('Hello CloudRift')"

This command creates a container using public python:slim image containing python interpreter and starts a python task that prints Hello CloudRift to the console. By default stdin, stdout and stderr streams are attached, and thus you can see the output of the command immediately.

Running Container in Background

If we don't want to block until task is finished, supply -d argument. Also, in this case instead of printing container output to the console rift will print container ID. Thus, it is also convenient to use -d flag when you want to memorize the container ID for some future commands.

CONTAINER_ID=`rift docker run -d alpine sleep 10`

Running on a Specific Executor

To run the job on a specific executor use -e <executor_name> option. The technique described below can be used to run several jobs on the same executor.

First, let's get the name of the executor. For this we can leverage CLOUDRIFT_EXECUTOR_NAME environment variable.

EXECUTOR_NAME=`rift docker run alpine printenv CLOUDRIFT_EXECUTOR_NAME`

Now, let's ensure that we're running on the same executor.

rift docker -e $EXECUTOR_NAME run alpine -- printenv CLOUDRIFT_EXECUTOR_NAME

Launching Several Jobs at Once

To launch jobs on all executors at once use -e all option. Running a command on all executors is useful for inspecting the cluster.

For example, let's print system information of all executors we have in the cluster:

rift docker -e all run alpine -- uname -a

However, the aforementioned command is not very useful because it is hard to tell which executor is which. To fix that we can add printenv CLOUDRIFT_EXECUTOR_NAME before printing system information. Let's use it to print system information of the executors in the cluster alongside the executor id:

rift docker -e all run alpine -- sh -c 'printf "$CLOUDRIFT_EXECUTOR_NAME:\n  "; uname -a'

Now we have see system information for each executor in the cluster.

Launching a Container with GPU Support

The biggest value of CloudRift is that it allows you to leverage the powerful GPU that you have in your computer.

To check that GPU is available on the executor run the following command supplying -e <EXECUTOR_NAME> argument to run the command on a specific executor if necessary:

rift docker run -- ubuntu nvidia-smi -L

Supplying Files

Volumes are the preferred mechanism for persisting data generated by and used by Docker containers. However, since CloudRift is executing the docker command on a remote machine, data needs to be copied over first. CloudRift does that automatically behind the scenes, so you can use it as you would normally use Docker.

echo "Hello Volumes" > /tmp/message.txt
rift docker run --volume "/tmp/message.txt:/app/message.txt" alpine cat /app/message.txt
info

Note that CloudRift simply copies the files over and is not actually mounting the local volume onto remote machine over the network.

Exposing Ports

Another commonly used feature is port mapping. For example, if you're developing a web service you might need that. To expose port supply -p <port> or -p <host_port>:<publish_port> argument. Here is an example of serving a minimal web server.

# memorize id of the executor
EXECUTOR_NAME=`rift docker run alpine printenv CLOUDRIFT_EXECUTOR_NAME`
echo $EXECUTOR_NAME

# run trivial web server
echo "Trivial Web Server" > /tmp/index.html
CONTAINER_ID=`rift docker run -d -p 8080:8080 \
--volume "/tmp/index.html:/www/index.html" \
busybox -- httpd -f -h /www -p 8080`
echo $CONTAINER_ID

# get IP of the executor where server is running
EXECUTOR_IP=`rift docker -e $EXECUTOR_NAME run curlimages/curl -- curl -sS icanhazip.com`
echo $EXECUTOR_IP

# check that the server is working
curl http://$EXECUTOR_IP:8080/

# kill the server
rift docker kill $CONTAINER_ID
info

This tutorial example is trying to access the executor using its public IP. If the executor is not accessible from the internet, the example won't work.

Listing Containers on the Executor

To list all running containers on the executor use the following command:

rift docker ps

This will print all container on all the executors because by default CloudRift adds -e all to rift docker ps command.

Use -a flag to list all containers, including the ones that have already been stopped. Note that CloudRift periodically cleans up containers on the executor.

Retrieving Container Logs

To retrieve container logs use rift docker -e <executor_id> logs <container_id>. Here is an example of how to start a job and retrieve logs from it afterward.

EXECUTOR_NAME=`rift docker run alpine printenv CLOUDRIFT_EXECUTOR_NAME`
CONTAINER_ID=`rift docker -e $EXECUTOR_NAME run -d alpine echo "Hello World"`
rift docker -e $EXECUTOR_NAME logs $CONTAINER_ID

You can retrieve logs even if the task has finished. However, task containers and all the associated information is removed after a few minutes.

Stopping the Container

To stop (kill) the container on the executor use rift docker -e <executor_id> kill <container_id>. Here is an example of starting some long-running command and terminating it.

EXECUTOR_NAME=`rift docker run alpine printenv CLOUDRIFT_EXECUTOR_NAME`
CONTAINER_ID=`rift -e $EXECUTOR_NAME docker run -d alpine sleep 30`
rift docker -e $EXECUTOR_NAME kill $CONTAINER_ID

End-to-end Example

Here is a more complete example demonstrating aforementioned commands in action. We're going to start some long-running job, inspect its output and finally terminate it.

# check the cluster
rift cluster info

# get id of some executor in the cluster
EXECUTOR_NAME=`rift docker run alpine printenv CLOUDRIFT_EXECUTOR_NAME`

# run a task that will be printing countdown to console for two minutes, run in background
CONTAINER_ID=`rift docker -e $EXECUTOR_NAME run -d python:slim --\
python -uc "import time; [(print(f'{120-i} seconds left'), time.sleep(1)) for i in range(120)]"`

# inspect running tasks
rift docker -e $EXECUTOR_NAME ps

# check logs of the task we've started
rift docker -e $EXECUTOR_NAME logs $CONTAINER_ID

# stop the task
rift docker -e $EXECUTOR_NAME kill $CONTAINER_ID

# ensure that no tasks are running
rift docker -e $EXECUTOR_NAME ps