Disclaimer: for this to work, you must be an ETH Zurich affiliated person and own a nethz-account.
So, you wanna play around a bit with machine learning. Or run a crazy particle physics simulation. But you only have a lame Macbook. Your gaming rig would do the job, but its fans are spinning so loud your neighbors complained. Solution? Just use ETHs Euler general purpose super computer! Even though they explicitely write that Euler is “not a supercomputer”, supercomputer just sounds cooler than “general purpose” computer.
So called shareholders (lab groups and ETH departments) invested money to own a reserved percentage of Eulers computing power. However, there is also a slice reserved for us students. The best thing about it? It works without need for any bureaucracy. Just log in and start.
We will run an example from the scikit-learn website for demonstration. There will be some some small changes, because we have no GUI. But first, let’s log in. Important: you can only log in to Euler from within the ETH network or when connected via VPN.
ssh <your nethz-name>@euler.ethz.ch
You will be greeted with a disclaimer you have to accept by typing
Yes the first time. Then, we can start by loading the python module.
module load python/2.7
# get sample script from scikit-learn
Now you’ll have to modify the script to make it run without X11. For this, we have to tell matplotlib to write to disk instead trying to display the images directly. Modify the downloaded file at the top and the bottom to look like this:
# at the top of the file,
# after the long introductionary comment
from time import time
# those two lines must be inserted here
import matplotlib.pyplot as plt
import numpy as np
# replaced plt.show() by the following line
When you submit a job for the batch processing system, it will inherit the current environment. As we already loaded the python module, we are now ready to go:
# submit job
bsub "python plot_image_denoising.py"
> Generic job.
> Job <9613719> is submitted to queue <normal.4h>.
# you can now check the status of your job via bjobs
> JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
> 9613719 muejo RUN normal.4h euler01 e1096 *oising.py Aug 25 00:34
After some time, in the directory a file called
lsf.o<jobid> will appear, besides the
plot.png our script generated. That’s it, we’re done!
- Your job will be killed after 4 hours. You can use an option
-W hh:mm with
bsub let it run longer, but it will wait longer in the queue.
- Same is valid for CPU cores (
-n 2 uses 2 cores instead of 1) and memory (
-R "rusage[mem=2048]" uses 2GB per core). You can use up to 48 cores at the same time (check using
MAX). If you submit several jobs, requiring more than 48 jobs in total, they will be run sequentially. If you submit a single job requiring more than 48 cores, it will probably be stuck in the queue forever.
Further information (only accessible from within ETH network):
If you need some special packages
You can install libraries via PIP locally in your home dir.
mkdir -p $HOME/python/lib64/python2.7/site-packages
module load python/2.7
# now, install e.g. theano
python -m pip install --install-option="--prefix=$HOME/python" theano
Some packages require Euler module dependencies. For example, if you wanna use h5py, you have to load the
hdf5 module before loading the python module:
module load hdf5
module load python/2.7.2
>>> import h5py
Only the not publicly-accessible Leonhard cluster has GPU nodes. So if you need that, you will have to ask Cluster Support.
I am a student who does not know anything about scientific computing, let alone ETHs HPC infrasctructure. This blog post just tries to provide some guidance for students. I do not provide any support. Also, I am not responsible if the admins get angry at you because you ran your stuff on a login node.
Find support here
For D-ITET students
There is an additional resource for D-ITET students. You can run jobs on the idling tardis machines (in the ETZ computer rooms). If you ever wondered, that is the reason for the “do not turn off” labels. The batch system is very similar to the one on Euler. Find more infos in the D-ITET Computing Wiki.
This post was updated on Jan 26 2017 to reflect the decomissioning of the old Brutus cluster and its Wiki. The new Wiki is now to be found here (Only from within ETH network!).