So, you wanna play around a bit with machine learning. Or run a crazy particle physics simulation. But you only have a lame Macbook. Your gaming rig would do the job, but its fans are spinning so loud your neighbors complained. Solution? Just use ETHs Euler general purpose super computer! Even though they explicitely write that Euler is “not a supercomputer”, supercomputer just sounds cooler than “general purpose” computer.
So called shareholders (lab groups and ETH departments) invested money to own a reserved percentage of Eulers computing power. However, there is also a slice reserved for us students. The best thing about it? It works without need for any bureaucracy. Just log in and start.
We will run an example from the scikit-learn website for demonstration. There will be some some small changes, because we have no GUI. But first, let’s log in. Important: you can only log in to Euler from within the ETH network or when connected via VPN.
ssh <your nethz-name>@euler.ethz.ch
You will be greeted with a disclaimer you have to accept by typing
Yes the first time. Then, we can start by loading the python module.
module load python/2.7 # get sample script from scikit-learn wget http://scikit-learn.org/stable/_downloads/plot_image_denoising.py
Now you’ll have to modify the script to make it run without X11. For this, we have to tell matplotlib to write to disk instead trying to display the images directly. Modify the downloaded file at the top and the bottom to look like this:
# at the top of the file, # after the long introductionary comment print(__doc__) from time import time # those two lines must be inserted here import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt import numpy as np ... # bottom # replaced plt.show() by the following line plt.savefig('plot.png')
When you submit a job for the batch processing system, it will inherit the current environment. As we already loaded the python module, we are now ready to go:
# submit job bsub "python plot_image_denoising.py" > Generic job. > Job <9613719> is submitted to queue <normal.4h>. # you can now check the status of your job via bjobs bjobs > JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME > 9613719 muejo RUN normal.4h euler01 e1096 *oising.py Aug 25 00:34
After some time, in the directory a file called
lsf.o<jobid> will appear, besides the
plot.png our script generated. That’s it, we’re done!
- Your job will be killed after 4 hours. You can use an option
bsublet it run longer, but it will wait longer in the queue.
- Same is valid for CPU cores (
-n 2uses 2 cores instead of 1) and memory (
-R "rusage[mem=2048]"uses 2GB per core). You can use up to 48 cores at the same time (check using
MAX). If you submit several jobs, requiring more than 48 jobs in total, they will be run sequentially. If you submit a single job requiring more than 48 cores, it will probably be stuck in the queue forever.
Further information (only accessible from within ETH network):
If you need some special packages
You can install libraries via PIP locally in your home dir.
mkdir -p $HOME/python/lib64/python2.7/site-packages export PYTHONPATH=$HOME/python/lib64/python2.7/site-packages:$PYTHONPATH module load python/2.7 # now, install e.g. theano python -m pip install --install-option="--prefix=$HOME/python" theano
Some packages require Euler module dependencies. For example, if you wanna use h5py, you have to load the
hdf5 module before loading the python module:
module load hdf5 module load python/2.7.2 python >>> import h5py # works!
I am a student who does not know anything about scientific computing, let alone ETHs HPC infrasctructure. This blog post just tries to provide some guidance for students. I do not provide any support. Also, I am not responsible if the admins get angry at you because you ran your stuff on a login node.
For D-ITET students
There is an additional resource for D-ITET students. You can run jobs on the idling tardis machines (in the ETZ computer rooms). If you ever wondered, that is the reason for the “do not turn off” labels. The batch system is very similar to the one on Euler. Find more infos in the D-ITET Computing Wiki.