Jun 2020 / FGCI Summer Kickstart
News
Before the workshop:
See the prerequisites below.
Request a HPC account (see university-specific instructions in prerequisites).
Verify you can connect to your cluster (but if you can’t, the last thing on Monday will a help session to get it working).
Check back here for other updates that don’t get their own email. Minor announcements and past communication are also available at this issue. If you are from Aalto and have technical issues, please post on the issue tracker.
This workshop may be streamed. This is not decided yet, but if it is, it will be at https://twitch.tv/coderefinery.
Part of the Scientific Computing in Practice lecture series at Aalto University.
Audience: All FGCI consortium members looking for the HPC crash course.
About the course:
Summer Kickstart is a three day courses for researchers to get started with the available computational resources at FGCI (Finnish Grid and Cloud Infrastructure, basically HPC, high-performance computing, at universities) and CSC (the Finnish national computing center). On the day one we start with the basic HPC intro, go through the available resources at CSC and then switch to the FGCI sites practicalities. The days two and three we cover one by one steps on how to get started on the local computational clusters: learning by doing with lots of examples and hands-on. In addition, on the last day we will have HTCondor introduction for all interested.
By the end of the course you get the hints, ready solutions and copy/paste examples on how to find, run and monitor your applications, and manage your data. In addition to how to optimize your workflow in terms of filesystem traffic, memory usage etc.
The first FGCI-wide kickstart for all FGCI consortium members, meaning we will try to adapt our material to serve all universities. We’ll have support representatives from several universities. Most of material will be common for all the participants and in addition we organize breaking rooms for different sites (= sort of parallel sessions) when needed. Material is based on previous years Aalto courses.
University specific information:
Aalto: this course is obligatory for all new Triton users and recommended to all interested in scientific computing in general. Basic reference information is at the Triton page
Tampere: this course is recommended for all new Narvi users and also all interested in HPC. Most things should work with simply replacing triton -> narvi. Some differences in configuration are listed in Narvi differences
Practical information
Time, date: Mon 8.6, Tue 9.6, Wed 10.6, 12:00-16:00 EEST
Place: Online: Zoom link will be sent to registered participants.
Lecturering by: Aalto Science IT and CSC people
Registration: registration link. Please register to get the Zoom link and updates in general.
Cost: Free of charge for FGCI consortium members including Aalto employees and students.
Additional course info at: Ivan Degtyarenko, ivan.degtyarenko -at- aalto.fi
Schedule
Schedule: The daily schedule will be adjusted based on the audience; below is the tentative plan. There will be frequent breaks. You will be given time to try and ask, it’s more like an informal help session to get you started with the computing resources.
Day #1 (Mon 8.jun):
11:50-12:00: Joining time and pre-discussion, please join 10 minutes early.
Module #1.1 (15m): Welcome, course details
Module #1.2 (1h): HPC crash course: what is behind the front-end // lecture // HPC fundamentals: terminology, architectures, interconnects, infrastructure behind, as well as MPI vs shared memory // Ivan Degtyarenko // Slides (.pdf)
Module #1.3 (1h): CSC resources overview // lecture with demos // An overview of CSC computing environment and services including Puhti supercomputer, Allas data management solution, Cloud services, notebooks, containers, etc // Jussi Enkovaara and Henrik Nortamo // Slides (.pdf)
Module #1.4 (1h) Gallery of computing workflows // There are more options that just Triton by ssh, like we will learn later. We’ll give an overview of all the ways you can work. // Enrico Glerean
Aalto: Remote workflows at Aalto
Helsinki: Remote access to university resources
Tampere: Tampere workflows
Module #1.5 (.5h): Connecting to the cluster // tutorial // Get connected in preparation for day 2 // Enrico Glerean
Aalto: Connecting to Triton tutorial – if you can ssh to Triton and run
hostname
, you are ready for tomorrow.Helsinki: general information
Tampere: Connecting to Narvi
Day #2 (Tue 9.jun):
Module #2.1 (4h): Getting started on the cluster // tutorial // SLURM basics, software, and storage. Workflow, running and monitoring serial jobs on Triton. Interactively and in batch mode. module and toolchains, special resources like GPU // Richard Darst
-
Every site will have its own ways of connecting. The basic lessons of
ssh
is the same for everyone, but it will have a different hostname and possibly different initial steps (jump hosts).Aalto: (same)
Helsinki: general information
Tampere: Connecting to Narvi. Note, that you will need SSH keys.
-
Each site will be quite different here, so don’t worry about making the exercises work outside of Aalto, but think and prepare for what comes next (where we’ll explain the differences).
-
In other sites, you should
module load fgci-common
to be able to make the Aalto modules available. Other specifics, such asmatlab
, won’t directly work.
-
Aalto: (same)
Helsinki: general information
Tampere: Narvi storage
This topic is very site-specific. The general principles will apply everywhere, but the exact paths/servers will vary.
-
The basic Slurm concepts are the same across all clusters (at least all those that use Slurm, but that is everyone in Finland). However, partition names may be different. You can list partitions at your site using
sinfo -O partition
and list nodes at your site withsinfo -N
. How these work will vary depending on your site - definitely read up on this.
-
Day #3 (Wed 10.jun):
Module #3.1 (2h): Advanced SLURM and cluster usage // tutorial // Running in parallel with MPI and OpenMP, array jobs, running on GPU with
--gres
, local drives, constraints // Simo Tuomisto-
Aalto: (same as above)
Helsinki: general information
Tampere: Narvi GPU computing differences
At other sites, you may need to use
-p gpu
in addition to--gres=gpu
.
Module #3.2 (1.5h): HTCondor (at Aalto) // lecture with demos // Did you know that department workstations can be used for distributed computing? HTCondor lets you // Matthew West
Python Bindings
Quickstart: Submitting and Managing Jobs
Making a workflow: Diamond DAG
Prerequisites
Participants will be provided with either access to their university’s cluster or Triton for running examples.
You should have an account on your university’s HPC cluster:
Aalto: if you do not yet have access to Triton, request an account in advance.
Helsinki: Account notes at the bottom of this page
Tampere: your cluster will require ssh keys to connect.
Others: Aalto will provide you with a guest Triton account, check back for more information.
Participants are expected to have a SSH client installed (for options, see the Triton connecting tutorial for examples).
You should install Zoom. Hints on installation.
If you aren’t familiar with the Linux shell, read the crash course or watch the video.
Try to get connected to your cluster in advance. We have some time scheduled for this, but you need to also try in advance, or else we can’t keep up.
Aalto: connecting to Triton
Helsinki: general information
Tampere: Connecting to Narvi
Other preparation
How to attend this course:
Take this seriously. There is a lot of material and hands-on exercises. Don’t overbook your time, don’t skip hands-on parts, and come prepared.
You will be given a Zoom link to join. Join each session 10 minutes early.
Join with a name of “(University) First Last”, e.g. “(Aalto) Richard Darst”. This will help us to put people into university-specific breakout rooms.
There will be a <HackMD.io> document sent to all participants. This is for communication an asking questions.
Always write new questions or comments at the bottom of the document.
Moderators will follow the developments, and answer questions and comments. You may get several answers from different perspectives, even. Our focus is the bottom, but we will scan the whole document and keep it organized.
The final document (excluding personal data and questions about individual circumstances) will be published as the notes at the end.
Streaming
This workshop may be streamed, so that anyone can follow along. We are still deciding if we will do this, but if we do it will be at the CodeRefinery Twitch stream, https://www.twitch.tv/coderefinery.