WSC Cluster     

links

WSC Cluster - User's Guide


Software

Using Modules to Load Software

Software is managed through the use of Lua modules. These modules alter the user environment variables (e.g. PATH, LD_LIBRARY_PATH, etc.) to point to different versions of software and manage software dependencies. It is recommended to manage software choices through the use of these modules, since modifying the path in combination with the use of modules can lead to dependency conflicts.

Documentation:

- http://lmod.readthedocs.org

- https://lmod.readthedocs.io/en/latest/


Compilers

The system has the following compilers installed: IBM XL, PGI, CLANG, and GNU.

To see all compilers and versions installed on the system, use the `module avail` command. Compilers should be loaded through the use of modules.


Changing underlying MPI compiler

MPI compilers are actually scripts that use a normal compiler plus link lines for the MPI libraries. To see what compiler is actually used when calling mpi** can be achieved by using the --showme argument, e.g.

bash$ mpicc --showme
xlc_r -I/opt/ibm/spectrum_mpi/include -pthread -L/opt/ibm/spectrum_mpi/lib -lmpiprofilesupport -lmpi_ibm

Spectrum MPI is based on OpenMPI and the underlying compiler can be changed by exporting the OpenMPI environment variable.

Env Variable Affects
OMPI_CC mpicc (C compiler)
OMPI_CXX mpicxx (C++ compiler)
OMPI_FC mpif90 (Fortran compiler)

For example, to use the gcc compiler instead of the default xlc_r compiler,

export OMPI_CC=gcc

Changing underlying nvcc compiler

The compiler used by nvcc can be changed with the -ccbin flag, for example:

nvcc -ccbin xlc_r

Running Jobs

IMPORTANT: Password-less ssh keys to compute nodes is required

To run jobs through the CSM queue from a login node:
1. Make sure to have passwordless ssh enabled

cd ~/.ssh
ssh-keygen -t rsa
cat id_rsa.pub >> authorized_keys

2. File permissions must be set correctly:

chmod 700 .ssh; chmod 640 .ssh/authorized_keys


3. Verify that PATH and other env are not pointing at /shared/lsf/.... directories

If passwordless ssh is not enabled on the compute nodes, an error from jsrun will be reported:

Error: It is only possible to use js commands within a job allocation unless CSM is running
01-30-2019 12:47:46:153 12441 main: Error initializing RM connection. Exiting.

Quick Start hello world examples

The public github repo at [WSC hello world examples](https://github.com/dappelha/summit-scripts/tree/WSC) provides examples of how to create a LSF submission script and how to use jsrun to achieve several popular MPI task layouts.


LSF - the job submission tool on WSC

LSF is the batch submission and job scheduling tool on WSC. The above quick start example scripts combine LSF and jsrun commands into a "hello world" example. Extension documentation about LSF is available here:

LSF commonly used commands:

LSF is entirely command line driven, with no supplied GUI. LSF Platform HPC normally supplies a GUI for LSF, but Platform HPC is not deployed in WSC.

  • lsid : displays the current LSF version number, the cluster name, and the master host name
  • lshosts : displays hosts and their static resource information
  • bhosts : displays hosts and their static and dynamic resources
  • lsload : displays load information for hosts
  • bjobs : displays and filters information about LSF jobs. Specify one or more job IDs (and, optionally, an array index list) to display information about specific jobs (and job arrays)
  • bqueues : displays information about queues
  • bsub : submits a job to LSF by running the specified command and its arguments
  • brsvs : displays advance reservations (see lsf admin page for info on creating reservations)
  • bmgroup : displays information about host groups and compute units
  • bkill: sends signals to kill, suspend, or resume unfinished jobs

Additional documentation on commands can be found here: LSF commands

LSF setup

Prior to being able to run lsf commands, need to source the LSF environment. There are two ways of doing this:

  • Place the following code in the .bash_profile file in the home directory.
# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi
if [ -f /etc/bash.bashrc ]; then   #variation for ubuntu
    . /etc/bash.bashrc
fi

this will pick up the global profile for LSF.

  • Do this directly in either the user profile or by hand:
source /opt/ibm/spectrumcomputing/lsf/conf/profile.lsf

Checking on a submitted job

  • bjobs will show a list of current jobs and their status.
  • To check pending reason of a specific job, bjobs -p <jobid>
  • Is the que busy/open? bqueues
  • What jobs are scheduled on the machine? bjobs -u all

Checking on a running job

  • bpeek will show the tail of standard out and standard error (jobid is optional): bpeek <jobid> | less
  • Something seems wrong, can I check into a job further? bjobs −WP <jobid> | less
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME   %COMPLETE
220933  ngawand RUN   batch      login1      batch1      d48_c      Nov 28 17:40  32.62% L 
                                             g36n12
                                             g36n12
                                             ...
  • User can ssh to a compute node (only while job is running) ssh g36n12

jsrun - the tool for launching mpi applications on WSC

ORNL has a very useful tool for visualizing task placement and affinity when using jsrun:

https://jsrunvisualizer.olcf.ornl.gov/

The visualization tool has some known limitations, such as not yet working well with hardware threading and non-SMT 4--but it usually fails gracefully with an error message.

Useful jsrun flags

  • Send kill signal to processes on a MPI failure (helps kill your job instead of hanging it).
jsrun -X 1 <further commands> 
  • Prepend the rank id to the output
jsrun --stdio_mode prepended
  • If Spectrum MPI arguments are needed (e.g. async argument when using 1-sided communication)
jsrun --smpiargs="-async"

Troubleshooting and Debugging

  • You can ssh to a compute node (only while job is running) ssh g36n12
  • Use gstack to get call stack of each thread (use top to get pid) gstack <pid>
  • See if the GPUs have something currently running: nvidia −smi Example idle GPUs:
 +-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.58                 Driver Version: 396.58                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000004:04:00.0 Off |                    0 |
| N/A   34C    P0    36W / 300W |      0MiB / 16128MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+                
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Optimizing and Profiling

MPI Profiling

Spectrum MPI includes a lightweight MPI tracing library called libmpitrace that was developed by IBM Research to profile critical benchmarks. The library has virtually no performance cost and provides MPI statistics such as histogram of MPI times per rank, breakdown of time spent in each call, message size statistics, and rank to node correlation. We highly recommend all users use this library which can be easily preloaded in your job script (no recompiling needed).

To use the library, simply include the following in your submission script:

export OMPI_LD_PRELOAD_POSTPEND=$MPI_ROOT/lib/libmpitrace.so

GPU Profiling Basics

NVIDIA's visual profiler and the command line driven nvprof are the recommended profiling tools. To get a basic command line profile summary of the time spent in each GPU kernel:

nvprof -s myexe.exe

nvprof also offers GPU metrics that can be examined in text format or imported into the visual profiler along with a timeline. If you need help understanding metrics please start a github issue or contact your IBM collaborator to draw on the community knowledge.

The NVIDIA Visual Profiler offers a GUI to visualize and navigate timelines. To use the NVIDIA Visual Profiler, the best technique is to

  1. Download NVIDIA Visual Profiler to your local computer (part of the CUDA toolkit). Make sure to download the version corresponding to the version of CUDA currently being used on WSC.

  2. Use nvprof -o -f myprofile.nvp executable.exe in your job script on WSC to create a profile file named myprofile.nvp.

  3. Move myprofile.nvp to your local computer and use the Visual Profiler locally to view the timeline.

NOTE: You can try to skip step 1 and 3 and instead use X forwarding and the Visual Profiler that is installed on the login node, but you will experience high latency when trying to navigate the profile.

GPU Profiling at Scale

Generating a profile file for each MPI rank of a large simulation will slow the application to a crawl and probably never finish.

Instead, you can generate a profile for just one of the ranks of a large simulation to get an idea of what is happening on a typical rank.

Have jsrun launch a script which will execute profiler+application if the rank matches PROFILE_RANK, otherwise it just launches your application without profiling:

export PROFILE_RANK=1
export PROFILE_PATH="/gpfs/path_to_your_directory"
jsrun <flags> profile_helper.sh a.out

Contents of profile_helper.sh:

# !/ bin /bash
if [ $PMIX_RANK == $PROFILE_RANK ]; then
nvprof −f −o $PROFILE_PATH "$@"
else
"$@"
fi

Running pytorch/torchvision

1. Load appropriate module. This will setup the environment

module l   //lists the modules
module load python/2.7.15-ananconda2 //loads conda2, it's the only environment found with pytorch

In order to load the latest pytorch and torchvision optimized for power, look here
https://researcher.watson.ibm.com/researcher/view_group.php?id=10068

A conda environment can be built with pytorch 1.1.0 and torchvision 0.2.2 (so far, this is the latest version of torchvision for linux-ppc64le out there)

1.1 First add the PowerAI channel to the Conda configuration by running the following command (look here for more info https://www.ibm.com/support/knowledgecenter/SS5SF7_1.6.0/navigation/pai_install.htm):

conda config --prepend channels \
https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/

1.2 Then create the conda environment which installs the PowerAI toolkit (including pytorch, torchvision and a bunch of other packages)

The WSC cluster already offers a module that includes PowerAI, namely python/2.7.15-anaconda2. In order to load this module and setup the Anaconda environment, create a PowerAI_setup.sh file containing the following commands:

module load python/2.7.15-anaconda2
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/shared/anaconda2/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/shared/anaconda2/etc/profile.d/conda.sh" ]; then
. "/shared/anaconda2/etc/profile.d/conda.sh"
else
export PATH="/shared/anaconda2/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<

Note that you will have to source this PowerAI_setup.sh file at the beginning of every new session

1.3 You can customize the version of python (default is 2.7) you need by creating your own conda environment

conda -n powerai_env python=3.6 powerai

1.4 Remember to add a conda activate powerai_env at the top of the .sh jobs  intended to run on each node (for eaxaple cifar10_train.sh)

P.S: you can check the example jobsub_example_powerai_env.sh which runs test_conda_env.sh on a single node

2. Setup Password-less ssh keys

cd ~/.ssh
ssh-keygen -t rsa
cat id_rsa.pub >> authorized_keys
chmod 700 .ssh; chmod 640 .ssh/authorized_keys

3. Prepare a job following the template shown below, and submit the job using jsrun

Following the template, you create a job summission script, in which you specify number of nodes, gpus per nodes, etc. and what command you want to execute on those nodes. The script generates a batch.job file, and that file gets submitted to the queue using jsrun

./jobsub_example_simple.sh // this will run the nvidia-smi command (contained in test_gpu.sh) on the nodes you allocated

./jobsub_example_cifar10.sh // this will train on cifar10 on the nodes you allocated (by running cifar10_train.sh)

4. Monitor the job

bjobs
bjobs -WP <jobnumber> | less
bseek <jobnumber> | less
bqueues

You can also check the `<jobnumber>.err` and `<jobnumber>.out` log files