WSC Cluster       

links

WSC Cluster - User's Guide


Connecting to WSC

There are two login nodes in WSC: c699login01 and c699login02

Connecting to a login node is done as follows:

  • IBM internal user: ssh c699login02.pbm.ihost
  • External user: ssh c699login02.pok.stglabs.ibm.com

From the login node, jobs are launched into a CSM queue using LSF/JSM commands -- see the quick start "hello world" example below.


Running Jobs

IMPORTANT: Password-less ssh keys to compute nodes is required

To run jobs through the CSM queue from a login node:

  1. Ensure passwordless ssh is enabled
cd ~/.ssh
ssh-keygen -t rsa
cat id_rsa.pub >> authorized_keys
  1. File permissions must be set correctly:
chmod 700 .ssh; chmod 640 .ssh/authorized_keys
  1. Verify that PATH and other env are not pointing at /shared/lsf/.... directories

If passwordless ssh is not enabled on the compute nodes, an error from jsrun will be reported:

Error: It is only possible to use js commands within a job allocation unless CSM is running
01-30-2019 12:47:46:153 12441 main: Error initializing RM connection.

Quick Start hello world examples

The public github repo at WSC hello world examples provides examples of how to create a LSF submission script and how to use jsrun to achieve several popular MPI task layouts.


LSF - the job submission tool on WSC

LSF is the batch submission and job scheduling tool on WSC. The above quick start example scripts combine LSF and jsrun commands into a "hello world" example. Extension documentation about LSF is available here:

LSF commonly used commands:

LSF is entirely command line driven, with no supplied GUI. LSF Platform HPC normally supplies a GUI for LSF, but Platform HPC is not deployed in WSC.

  • lsid : displays the current LSF version number, the cluster name, and the master host name
  • lshosts : displays hosts and their static resource information
  • bhosts : displays hosts and their static and dynamic resources
  • lsload : displays load information for hosts
  • bjobs : displays and filters information about LSF jobs. Specify one or more job IDs (and, optionally, an array index list) to display information about specific jobs (and job arrays)
  • bqueues : displays information about queues
  • bsub : submits a job to LSF by running the specified command and its arguments
  • brsvs : displays advance reservations (see lsf admin page for info on creating reservations)
  • bmgroup : displays information about host groups and compute units
  • bkill: sends signals to kill, suspend, or resume unfinished jobs

Additional documentation on commands can be found here: LSF commands

LSF setup

Prior to being able to run lsf commands, one has to source the LSF environment. There are two ways of doing this:

  • Place the following code in the .bash_profile file in the home directory.
# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi
if [ -f /etc/bash.bashrc ]; then   #variation for ubuntu
    . /etc/bash.bashrc
fi

this will pick up the global profile for LSF.

  • Do this directly in either the user profile or by hand:
source /opt/ibm/spectrumcomputing/lsf/conf/profile.lsf

Checking on a submitted job

  • bjobs will show a list of current jobs and their status.
  • To check pending reason of a specific job, bjobs -p <jobid>
  • Is the que busy/open? bqueues
  • What jobs are scheduled on the machine? bjobs -u all

Checking on a running job

  • bpeek will show the tail of standard out and standard error (jobid is optional): bpeek <jobid> | less
  • Something seems wrong, can I check into a job further? bjobs −WP <jobid> | less
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME   %COMPLETE
220933  ngawand RUN   batch      login1      batch1      d48_c      Nov 28 17:40  32.62% L 
                                             g36n12
                                             g36n12
                                             ...
  • User can ssh to a compute node (only while job is running) ssh g36n12

jsrun - the tool for launching mpi applications on WSC

ORNL has a very useful tool for visualizing task placement and affinity when using jsrun:

https://jsrunvisualizer.olcf.ornl.gov/

The visualization tool has some known limitations, such as not yet working well with hardware threading and non-SMT 4--but it usually fails gracefully with an error message.

Useful jsrun flags

  • Send kill signal to processes on a MPI failure (helps kill your job instead of hanging it).
jsrun -X 1 <further commands> 
  • Prepend the rank id to the output
jsrun --stdio_mode prepended
  • If Spectrum MPI arguments are needed (e.g. async argument when using 1-sided communication)
jsrun --smpiargs="-async"

Troubleshooting and Debugging

  • You can ssh to a compute node (only while job is running) ssh g36n12
  • Use gstack to get call stack of each thread (use top to get pid) gstack <pid>
  • See if the GPUs have something currently running: nvidia −smi Example idle GPUs:
 +-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.58                 Driver Version: 396.58                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000004:04:00.0 Off |                    0 |
| N/A   34C    P0    36W / 300W |      0MiB / 16128MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+                
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Optimizing and Profiling

MPI Profiling

Spectrum MPI includes a lightweight MPI tracing library called libmpitrace that was developed by IBM Research to profile critical benchmarks. The library has virtually no performance cost and provides MPI statistics such as histogram of MPI times per rank, breakdown of time spent in each call, message size statistics, and rank to node correlation. We highly recommend all users use this library which can be easily preloaded in your job script (no recompiling needed).

To use the library, simply include the following in your submission script:

export OMPI_LD_PRELOAD_POSTPEND=$MPI_ROOT/lib/libmpitrace.so

GPU Profiling Basics

NVIDIA's visual profiler and the command line driven nvprof are the recommended profiling tools. To get a basic command line profile summary of the time spent in each GPU kernel:

nvprof -s myexe.exe

nvprof also offers GPU metrics that can be examined in text format or imported into the visual profiler along with a timeline. If you need help understanding metrics please start a github issue or contact your IBM collaborator to draw on the community knowledge.

The NVIDIA Visual Profiler offers a GUI to visualize and navigate timelines. To use the NVIDIA Visual Profiler, the best technique is to

  1. Download NVIDIA Visual Profiler to your local computer (part of the CUDA toolkit). Make sure to download the version corresponding to the version of CUDA currently being used on WSC.

  2. Use nvprof -o -f myprofile.nvp executable.exe in your job script on WSC to create a profile file named myprofile.nvp.

  3. Move myprofile.nvp to your local computer and use the Visual Profiler locally to view the timeline.

NOTE: You can try to skip step 1 and 3 and instead use X forwarding and the Visual Profiler that is installed on the login node, but you will experience high latency when trying to navigate the profile.

GPU Profiling at Scale

Generating a profile file for each MPI rank of a large simulation will slow the application to a crawl and probably never finish.

Instead, you can generate a profile for just one of the ranks of a large simulation to get an idea of what is happening on a typical rank.

Have jsrun launch a script which will execute profiler+application if the rank matches PROFILE_RANK, otherwise it just launches your application without profiling:

export PROFILE_RANK=1
export PROFILE_PATH="/gpfs/path_to_your_directory"
jsrun <flags> profile_helper.sh a.out

Contents of profile_helper.sh:

# !/ bin /bash
if [ $PMIX_RANK == $PROFILE_RANK ]; then
nvprof −f −o $PROFILE_PATH "$@"
else
"$@"
fi

Software

Using Modules to Load Software

Software is managed through the use of Lua modules. These modules alter your environment variables (e.g. PATH, LD_LIBRARY_PATH, etc.) to point to different versions of software and manage software dependencies. It is recommended that you manage your software choices through the use of these modules, since modifying your path yourself in combination with the use of modules can lead to dependency conflicts.

Command Result
module l or module list Lists modules currently loaded by the user
module avail Displays modules available for loading.
module show <modulefile> Show the commands in the module file (i.e. what paths are set).
module whatis <modulefile> Print short description of the modulefile.
module unload <modulefile> Unloads modulefile

Documentation: http://lmod.readthedocs.org

Compilers

The system has the following compilers installed: IBM XL, PGI, CLANG, and GNU.

To see all compilers and versions installed on the system, use the module avail command. Compilers should be loaded through the use of modules.

Changing underlying MPI compiler

MPI compilers are actually scripts that use a normal compiler plus link lines for the MPI libraries. To see what compiler is actually used when you call mpi** you can use the --showme argument, e.g.

-bash-4.2$ mpicc --showme
xlc_r -I/opt/ibm/spectrum_mpi/include -pthread -L/opt/ibm/spectrum_mpi/lib -lmpiprofilesupport -lmpi_ibm

Spectrum MPI is based on OpenMPI and the underlying compiler can be changed by exporting the OpenMPI environment variable.

Env Variable Affects
OMPI_CC mpicc (C compiler)
OMPI_CXX mpicxx (C++ compiler)
OMPI_FC mpif90 (Fortran compiler)

For example, to use the gcc compiler instead of the default xlc_r compiler,

export OMPI_CC=gcc

Changing underlying nvcc compiler

You can change which compiler nvcc uses with the -ccbin flag, for example:

nvcc -ccbin xlc_r