WSC Cluster       

links

WSC Cluster - overview


Welcome to the WSC cluster

Overview

The WSC cluster consists of IBM AC-922 nodes (2 socket, 22 core Power9 CPUs + 6 Volta GPUs).  These are the same nodes used in the Summit supercomputer at ORNL.

 

The cluster consists of 57 compute nodes, each with with dual socket 22 core CPU and 6 GPUs, plus seven additional nodes dedicated to management functions. Occasionally, a few compute nodes are assigned to specific experiments, thus the maximum number of compute nodes available for regular use at a given time might be less than 57.

Management nodes include login and launch nodes; these nodes have dual socket 20 core CPU and 4 GPUs.

Note: This cluster is in the so-called IBM Yellow zone, thus it cannot access any of the internal IBM resources.

The cluster is intended to be used for the following purposes:

  • Advanced research for IBM Research projects.
  • Collaboration with external entities on projects of common interest.
  • Support IBM business units.

Confidential and Proprietary data

  • There should not be any Confidential data stored in WSC; that is, data that requires authorization for its use.
  • There should not be any Personal Sensitive Information (PSI) stored in WSC.
  • There should not be any Export Controlled data stored in WSC.

If in doubt regarding the classification of some data, do not hesitate to ask for clarification.


Requesting an Account

IBM employees

Submit the request form at the link below. Include GSA id and the output from the "id" command, so that admins have the GSA user and group numeric id's.

http://bgweb.watson.ibm.com/account_request.html   (This link is available only internally at IBM)

External Parties

CoVID-19: If you would like to use this resource for CoVID-19 research, please apply here: https://www.xsede.org/covid19-hpc-consortium

Non-IBM (external) parties should request access through an IBM collaborator. In addition to access to WSC itself, external parties need to get access to a VPN for connecting to the IBM Yellow zone.


Connecting to the login node:

Internal Users: ssh c699login02.pbm.ihost.com

External Users: ssh c699login02.pok.stglabs.ibm.com


Software

Software is managed through the use of Lua modules. These modules alter your environment variables (e.g. PATH, LD_LIBRARY_PATH, etc.) to point to different versions of software and manage software dependencies. It is recommended that you manage your software choices through the use of these modules, since modifying your path yourself in combination with the use of modules can lead to dependency conflicts.

Documentation: http://lmod.readthedocs.org

Compilers

The system has the following compilers installed: IBM XL, PGI, CLANG, and GNU.

To see all compilers and versions installed on the system, use the `module avail` command. Compilers should be loaded through the use of modules.

Spectrum MPI is based on OpenMPI and the underlying compiler can be changed by exporting the OpenMPI environment variable.

|Env Variable | Affects|
|--------|-------|
`OMPI_CC` | `mpicc` (C compiler)|
`OMPI_CXX` |`mpicxx` (C++ compiler) |
`OMPI_FC` | `mpif90` (Fortran compiler) |


Running Jobs

IMPORTANT: Password-less ssh keys to compute nodes is required

To run jobs through the CSM queue from a login node:
1. make sure you have passwordless ssh enabled

cd ~/.ssh
ssh-keygen -t rsa
cat id_rsa.pub >> authorized_keys

2. Your file permissions must be set correctly:

chmod 700 .ssh; chmod 640 .ssh/authorized_keys

3. verify that your PATH and other env are not pointing at /shared/lsf/.... directories

 

If you do not have passwordless ssh enabled on the compute nodes, you will get an error from jsrun:

Error: It is only possible to use js commands within a job allocation unless CSM is running
01-30-2019 12:47:46:153 12441 main: Error initializing RM connection. Exiting.

Quick Start hello world examples

The public github repo at [WSC hello world examples](https://github.com/dappelha/summit-scripts/tree/WSC) provides examples of how to create a LSF submission script and how to use jsrun to achieve several popular MPI task layouts.


Reporting problems

IBM employees

Problems with the cluster need to be submitted through an internal GitHub issues system (https://github.ibm.com/DCS-research/WSC-coral/issues  - this link is available only internally to IBM).

External Parties

Communicate the problem to IBM collaborators so they can file an issue through the internal GitHub issues system.


Documentation at ORNL

ORNL has extensive documentation on Summit; some of that documentation is also applicable to WSC: