diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..0b59602 --- /dev/null +++ b/404.html @@ -0,0 +1,2196 @@ + + + +
+ + + + + + + + + + + + + + +This documentation guides users through the process of accessing CSCS systems and services.
+Before accessing CSCS, you need to have an account at CSCS, and be part of a project that has been allocated resources. +More information on how to get an account is available in accounts and projects.
+Multi Factor Authentification
+Before signing in to CSCS' web portals or using SSH, all users have to set up multi factor authentification (MFA)
+ +Web Services
+Before signing in to CSCS' web portals or using SSH, all users have to set up multi factor authentification (MFA)
+ +SSH Access
+Logging into Clusters on Alps
+ +VSCode
+How to connect VSCode IDE on your laptop with Alps
+ +To access CSCS services and systems users are required to authenticate using multi-factor authentication (MFA). +MFA is implemented as a two-factor authentication, where one factor is the login and password pair ("the thing you know") and the other factor is the device which generates one-time passwords (OTPs, "the thing you have"). +In this way security is significantly improved compared to single-factor (password only) authentication.
+The MFA workflow uses a time-based one-time password (OTP) to verify identity. +An OTP is a six-digit number which changes every 30 seconds. +OTPs are generated using a tool installed on a device other than the one used to access CSCS services and infrastructure. +We recommend to use a smartphone with an application such as Google Authenticator to obtain the OTPs.
+ +When you first log in to any of the CSCS web applications such as UMP, Jupyter, etc., you will be asked to register your device.
+Firstly, you will be asked to provide a code that you received by email. +After this validation step, you will need to scan a QR code with your mobile phone using an application such as Google Authenticator. +Lastly, you will need to enter the OTP from the authenticator application to complete the registration of your device. +From then on, two-factor authrentication will be required to access CSCS services and systems. +A more detailed explanation of the registration process is provided in the next section.
+Warning
+It is not possible to log in to CSCS systems using SSH without registering a device and creating certified SSH keys. +See below for details on generating certified SSH keys.
+CSCS supports authenticators that follow an open standard called TOTP. +The recommended way to access such an authenticator is to install an application on your mobile phone. +Google Authenticator and FreeOTP have been tested successfully; however, if you are using a different mobile application for OTPs, feel free to continue using it - given it supports the TOTP standard.
+You can download Google Authenticator for your phone:
+Before starting, ensure that the following pre-requisites are satisfied
+Note
+If you try access any of our web applications without setting up MFA, you will be redirected to enroll for MFA.
+Warning
+If you try to SSH to CSCS systems without setting up MFA, you will be prompted with permission denied error, for example: +
+Steps:
+account.cscs.ch
, Jupyter, etc., on a new browser session which will redirects you to the CSCS login page.Todo
+do we need the images from KB?
+In case users lose access to their mobile device/Authenticator OTP, users can reset their OTP by following the below self-service process.
+Warning
+When replacing your smartphone remember to sync the authenticator app before resetting the old smartphone. +Otherwise, you will have to follow this process.
+Before accessing CSCS clusters using SSH, first ensure that you have created a user account that is part of a project that has access to the cluster, and have multi factor authentification configured.
+ +It is not possible to authenticate with a username/password and user-created SSH keys. +Instead, it is necessary to use a certified SSH key created using the CSCS SSHService.
+Note
+Keys are valid for 24 hours, after which a new key must be generated.
+Warning
+The number of certified SSH keys is limited to five per day. +Once you have reached this number you will not be able to generate new keys until at least one of these key expires or keys are revoked.
+There are two methods for generating SSH keys using the SSHService, the SSHService web app or by using a command-line script.
+On Linux and MacOS, the SSH keys can be generated and automatically installed using a command-line script.
+This script is provided in pure Bash and in Python.
+Python 3 is required together with packages listed in requirements.txt
provided with the scripts.
Note
+We recommend to using a virtual environment for Python.
+If this is the first time, download the ssh service from CSCS GitHub:
+ +The next step is to use either the bash or python scripts:
+The first time you use the script, you can set up a python virtual environment with the dependencies installed:
+ +Therafter, activate the venv before using the script:
+ +For both approaches, follow the on screen instructions that require you to enter your username, password and the six-digit OTP from the authentifactor app on your phone.
+The script generates the key pair (cscs-key
and cscs-key-cert.pub
) in your ~/.ssh
path:
Access the SSHService web application by accessing the URL, sshservice.cscs.ch.
+Once generated, the keys need to be copied from where your browser downloaded them to your ~/.ssh
path, for example:
+
mv /download/location/cscs-key-cert.pub ~/.ssh/cscs-key-cert.pub
+mv /download/location/cscs-key ~/.ssh/cscs-key
+chmod 0600 ~/.ssh/cscs-key
+
Once the key has been generated using either the CLI or web interface above, it is strongly reccomended that you add a password to the generated key using the ssh-keygen tool.
+ +To ensure secure access, CSCS requires users to connect through the designated jump host Ela (ela.cscs.ch
) before accessing any cluster.
Before trying to log into your target cluster, you can first check that the SSH key generated above can be used to access Ela: +
+To log into a target system at CSCS, you need to perform some additional setup to handle forwarding of SSH keys generated using the SSHService. +There are two alternatives detailed below.
+ +This approach configures Ela as a jump host and creates aliases for the systems that you want to access in ~/.ssh/config
on your laptop or PC.
+The benefit of this approach is that once the ~/.ssh/config
file has been configured, no additional steps are required between creating a new key using MFA, and logging in.
Below is an example ~/.ssh/config
file that facilitates directly logging into the Daint, Santis and Clariden clusters using ela.cscs.ch
as a Jump host:
Host ela
+ HostName ela.cscs.ch
+ User cscsusername
+ IdentityFile ~/.ssh/cscs-key
+
+Host daint
+ HostName daint.alps.cscs.ch
+ User cscsusername
+ ProxyJump ela
+ IdentityFile ~/.ssh/cscs-key
+ IdentitiesOnly yes
+
+Host santis
+ HostName santis.alps.cscs.ch
+ ProxyJump ela
+ User cscsusername
+ IdentityFile ~/.ssh/cscs-key
+ IdentitiesOnly yes
+
+Host clariden
+ HostName clariden.alps.cscs.ch
+ ProxyJump ela
+ User cscsusername
+ IdentityFile ~/.ssh/cscs-key
+ IdentitiesOnly yes
+
Replace
cscsusername
with your CSCS username in the file above.
After saving this file, one can directly log into daint.alps.cscs.ch
from your local system using the alias daint
:
Alternatively, the SSH authentification agent can be configured to manage the keys.
+Each time a new key is generated using the SSHService, add the key to the SSH agent: +
+If you see this error message, the ssh agent is not running. +You can start it with the following command: +
+Once the key has been configured, log into Ela using the -A
flag, and then jump to the target system:
+
# log in to ela.cscs.ch
+ssh -A cscsusername@ela.cscs.ch
+
+# then jump to a cluster
+ssh daint.cscs.ch
+
You may have too many keys in your ssh agent. +Remove the unused keys from the agent or flush them all with the following command: +
+This might indicate that they key has expired.
+Visual Studio Code provides flexible support for remote development. +VSCode's remote tunnel feature starts a server on a remote system, and connects the editor to this server. +There are two ways to set up the connection:
+The main challenge with using VSCode is that the most convenient method for starting a remote session is to start a remote tunnel from the VS Code GUI. +This approach starts a session in the standard login environment on that node, however this won't work if you want to be developing in a container, in a uenv, or on a compute node.
+The most flexible method for connecting VSCode is to log in to the Alps system, set up your environment (start a container or uenv, start a session on a compute node), and start the remote server in that environment pre-configured.
+Note
+This approach requires that you have a GitHub account, and that the GitHub account is configured with your VS Code editor.
+The first step is to download the VS Code CLI tool code
, which CSCS provides for easy download.
+There are two executables, one for using on systems with x86 or ARM CPUs respectively.
Alternatively, download the CLI tool from the VS Code site -- take care to select either x86 or Arm64 version that matches the target system.
+After downloading, copy the code
executable to a location in your PATH, so that it is available for future sessions.
The home directory can be shared by multiple clusters that might have different micro-architectures, so it is important to separate executables for x86 and aarch64 (ARM) targets.
+In ~/.bashrc
, add the following line (you will need to log in again for this to take effect):
+
uname -m
command will print aarch64
or x86_64
, according to the microarchitecture of the node it is run on.
+Then create the path, and copy the code
executable to the architecture-specific path:
+
To set up a remote server on the target system,
+run the code
executable that you downloaded the tunnel
argument.
+You will be asked to choose whether to log in to Microsoft or GitHub (we have tested with GitHub):
> code tunnel --name=$CLUSTER_NAME-tunnel
+...
+? How would you like to log in to Visual Studio Code? ›
+ Microsoft Account
+❯ GitHub Account
+
Tip
+Give the tunnel a unique name using the --name
flag, which will later be listed on the VSCode UI.
You will be requested to go to github.com/login/device and enter an 8-digit code. +Once you have finished registering the service with GitHub, in VSCode on your PC/laptop open the "remote explorer" pane on the left hand side of the main window, and the connection will be visible under REMOTES (TUNNELS/SSH) -> Tunnels.
+first time setting up a remote service
+If this is the first time you have followed this procedure, you may have to sign in to GitHub in VSCode. +Click on the Remote Explorer button on the left hand side, and then find the following option:
+ +If you have not signed in to GitHub with VS Code editor, you will be redirected to the browser to sign in.
+After signing in and authorizing VSCode, the open tunnel should be visible under REMOTES (TUNNELS/SSH) -> Tunnels.
+To use a uenv with VSCode, the uenv must be started before calling code tunnel
.
+Log into the target system and start the uenv, then start the remote server, for example:
+
# log into daint (this could be any other Alps cluster)
+ssh daint
+# start a uenv session on the login node
+uenv start --view=default prgenv-gnu/24.11:v1
+# then start the tunnel
+code tunnel --name=$CLUSTER_NAME-tunnel
+
Alternatively, you can execute code tunnel
directly in the environment:
+
Once the tunnel is configured, you can access it from VSCode.
+Warning
+If you plan to do any intensive work: repeated compilation of large projects or running python code in Jupyter, please see the guide to running on a compute node below. +Running intensive workloads on login nodes, which are shared resources between all users, is against CSCS fair usage of Shared Resources policy.
+Todo
+write a guide
+If you plan to do computation using your VSCode, then you should first allocate resources on a compute node and set up your environment there.
+directly create the tunnel using srun
+You can directly execute the code tunnel
command using srun:
+
ssh daint
+srun --uenv=prgenv-gnu/24.11:v1 --view=default -t120 -n1 --pty code tunnel --name=$CLUSTER_NAME-tunnel
+
--uenv
and --view
set up the uenv-t120
requests a 2 hour (120 minute) reservation-n1
requests a single rank - only one rank/process is required for VSCode--pty
allows forwarding of terminal I/O, regired to sign in to GithubOnce the job allocation is granted, you will be prompted to log into GitHub, the same as starting a session on the login node. +If you don't want to use a uenv, the command is even simpler: +
+log into a node before starting
+It is also possible to log into a compute node before executing the code tunnel
command, if that suits your workflow:
+
# log into daint
+ssh daint
+
+# start an interactive shell session
+srun -t120 -n1 --pty bash
+
+# set up the environment before starting the tunnel
+uenv start prgenv-gnu/24.11:v1 --view=default
+code tunnel --name=$CLUSTER_NAME-tunnel
+
-t120
requests a 2 hour (120 minute) reservation-n1
requests a single rank - only one rank/process is required for VSCode--pty
allows forwarding of terminal I/O, for bash to work interactivelyWarning
+This approach is not recommended, because while it may be easier to connect via the VS Code UI, it is much more difficult to configure the connection so that you can use uenv, containers or compute nodes.
+Todo
+Write the guide
+Most services at CSCS are connected to the CSCS Single Sign-On gate. +This gives users the comfort of not having to sign in multiple times in each individual service connected to this gate and increases security. +Furthermore, the Single Sign-On gate allow users to recover their forgotten passwords and authenticate using a third-party account. The login page looks like
+ +After having completed the setup of MFA, you will be asked to enter your login/password and the OTP to access all web-based services.
+Enter username and password.
+Then you will be prompted to enter the 6-digit code obtained from your device.
+Users at CSCS have one account that can be used to access all services and systems at CSCS. +To get an account you must be invited by a member of CSCS project adminstration or by a the principle investigator (PI) of a current project at CSCS.
+Getting a project at CSCS for PIs
+In order to get an account at CSCS, or to request access for the members of your team, you first need to get a project at CSCS. +CSCS issues calls for proposals that are announced via the CSCS website and e-mails. +More information about upcoming calls is available on the CSCS web site.
+New PIs who have sucessfully applied for a preparatory project will receive an invitation from CSCS to get an account at CSCS. +PIs can then invite members of their groups to join their project.
+Info
+It is possible for users to be part of multiple projects by being invited separately by the PI of each project.
+Note
+Accounts are bound to projects, and accounts will be closed with the project unless the account is also part of another open project.
+The tool used to manage projects and accounts depends on the platform on which the project was granted:
+Note
+The portal.cscs.ch site will be used to manage all projects in the future.
+New users who do not already have an account at CSCS, including PIs, need to provide the following information before CSCS can open their account:
+New accounts are usually opened within 48 hours.
+In order to use a different account, log out of the Single Sign-On gate by going to the Account and Resources Tool and selecting "Log out of CSCS" on the upper-right profile icon with the tool used to manage your project, account.cscs.ch or portal.cscs.ch.
+All users at CSCS need to go through the standard registration process and get a CSCS account. In addition, they can also link their CSCS account to an external account, e.g. the one from their home institution. +In this case, they can sign into the CSCS services using his/her home institution credentials instead of the CSCS username/password. +This process happens only during the Single Sign-On procedure described above, and from that time on and for all purposes, and until the user logs out, the user identifier that presents itself to all CSCS services is the CSCS username, not the external one. +The number of external institutions that are allowed to link their accounts is limited and displayed in the login page.
+Linking an external account can be done in the Profile section (upper-right corner) of your account page at the tool used to manage your project, account.cscs.ch or portal.cscs.ch.
+Please note that as soon as you receive and accept an invitation to get an account at CSCS, you agree to the CSCS/ETHZ regulations.
+ + + + + + + + + + + + + +The Swiss National Supercomputing Centre (CSCS) offers a web-based tool for users to manage their accounts and projects at account.cscs.ch.
+With this tool, users can:
+For group leaders (or PIs), the tool allows:
+A short guideline on how to perform these tasks is provided below.
+The tool is designed to be intuitive and comprises the following main areas:
+To invite users to a selected project, group leaders or their deputies need to:
+To remove users from a selected project, group leaders or their deputies need to:
+CSCS Account Managers, PIs and deputy PIs can invite users to the respective projects following the below steps on CSCS's new project management portal.
+Info
+The new user project management portal is currently only used by the Machine Learning Platform +All other platforms use the old user management portal
+Navigate to the site project management portal portal.cscs.ch.
+After login to the portal, choose the corresponding organization in which the project was created.
+Todo
+screenshot
+In this example, The project was hosted by the CSCS organization, and say the project name is csstaff_n
, From the organization dashboard navigate to Projects and click on csstaff_n
Project
Todo
+screenshot
+From the project dashboard, navigate to Team -> Invitations
+Todo
+screenshot
+Info
+Using both the web interface and bulk invitation, the following roles can be assigned in the tool:
+To invite a user, click on the "Invite Users" button on the right hand side of the tab.
+Todo
+screenshot
+Todo
+screenshot
+It is also possible to bulk invite users by preparing a CSV file and uploading it in this step.
+ +Note
+An email will be sent to the invited user:
+Alps is a HPE Cray EX3000 system, a liquid cooled blade-based, high-density system.
+Todo
+this is a skeleton - all of the details need to be filled in
+The basic building block of the system is a liquid-cooled cabinet. +A single cabinet can accommodate up to 64 compute blade slots within 8 compute chassis. The cabinet is not configured with any +cooling fans. +All cooling needs for the cabinet are provided by direct liquid cooling and the CDU. +This approach to cooling provides greater efficiency for the rack-level cooling, decreases power costs associated with cooling (no blowers) and utilizes a single water source per CDU One cabinet supports the following:
+Todo
+information about the network.
+Alps was installed in phases, starting with the installation of 1024 AMD Rome dual socket CPU nodes in 2020, through to the main installation of 2,688 Grace-Hopper nodes in 2024.
+There are currently four node types in Alps, with another becoming available in 2025:
+type | +blades | +nodes | +CPU sockets | +GPU devices | +
---|---|---|---|---|
NVIDIA GH200 | +1344 | +2688 | +10,752 | +10,752 | +
AMD Rome | +256 | +1024 | +2,048 | +-- | +
NVIDIA A100 | +72 | +144 | +144 | +576 | +
AMD MI250x | +12 | +24 | +24 | +96 | +
AMD MI300A | +64 | +128 | +512 | +512 | +
Perry Peak
+ +EX425
+ +Grizzly Peak
+ +Bard Peak
+ +Parry Peak
+coming soon
+H1 2025
+Alps is a general-purpose compute and data Research Infrastructure (RI) open to the broad community of researchers in Switzerland and the rest of the world. +Alps provides a high impact, challenging and innovative RI that will allows Switzerland to advance science and impact society.
+Alps enables the creation of versatile clusters (vClusters) that can be tailored to the specific needs of users while maintaining confidentiality. +For example, a vCluster will be dedicated to MeteoSwiss’ numerical weather forecasts, another one to the User Lab and another one to Machine Learning and Artificial Intelligence.
+A key feature of Alps is multi-tenancy, where tenants are organizations, typically a research institution, that deploys, operates, or manages its platform on the Alps infrastructure. +Tenants have privileged access to resource nodes, enabling them to deploy their own services and resource configurations. +Additionally, network segregation ensures secure and isolated communication, with the option to connect to the tenant's private network.
+Platforms
+ +Clusters
+The resources on Alps are partitioned and configured into versatile software defined clusters (vClusters).
+ +Hardware
+Learn about the node types and networking infrastructure in Alps.
+ +Storage
+Learn about the file systems attached to Alps.
+ +A platform represents a set of scientific services along with compute and data resources hosted on the Alps research infrastructure, provided to a specific scientific community. +Each platform addresses particular research needs and domains, such as climate and weather modeling, machine learning, or high-performance computing applications. +A platform can consist of one or multiple clusters, and its services can be managed either by CSCS or by the scientific community itself, including access control, usage policies, and support.
+ + + + + + + + + + + + + + +Alps has different storage attached, each with characteristics suited to different workloads and use cases. +HPC storage is manged in a separate cluster of nodes that host servers that manage the storage and the physical storage drives. +These separate clusters are on the same Slingshot 11 network as the Alps.
++ | Capstor | +IOPStor | +Vast | +
---|---|---|---|
Model | +HPE ClusterStor E1000D | +HPE ClusterStor E1000F | +Vast | +
Type | +Lustre | +Lustre | +NFS | +
Capacity | +129 PB raw GridRAID | +7.2 PB raw RAID 10 | +1 PB | +
Number of Drives | +8,480 16 TB HDD | +240 * 30 TB NVME SSD | +N/A | +
Read Speed | +1.19 TB/s | +782 GB/s | +38 GB/s | +
Write Speed | +1.09 TB/s | +393 GB/s | +11 GB/s | +
IOPs | +1.5M | +8.6M read, 24M write | +200k read, 768k write | +
file create/s | +374k | +214k | +97k | +
Capstor is the largest file system, for storing large amounts of input and output data. +It is used to provide SCRATCH and STORE for different clusters - the precise details are platform-specific.
+Todo
+small text explaining what iopstor is designed to be used for.
+The Vast storage is smaller capacity system that is designed for use as home folders.
+Todo
+small text explaining what iopstor is designed to be used for.
+The mounts, and how they are used for SCRATCH, STORE, PROJECT, HOME would be in the storage docs
+ + + + + + + + + + + + + +A vCluster (versatile software-defined cluster) is a logical partition of the supercomputing resources where platform services are deployed. It serves as a dedicated environment supporting a specific platform. The composition of resources and services for each vCluster is defined in a configuration file used by an automated pipeline for deployment. Once deployed by CSCS, the vCluster becomes immutable.
+Clusters on Alps are provided as part of different platforms.
+Climate and Weather Platform
+Santis is a Grace-Hopper cluster for climate and weather simulation
+ +