Open OnDemand Overview
Overview¶
We now provide various interactive Apps through the browser-based Open OnDemand service available at https://ood.brc.berkeley.edu.
Apps/services include:
- OOD Desktop
- Jupyter notebooks
- RStudio
- MATLAB via a Desktop environment
- VS Code
- File browsing
- Slurm job listing
- Terminal/shell access (under the "Clusters" tab)
The Open OnDemand service is a relatively new service. Starting with the Rocky Linux 8 OS upgrade (July 2024), we have now upgraded Open OnDemand to the latest version, 3.1.7. Please let us know if you run into problems or have suggestions or feature requests.
Logging In¶
- Visit https://ood.brc.berkeley.edu in your web browser.
- With Open OnDemand updated to version 3.1.7, we have now also adopted CILogon for user authentication to eliminate the repetitive login problems you might have experienced previously when logging in with your BRC username and PIN+one-time password (OTP). Please select the appropriate institute, primarily the University of California, Berkeley, at the login page. The command-line tool
email_lookup.sh
(located at/global/home/groups/allhands/bin/
) can help clarify at which institute you should log in.
Service Unit Charges¶
Open OnDemand apps may launch Slurm jobs on your behalf when requested. Open OnDemand refers to these jobs as "interactive sessions." Since these are just Slurm jobs, service units are charged for interactive sessions the same way normal jobs are charged.
Interactive sessions running on nodes n0000.savio4
and n0001.savio4
(in the ood-inter
partition) do not cost service units. These are are shared nodes that are provided for low-intensity jobs.
These should be treated like login nodes (that is, no intensive computation is allowed).
Job time is counted for interactive sessions as the total time the job runs.
The job starts running as soon as a node is allocated for the job.
The interactive session may still be running even if you do not have it open in your web browser.
You can view all currently running interactive sessions under My Interactive Sessions
.
When you are done, you may stop an interactive session by clicking "Delete" on the session.
There are several ways to monitor usage:
- Since Open OnDemand submits jobs through Slurm, you can monitor usage as you would monitor your regular Slurm jobs.
- View currently running (and recent) sessions launched by Open OnDemand under
My Interactive Sessions
. - View all currently running jobs under
Jobs > Active Jobs
.
Using Open OnDemand¶
Here are the services provided via Open OnDemand.
Files App¶
Access the Files App from the top menu bar under Files > Home Directory
. Using the Files App, you can use your web browser to:
- View files in the Savio filesystem.
- Create and delete files and directories.
- Upload and download files from the Savio filesystem to your computer.
- We recommend using Globus for large file transfers.
View Active Jobs¶
View and cancel active Slurm jobs from Jobs > Active Jobs
. This includes jobs started via sbatch
and srun
as well as jobs started (implicitly) via Open OnDemand (as discussed above).
Shell Access¶
Open OnDemand allows Savio shell access from the top menu bar under Clusters > BRC Shell Access
.
Interactive Apps¶
Open OnDemand provides additional interactive apps.
You can launch interactive apps from the Interactive Apps
menu on the top menu bar.
The available interactive apps include:
- Desktop App (for working with GUI-based programs)
- Jupyter Server (for working with Jupyter notebooks)
- RStudio Server (for working in RStudio sessions)
- Code Server (VS Code) (for code editing using Visual Studio Code)
Desktop App¶
The OOD Desktop App allows you to run programs that require graphical user interfaces (GUIs) on Savio, replacing our previous Visualization node service.
Intended Usage
When possible, you should carry out your computation via the traditional command line plus SLURM functionality. OOD Desktop is intended for use for programs that require GUIs. Furthermore, if you need to use Jupyter notebooks, RStudio, VS Code, or the MATLAB GUI, we provide specialized interactive apps that you should use instead of the OOD Desktop App.
Before getting started, make sure you have access to the Savio cluster, as you will need your BRC username and one-time password to log in.
Starting the Desktop App¶
- Connect to https://ood.brc.berkeley.edu
- Just after logging in with your BRC username and one-time password (OTP), the initial OnDemand screen presents a welcome screen. Click the "Interactive Apps" pulldown and choose the "Desktop", either using a shared node (for exploration and debugging use) or computing via Slurm for computationally-intensive work.
- Fill out the form presented to you and then press "Launch". (Note, as of this time, that the only partition that the Desktop app can be launched on when computing via Slurm is savio2_htc, as we assume that most GUI usage would be for programs using one or a small number of cores). After a moment, the Desktop session will be initialized and allow you to specify the image compression and quality options. If you are unhappy with the default values, you can relaunch the session from this page with different choices. Then, press "Launch Desktop" and the Desktop will open in a new tab.
Interacting with Files¶
Your Desktop session is running directly on Savio, and can interact with your files either through the command line as usual or through the file manager.
To open a command line terminal, right click anywhere on the Desktop and select "Open Terminal Here".
Jupyter Server¶
See the Jupyter documentation page for instructions on using Jupyter notebooks via Open OnDemand.
When using the "Jupyter Server - compute via Slurm in Slurm partitions" service units are charged based on job run time.
The job may still be running if you close the window or log out.
When you are done, shut down your Jupyter session by clicking "Delete" on the session under My Interactive Sessions
.
You can confirm that the interactive session has stopped by checking My Interactive Sessions
.
Don't submit Slurm jobs through the terminal within the Jupyter server
The Jupyter server is running within a Slurm job (including the Jupyter server on shared nodes). So don't submit a Slurm job (via sbatch
or srun
) from within the terminal available within the Jupyter server. If you do you'll be submitting a job within another job and this may cause problems with job execution. If you want to submit jobs from a terminal with your browser, see the Shell Access information.
RStudio Server¶
The RStudio server allows you to use RStudio on Savio, either run as part of a Slurm batch job ("compute via Slurm using Slurm partitions") or (for non-intensive computations) on our standalone Open OnDemand server ("compute on shared OOD node"). Use of the standalone Open OnDemand server doesn't use any FCA service units or tie up a condo node, but you are limited to 8 GB memory and should only use a few cores.
- Select the relevant
RStudio Server
underInteractive Apps
- Provide the job specification you want for the RStudio server (for the non-Slurm option, you'll just have to provide a time limit).
- Once RStudio is ready, click
Connect to RStudio Server
to access RStudio.
For the Slurm-based option, service units are charged based on job run time.
The job may still be running if you close the window or log out.
When you are done, shut down RStudio by clicking "Delete" on the session under My Interactive Sessions
.
You can confirm that the interactive session has stopped by checking My Interactive Sessions
.
Installing R packages
For certain R packages that involve more than simple R code, installing packages from within RStudio will fail because certain environment variables do not get passed into the RStudio instance running within OOD. Instead, please start a command-line based R session in a terminal and install R packages there. Once installed, these R packages will be usable from RStudio.
Accessing environment variables
Various environment variables, in particular Slurm-related variables such as SLURM_CPUS_ON_NODE
, are not available from within RStudio, either via Sys.getenv()
or via system()
.
Code Server (VS Code)¶
Code Server allows you to use Visual Studio Code (VS Code) from your web browser to work with code, including integrated debugging. The Code Server allows you to run as part of a Slurm batch job ("compute via Slurm using Slurm partitions") or (for non-intensive computations) on our standalone Open OnDemand server ("exploration/debugging on shared nodes"). Use of the standalone Open OnDemand server doesn't use any FCA service units or tie up a condo node, but you are limited to 8 GB memory and should only use a few cores.
- Select the relevant
Code Server
underInteractive Apps
- Provide the job specification you want for the Code Server (for the non-Slurm option, you'll just have to provide a time limit).
- Once Code Server is ready, click
Connect to VS Code
to access VS Code.
For the Slurm-based option, service units are charged based on job run time.
The job may still be running if you close the window or log out.
When you are done, shut down Code Server by clicking "Delete" on the session under My Interactive Sessions
.
You can confirm that the interactive session has stopped by checking My Interactive Sessions
.
VS Code remote SSH
For security reasons, users can not use VS Code's remote SSH feature with Savio via the command line in a terminal. Instead, Savio users should access VS Code via OOD following the above instructions.
Extensions on Code Server¶
Code Server is an open source implementation of VS Code which uses the Open-VSX extension gallery, not the Microsoft's extension marketplace. As a result, some extensions available from Microsoft's marketplace may not be available. We have installed some of the most commonly requested extensions, such as GitHub Copilot.
Troubleshooting Open OnDemand¶
Common problems¶
Problem: the OOD login pop-up box keeps reappearing¶
Note: this problem should no longer occur with changes made to the OOD login system during the switch to the Rocky 8 operating system in July 2024. If you see this problem, please let us know.
If you have trouble logging into OOD (in particular if the login pop-up box keeps reappearing after you enter your username and password), you may need to make sure you have completely exited out of other OOD sessions. This could include closing browser tab(s)/window(s), clearing your browser cache and clearing relevant cookies. You might also try running OOD in an incognito window (or if using Google Chrome, in a new user profile) or in a different browser (such as Google Chrome, Safari, or Firefox). For instructions on clearing your browser cache and cookies for the different browsers, see the links below:
Problem: when I login to OOD, I immediately get the error: "can't find user for YOUR-USERNAME. Run 'nginx_stage --help' to see a full list of available command line options"¶
This error occurs if your account has not been correctly set up to use OOD. Please contact us.
Problem: my OOD apps on shared nodes (VS Code, Jupyter shared node app, RStudio shared node app) never start¶
In some situations when using apps that run outside of a Slurm job (VS Code and Jupyter Server sessions on the shared Jupyter node) the session starts to be created, but the user is never provided with the "Connect" button on the "Interactive Sessions" page and the session terminates in a minute or two. We've seen two causes for this.
One possibility is that this can occur because you have the base Conda environment initialized automatically whenever you login. This will generally be the case if you have run conda init
and thereby modified your .bashrc file as discussed here. If so, you will generally see "(base)" appear at the beginning of your shell prompt, e.g. (base) [your_username@ln002 ~]$
.
The solution is to tell Conda not to enter the base environment when you login to Savio. Once in your shell, simply run
conda config --set auto_activate_base False
A second possibility is that you have installed a version of some Python package in your ~/.local
directory (via pip install --user
) that is masking the system version of the same package and interfering with starting your session. Please try moving aside your ~/.local
directory (e.g., mv .local .local.save
in a terminal) and trying again to see if that allows your session to start.
A third possibility is that you have something configured in your .bashrc
file that is preventing the OOD session from starting. Try moving your .bashrc
aside and using a plain version:
cp -rf ~/.bashrc ~/.bashrc.save
Then either remove items from your .bashrc
or create a very simple .bashrc
that looks like this:
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
Problem: my OOD apps on shared nodes (VS Code, Jupyter shared node app, RStudio shared node app) report "permission denied (gssapi-keyex,gssapi-with-mic,keyboard-interactive)"¶
This can occur if permissions are incorrectly set on your home directory or your .ssh
directory, preventing OOD from using SSH to connect to the shared node.
Make sure that your home directory is not world-writeable. E.g., in the following example, the last "w" indicates that the user's home directory is world-writeable.
[bugs_bunny@ln002 users]$ ls -ld bugs_bunny
drwxrwxrwx 1 bugs_bunny ucb 1048576 Mar 29 16:06 bugs_bunny
[bugs_bunny@ln002 users]$ chmod o-w ~bugs_bunny
drwxrwxr-x 1 bugs_bunny ucb 1048576 Mar 29 16:06 bugs_bunny
You might also need to modify the permissions on your .ssh
directory to look like this:
[bugs_bunny@ln002 ~]$ ls -ld .ssh
drwx------ 1 bugs_bunny ucb 348 Jan 6 15:58 .ssh
[bugs_bunny@ln002 ~]$ ls -l .ssh
total 104
-rw-r--r-- 1 bugs_bunny ucb 1013 Jan 6 15:58 authorized_keys
-rw------- 1 bugs_bunny ucb 672 Aug 2 2019 cluster
-rw-r--r-- 1 bugs_bunny ucb 610 Aug 2 2019 cluster.pub
-rw------- 1 bugs_bunny ucb 98 Jan 6 15:58 config
-rw-r--r-- 1 bugs_bunny ucb 79896 Mar 16 13:22 known_hosts
Problem: my OOD apps report an SSH or connection error¶
If OOD reports an error related to SSH or a connection problem, the issue may be that your account is not properly configured to be able to connect to the Savio compute nodes from the Savio login nodes. To troubleshoot:
- First, check that you can login to Savio by using SSH to connect to a login node.
- Second, check that you can ssh between nodes on Savio. For example, try to ssh to the DTN from a terminal session on one of the Savio login nodes:
ssh dtn
- Third, check that you can connect to one of the the shared nodes that OOD's non-Slurm-based sessions use. Try to ssh to the shared node from a terminal session on one of the Savio login nodes::
ssh n0001.savio4
If any of these tests do not work, please contact us.
Problem: my OOD Jupyter kernel or my RStudio's R session keeps dying¶
There can be various reasons a Jupyter kernel in an OOD Jupyter session or an RStudio R session may repeatedly die.
- Various users have reported problems when using Jupyter notebooks with the Safari web browser. (If you try to open a Terminal under the OOD
Clusters
tab, you will probably seen a "websocket connection" error message.) Try using Chrome or another browser and see if the problem persists. - If your code uses more memory than available to your session, the Jupyter kernel or R session can die without telling you why. In particular this is likely to occur when using a Jupyter Server or RStudio Server that uses the shared Jupyter/OOD node (i.e., outside of a Slurm job) because these sessions are limited to 8 GB of memory. Try running your Jupyter notebook or RStudio in a Slurm-based session and see if the problem persists.
Problem: Slurm-based OOD sessions never start¶
In some situations when starting a session that is Slurm-based (e.g., Jupyter server sessions that use the Savio partitions, RStudio sessions, or MATLAB sessions), the session starts to be created, but the user is never provided with the "Connect" button on the "Interactive Sessions" page and the session terminates in a minute or two. This can occur because of problems with your Slurm configuration or with your account's access to Savio software modules.
- First, check that you can run regular Savio jobs (outside of OOD) using either
srun
orsbatch
. - Second, check that you can load Savio software modules from within a terminal on a Savio login node. For example check that you can run
module load python
without error and that your MODULEPATH environment variable looks like this:echo $MODULEPATH ## /global/software/sl-7.x86_64/modfiles/langs:/global/software/sl-7.x86_64/modfiles/tools:/global/software/sl-7.x86_64/modfiles/apps:/global/home/groups/consultsw/sl-7.x86_64/modfiles module purge module load python module list ## 1) python/3.7
If you have problems in either case, please contact us.
Another possibility is that you have installed a version of some Python package (likely one that is Jupyter-related) in your ~/.local
directory (via pip install --user
) that is masking the system version of the same package and interfering with starting your session. Please try moving aside your ~/.local
directory (e.g., mv .local .local.save
in a terminal) and trying again to see if that allows your session to start.
Problem: OOD sessions on the shared node never start and report being in a "bad state"¶
This may be caused by instability of the shared node. Please try the following step to delete your session and start over:
- Delete the job using the Delete button to the right.
- If that doesn't work, log out and back into the OOD web portal.
- If that doesn't work, delete the folder under
~/ondemand/data/sys/dashboard/batch_connect/db
corresponding to the app having the “bad state” problem. - If none of those steps work, please contact us.
Problem: OOD apps fail to start with an error message about rsync/file IO/disk quota¶
If you see this message when trying to start an OOD app,
rsync: close failed on "/global/home/users/smith/ondemand/data/sys/dashboard/batch_connect/sys/..." Disk quota exceeded (122)
rsync error: error in file IO (code 11) at ...
it indicates that you've exceeded your disk quota in your home directory. Please see this FAQ for information on how to reduce your usage.
Problem: Accessing ood.brc.berkeley.edu
gives a 'Bad Request' message¶
If you see this message when accessing ood.brc.berkeley.edu
,
Bad Request
Your browser sent a request that this server could not understand.
Size of a request header field exceeds server limit.
try clearing the cookies in your browser. As work-arounds you might also try the browser icognito mode or use a different browser.
General information for troubleshooting¶
Logs and scripts for each interactive session with Open OnDemand are stored in:
~/ondemand/data/sys/dashboard/batch_connect/sys
There are directories for each interactive app type within this directory. For example, to see the scripts and logs for an RStudio session, you might look at the files under:
~/ondemand/data/sys/dashboard/batch_connect/sys/brc_rstudio-compute/output/b5733507-a750-4bb9-8d4b-710618ce0de1
where b5733507-a750-4bb9-8d4b-710618ce0de1
corresponds to a specific session of an OOD app (the RStudio app in this case).
The BRC Open OnDemand interactive apps configuration is on GitHub. Additional information about Open OnDemand configuration is available on the Open OnDemand documentation.