Tips on Providing an Informative Help Request
Here are some general tips that may help the Research IT consultants better assist you when sending in an issue ticket via email to brc-hpc-help@berkeley.edu:
- Provide a descriptive subject line, succinctly summarizing the problem. Something like "Savio problems" is NOT very useful to us.
- Please make sure to always provide your Savio username (if you have one), along with the FCA name and/or Condo account name(s) that your Savio user account has access to.
- If your question is related to an issue with a job you’ve submitted on Savio (e.g., your job crashed), please make sure to provide the job ID of the problematic job. (Recall that if the job was submitted, Slurm will give you a job ID. We will need this ID.)
- Please also copy and paste the contents of the relevant SLURM batch job script in your email. This can help the Research IT consultants with troubleshooting. If you send the SLURM batch job script file as an attachment, please rename the file extension to end in .txt, as the ticketing system rejects .sh and other file attachments. Similarly, providing the contents of any SLURM output log or error log file you’ve generated (by copying and pasting or as a .txt file attachment) is also helpful. Providing (by copying and pasting or as .txt file attachments) the output of job monitoring tools for the problematic job such as
squeue
,sacct
,sinfo
,sacctmgr
,scontrol
andsq
can also be helpful in troubleshooting (e.g., the output ofsqueue -u <your_user_name>
,scontrol show job <jobid>
, orsacct -j <JOB_ID> --format=JobID,JobName,MaxRSS,Elapsed
, etc. of your job can be useful).- In general, we will likely need to see error messages, your job submission script, and possibly your compile line and application configuration. You should include this information in your issue ticket either by copying and pasting the text into the email itself and/or by attaching a *.txt file or screenshot image file that includes this information to the email. Note that if you point consultants to a path in your directory where this information exists, not all consultants will have the correct file permissions to access this information. Therefore, it is always best to include all of this information in the issue ticket itself. Please see bullet points below for additional details.
- Send us the specific error message you are seeing (via copying and pasting or as a *.txt file attachment) and what command(s) you used to get it. Oftentimes the consultants receive emails which say, "My job crashed" or "I'm getting error messages when I try to install this software package", which doesn't give us enough information. For example, if you are receiving error messages when installing Python packages and/or setting up a Conda environment, please share with us the complete set of commands you're running to install the Python packages and/or create the Conda environment, along with the full error messages/output that you see, so we can try to reproduce everything if needed.
- State which Savio/Vector/CGRL file system(s) you were using, either to submit the job or in which the data files reside, and paths to the relevant files, especially your job script. For example, when submitting tickets about hung commands, please be sure to include the full file/directory path you're running the command in. We operate 30+ file systems on Savio that are served by no fewer than 4 distinct clusters. Therefore, providing the path significantly shortens the time it takes for us to find the issue and focus on fixing things. Please also let us know whether you were using the Open OnDemand service when receiving an error message, or whether you were accessing a login node (for example).
- Listing which module files you had loaded may help, as well as the exact command(s) you used that, for example, may have resulted in an error message.
- Give us some background about your problem. For example, have you run this code before on Savio? When was the last time it ran successfully? Is this a new version of your code? Has it run successfully on other platforms? What compiler (e.g., gcc, intel) version have you used? Have you changed compilers and/or compiler versions?
- If you are having connection issues, please include the exact command you are running, the host you are trying to connect to (i.e., the Savio login node or the Savio DTN), the username you are using (DO NOT INCLUDE PASSWORDS), the approximate time of the failed attempts (as accurately as you can), and if possible the IP address of the machine you are trying to connect from.
- Tell us what you've tried already. If you've already tried to solve your problem in a few different ways, let us know so we don't waste time trying the same things. For example, have you already checked through our extensive Savio documentation for a possible solution to your problem? Have you checked online via the Google search engine or another searh engine? Have you checked through online user forums such as Stack Overflow, Super User, or Ask.CI? Have you tried ChatGPT? Have you reached out and asked for help from a research colleague in your lab or department who also has a Savio account?
- Report any problem that affects your productivity. Our users are our best system testers and we rely on your reports to improve the systems.
- Finally, if an issue ticket has been resolved to the satisfaction of the user, it is good practice for the user to inform the consultant(s) who are handling the ticket of this in the user's reply to the ticket so that the consultant can label the ticket "resolved" and expiditiously close the ticket.