qgpureset (1) Linux Manual Page
NAME
SYNOPSIS
DESCRIPTION
The qgpureset command will request a MOM to reset the ECC counts on one of it’s Nvidia GPUs. The GPU’s error count is reset by sending a GPU Control batch request to the batch server.
Changing the GPU mode requires PBS Operator or Manager privilege. It also requires that Torque be configured with –enable-nvidia-gpu.
OPTIONS
- -H host
- Specifies the host within the cluster on which the GPU is located. The argument is the name of a host that is a member of the cluster of hosts managed by the server.
- -g gpuid
- Specifies the ID of the GPU.
- -p
- Specifies to reset the GPU’s permanent ECC error count.
- -v
- Specifies to reset the GPU’s volatile ECC error count.
OPERANDS
None
STANDARD ERROR
The qgpureset command will write a diagnostic messages to standard error for each error occurrence.
EXIT STATUS
Upon successful processing of all the operands presented to the
If the qgpureset command fails to process any operand, the command exits with a value greater than zero.
SEE ALSO
pbs_mom(8B) and pbs_server(8B)
