qgpureset (1) - Linux Manuals
qgpureset: reset GPU error counts
DESCRIPTIONThe qgpureset command will request a MOM to reset the ECC counts on one of it's Nvidia GPUs. The GPU's error count is reset by sending a GPU Control batch request to the batch server.
Changing the GPU mode requires PBS Operator or Manager privilege. It also requires that Torque be configured with --enable-nvidia-gpu.
- -H host
- Specifies the host within the cluster on which the GPU is located. The argument is the name of a host that is a member of the cluster of hosts managed by the server.
- -g gpuid
- Specifies the ID of the GPU.
- Specifies to reset the GPU's permanent ECC error count.
- Specifies to reset the GPU's volatile ECC error count.