pbs_gpureset (3) - Linux Manuals

pbs_gpureset: reset GPU error counts

NAME

pbs_ gpureset - reset GPU error counts

SYNOPSIS

#include <pbs_error.h>
#include <pbs_ifl.h>

int pbs_ gpureset(int connect, char *mom_node, int gpu_id, int ecc_perm, int ecc_vol)

DESCRIPTION

Issue a batch request for the pbs_mom to reset the ECC counts on one of it's Nvidia GPUs. The GPU's error count is reset by sending a GPU Control batch request to the batch server.

The argument, specifies the host within the cluster on which the GPU is located. The argument is the name of a host that is a member of the cluster of hosts managed by the server.

The argument, specifies ID of the GPU on the MOM node.

The argument, specifies whether or not to reset the GPU's permanent ECC error count. Value of 1 resets, value of 0 does not.

The argument, specifies whether or not to reset the GPU's volatile ECC error count. Value of 1 resets, value of 0 does not.

This call requires PBS Operator or Manager privilege. It also requires that Torque be configured with --enable-nvidia-gpu.

DIAGNOSTICS

When the batch request generated by the pbs_ gpureset() function has been completed successfully by a batch server, the routine will return 0 (zero). Otherwise, a non zero error is returned. The error number is also set in pbs_errno.

SEE ALSO

qgpureset(1B)