nvgpu-snmp is an SNMP agent for Net-SNMP to gather information about NVIDIA GPUs, for example BIOS and driver versions, GPU core and ambient temperatures, and GPU and memory clock frequencies.
It supports NVIDIA graphics cards (GeForce/Quadro series) as well as NVIDIA Tesla computing processors, provided the system is running an X server with the NVIDIA X display driver. It allows for effortless temperature monitoring in both small and large GPGPU computing cluster environments.
License
nvgpu-snmp is licensed under the GNU General Public License version 2 or later.
Getting the source code
nvgpu-snmp is maintained in a git repository. To get the source code, run:
git clone git://git.colberg.org/gpgpu/nvgpu-snmp
In case you are behind a firewall which blocks the git protocol port, use:
git clone http://git.colberg.org/gpgpu/nvgpu-snmp
Prerequisites
nvgpu-snmp requires a running X server with the NVIDIA X display driver. It suffices to have the login manager of KDM or GDM running. To test whether the NV-CONTROL X extension is working, run as root:
export XAUTHORITY=/var/lib/gdm/:0.Xauth export DISPLAY=:0 nvidia-settings -q gpus
2 GPUs on hal:0
[0] hal:0[gpu:0] (GeForce GTX 280)
[1] hal:0[gpu:1] (GeForce GTX 280)
to list the available GPUs in the system, where the XAUTHORITY environment variable contains the path to the authority file of the current X session (in this case of the GDM login screen).
Note that only one GPU needs to be configured as a screen in the xorg.conf file, which may also be a headless GPU such as an NVIDIA Tesla device: As long as you manage to start an X server on any of the available NVIDIA GPUs, you may query the NV-CONTROL X extension attributes of all of them.
Installation
The Net-SNMP, Xext and X11 headers and libraries are required for compilation.
To compile the source code, run:
make
This will produce the shared object module nvgpu-snmp.so, which may be installed along with the MIB module NV-CTRL-MIB.txt using:
make install
Otherwise, if your installation of Net-SNMP does not use the default paths for shared object modules or MIB modules, copy them manually.
SNMP daemon configuration
Next, configure the SNMP daemon to load the module by including the following snippet in the snmpd.conf file:
dlmod nvCtrlTable nvgpu-snmp
While you are at it, allow local access to the SNMP daemon for testing purposes:
com2sec server 127.0.0.1 public
Before starting snmpd, the DISPLAY environment variable has to be set:
export DISPLAY=:0
This command may for example be included in the /etc/default/snmpd or a similar file.
Granting X server access to the SNMP daemon
The tricky part is to allow the SNMP daemon to access the X server in order to query the NV-CONTROL X extension.
One possibility is to give the user of the snmpd process read access to the X authority file of the current X session and manually inform snmpd about its path by setting the XAUTHORITY environment variable.
A slightly more elegant solution is to use xhost +si:localuser:snmp in the current X session to grant the snmp user access to the X server. For the example of the GDM login manager, this may be achieved by running as root:
export XAUTHORITY=/var/lib/gdm/:0.Xauth export DISPLAY=:0 xhost +si:localuser:snmp
If your distribution has a /etc/X11/Xsession.d/ directory to execute arbitrary scripts upon X session initialization as the session user, placing a script such as the following into this directory might also work:
# # allow snmp daemon to access NV-CONTROL X extension # xhost +si:localuser:snmp
Testing
Use snmpwalk to query the nvgpu-snmp agent module:
snmpwalk localhost nvCtrl
NV-CTRL-MIB::nvCtrlGPU.0 = INTEGER: 0 NV-CTRL-MIB::nvCtrlGPU.1 = INTEGER: 1 NV-CTRL-MIB::nvCtrlProductName.0 = STRING: GeForce GTX 280 NV-CTRL-MIB::nvCtrlProductName.1 = STRING: GeForce GTX 280 NV-CTRL-MIB::nvCtrlVBiosVersion.0 = STRING: 62.00.0e.00.01 NV-CTRL-MIB::nvCtrlVBiosVersion.1 = STRING: 62.00.0e.00.01 NV-CTRL-MIB::nvCtrlNvidiaDriverVersion.0 = STRING: 185.18.14 NV-CTRL-MIB::nvCtrlNvidiaDriverVersion.1 = STRING: 185.18.14 NV-CTRL-MIB::nvCtrlVersion.0 = STRING: 1.18 NV-CTRL-MIB::nvCtrlVersion.1 = STRING: 1.18 NV-CTRL-MIB::nvCtrlBusType.0 = INTEGER: 2 NV-CTRL-MIB::nvCtrlBusType.1 = INTEGER: 2 NV-CTRL-MIB::nvCtrlBusRate.0 = INTEGER: 16 NV-CTRL-MIB::nvCtrlBusRate.1 = INTEGER: 16 NV-CTRL-MIB::nvCtrlVideoRam.0 = INTEGER: 1048576 NV-CTRL-MIB::nvCtrlVideoRam.1 = INTEGER: 1048576 NV-CTRL-MIB::nvCtrlIrq.0 = INTEGER: 16 NV-CTRL-MIB::nvCtrlIrq.1 = INTEGER: 19 NV-CTRL-MIB::nvCtrlGPUCoreTemp.0 = INTEGER: 49 NV-CTRL-MIB::nvCtrlGPUCoreTemp.1 = INTEGER: 46 NV-CTRL-MIB::nvCtrlGPUCoreThreshold.0 = INTEGER: 190 NV-CTRL-MIB::nvCtrlGPUCoreThreshold.1 = INTEGER: 190 NV-CTRL-MIB::nvCtrlGPUDefaultCoreThreshold.0 = INTEGER: 190 NV-CTRL-MIB::nvCtrlGPUDefaultCoreThreshold.1 = INTEGER: 190 NV-CTRL-MIB::nvCtrlGPUMaxCoreThreshold.0 = INTEGER: 190 NV-CTRL-MIB::nvCtrlGPUMaxCoreThreshold.1 = INTEGER: 190 NV-CTRL-MIB::nvCtrlGPUAmbientTemp.0 = INTEGER: 41 NV-CTRL-MIB::nvCtrlGPUAmbientTemp.1 = INTEGER: 40 NV-CTRL-MIB::nvCtrlGPUOverclockingState.0 = INTEGER: 0 NV-CTRL-MIB::nvCtrlGPUOverclockingState.1 = INTEGER: 0 NV-CTRL-MIB::nvCtrlGPU2DGPUClockFreq.0 = INTEGER: 300 NV-CTRL-MIB::nvCtrlGPU2DGPUClockFreq.1 = INTEGER: 300 NV-CTRL-MIB::nvCtrlGPU2DMemClockFreq.0 = INTEGER: 100 NV-CTRL-MIB::nvCtrlGPU2DMemClockFreq.1 = INTEGER: 100 NV-CTRL-MIB::nvCtrlGPU3DGPUClockFreq.0 = INTEGER: 602 NV-CTRL-MIB::nvCtrlGPU3DGPUClockFreq.1 = INTEGER: 602 NV-CTRL-MIB::nvCtrlGPU3DMemClockFreq.0 = INTEGER: 1107 NV-CTRL-MIB::nvCtrlGPU3DMemClockFreq.1 = INTEGER: 1107 NV-CTRL-MIB::nvCtrlGPUDefault2DGPUClockFreq.0 = INTEGER: 300 NV-CTRL-MIB::nvCtrlGPUDefault2DGPUClockFreq.1 = INTEGER: 300 NV-CTRL-MIB::nvCtrlGPUDefault2DMemClockFreq.0 = INTEGER: 100 NV-CTRL-MIB::nvCtrlGPUDefault2DMemClockFreq.1 = INTEGER: 100 NV-CTRL-MIB::nvCtrlGPUDefault3DGPUClockFreq.0 = INTEGER: 602 NV-CTRL-MIB::nvCtrlGPUDefault3DGPUClockFreq.1 = INTEGER: 602 NV-CTRL-MIB::nvCtrlGPUDefault3DMemClockFreq.0 = INTEGER: 1107 NV-CTRL-MIB::nvCtrlGPUDefault3DMemClockFreq.1 = INTEGER: 1107 NV-CTRL-MIB::nvCtrlGPUCurrentGPUClockFreq.0 = INTEGER: 300 NV-CTRL-MIB::nvCtrlGPUCurrentGPUClockFreq.1 = INTEGER: 300 NV-CTRL-MIB::nvCtrlGPUCurrentMemClockFreq.0 = INTEGER: 100 NV-CTRL-MIB::nvCtrlGPUCurrentMemClockFreq.1 = INTEGER: 100
Note that the module caches queries for 5 seconds by default. If you want to change the timeout, define NV_CTRL_TABLE_CACHE_TIMEOUT as the time in seconds upon compilation.
Troubleshooting
To debug the nvgpu-snmp agent module, start snmpd with:
DISPLAY=:0 sudo snmpd -u snmp -DnvCtrlTable -f
and query the nvCtrl table using snmpwalk.
Author
For suggestions or bug reports, mail me: Peter Colberg <peter@colberg.org>

