This is a follow on post to my previous article on getting the NetApp filer disk/volume/aggregate statistics charting using TeamQuest ITSAR (IT Service Analyzer and Reporter). So if are you interested in getting some other statistics on usage and utilization of your Network Appliance filers like the one below, read on.
This script and user table agent definition detail how to get the actual filer utilization such as CPU busy, network kilobytes in and out, and some other useful things for potential alerts. Potential alerts? Yes, some of the statistics that can be gathered using the SNMP agent are things like failed disks, failed power supplies, failed fans, the number of spare disks, and more. Simple peruse the Network Appliance SNMP MIB to see everything that is available to us. The table definition and my script can easily be extended before implementation to include the additional information you may be interested in.
Personally, I really trust the NetApp auto-support ability. Our NetApp filers are extremely capable of alerting us when a disk or anything fails. The filer heads are clustered and extremely redundant so I trust them (just not the devil inside, to quote a movie), so I might as well gather a few stats that I may track and alert on at a future time.
I won’t spend a lot of time covering the setup of SNMP on the filer or the TeamQuest host because that’s already done in the previous blog on the subject. Instead I will jump straight into the files and table setup for these new statistics.
The first step is to download the two additional files needed for the filer system statistics.
- The Network Appliance TeamQuest table definition for System statistics
- The Network Appliance Systats perl script
By now you have all the recommendations on hand and ready to go from my last blog… so save the files above to the same directory. Edit the script to configure the paths, username, password, and community string just like last time. Also make sure that the data directory has write privilege for the user that will be running the TeamQuest UTA which is usually daemon:root. Run it a few times to make sure it is working correctly, but take the time to make sure that the logfiles are writable by the user daemon after you are finished testing.
The script writes two files necessary for calculating the true network statistics. The SNMP statistic delivered is a number in bytes since the system last booted. I don’t think it needs to be stated, but this is not a very flexible statistic to work with for charting. It’s huge! And it gets humongous since the filers never, ever need to restart except for upgrades. I developed the script to use a log file to store the statistic from the last run and do a little math to give us a useful number for ongoing utilization. On execution the script operates like this in regards to the network statistics:
- Gets current network statistic
- Get last network statistic from log file
- Calculates difference
- converts to kilobytes
- saves current statistic (as read from filer) to the logfile
That’s it! It’s pretty easy to setup and run. The most difficult part of the setup was reading through all the many possible options for defining the statistics in the table definitions. I think I saved you a bit of work there – and in fact, some of the praise there goes to TeamQuest themselves. I was having issues with the way some of the statistics were being averaged and I opened a ticket with them. They were very patient with me and we got it resolved. Tickle me happy!
So import the table definition into your test or production database (“$manager/bin/tqtblprb -i -d testdatabase -f NetApp_sysStats.tbl”). And when that is done build your User Table Agent same as before but referencing the second script and the new table (USER:NetAppSysStats).
I may go ahead and setup some alerts on some of the statistics, there is more to be done!