Archive for the ‘perl’ Category

Oracle VM Repository Alarms

Posted: February 11, 2015 in Oracle VM, perl, tech
Tags: , ,

The Oracle VM platform for virtualization has some amazing gaps in functionality– see previous post about the disk size change problem for starters. But you want to know another big hole in functionality? Alarms! I would love to get alerts and alarms on certain conditions sent to my email or phone.

My embarrassing case in point: the other day a few thin-sized disks grew and filled up a repository. The virtual disks then started spewing SCSI errors. This could have been prevented if 1) the the disks were thick instead of thin, but even bettter would be option 2) some alarms about operating conditions such as repository utilization and performance utilization of the hosts.

So I this is a little script I wrote, you might want to sing it note for note, and be happy. I will eventually get some performance statistics put into my Teamquest environment so I can have performance graphs and alarms from that monitoring. But it’s been a while since I defined a User Table database in it…

In the meantime I have a simple repository utilization alarm script. You can download the script, below. It’s not gorgeous code but it gets the job done. You will need to get a few things working for it to work in your environment, namely you will need to get the Oracle VM 3.3 CLI running with SSH keys instead of password logins. The later version of OVM (starting with one of the 3.2 releases, I think) have added an administration port that can be accessed using SSH. I found the write-up here.

I have my cronjob running as root on the Oracle VM Manager host, the same one that is running the CLI port. After generating the SSH keys for root I skipped the ssh-add command the wiki above mentions, and simply ‘cat the public key file appending it to the oracle account authorized_keys file (eg, cat ~/.ssh/*.pub >>~oracle/.ssh/authorized_keys).

Next, as root, ssh to the admin port as the admin user. Answer yes to save the keys– if you have conflicts you should stop and clean it out of your .ssh/known_hosts file. Once you have authenticated it should authorize that newly added key. While you are in the CLI session you should test your functionality and compatibility of your Oracle VM Manager environment by issuing the command “list repository”. Your test should look something like this:

# ssh admin@localhost -p 10000
The authenticity of host '[localhost]:10000 ([127.0.0.1]:10000)' can't be established.
DSA key fingerprint is ya.da.ya.da.ya.da.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[localhost]:10000,[127.0.0.1]:10000' (DSA) to the list of known hosts.
admin@localhost's password: 
OVM> list repository
Command: list repository
Status: Success
Time: 2015-02-11 15:16:43,417 CST
Data: 
 id:0004fb00000300005eexxxxxxx name:Repo1
 id:0004fb0000030000709xxxxxxx name:Repo2
 id:0004fb0000030000455xxxxxxx name:Repo3
OVM>

Go ahead and ‘exit’ after this has completed successfully. Does the output look similar?

Try the SSH connection again and if successful you should not have a password prompt nor a prompt to save the host key. In this second session, please copy the UUID from the earlier test and try the command below and using your UUID after the equal sign:

# ssh admin@localhost -p 10000
OVM> show repository id=
Command: show repository id=0004fb0000030000455xxxxxxx
Status: Success
Time: 2015-02-11 15:22:47,666 CST
Data: 
 File System = 0004fb0000030000455xxxxxxx [fs on NETAPP]
 Manager UUID = 0004fb00000100006edaaaaaaaaaaaaaa
 File System Free (GiB) = 503.32
 File System Total (GiB) = 2037.11
 File System Used (GiB) = 1533.79
 Used % = 75.0
 Refreshed = Yes
 Presented = Yes
 Presented Server 1 = 4c:4c:45:44:00:52:38:10:aa:aa:aa:aa:aa:aa [host1]
 Presented Server 2 = 4c:4c:45:44:00:52:38:10:bb:bb:bb:bb:bb:bb [host2]
 VirtualDisk 1 = blahblah.img [blahblah.img]
 VirtualDisk 2 = blahblhablha.img [blahblahblha.img]
 VirtualCdrom 1 = 0004fb00001500000b984b7f06a25c25.iso [OEL 6.iso]
 Vm 1 = 0004fb0000060000ecccccccccc [virtmach1]
 Vm 5 = 0004fb0000060000ddddddddddd [testme]
 Id = 0004fb0000030000455xxxxxxx [Repo3]
 Name = Repo3
 Locked = false
OVM>

Your output should hopefully look similar to the above info so go ahead and exit out of the CLI. You should have at the minimum the following keywords in four named fields (see underlined words) for my OVMrepomon.pl script to work:

  • File System Free (GiB)
  • File System Total (GiB)
  • File System Used (GiB)
  • Used %

If you have those four fields, let us proceed because your next step is prepare your Perl environment with the modules necessary to retrieve, process, format the data, and send the alarm.

Your Perl executable needs three free modules. You may be able to find some of these in the package repository of your operating system distribution, but that would be more complicated than I want to write for this post. So run your yum commands, or dpkg/apt-get, or whatever… but ultimately you will probably be running CPAN to install a few of these modules.

  • MIME::Lite  — this sends the SMTP alarm message
  • HTML::HashTable — this is an easy way to make an HTML table for the email
  • Net::OpenSSH — this is an easy way to make the SSH connection to the OVMM CLI port

Once you have installed the modules above, download my code and create an executable script from it.

There are seven lines that need to configured for your environment and they are located near the top of the script. They are in one section called “Global Basics” and they control who gets the alarms, the OVM host, and your thresholds for the alarms.

  • # Global Basics to tweak for your installation:
  • my $OVMHOST=’admin@hostname:10000′; # should be in the notation user@host:port for the OVM CLI
  • my $MAILTO=’your best friends email address’; # who receives the alerts?
  • my $MAILFROM=’no-reply’; # who do the alerts come from?
  • my $MAILHOST=”smtpforwarder”; # who can forward the email alerts?
  • my $DEBUG=0;
  • my $PCused=”87″; # percent used threshold
  • my $FSfree=”200″; # gigs free of file system threshold

Run the command a few times, preferable with DEBUG set to 1 or higher and the thresholds set to level to insure a message can be sent. Make sure you are getting the email messages and then turn debug off. Now you are ready to put it in your cron scheduler. Make sure cron is able to execute and deliver a message to your inbox with debug off, and finally adjust the thresholds to what you truly want to be alerted at.

Advertisements

My previous two posts were about getting utilization statistics out of my Network Appliance filers into a Teamquest database for my IT Service Analyzer and Reporter charts. They are working great and I am using them in a production environment. The thing that bothered me about them is they seemed so slow. The volume stats report would take just over a second for four filer heads and the system stats script seemed to take FOREVER. I timed it. It was only five seconds for four filers but the feeling was still FOREVER.

timing old script

ptime of old volstats

timing the old systats

ptime of old systats

I knew what the problem was and I knew I would have to buckle down and learn SNMP even better, and especially learn Perl SNMP modules in order to tune it back my acceptable standards of runtime. That first script was a quick and dirty hack really, and like most hacks it is just functional. All the SNMP requests were running system commands that could be easily run and debugged from a command line. It’s a great way to learn and get something functional at the same time. But it’s like a baby eating from a bottle, it needs to grow up, eat solid food, go to school, and get a job to support itself. Or, in Perl terms, it needs to use pure Perl code to do the work instead of system commands.

So, enter version 2 of both scripts. My new volume stats script literally runs twice as fast as the old script. My new system stats, also quite literally, runs TEN times as fast. Woo hoo! How is that for tuning code and making things better?

timing new volstats

ptime of new volstats

timing new sys stats

ptime of new sys stats

These new versions run no system commands but do all work using the Net-SNMP Perl modules (not to be confused with Net::SNMP Perl modules). The process of learning the SNMP module took several days of trial and error around my other work. The biggest issue with Perl is the confusing amount of Perl modules available to do the same job. Often, a few google searches will reveal which module has the most support and I would choose that one. But in the case of the Perl SNMP modules there is no clear winner. Both have equal number of blogs and confused postings looking for help with the modules.

So I picked one. It was the wrong one initially, of course. I picked Net::SNMP to start with because it can be built using the CPAN shell (eg, ‘perl -MCPAN -e shell’). The other primary SNMP module being used is the one provided by the Net-SNMP command line packages. This can be more of a challenge to build, but more often than not it can just be installed as a package for your system, which is the easy route I chose. I used the OpenCSW package.

The reason I say that the Net::SNMP package was the wrong path is the challenge for an SNMP illiterate to understand SNMP and specialized MIBs. It appeared that you needed to know the confusingly long ID number of the statistic to use this module. I was (and am still) trying to learn about SNMP and could not figure out the proper way to find the statistics I wanted using this module. So I switched to the other package module which allowed me to use names for statistics that I was used to, like “df64AvailKBytes” to find the full and correct amount of Kilobytes available to a filesystem.

So I set off to learn the module. I started small with test scripts to just gather one or a few statistics. This allowed me to make some quick progress and learn how to address the desired statistics as a scalar, array, or hash, and to grow and process multiple statistics in relation to each other.

I ended up using the VarList method within the module. It allows the script to retrieve a bunch of statistics with a single connection. This is much more efficient than the old script which would make up to a dozen SNMP command requests to each filer head to get the desired statistics. This new method gets them once and then let me step through them one row at a time.

View/download my scripts here:

  1. new version 2 netapp volume stats script
  2. new version 2 netapp sys stats script

There is one thing that bothered me and I never figured out when I worked on the volume statistics script (the second one I tackled). When using the command line utilities the entire disk table can be requested using the name ‘dfTable’. This would not work using the Perl SNMP module even though ‘volTable’ and ‘ifTable’ would work. I do not understand the difference, but instead punted and again used the VarList method for named individual statistics with great success. If you know why, please make a comment. I wonder if I could shave a few tenths of a second off using dfTable… 😉

This is a follow on post to my previous article on getting the NetApp filer disk/volume/aggregate statistics charting using TeamQuest ITSAR (IT Service Analyzer and Reporter). So if are you interested in getting some other statistics on usage and utilization of your Network Appliance filers like the one below, read on.

NetApp systats Chart

Utilization statistics in ITSAR

This script and user table agent definition detail how to get the actual filer utilization such as CPU busy, network kilobytes in and out, and some other useful things for potential alerts. Potential alerts? Yes, some of the statistics that can be gathered using the SNMP agent are things like failed disks, failed power supplies, failed fans, the number of spare disks, and more. Simple peruse the Network Appliance SNMP MIB to see everything that is available to us. The table definition and my script can easily be extended before implementation to include the additional information you may be interested in.

Personally, I really trust the NetApp auto-support ability. Our NetApp filers are extremely capable of alerting us when a disk or anything fails. The filer heads are clustered and extremely redundant so I trust them (just not the devil inside, to quote a movie), so I might as well gather a few stats that I may track and alert on at a future time.

I won’t spend a lot of time covering the setup of SNMP on the filer or the TeamQuest host because that’s already done in the previous blog on the subject. Instead I will jump straight into the files and table setup for these new statistics.

The first step is to download the two additional files needed for the filer system statistics.

  1. The Network Appliance TeamQuest table definition for System statistics
  2. The Network Appliance Systats perl script

By now you have all the recommendations on hand and ready to go from my last blog… so save the files above to the same directory. Edit the script to configure the paths, username, password, and community string just like last time. Also make sure that the data directory has write privilege for the user that will be running the TeamQuest UTA which is usually daemon:root. Run it a few times to make sure it is working correctly, but take the time to make sure that the logfiles are writable by the user daemon after you are finished testing.

The script writes two files necessary for calculating the true network statistics. The SNMP statistic delivered is a number in bytes since the system last booted. I don’t think it needs to be stated, but this is not a very flexible statistic to work with for charting. It’s huge! And it gets humongous since the filers never, ever need to restart except for upgrades. I developed the script to use a log file to store the statistic from the last run and do a little math to give us a useful number for ongoing utilization. On execution the script operates like this in regards to the network statistics:

  1. Gets current network statistic
  2. Get last network statistic from log file
  3. Calculates difference
  4. converts to kilobytes
  5. saves current statistic (as read from filer) to the logfile

That’s it! It’s pretty easy to setup and run. The most difficult part of the setup was reading through all the many possible options for defining the statistics in the table definitions. I think I saved you a bit of work there – and in fact, some of the praise there goes to TeamQuest themselves. I was having issues with the way some of the statistics were being averaged and I opened a ticket with them. They were very patient with me and we got it resolved. Tickle me happy!

So import the table definition into your test or production database (“$manager/bin/tqtblprb -i -d testdatabase -f NetApp_sysStats.tbl”). And when that is done build your User Table Agent same as before but referencing the second script and the new table (USER:NetAppSysStats).

I may go ahead and setup some alerts on some of the statistics, there is more to be done!

NetApp all table data

NetApp systats

For as long as I have been using TeamQuest products I have wanted them to provide a solution for my Network Appliance brand filer devices. It was a desire that I could have have written a long time ago but frankly it was a low priority. I had a custom script that would run “df-Ah” on the filer, cut out the columns I wanted and write it to a CSV file that certain people could read in and make an Excel chart with. It was adequate so other higher priority items were worked and this languished for… a really long time. I now, finally, have something like the gauges chart below:

ITSAR chart of NetApp storage utilization

ITSAR chart of my NetApp appliances’ utilization

It finally happened because this last summer we migrated our Solaris web environment into zones. My previous job ran on the old systems and while I could have moved the stupid job I was holding out to force myself to get this written. So after a couple months of running it manually when the user needed the data I bit down and wrote the code necessary to get the capacity data into Teamquest. Basically I leveraged that inherent laziness in me to finally make myself get it done the proper way.

So, this blog documents my efforts to write a real solution to make a beautiful chart in my Teamquest IT Service Analyzer and Reporter where all the wild things go to get charted for management. I wanted more than just the storage utilization metrics we currently provide but that was the most important first step to accomplish and will be covered in this blog. A follow on blog should cover the  CPU, network, failed disk, failed power supply, and other interesting metrics that can be gathered, monitored, charted, and alerted on.

How to duplicate my results in your environment

The first item is get SNMP version 3 working on your filers under a limited user account on the filer. SNMP version 3 is necessary in today’s multi-terabyte world because the fields defined within SNMP version 1 and 2 by the MIB cannot account for the insane amount of “bytes” reported. Yes, it has to report in bytes. So be sure to download the Word Doc available at the NetApp community site and follow through step one. Yes, just the first step is all that is really needed, but don’t forget to set a password for the new user account who is allowed (and only allowed) to use SNMP.

Create a location for your scripts and data files. I like to put my scripts in /opt/teamquest/scripts, with a data directory underneath that. The Teamquest User Table Agent will run as the user ‘daemon’ and group root, so be sure to set appropriate permissions on the directory path and script for read and execution, and the data directory for write permission.

Make sure your system has snmp binaries — the Solaris ones are adequate and will probably be in /usr/sfw/bin if you installed the packages. The OpenCSW binaries are great, too. You will notice I am actually using the OpenCSW binaries at times but I have no good reason too– except that I typically like to have some base OpenCSW packages installed so that I have gtail, allowing me to tail multiple files at the same time.

Download the following files

  1. NetApp MIB for SNMP from NetApp
  2. My Script for NetApp Volume Statistics
  3. TeamQuest table definition for NetApp Volume

Drop the latest NetApp SNMP MIB into the data directory and copy my scripts to the script directory. Use “less” to look into the NetApp MIB and look at some of the options to get in there. There is a lot. I focused on the following values that I will use between this blog (volume statistics) and a future blog on system statistics:  dfTable, productVersion, productModel, productFirmwareVersion, cpuBusyTimePerCent, envFailedFanCount, envFailedPowerSupplyCount, nvramBatteryStatus, diskTotalCount, diskFailedCount, diskSpareCount, misc64NetRcvdBytes, and misc64NetSentBytes. If you see a “64” version of a statistic you will want to use that one to make sure that you are getting the real data figure out of the system.

Test your user and SNMP client with some command line operations before you start editing the script to run in your environment. A command would be like this ‘/opt/csw/bin/snmpget -v3 -n “” -u yourSNMPuser -l authNoPriv -A yourSNMPuserPassword -a Md5 -O qv -c yourcommunity -m /opt/teamquest/scripts/data/netapp.mib your-filername NETWORK-APPLIANCE-MIB::misc64NetRcvdBytes.0’. We will work on this statistic next time but today we are looking at the dfTable statistic for all the stats you want on your storage. So be sure to also test this different SNMP command: ‘ /usr/sfw/bin/snmptable -v3 -n “” -u yourSNMPuser -l authNoPriv -A yourSNMPuserPassword -a Md5 -c yourcommunity -m /opt/teamquest/scripts/data/netapp.mib yourfilername NETWORK-APPLIANCE-MIB::dfTable’ and marvel at the amount of data that comes across your terminal.

If all is successful with your command line tests then you are ready to edit the script and get it configured for your environment. You may be changing the path to the SNMP command and the MIB file, but you will definitely be changing the username, password, and community string. There are several other options to tweak too — do you want to import all volumes or just aggregates? Do you want to ignore snapshots? Test the script several times and make sure it is returning the data the way you want it. You will notice that you have to pass the filer names (comma separated, no spaces) in on the command line. This makes it easy to add and remove filers from your environment without adding or removing User Table Agents from your Teamquest manager, just simply edit the command line options passed to the script. Don’t forget to test with and without the -t=interval options for the TeamQuest format where the interval will match your desired frequency that the agent runs. And don’t worry about the extra options for snapshots or aggregates-only, this can be tweaked at any time to limit the data being absorbed by Teamquest and when you report or alert you can easily filter out what you don’t want.

When you are ready import the third file, the table, into your Teamquest database. You may want to use a test database for a while and then eventually add it to a production database. The command to import the table is “$manager/bin/tqtblprb -i -f NetApp_VolumeStats.tbl” but I heartily recommend you have the command line manual handy and consult it regularly for adding databases, tables, and deleting said items when things go wrong. IT happens.

Adding User Table Agent configuration

Adding User Table Agent configuration

When the table is entered into the database you are ready to add your very own User Table Agent. Connect to the TQ manager of the desired system using your browser. Set the database if you are not using the production database, and then click Collection Agents. On the far right you will see the link “Add Agent”, click that and then “User Table Instance”. Begin to enter the information that makes sense to you such as a name for this agent, the path to the executable, and the interval you want  collection to happen. The class and subclass must match exactly what is in the table file that was imported. It will be “USER” and “NetAppVolumes” unless you changed it. The Program arguments is where you pass in the comma separated list of filer names (no spaces!), a single space and -t=<interval>. Make sure to set that interval to equal what you have entered in below for the actual collection interval. After you save and apply the new settings you simply have to wait until the clock hits the time to match the next collection (every five minute increment of the hour if you are using the 300 second interval like I am).

Be sure to launch TQView and look at the table directly for accurate statistics, play with the filter options, etc. Tip: you can create a test database in ITSAR that only harvests from this test database so that you can test end to end.

table view of data

Using TQview to examine actual data gathered

You will notice that I dropped the actual FlexibleVolume volume type data from my gathering. It may be useful at some point in the future and it can be re-added with a simple edit to the script, but for this first stage all I care about is overall health of the filer and so my ITSAR chart for management is a simple global view of the filer cluster-pair. For this, I use the statistic “FilerCapacity” that the script calculates by summing all of the VolumeType “aggregate” on each filer node. You can see that I have a total of four nodes in my environment (names withheld to protect the innocent).

And that is it for the first stage! On to writing alerts and getting the system stats working.

There’s a fun little geeky comic online that you may have heard of, XKCD.

A while back the author had comic that resonated with me about password security. I’m not buff enough in my math skills to keep up understand the equation but I could follow the principle. The idea is that four (or so) random words is more secure than making an extremely complex password that has numbers and special characters embedded and replacing letters. The challenge with this somewhat standard practice of l33tspeak is it has to be written down. But… since we are people that love stories, four random words in real English will be more memorable because we can make up a story to remember it.

Here’s the famous comic.

Recently, while I was working on one of my other Perl scripts, I was on an online forum and saw a post about how to make a random sentence. That tickled my fancy and I came up with a quick and dirty little CGI script to generate a random sentence suitable for passwords. Unfortunately a lot of places have requirements to still include special characters and numbers but this little script will meet those requirements. The spaces meet special character requirements (most of the time) and a number between 1 and 99 is included.

The results? They are often amusing and poetic. Sometimes they are risque. It just depends on what is in your system’s local dictionary. Download Perl Password Poetry Producer

#!/tools/perl/current/bin/perl
#

print "Content-type: text/html\n\nPassword Poetry Generator\n";

open (INWORDS,"< /usr/dict/words");
@w=;
close INWORDS;
chomp@w;

my $poem;
my $randiddly=int(rand(99));

if ($randiddly%2==1){ $poem= join" ",(map{$w[rand@w]}1),$randiddly,(map{$w[rand@w]}1); }
else { $poem= join" ",(map{$w[rand@w]}1..2),$randiddly; }

if ("$poem" !~/[A-Z]/){ $poem= join" ",$poem,( map{ ucfirst ($w[rand@w])}1);}
else { $poem= join" ",$poem, (map{$w[rand@w]}1) ; }

print "$poem\n

\n

\n";
#some html code has been stripped for wordpress

Some sample passwords:




Expirations happen.

But when those SSL certificates expire before being replaced, well, that’s bad. That’s egg on your face. This little Perl script is to put the egg back in the burrito.

All you have to do is make a directory tree where you save your public certificates (you don’t need the private key). Name them with a .cert extension if you use my code exactly or you can tweak the extension to match, and set up this little Perl script as a weekly cronjob to send you an email warning before they go bust!

You may need to add a few modules to your Perl repository. The modules I am using are Date::Calc, Crypt::OpenSSL::X509, Term::ANSIColor, and MIME::Lite. The Crypt Openssl module was a major pain in the butt to compile on Solaris. I should do a blog about that.

Oh, and the MIME::Lite module seems to require root or trusteduser privilege to run. At least on my Solaris boxes. It works great on Max OSX, but I’m probably a Trusted User on that system, I will be testing Linux before long. So, tweak the locations of the script in my examples below to meet your needs.

Setup the directory –
mkdir /home/billSpreston/mycerts

Copy the certs from your various servers, naming them with .cert extension —

ls mycerts
server1.cert server2.cert server3.cert

Touch a file for the Perl script and make it executable

touch ~/certwatch.pl
chmod +x ~/certwatch.pl

Now edit the file with your favorite editor (vim, or Smultron rocks!) and add this code in the certwatch.pl PDF. (code with HTML tags is very hard to add to a wordpress.com blog).

Be sure to run it a few times to make sure it works the way you want it. Debug or verbose mode is useful in this phase, as is playing with expiration time. You could also create certificate using openssl that expires next week to test, or find an old expired cert as well. And when you are satisfied create a cronjob to run it weekly on your schedule and get pretty HTML reports in your mailbox. Don’t forget to turn off debug or verbose mode unless you just like noise.