Do you ever wish that you could foresee your next IT disaster? Maybe plan ahead better or be able to get in front of it? Well that’s what we are going to do with our DR forecasting system.
As we all know actually forecasting an entire disaster is not really possible but there are some telltale signs that one is coming. For instance if the weather channel announces that a hurricane or tornado is heading your way would be a glaring observation or if there is a wildfire that is spreading your way that is being reported you may want to take action and quickly. Most times with situations like this you really don’t have time to get ready and if you don’t have a strategy already in place then you better start figuring out a loss mitigation strategy.
Sometimes though you can predict your impending disasters that happen internally and not by a force of nature. When disasters like hardware failures and viruses etc. happen internally and the insurance company is going to foot the bill to help you recover then the issue becomes yours. You may be asking yourself how can I tell if hardware is just going to fail? Hopefully first of all you have a good backup design in place. Well hardware has things called MIBS and OID’s they give all of the information regarding your hardware that can be interpreted by a protocol SNMP. So for instance if your router is running hot and its processor is peaking your protocol can talk to that device and let you know “hey something’s wrong”. Or if at a certain time your network traffic slows down to a halt the router can show traffic loads or there’s actually something wrong with the device itself.
So what I’m going to do build a weather station that can be used to predict failures by talking with MIBs and OIDs. This will then send reports or alerts depending on the severity of your problem. Every software vendor has a six figure version of this software but I’d rather build it for free and customize it as I choose so let’s start with my favorite free OS which is CentOS of course and Install Cacti.
Here is the site I used to install Cacti it has a great step by step setup. We will continue our setup from where that left off.
I now have my Local CentOS server being monitored successfully! So what else can I do?
There has been a lot of confusion on the internet regarding cacti working with VMware 5.1 and above due to VMware no longer supporting MIBs. So this is the place I would like to start since it seems to be the hardest to do from the rest.
First we need to download the vSphere Perl SDK for the version of ESX we are working with for me it will be ESX 5.5.
Once we download it the next step will be to extract in and CD into the directory we put it.
Once that done run the install perl script ./vmware-install.pl
Accept all the defaults by putting a Y after the questions.
OK now that we have that finished we are going to have to go over to the ESX console.
Enable SSH on your ESX host from the console
VMware vicfg-snmp commands have not been supported since 5.0 the new command format is esxcli (ESX command line interface)
From the new cacti server SSH into host and run the following esxcli commands to enable
esxcli system snmp set –enable true
esxcli system snmp set – targets xxx.xxx.xxx.xxx@162 /public
then test it esxcli system snmp test
When finished type in exit to leave the SSH session
Now its time to gather some templates and we will start with our ESX one here
Once extracted go to Import Templates and browse to the .XML file you just extracted
Next Select the file and open it which will bring you back to the original screen and choose import.
You should see a screen that shows all green successful imports
Next we will put the other files from the folders where they belong
# cp scripts/* /usr/share/cacti/scripts (this is the cacti home on CentOS
# cp resource_server/* /usr/share/cacti/resource/script_server
Then run a service httpd restart
Back to the console we are going to choose to add a device
Adjust the values for this server
Alright!
Go to graphs now and check out your handy work.
So we now have setup our monitoring for our virtual environment and the Local Linux server. As you can see here we can also do every server on the network from AS400 to UNIX to switches to routers the possibilities are endless. The templates are all on the website or here is a direct link. At this point we can start to see everything going on in the environment and make educated guesses as to why some things are running slow and what may be next to fail! Hopefully you have a disaster recovery automation in place but if not check out one of my earlier articles here.
Next on our list is to add some automation and to send alerts because its just impracticable to have someone stare at charts all day. We are going to enable these features through plugins that there are plenty of but I just want to keep it hi level for now.
Go to the plugins directory on cacti’s website and look around for some things relevant to your needs and environment. I am going to choose 4 for myself that are relevant auto-discovery (name says it all), mobile (for text alerts), router configs (backs up router configs daily which we need for our crash kit), nectar (sends graphs and pics to emails) and Thold which lets us set thresholds. That is the most important piece for us because we can decide what is acceptable and needs to be alerted.
We are going to copy all of the newly downloaded plugins to our cact home plugins directory as shown below.
Once they are there we can go to plugin management on the left hand side. The page may be blank so just hit go and it will populate the downloads. The blue button next to the name installs the plugin and then a green icon will appear so you can enable it. This is pretty cool because you can put all of the plugins in at once if you choose then enable them as you see fit.
Thold has its own tab which should appear once enabled.
It seems we need anpother plugin to run that plugin… No problem just repeat the steps
Repeat the steps above to download install and enable the plugin
Go to the host status tab and you should see both of the server we enabled already earlier.
Now we need to create a template for thold to monitor and alert us about. Choose from the downloads what suites your need
Click on create to finish
Now in the settings of that service we can predefine our warning settings
And our alert settings. Don’t forget to put the emails of the people you will be alerting 😉
Now go back to threshold templates and click search to find the new templates
From here we can enable them and apply our setting.
When we open the host setting for the actual machine again we can see that thold has become a part of it.
Now with the plugins you have endless opportunities to setup alerting options such as text messages etc that need to send some important messages. As you can see the possibilities are endless and these alerts can be used in conjunction with your other software as well. With the level of detail provided and granularity of cacti all of which are free how can you really go wrong?
With the knowledge and insite you now gained into your environment you can have a better understanding of WHY your devices are misbehaving. If you’re getting memory spikes, the processor is overheating disk space is running out etc you can now be in front of these issues and have an understanding of what is likely to happen soon. This should all be part of a good disaster recovery plan that I hope you have already created.
Next time what to do after a failover or test.