Troubleshooting is time consuming. Sometimes the problem is obscure and hard to research but most of the time the issue becomes apparent early on in the process. One by one this troubleshooting methodology isn’t a huge time commitment, however as more break fix problems creep up it can consume more time than one realizes. I liken it to working a jigsaw puzzle. When you sit down to put the puzzle together you flip all the pieces over and sort them to make the problem easier. What if there was a way to automatically sort the pieces for you so you can just get to work? This “what-if” lead me to develop a health check script many years ago.
Troubleshooting with vRealize Operations
vRealize has many troubleshooting dashboards built in and can visualize metrics to you in thousands of different forms. The stress view, the recommendations view and the overview tab work together to give a fairly complete picture for how the VM is running and displays any performance problems. Clicking through three or more dashboards can take time. vRealize Operations Manager may not even capture a problem and investigating vROps won’t be an effective use of time.
When I would receive the dreaded “Can you look at this VM for me?” stop in or email I knew I was in for 30 minute of finding the problem. Was it a problem inside Windows? Was there anything in vCenter Server recently? Is the VM even configured properly? Does vRealize Operations Manager suggest I add or subtract CPU or Memory?
After countless instances of spending 30+ minutes just finding out where to start I decided to try automating the initial investigation. I did the same four or five steps everytime, why not script it? Before I got started I made a diagram of how I actually approached diagnosing a problem.
Enter PowerShell and PowerCLI
PowerShell is a great tool if you work with Windows machines. In regards to troubleshooting, PowerShell can pull data from the virtual hardware components and the Windows Event Logs, presenting an easy to consume output with full filtering support. The vRealize Operations Manager PowerCLI plugin interacts with the raw metrics in vROps, allowing a script to perform math and transformations on accurate metrics. vCenter PowerCLI can pull event logs, configuration information, and other important metrics to assist in troubleshooting.
Using a combination of PowerShell, PowerCLI, and vROps PowerCLI the troubleshooting flow can be accelerated.
Presentation is Everything
Version 1 of the script presented the data inside the PowerShell window. This is sufficient for quick troubleshooting, however isn’t the easiest format to present to a coworker or manager that wants a record of the issue.
Using Don Jone’s EnhancedHTML2 module for PowerShell allows you to format a full report with CSS formatting.
The scripts are available at github.com/tbgree00/vrops
You need to be running PowerShell as a user that has permissions to read logs on the VMs. Get-WMIObject can’t pass credentials. IF that isn’t possible you can use Get-LogObject, but that is significantly slower.
The v2 EnhancedHTML2 script prompts for special administrative credentials to pull information about free space available in the VM. This is included in case you don’t have permissions with your account.
During the initial credential prompts it asks for a vRealize Operations Authentication source. This is the name you gave the auth source when it was configured. To use a local account it is local.
I would like to have a Linux compatible version of this script as well. If anyone has interest in collaborating with that, please let me know.