Server Health Checks
As a Technical Consultant I usually visit my customers for checking the health of the servers, troubleshooting and discussing the new implementations scenarios. Listed below are some of the best practices which I follow when troubleshooting or checking the Server healths. Lets divide the server health checks in 2 parts Hardware realted checks and Software Related Checks.
Check CPU Hardware:
1) Open Device Manager underneath the Processors make it sure that no CPU’s have red cross marks (X) or yellow Exclaimation Marks (!). If you find this then please take the vendor support.
2) Check CPU Usage from the Taskmanager ensure that there are no processes consuming excessive CPU. I usually use Process Explorer from Sysintrnals to trouble shoot high CPU spikes.
I had an Issue with one of my customer the applications and the Operating System on the behavious was sluggish and the problem turned out was one of the processors clock was mismatched. There might be lot of problem which makes the OS sluggish.
Memory:
1)Open Task Manager select the Performance tab look at the Physical memory box, and multiply the total memory by 2,If the total available memory is less than this number then the Server is currently utilizing more than 80 percent of the memory.
Hard Drives:
Lot of times I have seen my customers ignoring the disk space and ending up with serious problems. Do check the Disk space and
Validate that each disk has more than 10 percent of free space.
I had a customer who was facing problem with Excahnge Services hanging that was due to disk space.
Network Controllers
Verify the connectivity between the NIC and the Switch is fine.On the back of the server verify you have a green blinking link light on the NIC port. Check the Drivers updated.
Note: Microsoft Product Services Support does not Supports NIC teaming on the Domain Controllers.
http://support.microsoft.com/kb/272294
Event Viewer
Event logs are one of the most important logs which are used for troubleshooting scenarios. Events have 3 categories in the event viewer.
Informational: Noted with a white icon and letter ‘i’. Successful operations are logged as informational.
Warning: Noted with a yellow icon and exclamation point. These usually are looked up as they serve as predictive future failure indicators, such as disk space running low, dhcp ip address lease renewal failures, etc.
Error: Noted with a red circle icon and ‘x’. These are indications that something has failed outright and are a good starting point for troubleshooting.
Note: EventCombmt is a multithreaded tool that you can use to search the event logs of several different computers for specific events, all from one central location.
http://www.microsoft.com/downloads/details.aspx?FamilyId=7AF2E69C-91F3-4E63-8629-B999ADDE0B9E&displaylang=en
Services:
Each server will have specific set of services depending upon the Application Installed (Like Exchange has different set of services to run the application on the Operating System). These services are very crucial to run the application failing of one service of a particular application can make your application unresponsive.
Note: Have you ever noticed an error "At Least One Service Or Driver Failed" when you restart the server. in this case you can go to services and startup type and check the Automatic Services. If the Automatic Services are not started correctly while the server is booting this error pops. You can always troublshoot by arranging them with startup types.
Name Server Resolution (DNS).
One of the important thing you always have to check is your DNS server which is responsible for name resolution on the entire Network. If you dont have this confgured correctly then you have problems with Active Directory and Exchange. The Command DCDIAG /TEST:DNS is used to validate DNS health.
Microsoft Product Support Reports
This tool is used to collect the information of all the Services running on the Server and its heavily used by Microsoft Product Support Services group.
Read here about the utility http://support.microsoft.com/kb/818742
Download and try from here http://www.microsoft.com/downloads/details.aspx?familyid=cebf3c7c-7ca5-408f-88b7-f9c79b7306c0&displaylang=en
http://blogs.technet.com/askperf/archive/2009/05/01/two-minute-drill-the-new-mps-reports.aspx
Note:
Each application type of server needs its own set off health checks. For example web servers, terminal servers, Excahnge Servers and database servers. This is just the baseline for each server. You have to diagnose accordingly.