Updated 5 Feb 2013 to include Redirect-Message cmdlet
Exchange 2013 has a feature called Managed Availability. This feature detects issues with a server and in the event of an issue attempts to fix the component at issue. Fixes range from simple restarts of the component (for example restarting the service) to doing what is called a bugcheck. A bugcheck is forcing the server to “blue screen” and therefore reboot. Bugchecks occur when earlier simple fixes do not work. For example if the service cannot be restarted then service is moved to another node in the DAG or the Exchange 2013 aware load-balancer takes the CAS server out of service. If Managed Availability still cannot fix the server it is bugchecked.
There is one or two obvious issues with this though – the first is when you are upgrading or patching the server and the second is in a lab environment. In both these scenarios you could have servers that are considered not responsive to Managed Availability when its only because a patch or Exchange cumulative update (CU, previously known as Rollup Updates), is being installed.
This blog will discuss how to tell Managed Availability not to cause things such as reboots to happen during updates or in low spec’ed lab environments.
Patching Exchange 2013 Servers
When the patch process starts on Windows or a Cumulative Update for Exchange is installed services are stopped and possibly disabled. Disk I/O might be higher and your underlying disk subsystem might not cope well (though this is more likely to be an issue in a lab environment). The last thing you want is services being restarted, services therefore failing, and therefore Managed Availability considering that the server is dead and needs a reboot – and so in the middle of an update it blue screens.
To place a server into Maintenance Mode before you upgrade it you need to run the following Exchange Management Shell cmdlets
Maintenance Mode on Mailbox or Multi-Role Servers
Set-ServerComponentState $env:COMPUTERNAME -Component HubTransport -State Draining -Requester Maintenance
Redirect-Message -Server $env:COMPUTERNAME -Target
Suspend-ClusterNode $env:COMPUTERNAME
Set-MailboxServer $env:COMPUTERNAME -DatabaseCopyActivationDisabledAndMoveNow $True
Set-MailboxServer $env:COMPUTERNAME -DatabaseCopyAutoActivationPolicy Blocked
Set-ServerComponentState $env:COMPUTERNAME -Component ServerWideOffline -State Inactive -Requester Maintenance
Get-ServerComponentState $env:COMPUTERNAME | Format-Table Component,State -Autosize
Get-MailboxServer $env:COMPUTERNAME | Format-Table DatabaseCopy* -Autosize
Get-ClusterNode $env:COMPUTERNAME | Format-List
Maintenance Mode on CAS Servers
Set-ServerComponentState $env:COMPUTERNAME -Component ServerWideOffline -State Inactive -Requester Maintenance
For mailbox or multi-role servers step 1 should be done independent of other steps. Step 1 places the transport queues into “draining” mode, which means the server processes existing queues but does not accept new connections. Once the queue has drained, which can be checked with Get-Queue, then do steps 3 to 9. (Added 5th Feb 2013**): To speed up draining of the queues it is possible in Exchange 2013 to move the messages to another server using Redirect-Message. The Target in RedirectMessage must be an FQDN and if the Server (i.e. where the queue is sourced) is missing then the local server is used. Only active queues are moved with this command, poison and shadow queues are not moved (End of Update**). Steps 3 to 6 place the DAG node offline and move mailbox databases onto other nodes in the DAG. Steps 7 to 9 confirm these changes with a report to the screen.
Note these cmdlets all use $env:COMPUTERNAME so they run on the local machine that you want to place into Maintenance Mode. You can replace $env:COMPUTERNAME with the actual server you want to effect if you want to run the cmdlets remotely.
CAS only servers only have one step, and that is the same as step 6 in the mailbox/multi-role server process.
Ending Maintenance Mode
On a CAS server, to return to functional mode, run the following:
Set-ServerComponentState $env:COMPUTERNAME -Component ServerWideOffline -State Active -Requester Maintenance
On a mailbox or multi-role server run the following:
Set-ServerComponentState $env:COMPUTERNAME -Component ServerWideOffline -State Active -Requester Maintenance
Resume-ClusterNode $env:COMPUTERNAME
Set-MailboxServer $env:COMPUTERNAME -DatabaseCopyActivationDisabledAndMoveNow $False
Set-MailboxServer $env:COMPUTERNAME -DatabaseCopyAutoActivationPolicy Unrestricted
Set-ServerComponentState $env:COMPUTERNAME -Component HubTransport -State Active -Requester Maintenance
Once mailbox or multi-role server steps are complete you need to move databases that you want back to this server, or start maintenance on another server (as that might move databases to this server for you).
Finally note that going into maintenance mode is not an immediate step. It takes somewhere between 5 and 10 minutes (in my tests) for the Health Service to pick up these changes and implement them. Also note that where you only have one server or one DAG node available, the Health Service will not action maintenance mode as it will reduce availability to a point where service fails – for example if you only have one CAS server then the above command will not stop connections to OWA of Frontend Transport through that one CAS server.
Building Exchange 2013 Lab Environments
All of the information for managing maintenance mode above is valid for lab environments, but its also worth considering the following cmdlet:
Set-ServerComponentState $env:COMPUTERNAME -Component RecoveryActionsEnabled -State Inactive -Requester Sidelined
The above will tell Managed Availability not to do any recovery actions in the event of an issue. Therefore if your lab is (for example) slow because you are overworking the disks, then your Exchange Servers don’t blue screen and add to the load on the disk.
If you see your lab environment is regularly reporting that the server recovered from an unexpected failure then see if the following bugcheck codes are in the Event Viewer. I’ve seen these as being caused due to attempts to force a bugcheck and reboot some of my lab machines whilst I was installing Exchange on other servers on the same disk.
- 0x000000ef (i.e. CRITICAL_PROCESS_DIED)
- 0x00000F4 (i.e. CRITICAL_OBJECT_TERMINATION)
Leave a Reply