Placing Exchange 2013 Into Maintenance Mode

Posted on 7 CommentsPosted in 2008 R2, 2012, 2013, backup, exchange, hotfix, lab, load balancer, loadbalancer, update, upgrade

Updated 5 Feb 2013 to include Redirect-Message cmdlet
Exchange 2013 has a feature called Managed Availability. This feature detects issues with a server and in the event of an issue attempts to fix the component at issue. Fixes range from simple restarts of the component (for example restarting the service) to doing what is called a bugcheck. A bugcheck is forcing the server to “blue screen” and therefore reboot. Bugchecks occur when earlier simple fixes do not work. For example if the service cannot be restarted then service is moved to another node in the DAG or the Exchange 2013 aware load-balancer takes the CAS server out of service. If Managed Availability still cannot fix the server it is bugchecked.
There is one or two obvious issues with this though – the first is when you are upgrading or patching the server and the second is in a lab environment. In both these scenarios you could have servers that are considered not responsive to Managed Availability when its only because a patch or Exchange cumulative update (CU, previously known as Rollup Updates), is being installed.
This blog will discuss how to tell Managed Availability not to cause things such as reboots to happen during updates or in low spec’ed lab environments.

Patching Exchange 2013 Servers

When the patch process starts on Windows or a Cumulative Update for Exchange is installed services are stopped and possibly disabled. Disk I/O might be higher and your underlying disk subsystem might not cope well (though this is more likely to be an issue in a lab environment). The last thing you want is services being restarted, services therefore failing, and therefore Managed Availability considering that the server is dead and needs a reboot – and so in the middle of an update it blue screens.
To place a server into Maintenance Mode before you upgrade it you need to run the following Exchange Management Shell cmdlets

Maintenance Mode on Mailbox or Multi-Role Servers

Set-ServerComponentState $env:COMPUTERNAME -Component HubTransport -State Draining -Requester Maintenance

Redirect-Message -Server $env:COMPUTERNAME -Target

Suspend-ClusterNode $env:COMPUTERNAME

Set-MailboxServer $env:COMPUTERNAME -DatabaseCopyActivationDisabledAndMoveNow $True

Set-MailboxServer $env:COMPUTERNAME -DatabaseCopyAutoActivationPolicy Blocked

Set-ServerComponentState $env:COMPUTERNAME -Component ServerWideOffline -State Inactive -Requester Maintenance

Get-ServerComponentState $env:COMPUTERNAME | Format-Table Component,State -Autosize

Get-MailboxServer $env:COMPUTERNAME | Format-Table DatabaseCopy* -Autosize

Get-ClusterNode $env:COMPUTERNAME | Format-List

Maintenance Mode on CAS Servers

Set-ServerComponentState $env:COMPUTERNAME -Component ServerWideOffline -State Inactive -Requester Maintenance

For mailbox or multi-role servers step 1 should be done independent of other steps. Step 1 places the transport queues into “draining” mode, which means the server processes existing queues but does not accept new connections. Once the queue has drained, which can be checked with Get-Queue, then do steps 3 to 9. (Added 5th Feb 2013**): To speed up draining of the queues it is possible in Exchange 2013 to move the messages to another server using Redirect-Message. The Target in RedirectMessage must be an FQDN and if the Server (i.e. where the queue is sourced) is missing then the local server is used. Only active queues are moved with this command, poison and shadow queues are not moved (End of Update**). Steps 3 to 6 place the DAG node offline and move mailbox databases onto other nodes in the DAG. Steps 7 to 9 confirm these changes with a report to the screen.

Note these cmdlets all use $env:COMPUTERNAME so they run on the local machine that you want to place into Maintenance Mode. You can replace $env:COMPUTERNAME with the actual server you want to effect if you want to run the cmdlets remotely.

CAS only servers only have one step, and that is the same as step 6 in the mailbox/multi-role server process.

Ending Maintenance Mode

On a CAS server, to return to functional mode, run the following:

Set-ServerComponentState $env:COMPUTERNAME -Component ServerWideOffline -State Active -Requester Maintenance

On a mailbox or multi-role server run the following:

Set-ServerComponentState $env:COMPUTERNAME -Component ServerWideOffline -State Active -Requester Maintenance

Resume-ClusterNode $env:COMPUTERNAME

Set-MailboxServer $env:COMPUTERNAME -DatabaseCopyActivationDisabledAndMoveNow $False

Set-MailboxServer $env:COMPUTERNAME -DatabaseCopyAutoActivationPolicy Unrestricted

Set-ServerComponentState $env:COMPUTERNAME -Component HubTransport -State Active -Requester Maintenance

Once mailbox or multi-role server steps are complete you need to move databases that you want back to this server, or start maintenance on another server (as that might move databases to this server for you).

Finally note that going into maintenance mode is not an immediate step. It takes somewhere between 5 and 10 minutes (in my tests) for the Health Service to pick up these changes and implement them. Also note that where you only have one server or one DAG node available, the Health Service will not action maintenance mode as it will reduce availability to a point where service fails – for example if you only have one CAS server then the above command will not stop connections to OWA of Frontend Transport through that one CAS server.

Building Exchange 2013 Lab Environments

All of the information for managing maintenance mode above is valid for lab environments, but its also worth considering the following cmdlet:

Set-ServerComponentState $env:COMPUTERNAME -Component RecoveryActionsEnabled -State Inactive -Requester Sidelined

The above will tell Managed Availability not to do any recovery actions in the event of an issue. Therefore if your lab is (for example) slow because you are overworking the disks, then your Exchange Servers don’t blue screen and add to the load on the disk.

If you see your lab environment is regularly reporting that the server recovered from an unexpected failure then see if the following bugcheck codes are in the Event Viewer. I’ve seen these as being caused due to attempts to force a bugcheck and reboot some of my lab machines whilst I was installing Exchange on other servers on the same disk.

  • 0x000000ef (i.e. CRITICAL_PROCESS_DIED)
  • 0x00000F4 (i.e. CRITICAL_OBJECT_TERMINATION)

Access Is Denied Message After Sysprep–How To Fix

Posted on 1 CommentPosted in 2003, 2007, 2008, 2008 R2, 2012, 64 bit, backup, bios, hyper-v, password, recovery, sysprep, windows, windows 2003, windows 2008, windows 7, windows server, workstation, x64, x86

If before you use Sysprep to prepare a Windows machine for imaging you set the administrators password “User cannot change password” then sysprep will not clear this setting, but will set the “User must change password at next logon” setting. Normally these two settings are mutually exclusive, but in the scenario for sysprep it seems they can both end up being set.

This means you get prompted to reset you password at first logon after sysprep completes and then find you have “Access Denied” as the response. There is seemingly no way around this Catch-22.

That is unless you use the Offline NT Password and Registry Editor. This tool allows password resets when booting the server from a CD or USB key (so physical access to the server is required). As the download for this is an iso file, it can also be used in virtual environments by configuring your virtual machine to boot from the iso you have downloaded.

To allow you to logon to your machine following the above issue, all you need to in the Offline NT Password tool is to blank out the administrators password and unlock the account. These are options 1 and 4 during the password reset stage. Full instructions with screenshots follow:

  1. Boot the server with the issue with the Offline NT Password and Registry Editor iso file:
    image
  2. Choose the correct boot option (or just press Enter for the defaults):
    image
  3. For Vista and earlier select the default of Option 1. For Windows 7 and Windows 2008 and later select Option 2 (to boot into the second partition on the disk). You might need to select a different option if you have more partitions. You need to select the partition that Windows is installed on.
  4. If the disk is marked as Read-Only ensure that the server went through a clean boot and was not shutdown incorrectly. Once the messages indicate a writable partition
    image
  5. Select the presented folder (by pressing Enter again). You can typically just press Enter through most of these stages. You will be asked what you want to do – we want to reset passwords:
    image
  6. Select Option 1 to Edit user data and passwords:
    image
  7. Press Enter to choose the Administrator account:
    image
  8. Type 1 to Clear (blank) user password. You should get back the message “Password cleared!”:
    image
  9. Press Enter again to reselect the Administrator account, and this time select Option 4 to unlock the account (even though this program tells you the account is already unlocked):
    image
  10. Once you see “Unlocked!” you can quit from this program. The process to quit requires you to save your changes. Note that the default setting is not to save changes, so you cannot now use Enter to select the default option.
  11. Enter ! to quit from the password reset program:
    image
  12. Enter q to quit from the script and to ask about saving changes:
    image
  13. Enter y to write back the files that have been changed:
    image
  14. You should have been told “***** EDIT COMPLETE *****”. Press Enter to finish the program scripts:
    image
  15. At this final screen you can remove the CD or unmount the iso image from your virtual machine and press CTRL+ALT+DEL to restart the server. The server should now boot into Windows and auto-logon as it has a blank password.
  16. Change the password and optionally untick the “User cannot change password” setting.

Exchange Log Truncation Failure in a DAG

Posted on 2 CommentsPosted in 2010, backup, domain, exchange, networking, windows 2008, x64

Today I visited a client who had noticed that no log files had ever been removed after any backup within Exchange 2010 SP1. It was fortuitous that they had enough log disk space for about eight months of log generations. The disadvantage was that we were four months into this time period, so it was a ticking clock, and that the nightly incremental backups were taking longer and longer.

They were getting the following error in their backup datacentre:

image

Unable to communicate with the Microsoft Exchange Information Store service to coordinate log truncation for database ‘name’ due to an RPC communication failure. Error 3355379671 Extended Error: 0 and Event ID 2136 for the MSExchangeRepl service in the Application event log.

What the error does not clearly say is that the Microsoft Exchange Replication service (MSExchangeRepl) on the server in the DR site (a passive node in the DAG) needs to communicate via RPC to the Microsoft Exchange Information Store service on the server holding the active node of the database.

In the case of my client, the Exchange team is not the same people as the network team or indeed the firewall team, and these teams are in different countries. In the case of the network for this client, the Replication network for the DAG had been opened to allow RPC traffic, but the MAPI (Client) network had not.

When Exchange in the DR site needed to check which logs it could truncate (a process it performs every 15 minutes), it needs to talk to the Microsoft Exchange Information Store service on the server holding the active copy of the database, and name resolution was returning (as expected) the IP address of the server on the MAPI/Client network. This network blocked RPC between servers and so (as one of the many issues they now attribute to this problem) logs could not be truncated and Event ID 2136 was posted once per database on the passive node in the DR site. The two servers in the primary site could RPC each other, so this log is not repeated in the primary site.

To solve this log growth problem without waiting for a response from the firewall team, we added a record to the hosts file on the passive server to override DNS name resolution, and within 15 minutes 2TB of log files instantly disappeared on all servers. Name resolution was reverted to DNS and the firewall team contacted.

Scheduling Backup on Microsoft Hyper-V Server

Posted on 2 CommentsPosted in 2008, 2008 R2, backup, hyper-v

To do a backup of the virtual machines installed on your Hyper-V Server (2008 or 2008 R2 editions) you need to complete the following steps.

  1. Install the backup feature by typing start /w ocsetup WindowsServerBackup from the command prompt.
  2. Get a list of the drives on which Hyper-V Server has stored virtual machines. This will be C: unless you have made changes.
  3. Determine the times you want to run the backup at.
  4. Determine the drive letter of the removable disk by typing at the command prompt each of the following commands
    1. diskpart
    2. list volume
    3. The disk drive letter will be displayed for the disk that matches the size of your removable disk.
    4. Type exit to exit diskpart.
  5. From the command prompt type wbadmin enable backup -addtarget:x: -schedule:hh:mm,h2:m2 -include:y:,z: -systemState -allCritical to backup to drive X: the contents of drives Y: and Z:, the system state and all drives critical to the running of the server.
  6. Confirm you want to schedule the backup at times HH:MM and H2:M2 (for twice a day). If you want one backup a day use HH:MM and if you want more than two just comma separate a group of times. Enter times as per local timezone. Check the current time on the Hyper-V Server by typing time from the command prompt.
  7. Start a backup now if you want by typing wbadmin start backup and confirming to use the same settings as the scheduled backup.
  8. Backup will proceed in the console. If you log out backup will remain running.
  9. Enter wbadmin enable backup to see the settings you have enabled.
  10. Type wbadmin get versions to see what backups have completed.

Hyper-V and VSS Backups Cause Bluescreen

Posted on Leave a commentPosted in 2008, backup, server core

I found the other week that my Hyper-V server, running Server Core and nothing else was restarting all of its own accord. As this is just a server at home, and the monitor is switched off 99% of the time I had not noticed it blue screening.

So looking in the event log (remotely of course, as it was running Server Core) to see why, I noticed it had done the same thing every day at a few minutes past 1pm – one of my scheduled backup times during the day.

I was getting Event ID 1001 at about 1:03pm each day. So I changed the time of the backup (using Windows Server Backup, command line) to 11pm and I got 1001 bugchecks at 11:03pm each day.

There was nothing else recorded in the event log, apart from the usual system start/TCP-IP etc messages and no clue as to the reason for the failure. All I had was the BugCheck, an example being 0x0000007e (0xffffffffc0000047, 0xfffff80003676b48, 0xfffffa60019ff5c0, 0xfffffa60019ff660.

A bit of research later, and ignoring most of the posts regards VSS and Hyper-V I came across http://support.microsoft.com/kb/958662/en-us and http://support.microsoft.com/kb/960038/en-us (the latter of these is a hotfix) which I applied and solved the problem.

It would seem that Hyper-V and VSS based backups have an issue with some backups if a virtual machine is in a running state. It is possible to save the Hyper-V guest machine and then back it up without issue, but of course this kicks people of the virtual machine – a bit pointless really unless its a development machine. To turn off backup for a Hyper-V machine, so that the server does not bluescreen then either disable the Backup (volume snapshot) option in the guest machine settings, under Integration Services or install the hotfix and reboot once.

SBS and WEBS 2008 Backup Fails to Backup Exchange Server

Posted on Leave a commentPosted in backup, ebs 2008, exchange, sbs 2008, windows

The following errors are reported in the Event Log Windows Logs/Application when you run the built-in backup that is part of Small Business Server 2008 (SBS) or Windows Essential Business Server 2008 (WEBS):

Event ID 565 – Consistency check for component StorageGroup-GUID\’Microsoft Exchange Server\Microsoft Information Store\SERVER’ failed. Application ‘Exchange’ will not be avaliable in the backup done at time ‘date time’

The Event Viewer log at Application and Services Logs/Microsoft/Windows/Backup/Operational shows that everything completed fine but the Windows Server Backup administrative tool says backup completed with warnings. Double-clicking the backup record shows:

Application will not be available for recovery from this backup. Consistency
check failed for component Microsoft Exchange Server\Microsoft Information
Store\Server-Name\Store-GUID

This seems to be related to having enabled Local Continous Replication (LCR) on the Exchange mailbox database. This is unfortunate as LCR is such a useful tool in recovery for Exchange Servers that I would want to enable it as a matter of course, and spec my SBS servers to have enough disk space to store LCR copies. Note that the actual Exchange databases and log files are backed up as part of the volume backup, just not as part of the application aware backup and that might result in invalid restores as the volume level backup is not Exchange aware.

Please Microsoft, will you make the VSS backup for Exchange 2007 that is included in SBS and WEBS LCR aware. Thanks.