Exchange Move Requests | Large Items | And Setting TCP KeepAliveTime To A Large Value

Posted on Posted in exchange online, Exchange Server, mailbox, move, networking

I have seen this situation a number of times. A large mailbox (or mailbox and archive) wont move to the target because the process of checking what the changes are in the mailbox take too long, the network or Exchange Server times out the users move and then reports the mailbox is locked.

The fix for this is counter though to everything else you read online about this. Often you will see to reduce the TCP KeepAliveTime and reboot the server. This is the opposite – increase the value and do not reboot the server. Here is why:

First make sure no bad items in your failed moves – this is not a fix for bad items, this is a fix where things timeout:

Get-MoveRequest -MoveStatus failed | Get-MoveRequestStatistics | fl badite*

View the Move Request Statistics log for one of your failed mailbox moves:

Get-MoveRequestStatistics "<name>" -IncludeReport | fl | Out-File movereport.txt

Search the report that you have saved in the above cmdlet and search for “Error” in the text file. If you get the following then the mailbox is probably too large for a successful move, which means the source server or network has not got the resources. What can happen is the move is progressing and a check happens for changes to the source mailbox – this takes a long time to complete and something times out. When target Exchange tries to connect again, the source has lost the TCP port and so a new move is started, but the mailbox is still locked for the old move. Therefore the move cannot continue.

I have found that by increasing TCP KeepAliveTime (contrary to all the advise online) that this solves the issue. Now I need to be clear here – all I am doing is changing the registry key for this setting and restarting the MRS service on the source Exchange Server. I am NOT restarting Windows, and so I am not changing the KeepAliveTime for the entire network stack. I think MRS checks the registry key to see the KeepAliveTime and sets this to the lock time on the mailbox during the move. If I can lock the mailbox for longer, moves don’t timeout and fail is the theory behind why this happens

The error I get in the MailboxStatistics report (see above for cmdlet) reads:

Message                                : Error: Couldn’t switch the mailbox into Sync Source mode.
                                          This could be because of one of the following reasons:
                                            Another administrator is currently moving the mailbox.
                                            The mailbox is locked.
                                            The Microsoft Exchange Mailbox Replication service (MRS) doesn’t have the correct permissions.
                                            Network errors are preventing MRS from cleanly closing its session with the Mailbox server. If this is the case, MRS may continue to encounter this error for up to 2 hours – this duration is controlled by the TCP KeepAlive settings on the Mailbox server.
                                          Wait for the mailbox to be released before attempting to move this mailbox again. –> An error occurred while saving the changes on the folder “FolderID/”. Error details: Failed, Property: [0x66180003]
                                          InTransitStatus, PropertyErrorCode: AccessDenied, PropertyErrorDescription: .
                                          –> Property: [0x66180003] InTransitStatus, PropertyErrorCode: AccessDenied,
                                          PropertyErrorDescription: .

Of interest in the error is the point that says “MRS may continue to encounter this error for up to 2 hours ”. This time value matches the default TCP KeepAliveTime value. Raising this in the registry and restarting the MRS service (not the server) changes the lock timout, which means that when the long job that is happening on the target finishes (and takes longer than two hours), the source server is still waiting for the connection and does not throw the above error.

Once you have your mailboxes moved, delete the registry value (to put it back to the default of two hours) and avoid rebooting the server when this key is set to a different value. If you started with a different value return to that one instead of deleting the registry value.

The KeepAliveTime setting is found at \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, and its a DWORD value called KeepAliveTime. The value is in milliseconds, so 7200000 is two hours and 86400000 is 24 hours (which is the value I tend to use to resolve this issue). This change is made on the mailbox server and the service restarted on that server (or servers if you have more than one).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.