Failover

Overview

Automate permits failover capability in servers, systems or networks requiring continues accessibility and a high degree of reliability. Failover provides agents the capability to automatically switch over to a redundant standby server upon failure or abnormal termination of a previously active server component. Failover happens without human intervention. The second server will immediately take over the work of the first as soon as it detects a loss of communication.  

Related Topics

Requirements

Automate failover relies on the following requirements in order to work properly:

  • 2 Server Licenses

  • 2 Agent Licenses (1 Agent License Per Server)  

  • 1 Datastore used as the Database Server for both the primary and secondary Automate components.

Although not a requirement, it is recommended that Automate development tools, remote agents, and the external database are installed on a computer separate from the ones where the primary and secondary Automate Execution Server components are installed.

Concept

Failover functionality is primarily agent-based in that remote agents are responsible for all connections to the Automate Execution Server component and are able to connect to a secondary or tertiary execution server if connection to the primary one is suddenly lost. This is performed independent of the Automate Execution Server. Agents can be configured for up to three server host names (or IP addresses) to allow a hierarchal order of primary, secondary and tertiary servers. The different host names are stored in the registry under a single key value separated by semi-colons. The agent will always attempt to connect to the primary hostname first. If that is not available, the secondary, then tertiary server will be attempted. This process will cycle until the agent successfully connects to an execution server.  

If an agent suddenly loses connection to the primary execution server, it waits 15 seconds and attempts to connect to the secondary and then tertiary execution server as previously described.

If agent connection to a secondary or tertiary execution server is successful, a timer for thirty seconds is set. When the timer expires, a separate thread is started to test connection with the primary execution server. If successful, connection to the secondary or tertiary execution server is closed and the agent reconnects to the primary. If connection to the primary execution server is unsuccessful, the agent will remain connected to the secondary or tertiary execution server, and connection to the primary execution server is re-tried every 30 seconds until successful.

Instructions

  1. On each computer where a Automate agent is installed, shut down the agent service.
  2. Create a backup of the registry.
  3. After backing up the registry, go to the following registry location: HKEY_LOCAL_MACHINE\SOFTWARE\AutoMate\Automate 2024 Agent\TaskService\Agent
  4. Right-click the Host string, and then select Modify.
  5. Enter the primary, secondary and tertiary execution server names/IP addresses and their communication port. Each server name and port should be separated by a colon. Each server/port combination should be separated with a semi colon. For example, if specifying the server name (as opposed to the IP Address), enter the information in the following manner:

  6. Exit the registry for the current Agent computer.
  7. Restart the Agent service.

Repeat steps 2-76 for all other agent computers. The value data entered in the Host registry string must be identical on all agent computers to ensure that failover works properly.

WARNING: Improper registry editing can make your system malfunction or even keep you from starting Windows. Proceed with caution.