From Zero to Fault Tolerance Hero with AMPS Replication

xkcd style stick man comic that describes a boss asking a programmer how their AMPS application handles fault tolerance.

In real world systems, networks fail, components need to be replaced, servers need maintenance. Successful enterprise grade applications need to be designed with fault tolerance in mind! AMPS sets you up for success with features designed for robust fault tolerance and high availability.

Key to these features is AMPS replication – ensuring that messages are reliably distributed to more than one server. Planning for a disaster can be hard, but AMPS replication doesn’t have to be daunting.

That’s why we are bringing you a series of detailed blog posts that will take you from fault tolerance newbie to replication pro.

Starting Simple

AMPS supports complex replication topologies containing many AMPS instances, but first, we have to start with the fundamental building blocks. In this first post, we are going to describe the basic configuration settings needed to bring your AMPS deployment into a replicated configuration.

We’ll set up a simple configuration with a single topic replicated between two AMPS instances.

The First Instance

We start by giving the AMPS instance a name as usual. When we’re replicating an instance the name is important, because this is what identifies this instance to other instances of AMPS. It’s important that the name be unique among the set of replicated instances, and it shouldn’t have any weird characters, but it doesn’t have to be especially exciting otherwise. So we’ll use AMPS-Replication-A for this one.

Then we give the instance a Group tag. The Group tag is used by AMPS to represent regions or clusters of instances. As you will see when we define the replication destination, the Group name is the primary way in which amps verifies that it has connected to the correct destination.

Note: The Group tag is verified by the upstream AMPS instance at the time of connection, not when the server first starts up. (And if you don’t set a Group, AMPS uses the Name of the instance as the group – a group of one, if you will.)

<AMPSConfig>

  <Name>AMPS-Replication-A</Name>
  <Group>DataCenter-1</Group>

  ...

Let’s Define Some Transports

Transports allow connections into the AMPS instance. For this sample, we will create two transports.

The first transport is for standard client traffic to this instance. There is nothing special here. Pay attention to the second transport we define. This is the important one for replication.

AMPS uses a dedicated protocol to replicate messages between instances. The amps-replication protocol is a proprietary format that allows AMPS to efficiently compress and multiplex the replicated message streams so you get the most out of your network.

The instance can receive messages from any number of upstream instances on this transport, so we only need one incoming transport, regardless of how many servers will replicate to this instance.

The Name field of a transport is an identifier string used for debugging purposes such as log messages. Name can be anything, but to help make debugging replication easier, we recommend that the Name of the Transport match the value of the Type for the amps-replication transport.

<Transports>

    <!-- Transport for clients: accept any known message
         type over tcp. -->
    <Transport>
      <Name>any-tcp</Name>
      <Type>tcp</Type>
      <InetAddr>9007</InetAddr>
      <Protocol>amps</Protocol>
    </Transport>

    <!-- The amps-replication transport is required. 
         This AMPS instance will receive replication messages
         on this transport. The instance can receive messages from any
         number of upstream AMPS instances on this transport.
         However, regular clients cannot connect
         on this port, since this port uses the replication protocol. -->
    <Transport>
        <Name>amps-replication</Name>
        <Type>amps-replication</Type>
        <InetAddr>localhost:10004</InetAddr>
    </Transport>
  </Transports>

Notice that we’re not saying anything here about what the incoming replication messages contain, where they’re from, or anything like that. That’s not the concern of this instance – in AMPS, the source of the data controls replication, as we’ll see later on.

Topics and Transaction logs

This is a standard pub/sub topic that is configured to be stored in a transaction log. The one requirement for a replicated topic is that it must be stored in a transaction log. AMPS replicates exactly the information that is written to the local transaction log. The topic is not required to be declared anywhere else.

<TransactionLog>
  	<JournalDirectory>./journal-A/</JournalDirectory>
		<Topic>
			<Name>orders</Name>
			<MessageType>json</MessageType>
		</Topic>	
  </TransactionLog>

Look Mom, no SOW

If you’re used to other systems, where you have to predeclare topics, or if you typically use topics in the SOW (which have to be defined so AMPS can handle associating messages into a record), let’s just pause here for a minute.

XKCD style comic showing a surprised programmer

Let me repeat that last point, “the topic is not required to be declared anywhere else!”

If it’s in the transaction log, you can replicate it.

The Replication Block

All of the building blocks are in place for replication! Now, all we need to do is tell AMPS how to replicate the topic. This is where the magic happens.

AMPS replication is push based. Each replication block describes what messages will be replicated and what destination those messages will be pushed to.

The most important configuration is the name (and message type) of the topic we want to replicate. For this example we define a single topic, but you can define any number of topics or even a regular expression. All matching topics will be replicated to the destination node.

<Replication>
      <Destination>
          <Topic>
              <MessageType>json</MessageType>

              <!-- The Name definition specifies the name of the topic
                   or topics to be replicated. The Name option can 
                   be either a specific topic name or a regular expression
                   that matches a set of topic names. -->
              <Name>orders</Name>

          </Topic>

Group

Since we’re sending business data to another instance of AMPS so that it is available there when we need it, it’s important to be sure that this connection reaches the expected downstream instance.

The group needs to match the name of the group that the downstream instance has defined (that is, the top level Group tag in that instance’s configuration). At connection time, AMPS will report an error and not replicate if the group specified here does not match the group of the downstream instance.

<!-- within the Replication Destination -->
          <!-- The group name of the destination instance (or instances).
               The name specified here must match the Group defined 
               for the remote AMPS instance, or AMPS reports
               an error and refuses to connect to the remote instance. -->
          <Group>DataCenter-2</Group>

Sync or Async

The choice between sync or async determines when the server will respond to the message’s source that the message has been persisted.

The choice has no effect on the transfer of messages from the server to the replication destination.

As a good default, 60East recommends starting with sync. There are powerful use cases that take advantage of async replication, but that is for a future blog post.

<SyncType>sync</SyncType>

Destination Transports

Last, we set the network properties for the outgoing connection. The last part of the Replication destination is probably the most straight forward.

We need to define the address and type of the destination. The Type for both the source and the destination should always be amps-replication.

Remember that the amps-replication protocol is a proprietary format that allows AMPS to efficiently compress and multiplex the replicated message streams so you get the most out of your network.

<!-- The Transport definition defines the location 
               to which this AMPS instance will replicate messages.
               The InetAddr points to the hostname and port of the
               downstream replication instance. The Type for a 
               replication instance should always be amps-replication. -->
          <Transport>
              <InetAddr>localhost:10005</InetAddr>
              <Type>amps-replication</Type>
          </Transport>
      </Destination>
  </Replication>

Finish off this config

There are some parts of the configuration that we always include, no matter what the purpose of the instance is. This includes settings such as the admin interface, standard logging, and the closing AMPSConfig tag.

Note: This is a simple example to demonstrate replication. Your production config will likely have different logging settings and many other elements configured.

<Admin>
     <InetAddr>localhost:8085</InetAddr>
  </Admin>

  <Logging>
    <Target>
      <Protocol>file</Protocol>
      <Level>info</Level>
      <FileName>./logs/instance-A-%Y%m%d-%n.log</FileName>
    </Target>
  </Logging>

</AMPSConfig>

Take a Breath

Congratulations! You have set up the first instance in a replicated pair. Remember, most of it is the same as a standard AMPS configuration.

Let’s summarize the big highlights for this configuration:

Define a group along with your instance name
Define the incoming replication transport; we only need one for the entire instance
Define the topic in the Transaction Log
Define the replication destination block; define where we are pushing message to (one for each place we replicate to)
Define the standard logging and admin configuration

The Second Instance

The second instance looks much like the first, with only a few differences.

The Name and Group are defined just like the first instance. Note, the name is different, since these are different instances (and the instance name needs to be unique across all replicated instances). Also notice that the group matches the group we put in the Destination for the first instance – after all, this is the group we’re planning for that Destination to reach!

<AMPSConfig>

  <Name>AMPS-Replication-B</Name>
  <Group>DataCenter-2</Group>

Just like the first instance, a transport for normal clients and a single separate replication transport is defined.

Note: the ports are unique, to allow both AMPS instances to run on a single host. In a production system, with multiple hosts, it’s common to use the same ports for the same purpose in every configuration.

<Transports>
    <Transport>
      <Name>any-tcp</Name>
      <Type>tcp</Type>
      <InetAddr>9008</InetAddr>
      <Protocol>amps</Protocol>
    </Transport>

    <Transport>
        <Name>amps-replication</Name>
        <Type>amps-replication</Type>
        <InetAddr>localhost:10005</InetAddr>
    </Transport>
  </Transports>

The Transaction log is defined exactly the same way as the first instance. Every replicated topic has to have the transaction log defined for that topic on every node. This is the one requirement for topics.

Remember AMPS replicates exactly the information that is written to the local transaction log.

<TransactionLog>
  	<JournalDirectory>./journal-B/</JournalDirectory>
		<Topic>
			<Name>orders</Name>
			<MessageType>json</MessageType>
		</Topic>	
  </TransactionLog>

AMPS replication is single-directional and push based. Messages always flow from a replication destination to a transport configured in the transport section.

The second instance has configured a replication destination that mirrors back to the first instance.

The Topic name in the destination must match the first instance.

<!-- All replication destinations are defined inside
       the Replication block. -->
  <Replication>
      <!-- Each individual replication destination
           requires a Destination block. -->
      <Destination>
          <!-- The replicated topics and their respective
               message types are defined here.
               AMPS allows any number of Topic definitions
               in a Destination. -->
          <Topic>
              <MessageType>json</MessageType>

              <!-- The Name definition specifies the name of the
                   topic or topics to be replicated. The Name option
                   can be either a specific topic name or a
                   regular expression that matches a set of
                   topic names. -->
              <Name>orders</Name>
          </Topic>

We use the group of the first AMPS instance here, since that is where this destination will deliver messages.

Just like the first instance, We use sync acknowledgment to be sure the other instance has the message.

<!-- The group name of the destination instance (or instances).
               The name specified here must match the Group defined for
               the remote AMPS instance, or AMPS reports an error and
               refuses to connect to the remote instance. -->
          <Group>DataCenter-1</Group>

          <!-- Since we're building a configuration to replicate to the
               other instance in a high availability pair,
               we specify sync. We want the other instance to also store
               the message before the message is considered
               safely written. -->
          <SyncType>sync</SyncType>

We set the address and port of the first instance – this tells the second instance to connect to the first instance and deliver messages there.

<!-- The Transport definition defines the location to which
               this AMPS instance will replicate messages.
               The InetAddr points to the hostname and port of the
               downstream instance that will receive messages.
               The Type for a replication instance needs to be
               amps-replication, to match the type on
               the downstream instance. -->
          <Transport>
              <!-- The address, or list of addresses,
                   for the replication destination. -->
              <InetAddr>localhost:10004</InetAddr>
              <Type>amps-replication</Type>
          </Transport>
      </Destination>
  </Replication>

Other configuration for the instance, such as logging, is independent of replication. This configuration does not have to match the replicated instance. As before, we turn on the admin interface and add logging.

<Admin>
     <InetAddr>localhost:8085</InetAddr>
  </Admin>

  <Logging>
    <Target>
      <Protocol>file</Protocol>
      <Level>info</Level>
      <FileName>./logs/instance-B.%Y%m%d-%n.log</FileName>
    </Target>
  </Logging>

</AMPSConfig>

Some Notes

For this example, both instances of AMPS reside on the same physical host. Don’t use this configuration for performance testing!

When running both instances on one machine, the performance characteristics will differ from production, so running both instances on one machine is more useful for testing configuration correctness than testing overall performance.

To get the best performance when running more than one instance of AMPS on the same machine, 60East recommends disabling AMPS NUMA tuning in the AMPS configuration file and relying on the operating system NUMA management. See the AMPS Configuration Guide for details on how to disable NUMA in your configuration file.

Double check all your port numbers!

It’s important to make sure that when running multiple AMPS instances on the same host that there are no conflicting ports. AMPS will emit an error message and will not start properly if it detects that a port is already in use. That’s why these samples use different ports for each instance. If the instances were on different systems, we would likely use the same port each instance for a given purpose.

Conclusion

That’s it! You now have a replicated AMPS configuration. This will get you started with the basics, but there is a lot more to come.

Look for the coming posts in the series, where we build off of these fundamentals to make your AMPS deployment bullet proof in the face of fault tolerance:

Capacity Planning for replication
The AMPS High Availability client
Distributed SOWs and Distributed Queues
Server maintenance advice for replication such as optimizing disk space utilization and administrative actions
Complex replication topologies with multiple regions and strategies network degradation mitigation

XKCD style comic showing the boss suggesting more and more features while the programmer runs away.

For more detailed information check out the AMPS user manual, including the chapter on replication, advice on capacity planning, and as always, support@crankuptheamps.com is available to help with specific needs.