close up of rusty hammerOne of the most enjoyable parts of AMPS is how easy it is to create a client, connect to an AMPS instance, and start building an application. In just a few minutes, you can have applications communicating through AMPS and start working out your application’s message flow. (In fact, we’ve demonstrated live-coding a basic chat application from scratch in under 30 minutes!)

If you’re responsible for keeping a common development instance of AMPS running, that joy can sometimes turn to frustration as misbehaving applications can end up consuming resources on the instance. In extreme cases, misbehaving applications can consume enough resources that other applications are affected.

We’ve seen instances where a misconfigured application generated millions of failed connections in less than two hours. While AMPS did a reasonably good job of managing the flood, the overhead involved in managing those failed connections consumed bandwidth and CPU time that would have been better spent doing real work for the instance.

There’s a simple solution, though.

fail2ban: Better than a Wizard

Managing misbehaving connections isn’t a problem unique to AMPS. Web servers, SMTP servers, ssh servers, databases – any software that accepts connections over a network needs to handle this problem. Ideally, with as little CPU overhead and network overhead as possible.

In fact, most modern Linux distributions already provide a service that does this protection: fail2ban. Commonly used to protect critical services like SSH access, Apache, ngnix and so on, fail2ban has exactly the features we need to protect an AMPS instance.

The idea behind fail2ban is simple. The fail2ban service monitors logs for entries that indicate a problem from a specific remote IP address. If the number of problem entries exceeds a configured threshold, fail2ban updates the firewall rules for the system to block access from the problematic system to the affected port for a period of time.

With some simple configuration, you can easily configure fail2ban to protect AMPS instances.

There are a few pieces of the recipe that we need to use fail2ban.

  • Setting policy First, we need to decide what we consider to be behavior that we want to protect AMPS from.
  • Logging events Then, we need to log the information that we’ll use to identify when connections are misbehaving.
  • Identifying problems Next, we need to let fail2ban know what events should be considered bad behavior.
  • Configuring protection Finally, we’ll configure how fail2ban protects AMPS from bad behavior.

Setting policy

For the purposes of this blog, we’ll set a policy that protects AMPS from two common problems: repeated connections from clients that have the same name (“name in use” collisons) and repeated connections from clients that do not successfully authenticate to AMPS.

When we see a client misbehaving in either of these ways, we want fail2ban to block connections from that client for two minutes. If a client has more than 20 failures due to entitlement or name in use in 10 seconds, we’ll trigger the ban.

Logging Events

Now that we have a policy to enforce, we need to capture the events for fail2ban.

To enforce the policy we just set, we need to know when a client disconnects and why that client disconnected. For fail2ban to successfully ban the client, we’ll also need information on the IP address that the client is connecting from.

Recent patch versions of AMPS 5.2 (that is 5.2.1.37, 5.2.0.87 and subsequent versions) contain all of the information we need for this monitoring in the 07-0013 log message.

To capture this information in a place that’s easy for fail2ban to monitor, we add a distinct logging target to AMPS configuration.

<Target>
   <Protocol>file</Protocol>
   <IncludeErrors>07-0013,00-0015</IncludeErrors>
   <FileName>/var/log/amps/ban.log</FileName>
   <RotationThreshold>1MB</RotationThreshold>
 </Target>

Notice that there’s no logging level specified in this Target. The ban.log file will only contain the specific error that we need fail2ban to monitor, 07-0013, and 00-0015, which ensures that AMPS will write at least one event to the logging file, even if there are no disconnects happening. Also notice that the file has a consistent filename, so fail2ban can easily find it, and that AMPS will write 1MB of logs and then roll the log over onto the same file name: AMPS will keep the file size at approximately 1MB.

This file captures disconnect events from the server. For example, the following set of events shows three clients disconnecting from the server.

2017-12-07T16:55:49.9395590-08:00 [26] info: 07-0013 client[queue-reader-dirkm](192.168.0.7:50336) disconnected: connection closed.
2017-12-07T16:56:12.2582980-08:00 [26] info: 07-0013 client[daily-report](192.168.1.1:50344) disconnected: connection closed.
2017-12-07T16:59:57.2596670-08:00 [27] info: 07-0013 client[fast-message-loader-3218](192.168.2.0:50342) disconnected: connection closed.

The states that we’re looking for are the auth state (indicating that a client isn’t authorized to log on) or the name in use state (indicating that a client was disconnected due to the name being in use).

Identifying Problems

Once AMPS is recording messages in a format that fail2ban can consume, the rest of the configuration is easy.

The log message is built to be relatively easy to scan for, so the filter to fail2ban is simple. Create this amps.conf file in the /etc/fail2ban/filter.d directory (or the equivalent directory for your Linux distribution):

# Fail2ban filter for AMPS (action-based file format)

  [INCLUDES]

  before = common.conf

  [Definition]

  # Notice that you could also update the regex to
  # track specific reasons for disconnection (for
  # example, you could specify that fail2ban only
  # tracks name in use or entitlement failures).

  failregex = info: 07-0013.*\(<HOST>:[\d]+\) disconnected: (auth|name in use)

The regular expression here indicates that a 07-0013 message with a disconnection reason of auth or name in use is an event that should be monitored. Further, the IP address for the event is the first part of the host:port string that is in parentheses just before the word disconnected.

Configuring Protection

Now that we have a rule that tells fail2ban how to identify problem disconnects, all that’s left is to configure fail2ban to protect the host from the problem.

We add an amps rule to /etc/fail2ban/jail.local (or the equivalent file in your distribution). That file defines rules that are customized for the local system, rather than provided by default with the distribution.

[amps]
  enabled = true
  port = 9007
  logpath = /var/log/amps/ban.log
  maxretry = 20
  bantime = 120
  findtime = 10

The name at the top of the section, [amps], matches the name of the rule file we created for AMPS in the filter.d directory.

The maxretry setting specifies the number of failures that a given host is allowed to have during the findtime period. The bantime specifies how long to block connections.

The settings here say that if an IP address has logged 20 or more disconnect messages in the last 10 seconds, ban that IP address for 120 seconds. Naturally, you can tune these settings to best fit your installation. The ban applies to the specified ports – if your AMPS instance uses different (or additional) ports, replace the port parameter in the configuration.

Easy As A Banhammer

That’s all there is to it. Restart fail2ban, and misbehaving clients will no longer be able to disrupt your AMPS instance.