Reloaded: Monitor Your AMPS Instances with Prometheus and Grafana

a light bulb with Earth as a light source

This post updates one of our most popular blog articles

We wrote this several years ago, and it remains true: modern data processing systems are complex and often consist of several sub-systems from various vendors where each individual subsystem typically exposes some sort of monitoring interface with its own metrics, format, authentication and access control. In order to keep such complexity under control and be able to monitor whole system state in real-time and in the past, standard monitoring packages have emerged. More than ever, most customers we work with no longer build end-to-end monitoring systems themselves, but instead build custom dashboards using off-the-shelf monitoring software. It makes sense to focus on the metrics that are important to the business and the application rather than the low-level details of creating the framework.

One popular package is Prometheus. Prometheus is an open-source product created to collect and aggregate monitoring data and stats from different systems in a single place. Prometheus conveniently integrates with Grafana, another open source tool that can visualize and present monitoring stats organized in dashboards. In this post, we demonstrate how to integrate the built-in AMPS Admin API with Prometheus, and thus, with Grafana in order to monitor and visualize various AMPS metrics.

To integrate AMPS data into Grafana, we’re going to need to do a few things:

configure AMPS Admin API
create a data exporter that exposes data from AMPS Admin API in a format recognized by Prometheus
configure Prometheus to use the AMPS data exporter
configure Grafana to use Prometheus as a data source

As usual, all of the files used in this article are available on Github.

If you’ve never worked with Prometheus or Grafana before, you can find detailed quick start guides here:

Prometheus: https://prometheus.io/docs/prometheus/latest/getting_started/
Grafana: http://docs.grafana.org/guides/getting_started/

Configure AMPS Admin API

AMPS has a built-in Admin module that provides metrics using a RESTful interface. All it takes to enable the monitoring and statistics interface is to add the Admin section in the AMPS configuration file:

<AMPSConfig>
    ...

    <Admin>
        <InetAddr>8085</InetAddr>
        <FileName>stats.db</FileName>
        <Interval>1s</Interval>
    </Admin>

    ...
</AMPSConfig>

If your configuration already has the Admin API enabled, just take note of the port number used for the Admin API.

Otherwise, you can simply add the Admin section that exposes the Admin API at the specified URL (http://localhost:8085) and also stores stats in a file (stats.db)

Once you’ve prepared the configuration file, start AMPS. The full configuration file for the demo is available on Github for your convenience.

The detailed description of every metric AMPS provides is available in the Monitoring guide here.

Plan Data Collection

AMPS offers a wide variety of metrics, and not all metrics will be useful for every installation. (For example, although there is a lot of information about message queues available, those metrics aren’t useful for applications that use a fan-out messaging pattern rather than using message queues for compettitive consumption).

In the updated sample dashboard, we include basic metrics for:

Host level load
- Memory
- I/O
- Disk usage
- CPU load
Instance level metrics
- Overall incoming messages by processor type
Metrics for topics in the SOW (including views and queues)
- Insert, update and delete counts (numbers since AMPS started)
- Insert, update, query, and delete counts per second (averaged over each sample interval)
Metrics for views
- Number of in-flight updates for each view
Metrics for queues
- Age of oldest message in the queue
- Current queue depth
- Replication-related statistics
Metrics for replication destinations
- Connection state (currently connected or not)
- Transaction log replay point for this destination (seconds_behind)
- Messages sent per second (averaged over each sample interval)
Metrics for client connections
- Bytes in and out per second (averaged over each sample interval)
- Network buffer metrics (for both send and receive buffer)
- Messages buffered in AMPS for this client (oldest message and current count)

These metrics provide a general-purpose minimal dashboard for AMPS. We encourage you to use this as a starting point. Remove any statistics that don’t make sense for your installation, and add any statistics that are important for your application and environment.

Create an AMPS data exporter for Prometheus

In order to add AMPS monitoring stats to Prometheus we will need a custom exporter. Exporters are applications that convert monitoring data into a format recognized by Prometheus. The detailed guide on how to write Exporters is available here. Depending on the language you want to use you might utilize one of the official client libraries available here. In our demo, we will be using Python since Python is simple to use and allows us to focus on the exporter’s logic. As with the configuration file, all the files mentioned in this section are in github.

Now we’ll need to create the exporter application that you can run as a python app.

First, make sure you’ve installed the dependencies for the Prometheus client:

pip install requests prometheus_client

Our exporter will need a custom collector – a special class that collects data from AMPS upon receiving a scrape event from Prometheus:

from prometheus_client.core import GaugeMetricFamily
import requests


class AMPSCollector(object):
  def get_stats(self):
    """
    This method collects stats from AMPS at the moment 
    of the scrape event from Prometheus. It can also 
    handle all the required authentication / custom HTTP 
    headers, if needed.
    """
    return requests.get(
      'http://localhost:8085/amps.json'
    ).json()

  def collect(self):
    # load currents stats from AMPS first
    stats = self.get_stats()

    # update the metrics -- add
    # whichever metrics you need to
    # monitor here.

    yield GaugeMetricFamily(
      'amps_instance_clients',
      'Number of currently connected clients',
      value=len(stats['amps']['instance']['clients'])
    )

    yield GaugeMetricFamily(
      'amps_instance_subscriptions',
      'Number of currently active subscriptions',
      value=len(stats['amps']['instance']['subscriptions'])
    )

    yield GaugeMetricFamily(
      'amps_host_memory_in_use',
      'The amount of memory currently in use.',
      value=stats['amps']['host']['memory']['in_use']
    )

    # The repository has more metrics with more
    # advanced collection -- check it out!

To add an exposed metric, we use a GaugeMetricFamily object. For example in the above sample we expose the metric amps_instance_clients that corresponds with the number of Client objects reported in the Admin API at the /amps/instance/clients path.

Most AMPS metrics can use the gauge metric type since it’s a simple value that can be set at each interval. You can read more about Prometheus metrics types here.

The collector class only has a single required method – collect(). The collect() method is called upon a scrape event. Once called, the method is responsible for populating metrics values which are gathered from AMPS via a simple GET request to the AMPS Admin API. We request data in the JSON format by adding .json at the end of URL since JSON is easily convertible into native Python lists and dictionaries.

Second, we need to register our AMPS collector within the Prometheus client:

from prometheus_client.core import REGISTRY


REGISTRY.register(AMPSCollector())

Finally, we start the HTTP server supplied by the client that will serve the exporter’s data:

from prometheus_client import start_http_server
import time


if __name__ == '__main__':
  # Start up the server to expose the metrics.
  start_http_server(8000)

  # keep the server running
  while True:
    time.sleep(10)

The above code uses a custom collector to properly request data from AMPS and expose it to Prometheus at the moment of a scrape event. Depending on the policies at your site, you might modify the get_stats() method to add authentication / entitlement handling, if needed. More information about securing AMPS Admin API is available here.

Start the exporter application and it will expose an HTTP interface at localhost:8000 for Prometheus to scrape:

python amps-exporter.py

That’s it: our custom exporter is complete!

For more details on the Prometheus Python client, see the manual, available here.

Configure Prometheus to use the AMPS data Exporter

Now we need to configure Prometheus to utilize the new scrape target (that is, the service provided by the exporter) that we just created. To do this, add a new job to the configuration file:

global:
  # Set the scrape interval to every 10 seconds. 
  # Default is every 1 minute.
  scrape_interval:     10s 

scrape_configs:
  - job_name: 'amps_stats'

    # Override the global default 
    # and scrape targets to every 1 seconds. 
    # (should match AMPS > Admin > Interval settings)
    scrape_interval: 1s

    static_configs:
      - targets: ['localhost:8000']
        labels:
          group: 'AMPS'

In the above example, we add the job and also override the scrape_interval value to match the AMPS Admin statistics interval value we set in the first step. Since that’s the interval at which AMPS refreshes statistics, it’s not especially useful for Prometheus to ask for statistics on a more frequent interval (though if the visualization does not need to be as granular as the statistics interval, it could be reasonable to ask for statistics less frequently).

We set the `scrape_interval at the job level since several AMPS instances can be monitored, and each instance might have a different statistics interval.

Once configured, Prometheus can be started with this configuration file:

./prometheus --config.file=prometheus.yml

That’s all it takes to start collecting AMPS statistics into Prometheus!

Configure Grafana to use Prometheus as a data source

Of course, statistics are more useful if there’s a way to visualize them. That’s where Grafana comes in.

Once the data is in Prometheus, adding it to Grafana is straightforward. Navigate to Grafana and add Prometheus as a Data Source. The detailed instructions on how to do this are available here.

The only setting you’ll need to modify for our example is the URL: http://localhost:9090. After the data source is added, building the dashboard is pretty straightforward – you can choose different graphs, thresholds and re-arrange widgets on the page.

In this version of the dashboard, we show results for the mertics discussed above.

Here’s a screenshot of the dashboard:

AMPS Grafana Dashboard with metrics displayed

The dashboard is included in the github repository. Notice that, when you load it, you will need to replace the UID of the datasource in the sample dashboard with the UID of the datasource you created in Grafana – Grafana does not adjust the reference.

To Infinity and Beyond!

In this post, we’ve just scratched the surface of how the AMPS Admin API can be integrated with Prometheus and Grafana. Many additional metrics are available and there are a wide variety of ways those metrics can be visualized. Since Prometheus can collect data from a wide variety of sources, you can also combine data on the AMPS instance with data about other parts of the application, giving you full end-to-end monitoring.

For further reading, here are some more articles about AMPS monitoring:

Have a recipe that isn’t listed here? Know a great trick for monitoring AMPS with Prometheus, or have a cool technique that isn’t mentioned here? What dashboard would you build? What other systems would you monitor together with AMPS?

Let us know in the comments!