This post updates one of our most popular blog articles
We wrote this several years ago, and it remains true: modern data processing systems are complex and often consist of several sub-systems from various vendors where each individual subsystem typically exposes some sort of monitoring interface with its own metrics, format, authentication and access control. In order to keep such complexity under control and be able to monitor whole system state in real-time and in the past, standard monitoring packages have emerged. More than ever, most customers we work with no longer build end-to-end monitoring systems themselves, but instead build custom dashboards using off-the-shelf monitoring software. It makes sense to focus on the metrics that are important to the business and the application rather than the low-level details of creating the framework.
One popular package is Prometheus. Prometheus is an open-source product created to collect and aggregate monitoring data and stats from different systems in a single place. Prometheus conveniently integrates with Grafana, another open source tool that can visualize and present monitoring stats organized in dashboards. In this post, we demonstrate how to integrate the built-in AMPS Admin API with Prometheus, and thus, with Grafana in order to monitor and visualize various AMPS metrics.
To integrate AMPS data into Grafana, we’re going to need to do a few things:
- configure AMPS Admin API
- create a data exporter that exposes data from AMPS Admin API in a format recognized by Prometheus
- configure Prometheus to use the AMPS data exporter
- configure Grafana to use Prometheus as a data source
As usual, all of the files used in this article are available on Github.
If you’ve never worked with Prometheus or Grafana before, you can find detailed quick start guides here:
- Prometheus: https://prometheus.io/docs/prometheus/latest/getting_started/
- Grafana: http://docs.grafana.org/guides/getting_started/
Configure AMPS Admin API
AMPS has a built-in Admin module that provides metrics using a RESTful interface. All it takes
to enable the monitoring and statistics interface is to add the Admin
section in the AMPS configuration file:
<AMPSConfig>
...
<Admin>
<InetAddr>8085</InetAddr>
<FileName>stats.db</FileName>
<Interval>1s</Interval>
</Admin>
...
</AMPSConfig>
If your configuration already has the Admin API enabled, just take note of the port number used for the Admin API.
Otherwise, you can simply add the Admin
section that exposes the Admin API at the specified URL
(http://localhost:8085
) and also stores stats in a file (stats.db
)
Once you’ve prepared the configuration file, start AMPS. The full configuration file for the demo is available on Github for your convenience.
The detailed description of every metric AMPS provides is available in the Monitoring guide here.
Plan Data Collection
AMPS offers a wide variety of metrics, and not all metrics will be useful for every installation. (For example, although there is a lot of information about message queues available, those metrics aren’t useful for applications that use a fan-out messaging pattern rather than using message queues for compettitive consumption).
In the updated sample dashboard, we include basic metrics for:
- Host level load
- Memory
- I/O
- Disk usage
- CPU load
- Instance level metrics
- Overall incoming messages by processor type
- Metrics for topics in the SOW (including views and queues)
- Insert, update and delete counts (numbers since AMPS started)
- Insert, update, query, and delete counts per second (averaged over each sample interval)
- Metrics for views
- Number of in-flight updates for each view
- Metrics for queues
- Age of oldest message in the queue
- Current queue depth
- Replication-related statistics
- Metrics for replication destinations
- Connection state (currently connected or not)
- Transaction log replay point for this destination (
seconds_behind
) - Messages sent per second (averaged over each sample interval)
- Metrics for client connections
- Bytes in and out per second (averaged over each sample interval)
- Network buffer metrics (for both send and receive buffer)
- Messages buffered in AMPS for this client (oldest message and current count)
These metrics provide a general-purpose minimal dashboard for AMPS. We encourage you to use this as a starting point. Remove any statistics that don’t make sense for your installation, and add any statistics that are important for your application and environment.
Create an AMPS data exporter for Prometheus
In order to add AMPS monitoring stats to Prometheus we will need a custom exporter. Exporters are applications that convert monitoring data into a format recognized by Prometheus. The detailed guide on how to write Exporters is available here. Depending on the language you want to use you might utilize one of the official client libraries available here. In our demo, we will be using Python since Python is simple to use and allows us to focus on the exporter’s logic. As with the configuration file, all the files mentioned in this section are in github.
Now we’ll need to create the exporter application that you can run as a python app.
First, make sure you’ve installed the dependencies for the Prometheus client:
pip install requests prometheus_client
Our exporter will need a custom collector – a special class that collects data from AMPS upon receiving a scrape event from Prometheus:
from prometheus_client.core import GaugeMetricFamily
import requests
class AMPSCollector(object):
def get_stats(self):
"""
This method collects stats from AMPS at the moment
of the scrape event from Prometheus. It can also
handle all the required authentication / custom HTTP
headers, if needed.
"""
return requests.get(
'http://localhost:8085/amps.json'
).json()
def collect(self):
# load currents stats from AMPS first
stats = self.get_stats()
# update the metrics -- add
# whichever metrics you need to
# monitor here.
yield GaugeMetricFamily(
'amps_instance_clients',
'Number of currently connected clients',
value=len(stats['amps']['instance']['clients'])
)
yield GaugeMetricFamily(
'amps_instance_subscriptions',
'Number of currently active subscriptions',
value=len(stats['amps']['instance']['subscriptions'])
)
yield GaugeMetricFamily(
'amps_host_memory_in_use',
'The amount of memory currently in use.',
value=stats['amps']['host']['memory']['in_use']
)
# The repository has more metrics with more
# advanced collection -- check it out!
To add an exposed metric, we use a GaugeMetricFamily
object. For example in the above sample we expose the metric
amps_instance_clients
that corresponds with the number of Client
objects reported in the Admin API at the
/amps/instance/clients
path.
Most AMPS metrics can use the gauge
metric type since it’s a simple value that can be set at each interval.
You can read more about Prometheus metrics types here.
The collector class only has a single required method – collect()
. The collect()
method is called upon a scrape event. Once called, the method is
responsible for populating metrics values which are gathered from AMPS via a simple GET
request to the AMPS Admin API. We request data
in the JSON
format by adding .json
at the end of URL since JSON is easily convertible into native Python lists and dictionaries.
Second, we need to register our AMPS collector within the Prometheus client:
from prometheus_client.core import REGISTRY
REGISTRY.register(AMPSCollector())
Finally, we start the HTTP server supplied by the client that will serve the exporter’s data:
from prometheus_client import start_http_server
import time
if __name__ == '__main__':
# Start up the server to expose the metrics.
start_http_server(8000)
# keep the server running
while True:
time.sleep(10)
The above code uses a custom collector to properly request data from AMPS and expose it to Prometheus at the moment of a scrape event.
Depending on the policies at your site, you might modify the get_stats()
method to add authentication / entitlement handling, if needed.
More information about securing AMPS Admin API
is available here.
Start the exporter application and it will expose an HTTP interface at localhost:8000
for Prometheus to scrape:
python amps-exporter.py
That’s it: our custom exporter is complete!
For more details on the Prometheus Python client, see the manual, available here.
Configure Prometheus to use the AMPS data Exporter
Now we need to configure Prometheus to utilize the new scrape target (that is, the service provided
by the exporter) that we just created. To do this, add a new job
to the configuration file:
global:
# Set the scrape interval to every 10 seconds.
# Default is every 1 minute.
scrape_interval: 10s
scrape_configs:
- job_name: 'amps_stats'
# Override the global default
# and scrape targets to every 1 seconds.
# (should match AMPS > Admin > Interval settings)
scrape_interval: 1s
static_configs:
- targets: ['localhost:8000']
labels:
group: 'AMPS'
In the above example, we add the job and also override the scrape_interval
value to match the
AMPS Admin statistics interval value we set in the first step. Since that’s the interval at which
AMPS refreshes statistics, it’s not especially useful for Prometheus to ask for
statistics on a more frequent interval (though if the visualization does not need to
be as granular as the statistics interval, it could be reasonable to ask for
statistics less frequently).
We set the `scrape_interval at the job level since several AMPS instances can be monitored, and each instance might have a different statistics interval.
Once configured, Prometheus can be started with this configuration file:
./prometheus --config.file=prometheus.yml
That’s all it takes to start collecting AMPS statistics into Prometheus!
Configure Grafana to use Prometheus as a data source
Of course, statistics are more useful if there’s a way to visualize them. That’s where Grafana comes in.
Once the data is in Prometheus, adding it to Grafana is straightforward. Navigate to Grafana and add Prometheus as a Data Source. The detailed instructions on how to do this are available here.
The only setting you’ll need to modify for our example is the URL: http://localhost:9090
. After the data source is added, building
the dashboard is pretty straightforward – you can choose different graphs, thresholds and re-arrange widgets on the page.
In this version of the dashboard, we show results for the mertics discussed above.
Here’s a screenshot of the dashboard:
The dashboard is included in the github repository. Notice that, when you load it, you will need to replace the UID of the datasource in the sample dashboard with the UID of the datasource you created in Grafana – Grafana does not adjust the reference.
To Infinity and Beyond!
In this post, we’ve just scratched the surface of how the AMPS Admin API can be integrated with Prometheus and Grafana. Many additional metrics are available and there are a wide variety of ways those metrics can be visualized. Since Prometheus can collect data from a wide variety of sources, you can also combine data on the AMPS instance with data about other parts of the application, giving you full end-to-end monitoring.
For further reading, here are some more articles about AMPS monitoring:
Have a recipe that isn’t listed here? Know a great trick for monitoring AMPS with Prometheus, or have a cool technique that isn’t mentioned here? What dashboard would you build? What other systems would you monitor together with AMPS?
Let us know in the comments!