Managing Large Topics in the SOW

elegant gear pattern One of the most popular features of AMPS is the State-of-the-World (or SOW), which allows applications to quickly retrieve the most current version of a message. Many applications use the SOW as a high-performance streaming database, quickly retrieving the current results of the query and then automatically receiving updates as changes occur to the data.

For maximum performance, 60East recommends making sure that all topics in the SOW (including views and conflated topics) fit into memory.

Some applications, though, require larger data sets. We’ve seen applications in production with data sets several times as large as physical memory. We’ve successfully tested SOW topics as large as 8TB on a system with 128GB of memory (a topic that is more than 60 times as large as physical memory).

AMPS version 5.2.2 introduces a new algorithm for State-of-the-World memory utilization that can greatly improve the memory efficiency and decrease the memory pressure of large SOW topics on hosts. This new algorithm applies to persistent SOW topics covered by the Transaction Log.

File-backed SOW topics that are not covered by a Transaction Log are memory mapped such that as a SOW topic grows beyond the available physical memory of the host environment, the OS will “page” the SOW topic data from memory out to the storage device and from the storage device into memory on-demand.

SOW topics covered by a Transaction Log are mapped to memory as “private” mappings which counts any modified topic memory as “anonymous dirty” data that counts towards AMPS’ Out-of-memory score (aka “OOM score”) and puts the system at increased risk of swapping.

Prior to AMPS version 5.2.2, Transaction Log covered SOW topics were left in the dirty state once modified, limiting the growth of a SOW topic to the amount of memory available to the AMPS process (typically the amount of physical memory on the host plus configured swap memory).

Starting with version 5.2.2, Transaction Log covered SOW topics are “cleansed” periodically as AMPS flushes the pages to disk manually. This “anonymous clean” data does not count towards AMPS’ OOM score and thus allows AMPS SOW topics to far exceed the memory available on the host.

Considerations for Large SOW Topics

Having SOW topics that far exceed the host memory works great for many use cases, but you must be careful, since there are many things that need to be considered:

Decreased Publish Performance: When publishing to a topic that isn’t currently resident in memory, there could be significant VM “page-outs” or “page-ins” that impede publishing performance. Capacity planning up to the host memory limit could show acceptable performance that quickly degrades once the SOW topic exceeds the size of the physical memory of the host environment.

Decreased Query Performance: When querying data where the query result contains records that are not resident, the OS pages those records into memory before the query result can be constructed and sent to the querying client. Additionally, AMPS auto-indexes fields not previously queried, so executing a query with a new field XPath in a content filter can force a re-indexing of a SOW topic, creating extensive “page-in” activity while the SOW data is brought into resident memory to be parsed and indexed.

Increased Recovery Times: When AMPS starts, it checks the CRC of every record within a SOW topic to guard against corruption. To do the CRC check, AMPS needs to bring the SOW topic into memory. If you have a 1TB SOW topic stored on a fast storage device that can read 1GB/s, it could take 17 minutes (or more) for AMPS to recover the SOW topic.

Tuning Tips

If you’re contemplating running large SOW topics that exceed the memory of your host environment, here are some tips on how to tune your host environment and configurations to get the optimal performance out of your large SOW use.

Hardware

Use Fastest Available Storage: We recommend you place the SOW topic on the fastest storage available in your environment to minimize the paging costs and recovery times. It’s expected to have increased VM paging activity when in these configurations, so the faster the storage device where these large topics are persisted, the better the performance will be. See Is Your App Solid? (Does it need SSD?) for a discussion on storage options for AMPS applications.

Operating System

Apply Recommended OS Tunings: There are OS tuning parameters that can benefit large SOW use cases, which we list here. Engage your system administration teams to set these when appropriate.

Turning off Transparent Huge Pages: Transparent Huge Pages can carry a huge cost for memory management, we recommend disabling it or setting it to “madvise”.

$ echo never > /sys/kernel/mm/transparent_hugepage/enabled

Reduce Swappiness: This setting controls the preference of dropping non-dirty pages over swapping. With large SOW files, we always want to prefer dropping rarely touched, non-dirty pages over the system swapping. Therefore, we set this value to the lowest setting without disabling it entirely.

$ sudo sysctl -w vm.swappiness=1

Increase Minimum Free Bytes: Interrupt handlers and other critical OS components require memory always being available. In large, enterprise host environments the default values controlling the minimum free for these critical operations is often not large enough. We recommend setting this value to 1% of the available memory rounded up to the nearest 1GB.

$ sudo sysctl -w vm.min_free_kbytes=2097152

Disable Zone Memory Reclaim: AMPS is already NUMA aware for it’s low-latency components and doesn’t benefit from the built-in NUMA optimizations within the Linux OS for large SOWs. We recommend turning off the Zone Memory Reclaim feature.

$ sudo sysctl -w vm.zone_reclaim_mode=0

Increase Maximum Memory Mappings: One of the thresholds set in the Linux OS configuration that can cause requests for more memory to fail prematurely is the max_map_count. When set too low, AMPS can fail to increase the size of a SOW topic. Increasing this value allows us to grow further.

$ sudo sysctl -w vm.max_map_count=500000

AMPS Configuration

Use a Large SlabSize: Using a large SlabSize for your SOW topic can minimize the overhead of memory map maintenance and rate of SOW topic file extensions when growing. We recommend using 1GB slab sizes for any of your large topics.

Use a Durability of persistent: For large topics, you want to use topics with persistent durability (the default). Using transient durability topics that are large will increase swap usage and be limited in size to the sum of the physical and swap memory on the host.

Monitoring

When AMPS version 5.2.2 or greater is running on a Linux OS with kernel version 3.14 or greater, it is recommended you monitor how much memory the host has available before swapping. This metric is exposed in the AMPS HTTP Admin in the /amps/host/memory/available path.

Testing

When testing the performance of large SOWs on runtime and recovery performance, it’s easy to be fooled by effects of the page-cache that make test results inconsistent between consecutive runs. When testing for recovery performance, we advise that you purge the page caches between AMPS lifetimes to achieve worst-case testing with your data, AMPS version, and host environment. Dropping the Linux page caches require root privileges and can be done as follows:

$ echo 1 > /proc/sys/vm/drop_caches

Go Big

The AMPS State-of-the-World is suitable for large caching and computation modeling applications, for example market data caches or XVA modeling. In this post, you’ve seen the best practices and tuning recommendations 60East has developed over the course of working with many large data applications.

For any system, we highly recommend capacity planning to ensure that the resource needs of the system are well understood, and we recommend spending time carefully testing the throughput and capacity of the system. As always, if you have questions about how best to design your application or configure AMPS, let us know at support@crankuptheamps.com.