Preprocessing and Enrichment

  Apr 6, 2017   |      David Noor

enrichment preprocessing transformation

medieval writing desk


In AMPS 5.2, we’ve introduced a new set of capabilites for modifying messages as they are published to AMPS: Message Preprocessing and Message Enrichment. Both features are configured in your AMPS configuration file, on the individual SOW Topics where you would like to use them. These new capabilities can streamline applications that need complex message flows, producing higher perfomance and easier administration.

Here is a brief example of configuring Preprocessing and Enrichment on a SOW topic Orders, to both add a new field and validate existing fields:

<SOW>
  <Topic>
    <Name>Orders</Name>
    <MessageType>json</MessageType>
    <Preprocessing>
      <Field>CONCAT(/customer_id,"-",/order_id) as /order_key</Field>
    </Preprocessing>
    <Key>/order_key</Key>
    <Enrichment>
      <Field>IF(/qty &lt; 0, /qty OF PREVIOUS, /qty) as /qty</Field>
      <Field>IF(/price &lt; 0, /price OF PREVIOUS, /price) as /price</Field>
    </Enrichment>
  </Topic>
</SOW>

With this configuration, a message published to AMPS that looks like this:

{"customer_id":"A-111", "order_id":1000, "qty":-1, "price":100}

Will be transformed into the following before it is stored in the AMPS SOW, written to the transaction log, or delivered to subscribers:

{"customer_id":"A-111", "order_id":1000, "order_key":"A-111-1000",
  "qty":null, "price":100}

Note the new Enrichment and Preprocessing elements in this SOW topic definition. If you have used AMPS Views before, the syntax of each of these features may seem familiar: you define one or more Field elements based on AMPS expressions using a SQL-like syntax to define the content of each field.

Unlike View projections, enrichments and preprocessing are evaluated as you publish a message into the SOW. The message received by AMPS is amended or changed based on the preprocessing and enrichment rules defined in your configuration file. The altered message is the one stored in the SOW and sent to subscribers.

Preprocessing versus Enrichment

Preprocessing and Enrichment are very similar but run at different stages of message processing, allowing you to accomplish unique things with each. Here’s an abstract outline of when these steps occur:

here

Preprocessing occurs before we evaluate the message’s SOW key. This means you can use preprocessing to clean or trim the message’s key fields before we use them. In the above example, we compute a brand-new field /customer_key which is then used as the SOW key of the message.

Enrichment runs later in message processing, after the SOW key has been located and we can load the existing message in the topic for that key, if any. If the publish was a delta_publish we merge the message into the existing one as expected, but the previous values for that record are available to enrichment fields via a new OF PREVIOUS syntax.

OF PREVIOUS allows Enrichment rules to use the previous values in a message as part of enriching a new message. In the above example, we use OF PREVIOUS to ensure /qty and /previous are not updated to values < 0, and default them to whatever they were previously. The rule validates that the newly supplied /qty value is not less than 0. If it is, the IF clause evaluates to the previous value of /qty for that record, and the /qty field in the updated message has the value of /qty from the previous record. Enrichments are a powerful tool for implementing data quality rules that cannot be implemented at the publisher, since the publisher may not have access to the entire state of the record.

Uses

Preprocessing and Enrichment are useful for computing new fields and validating or changing existing ones. Here’s a few ideas of how you might be able to use them:

  • Use Preprocessing rules to construct unique SOW keys when publishers do not provide suitable values.
  • Enforce business logic/business rules for important fields at the server with Preprocessing and/or Enrichment.
  • Allow or reject different types of updates to a record based on that
    record’s other fields (for example, a state field on an order might be used to determine whether the price can still be modified.)

More

Additional options exist on Enrichment and Preprocessing for (optionally) removing fields from a message if desired, and for controlling the order of execution of individual Fields. For more on these options and on everything else you can do with Preprocessing and Enrichment, visit the Enrichment Chapter of our User Guide!


Read Next:   60East Launches Media Division with the World’s Most Advanced ASCII Movie Player!