Protobuf: Battle of the Syntaxes

fighting robots Google Protocol Buffers, or protobuf for short, is a method for serializing a message using a strict schema. AMPS has supported the Proto2 syntax of protobuf since AMPS 5.0, but up until now has not supported the Proto3 syntax for reasons we will discuss shortly. With the release of support for the Proto3 syntax in AMPS 5.2.1.0, I’d like to highlight the differences between the two syntaxes and the impact those have on AMPS. Let’s start by giving a brief overview of protobuf.

What is a Protobuf?

As stated earlier, Protobuf messages must follow a specific schema. Message schemas are defined in a .proto file. The protoc compiler is then used to generate a language specific implementation of your message schema. Here is an example of a .proto file written using the Proto3 syntax.

syntax = "proto3";
package MyNamespace;

message Message {
  string Foo = 1;
  string Bar = 2;
  int32 ID   = 3;
}

As you can see, this schema defines a message with 3 fields, 2 strings and an integer. Fields may also consist of sub-messages that are defined in the same protofile, or any imported protofiles. If this was a schema for a Proto2 message, each field would need to be tagged with either optional or required, but more on that later. In the Proto3 syntax, everything is optional. If you do not provide a syntax, Proto2 will be used and a warning similar to the following will be displayed in the log file.

No syntax specified for the proto file: example.proto.
Please use 'syntax = "proto2";' or 'syntax = "proto3";'
to specify a syntax version. (Defaulted to proto2 syntax.)

Now, let’s go over some of the highlights of the Proto2 and Proto3 syntaxes.

Proto2 vs. Proto3

The Proto2 and Proto3 syntaxes are quite similar, but have some distinct differences. To begin this comparison, I’m going to highlight some of the features of Proto2 that are not present in Proto3. These features include, but are not limited to:

Required message fields. This can be useful for things like SOW keys.
Ability to set custom default values for a field.
Ability to determine if a missing field was not included, or was assigned the default value. This is needed for AMPS to use delta messaging.
Support for nested groups.

The Proto3 syntax contains the addition of many new features that were not present in the Proto2 syntax, as well as the removal of some existing features. These changes include:

Removal of required fields.
Removal of default values. Primitive fields set to a default value are not serialized.
Addition of JSON encoding instead of binary protobuf encoding.
Extensions have been replaced by the Any type.
Addition of well-known type protos. These include any.proto, timestamp.proto, duration.proto, etc.
Strict UTF-8 checking is enforced.

At the time of writing this, several Proto3 features have been back-ported to Proto2. These features are the following:

Addition of Maps.
Addition of a small set of standard types for time, dynamic data, etc.

A complete list of the changes between Proto2 and Proto3 can be found in the Proto3 Release Notes.

Limitations of Protobuf

Deciding which version of protobuf to use really comes down to your application’s needs. Both versions of Protobuf have limitations in AMPS, but Proto3 has some more serious limitations to consider. Before addressing this, let’s take a look at the restrictions that apply to Proto2 and Proto3.

A protobuf message can be an UnderlyingTopic for a View, but it cannot be the outbound message type.
Subscriptions to AMPS internal topics cannot be a Protobuf message. An example of this would be the /AMPS/ClientStatus topic.
The Protobuf module will not be loaded into AMPS if your host does not have GLIBC version 2.5 or newer.

In short, these restrictions exist because of the strict, fixed definition of a message.

In addition to the above restrictions, Proto3 message types have an additional restriction of being unable to use delta publish or delta subscribe. This is because Proto3 has fixed default values, and does not serialize any field that matches the default. This leaves AMPS in a situation where it is unable to determine if the omitted field was intentionally left out, or if it was supposed to be set to one of the pre-defined default values. For this reason we made the decison to not support delta messages with Proto3. If you send AMPS a delta publish or subscribe command, the command is converted to its non-delta form. In the case of subscriptions, we will log a message similar to the following:

warning: 02-0027 client[client_name] delta_subscribe: message 
type protobuf-message-type does not support delta; request 
converted to a subscribe.

This is important to keep in mind should you decide to convert from Proto2 to Proto3. With the exception of these limitations, Protobuf is a fully supported message type inside of AMPS.

Choosing a Syntax

It is hard to make a general recommendation of which version of protobuf to use, but there are some recommendations. For those of you already using Proto2 successfully, it is recommended that you stay with the Proto2 syntax since there is some API incompatibility between Proto2 and Proto3. Beyond that, I have created a few questions that you must ask yourself before choosing a syntax:

Do you intend on using delta messaging?
Are there any fields that you require in every message, and would like to fail if it is not there?
Do you have a need to assign custom default values for each field?
If a default value is received, custom or otherwise, would you need to know if it was omitted or set to the default?

If you require any of these features, then you’ll need to use Proto2. If your application’s needs do not include these features, then the choice is yours.

Configuring AMPS

In order to configure AMPS to use protobuf, you will need to import a .proto file containing the schema for the messages that will be sent over a transport. This means that you will need to define a MessageType for each .proto file that you intend to use. If your .proto file uses the protobuf method of including another .proto file, then you will only need to specify the main .proto file. That being said, all .proto files must be located at one of the ProtoPath locations. Here we only have one ProtoPath.

<MessageType>
    <Name>my-protobuf-messages</Name>
    <Module>protobuf</Module>
    <ProtoPath>proto-archive;/mnt/shared/protofiles</ProtoPath>
    <ProtoFile>proto-archive/person.proto</ProtoFile>
    <Type>MyNamespace.Message</Type>
</MessageType>

This example, taken from the AMPS User Guide, shows a standard way to define your protobuf message type. AMPS will import the .proto file and identify the syntax of the file automatically. There is no change to the configuration between the Proto2 and Proto3 syntaxes.

Publishing a Message

In order to publish a protobuf message into AMPS, you must first create and serialize the message. This is done using the protobuf generated files. Once you have serialized your data, publish it to AMPS like you would any other message type. Here is an example using python:

# Import the generated file
import person_pb2

# First we need to create an object of the protobuf message
protobuf_record=person_pb2.Message()

# Now we need to populate it
protobuf_record.Foo = "60East"
protobuf_record.Bar = "AMPS"
protobuf_record.ID  = 60

# Serialize the message
serialized_message = protobuf_record.SerializeToString()

# Let's publish it
client.publish("my-protobuf-messages-topic", serialized_message)

Closing Thoughts

Though AMPS supports both Proto2 and Proto3, Proto2 is more feature rich due to restrictions of the message type. Both versions of protobuf have their advantages and disadvantages, and the best choice really does come down to the needs of your application. If you are already successful with Proto2 and do not require a feature added in Proto3, then you should stick with Proto2. If you’re just starting out with Protobuf and the restrictions don’t affect you, then you should consider Proto3. Much of the information provided here and more is available in the Protobuf section of the User Guide.