Lessons learned from Protocol Buffers, part 1: messages, builders and immutability

My port of the Protocol Buffers project has proved pretty interesting. I thought I’d share some of the lessons I’ve learned along the way, as well as some of the frustrations at concepts I still can’t express in C#.

This was originally all going to be in one post, but I’m becoming acutely aware of how long some posts can grow. I don’t know about you, but I find very long blog posts quite intimidating, so I’ve decided to split them up into individual topics. You’ll still probably need to read the posts in order to understand them though – and this introductory post is the most important one in that respect.

Messages and Builders

The Protocol Buffers project (or PB for short) is basically another serialization technology, putting emphasis on efficiency, platform neutrality, and backward/forward compatibility. The normal set of steps in using PB is something like this:

Write a .proto file describing your data in terms of messages.
Run protoc to generate C# (and Java/C++ if you so wish).
In your application, use the builder associated with the message type to create an instance of a message.
Serialize the data to a stream.
At some other point in the application (or a different app) deserialize the data.

The idea is that builders are mutable, while the messages they build are immutable. You can use builders either with Set* methods which return the same builder again, or properties which can be used within object initializers. For example:

// Syntax available in C# 2
Person john = new Person.Builder()
    .SetFirstName(“John”)
    .SetLastName(“Doe”)
    .Build();

// Using an object initializer
Person jane = new Person.Builder
{ FirstName=“Jane”, LastName=“Doe” }
.Build();

Of course, you don’t have to do all the building in one expression, it’s just a handy option in many cases.

As you can see, the builder is generated as a nested type of the message. That’s handy, as it means the builder has access to the private members of the message. To avoid lots of data copying we employ popsicle immutability – the builder directly manipulates the message until it’s built, at which point it makes sure that nothing will change it afterwards. If that makes you uncomfortable in terms of it not being “true” immutability, I sympathise – but I also give String as a counterexample; StringBuilder works in exactly this way, modifying a string directly until it exposes it to the outside world.

Other than the copying – and the fact that all the code exists explicitly, and the caller has to know about the builder – this is quite similar to the suggestion I made about C# immutability a while ago. One point which makes it all simpler is that every data type in Protocol Buffers is itself immutable – so we don’t need to worry about deep copies and the like.

Unfortunately the current implementation doesn’t support collection initializers – if you have a repeated field in your message, you have to call Add* to populate it. The Add* methods return the builder just like the Set* methods, so you can still do it all in one expression, but it’s not terribly neat. Using a collection initializer compiles, but fails at execution time because the properties for repeated fields always return immutable lists. This is by design, to stop callers from creating a builder, fetching the list property, calling Build and then adding to the list. A better solution (and one which I plan to implement soon) is to have a PopsicleList<T> which is initially mutable but which will become immutable at the appropriate time (i.e. when Build() is called). At that point we’ll be able to write:

Person jane = new Person.Builder
    { FirstName=“Jane”, LastName=“Doe”,
      Friends = { “Tom”, “Dick”, “Harry” } }
    .Build();

There’s quite a lot more to messages and builders than this – things like the reflection-like API to query properties of the message based on fields in the the message descriptor – but what I’ve described so far ought to be enough for most of what I want to talk about, most of which relates to generics. In the next part, I’ll talk about self-referential generic types.

10 thoughts on “Lessons learned from Protocol Buffers, part 1: messages, builders and immutability”

Hi man!

I’m new with PB ( Knew it yesterday ) and finally I found a GREAT project for porting this protocol to C#. But I dont know how to run Protoc to generate .cs file ?

What does parameters here ? Could you please give me some advise :)

Thank u a lot :X

P/s : You can send your answer via my email : cnttvn.com@gmail.com. It’ll be more better.

LikeLike

(I’ll copy this by email.)

The project’s home page is now http://code.google.com/p/protobuf-csharp-port/

We no longer generate the code directly from protoc – instead, protoc compiles the .proto file into a binary version, and ProtoGen.exe generates the code.

See http://code.google.com/p/protobuf-csharp-port/wiki/GettingStarted for more information and a tutorial. (I must check it still actually works…)

LikeLike

Darn it. URL got mangled. Try this:

http://code.google.com/p/protobuf-csharp-port/wiki/GettingStarted

LikeLike

Thank you so much!

This morning, I received your email and tried to build my own class. It works perfectly :X

And, I have another question, could you please answer it to me :X ( I think you know it )

I have a list of my class so long, so large ( About 2 million or more objects ).So my problem here is when I load this list into memory ( using PB of course ), and then, I wanna add some object and this new object must not be appear in my list. That means each object in my list is unique!

Do you have any solution for this case ?

Thanks again :)

LikeLike

(Also cc’d)

Once you’ve loaded the list into memory, you should use some sort of Set type (e.g. HashSet in .NET 3.5). You’ll want to specify some way of identifying that an object “the same as” another, unless you want to use the default implementation of Equals/GetHashCode in Protocol Buffers, which compares every field.

At this point it’s not really a Protocol Buffers problem so much as normal .NET collections classes.

LikeLike

Hi,

if the repeated field is of type collection of type IMessage, basically collection other protobuf type, SetValue is throwing SetValue not implemented. any suggestions?

fields[FieldNumber].Accessor.SetValue(message, new List<IMessage>());

LikeLike

jonskeet says:

December 23, 2024 at 10:00 pm

You’re not intended to set a repeated field to a new collection. Mind you, it’s not clear which implementation you’re talking about, given that you’re replying to a post from 16 years ago, and that version hasn’t been supported for ages.

LikeLike

Reply
1. jonskeet says:
  
  December 23, 2024 at 10:03 pm
  
  (Please file an issue on GitHub, if you still want to – but reflection on repeated fields is basically clear/add)
  
  LikeLike
  
  Reply

Hi Jonskeet,

sorry i was not clear on my previous question.

my question was generic. in C#, if the field is a collection of particular type, how do we add items to the collection. example, i have a fielddescriptor to a repeatedfield<specific type>, since i am doing it by reflection(i dont know the specific type statically), i need to use IMessage type. field is readonly and not null in the object, but GetValue is returning null. is my typecast wrong here? what type cast shall I use to get valid value?

var repeatedField = fieldDescriptor.Accessor.GetValue(message) as RepeatedField<Google.Protobuf.IMessage>;

var subMessage = (Google.Protobuf.IMessage)Activator.CreateInstance(fieldDescriptor.MessageType.ClrType);

repeatedField .Add(subMessage )

LikeLike

jonskeet says:

December 24, 2024 at 1:03 am

Your question is not generic – it’s specific to the Protobuf reflection API. That’s not the same as the C# reflection API. For protobuf questions, ask on GitHub. For general reflection questions, ask on Stack Overflow. A 16 year old blog post is not a good place to ask.

LikeLike

Reply

Jon Skeet's coding blog

Lessons learned from Protocol Buffers, part 1: messages, builders and immutability

Messages and Builders

10 thoughts on “Lessons learned from Protocol Buffers, part 1: messages, builders and immutability”

Leave a comment Cancel reply

Messages and Builders

Share this:

Related

10 thoughts on “Lessons learned from Protocol Buffers, part 1: messages, builders and immutability”

Leave a comment Cancel reply