[ANN] ProtoBuf.jl 1.0.0

ProtoBuf.jl 1.0.0

At RelationalAI, we created a new Julia package for working with the Protocol Buffers format and I’m happy to announce that this complete rewrite will soon be registered as a (very) breaking version of the current ProtoBuf.jl, that is 1.0.0. This new release brings following benefits:

  • The generated Julia structs and codec methods often result in noticeably faster serialization with lower memory overhead.*
  • We dropped protoc_jll dependency, which currently carries a large (100+MB) binary. Time to first protojl (our variant of protoc) is also much lower.
  • Enumerations are now generated using the EnumX.jl package, meaning they are a proper subtypes of Base.Enum.

* For example, see the following benchmark from issue #179 β€œDeserialization is extremely slow for messages with many small sub-messages.”

Here’s the pre-1.0 ProtoBuf.jl:

julia> BenchmarkTools.@benchmark(read_example_proto()) |> display
BenchmarkTools.Trial: 71 samples with 1 evaluation.
 Range (min … max):  66.590 ms … 76.673 ms  β”Š GC (min … max): 0.00% … 10.80%
 Time  (median):     71.134 ms              β”Š GC (median):    5.37%
 Time  (mean Β± Οƒ):   70.467 ms Β±  2.704 ms  β”Š GC (mean Β± Οƒ):  4.42% Β±  3.48%

  β–ˆ ▁▆ ▁▁                      β–†   ▁  ▁  β–ƒ  β–ƒ
  β–ˆβ–„β–ˆβ–ˆβ–‡β–ˆβ–ˆβ–„β–„β–β–β–β–„β–β–β–β–β–β–β–β–β–β–β–β–„β–„β–„β–‡β–‡β–ˆβ–„β–„β–„β–ˆβ–‡β–β–ˆβ–‡β–„β–ˆβ–‡β–‡β–ˆβ–„β–‡β–‡β–β–β–β–β–β–β–β–β–β–β–β–β–„ ▁
  66.6 ms         Histogram: frequency by time        75.8 ms <

 Memory estimate: 33.81 MiB, allocs estimate: 625475.

and here is ProtoBuf.jl 1.0.0:

julia> BenchmarkTools.@benchmark(read_example_proto_new()) |> display
BenchmarkTools.Trial: 741 samples with 1 evaluation.
 Range (min … max):  6.541 ms …   9.074 ms  β”Š GC (min … max): 0.00% … 26.07%
 Time  (median):     6.647 ms               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   6.748 ms Β± 386.710 ΞΌs  β”Š GC (mean Β± Οƒ):  1.16% Β±  4.39%

  β–„β–†β–ˆβ–‡β–…β–ƒβ–‚β–β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–„β–…β–…β–‡β–‡β–†β–…β–β–β–β–„β–„β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–„β–β–β–„β–…β–„β–β–…β–„β–„β–β–„β–„β–†β–…β–„β–‡ β–‡
  6.54 ms      Histogram: log(frequency) by time      8.65 ms <

 Memory estimate: 1.85 MiB, allocs estimate: 2668.

As mentioned earlier, this release is very breaking, migrating to 1.0.0 will require some effort. Here are some differences to the pre-1.0.0 version that you should take into account:

  • Services and RPCs are not yet implemented. We will focus on these in near future as a part of our effort to build native gRPC libraries for Julia.
  • All generated structs are immutable and don’t share a common abstract type. By default, no convenience constructors are generated for these structs, but you might use add_kwarg_contructors=true option in protojl.
  • oneof fields are now translated to OneOf{T} types with fields name::Symbol and val::T containing the name and the value of the chosen member.
  • Nested definitions (e.g. message Parent { message Child {} }) would previously be named like Parent_Child, now the Child message would be translated to a struct called var"Parent.Child".
  • When translating proto files with a package directive, protojl will generate a directory structure that copies the levels of said package. In the future, we want to support generation of full blown Julia packages that are easy to register in (private) registries.

Please see the docs for more information about the package.

Also please note that in this case the 1.0.0 version was not meant to signal stability of the package, there are no planned breaking changes, but there might be some rough edges here and there, so please report any issues you encounter.

39 Likes

Any particular reason to not call the package ProtocolBuffers.jl?

2 Likes

Huh, I just noticed that the package name is spelled ProtoBuf, not ProtoBuff. If you’re going to abbreviate Protocol Buffer, I would have expected it to be spelled ProtoBuff.

1 Like

protobuf is how it’s usually shortened: GitHub - protocolbuffers/protobuf: Protocol Buffers - Google's data interchange format

12 Likes

FWIW, during its development, the package was called ProtocolBuffers.jl, but since ProtoBuf.jl was already established, we chose not to fragment the ecosystem with a competing implementation.

4 Likes

How does it compare to the reference C++ implementation?

Would it make sense to merge this into the existing ProtoBuf.jl package as v2.0?

1 Like

How does it compare to the reference C++ implementation?

Not sure! The google repo does contain some benchmarking code I was able to run today, but I’m not sure if I can easily make an apples to apples comparison with our code, the setup they use seemed quite involved… I’ll look into it again at some point.

Would it make sense to merge this into the existing ProtoBuf.jl package as v2.0?

We bumped the existing ProtoBuf.jl package from 0.11.5 β†’ 1.0.0, sorry if that was not clear!
Edit: Assuming you brought this up for semver/compat reasons, this version change is as breaking as a bump to 2.0 would be.

2 Likes

Do you plan to add a json parser/writer? There is a related git issue but not much information there (or under the linked issues).

3 Likes

Hello, Thanks for working on this package. I am a user of this package and I noticed that encoding of unpacked representations is withheld (commented out here β†’ Allow unpacked repeated primitives). Is there any reason its not included yet. I can see it needs test set. Any other reasons ?

At some point yes, but it’s not a priority, as one can use protoc_jll as a workaround, see e.g. Parsing ProtoBuf Text Format - #3 by FireCrumb

2 Likes

This would be better to discuss in an issue, but the reason is that I don’t understand the use case – it is strictly more efficient to encode the array as packed. Maybe I’m missing something.

When supporting an existing protocol which uses unpacked representation for encoding and decoding we need it. If it is strictly encoding and decoding at the other end it is not important. But before decoding if hash is used to verify the object received in remote session then we would get into issues. The hash of the encoded representation will be different and will be rejected in our use case. For now I had to make a fork with these changes and continue using it. It would be ideal if changes are in the upstream. I was assuming you guys must have a reason and didn’t bother to raise an issue. I hope you are convinced about the use-case. If I am the protocol designer and implementer I would definitely avoid unpacked representation but to adhere to existing protocol it would make sense to support it.

raised a pull request here Allow unpacked repeated primitives by arhik Β· Pull Request #224 Β· JuliaIO/ProtoBuf.jl (github.com)

1 Like

Just to note that I’ve updated the code in the post above to ProtoBuf 1.0.0, and am using it – in case someone wants to discuss/ask/collaborate.

It does require booth protoc binary, and protoc_jil and .proto files all to be accessible in runtime.

1 Like

Thanks for the pointer and thanks for the update @FireCrumb.

Maybe I am missing something but I would like to parse a JSON file into proto and not a protobuf text file. We are using json as the human readable intermediary and several systems are already dumping dependent on it, I checked the protobuf text format but it is quite different then json.

Do you have a suggestion for reading/parsing a json file (or a Dictionary object) into a ProtoType object directly or via the binary encoding in Julia?

Ah, there is not automatic mapping between JSON and ProtoBuf (that I know of), the closest to that was a recent proposal to integrate tightly with StructTypes.jl. In the end we decided that this would be best implemented in a separate package.

Thanks @drvi ! Yeah, I was mislead by other protobuf APIs having JSON parser/writers and c++ api referring to JSON as β€œproto3 JSON format” but they are all custom parsers I guess.

Useful references I could find

TL;DR: unrelated to Julia, functionality exists in C++ API but not in protoc, some people implemented their cli tools with this functionality


Edit: seems they are using Google::Protobuf::DescriptorPool.generated_pool.lookup to load google.protobuf.json_format
Perhaps it can be done with Julia Interop?

1 Like