Protocol Buffers Interview Questions - Hard

Hard-level Protocol Buffers interview questions covering advanced topics, performance optimization, and complex scenarios.

Q1: How does Protocol Buffer encoding work internally?

Answer:

Varint Encoding:

  • Variable-length encoding for integers
  • Smaller numbers use fewer bytes
  • Most significant bit indicates continuation

Example:

1300 = 0x012C
2Encoded as: 1010 1100 0000 0010
3            ^^^^ ^^^^ ^^^^ ^^^^
4            |    |    |    |
5            +----+----+----+-- continuation bits
6                 |    |    |
7                 +----+----+-- value bits

Wire Types:

  • VARINT (0): int32, int64, uint32, uint64, sint32, sint64, bool, enum
  • FIXED64 (1): fixed64, sfixed64, double
  • LENGTH_DELIMITED (2): string, bytes, embedded messages, packed repeated
  • START_GROUP (3): groups (deprecated)
  • END_GROUP (4): groups (deprecated)
  • FIXED32 (5): fixed32, sfixed32, float

Field Encoding:

1Field = (field_number << 3) | wire_type

Example Encoding:

1message Test {
2  int32 a = 1;  // Field 1, VARINT
3  string b = 2; // Field 2, LENGTH_DELIMITED
4}

For a=150, b="testing":

108 96 01        // Field 1 (08 = field 1, VARINT), value 150
212 07 74 65 73 74 69 6e 67  // Field 2 (12 = field 2, LENGTH_DELIMITED), length 7, "testing"

ZigZag Encoding (for sint32/sint64):

1Signed value -> Unsigned value
20 -> 0
3-1 -> 1
41 -> 2
5-2 -> 3
62 -> 4

Formula: (n << 1) ^ (n >> 31) for int32

Documentation: Encoding


Q2: How do you implement custom serialization for performance?

Answer:

Problem: Standard serialization may not be optimal for specific use cases.

Custom Marshaler:

 1type CustomUser struct {
 2    ID    int64
 3    Name  string
 4    Email string
 5}
 6
 7func (u *CustomUser) Marshal() ([]byte, error) {
 8    // Custom binary format
 9    buf := make([]byte, 0, 64)
10    
11    // Encode ID (varint)
12    buf = appendVarint(buf, uint64(u.ID))
13    
14    // Encode name (length-prefixed)
15    nameBytes := []byte(u.Name)
16    buf = appendVarint(buf, uint64(len(nameBytes)))
17    buf = append(buf, nameBytes...)
18    
19    // Encode email
20    emailBytes := []byte(u.Email)
21    buf = appendVarint(buf, uint64(len(emailBytes)))
22    buf = append(buf, emailBytes...)
23    
24    return buf, nil
25}
26
27func appendVarint(buf []byte, v uint64) []byte {
28    for v >= 0x80 {
29        buf = append(buf, byte(v)|0x80)
30        v >>= 7
31    }
32    buf = append(buf, byte(v))
33    return buf
34}

Zero-Copy Deserialization:

 1func (u *CustomUser) Unmarshal(data []byte) error {
 2    var offset int
 3    
 4    // Decode ID
 5    id, n := decodeVarint(data[offset:])
 6    offset += n
 7    u.ID = int64(id)
 8    
 9    // Decode name (zero-copy)
10    nameLen, n := decodeVarint(data[offset:])
11    offset += n
12    u.Name = string(data[offset:offset+int(nameLen)])
13    offset += int(nameLen)
14    
15    // Decode email
16    emailLen, n := decodeVarint(data[offset:])
17    offset += n
18    u.Email = string(data[offset:offset+int(emailLen)])
19    
20    return nil
21}

Documentation: Custom Types


Q3: How do you handle very large schemas and code generation?

Answer:

Problem: Large schemas generate huge code files and managing compilation becomes complex.

Modern Solution: Use buf (Recommended)

buf is the modern build system for Protocol Buffers - it's like "makefiles but for protobufs". It handles large schemas, dependencies, and code generation automatically.

buf Configuration:

 1# buf.yaml
 2version: v1
 3name: buf.build/acme/api
 4deps:
 5  - buf.build/googleapis/googleapis
 6  - buf.build/acme/common-proto
 7modules:
 8  - path: proto
 9lint:
10  use:
11    - DEFAULT
12  except:
13    - PACKAGE_VERSION_SUFFIX
14breaking:
15  use:
16    - FILE

buf.gen.yaml Template:

 1version: v1
 2plugins:
 3  - plugin: buf.build/protocolbuffers/python
 4    out: gen/python
 5  - plugin: buf.build/connectrpc/go
 6    out: gen/go
 7    opt:
 8      - paths=source_relative
 9  - plugin: buf.build/grpc/go
10    out: gen/go
11    opt:
12      - paths=source_relative

buf Workflow:

 1# Initialize project
 2buf mod init
 3
 4# Update dependencies
 5buf mod update
 6
 7# Lint all proto files
 8buf lint
 9
10# Check for breaking changes
11buf breaking --against '.git#branch=main'
12
13# Generate code for all languages
14buf generate
15
16# Format proto files
17buf format -w
18
19# Build and validate
20buf build

Split into Multiple Files:

 1// user.proto
 2syntax = "proto3";
 3package api;
 4
 5import "common.proto";
 6import "user_types.proto";
 7
 8message User {
 9  int64 id = 1;
10  common.Metadata metadata = 2;
11  user_types.UserProfile profile = 3;
12}
13
14// user_types.proto
15syntax = "proto3";
16package api.user_types;
17
18message UserProfile {
19  string name = 1;
20  string email = 2;
21}

Benefits of buf:

  • Automatic Dependency Management: No need to manage --proto_path flags
  • Consistent Generation: Same command generates code for all languages
  • Breaking Change Detection: Catches incompatible changes automatically
  • Linting: Built-in linting ensures code quality
  • CI/CD Integration: Easy to integrate into build pipelines
  • Schema Registry: Can publish/consume schemas from Buf Schema Registry

Legacy Solution: Using protoc with Makefiles:

If you must use protoc directly, use Makefiles for incremental generation:

 1# Makefile
 2PROTO_FILES := $(shell find . -name '*.proto')
 3GO_FILES := $(PROTO_FILES:.proto=.pb.go)
 4
 5%.pb.go: %.proto
 6	protoc --go_out=. --go_opt=paths=source_relative $<
 7
 8generate: $(GO_FILES)
 9
10.PHONY: generate

Recommendation: Use buf for all new projects. It's the modern standard and significantly simplifies protobuf workflow.

Documentation:


Q4: How do you implement Protocol Buffer reflection and dynamic messages?

Answer:

Reflection API:

 1import (
 2    "google.golang.org/protobuf/reflect/protoreflect"
 3    "google.golang.org/protobuf/reflect/protoregistry"
 4)
 5
 6// Get message descriptor
 7desc, _ := protoregistry.GlobalTypes.FindMessageByName("user.User")
 8
 9// Create new message instance
10msg := desc.New()
11
12// Get field descriptor
13field := desc.Fields().ByName("name")
14
15// Set field value
16field.Set(msg, protoreflect.ValueOfString("John Doe"))
17
18// Get field value
19value := field.Get(msg)
20fmt.Println(value.String())  // "John Doe"

Dynamic Message Creation:

 1func CreateDynamicMessage(typeName string, data map[string]interface{}) (proto.Message, error) {
 2    // Find message type
 3    desc, err := protoregistry.GlobalTypes.FindMessageByName(protoreflect.FullName(typeName))
 4    if err != nil {
 5        return nil, err
 6    }
 7    
 8    // Create instance
 9    msg := desc.New().Interface()
10    
11    // Set fields dynamically
12    for key, value := range data {
13        field := desc.Fields().ByName(protoreflect.Name(key))
14        if field == nil {
15            continue
16        }
17        
18        switch field.Kind() {
19        case protoreflect.StringKind:
20            field.Set(msg.(protoreflect.Message), protoreflect.ValueOfString(value.(string)))
21        case protoreflect.Int32Kind:
22            field.Set(msg.(protoreflect.Message), protoreflect.ValueOfInt32(value.(int32)))
23        // ... handle other types
24        }
25    }
26    
27    return msg, nil
28}

JSON to Protobuf (Dynamic):

1func JSONToProtobuf(typeName string, jsonData []byte) (proto.Message, error) {
2    // Parse JSON
3    var jsonMap map[string]interface{}
4    json.Unmarshal(jsonData, &jsonMap)
5    
6    // Create dynamic message
7    return CreateDynamicMessage(typeName, jsonMap)
8}

Documentation: Reflection


Q5: How do you optimize Protocol Buffer performance for high-throughput systems?

Answer:

1. Pool Message Instances:

 1var userPool = sync.Pool{
 2    New: func() interface{} {
 3        return &pb.User{}
 4    },
 5}
 6
 7func GetUser() *pb.User {
 8    return userPool.Get().(*pb.User)
 9}
10
11func PutUser(u *pb.User) {
12    u.Reset()
13    userPool.Put(u)
14}

2. Pre-allocate Slices:

 1// ❌ BAD: Growing slice
 2users := []*pb.User{}
 3for i := 0; i < 1000; i++ {
 4    users = append(users, &pb.User{})
 5}
 6
 7// ✅ GOOD: Pre-allocate
 8users := make([]*pb.User, 0, 1000)
 9for i := 0; i < 1000; i++ {
10    users = append(users, &pb.User{})
11}

3. Reuse Buffers:

 1var bufferPool = sync.Pool{
 2    New: func() interface{} {
 3        return make([]byte, 0, 1024)
 4    },
 5}
 6
 7func MarshalUser(user *pb.User) ([]byte, error) {
 8    buf := bufferPool.Get().([]byte)
 9    defer bufferPool.Put(buf[:0])
10    
11    // Marshal into buffer
12    return proto.MarshalOptions{
13        UseCachedSize: true,
14    }.MarshalAppend(buf[:0], user)
15}

4. Use Streaming for Large Datasets:

 1func StreamUsers(stream pb.UserService_ListUsersServer) error {
 2    // Batch send
 3    batch := make([]*pb.User, 0, 100)
 4    
 5    for user := range userChannel {
 6        batch = append(batch, user)
 7        
 8        if len(batch) >= 100 {
 9            for _, u := range batch {
10                stream.Send(u)
11            }
12            batch = batch[:0]
13        }
14    }
15    
16    // Send remaining
17    for _, u := range batch {
18        stream.Send(u)
19    }
20    
21    return nil
22}

5. Profile and Optimize Hot Paths:

1import _ "net/http/pprof"
2
3// Profile serialization
4go tool pprof http://localhost:6060/debug/pprof/profile

Documentation: Performance Best Practices


Q6: How do you implement Protocol Buffer schema migration at scale?

Answer:

Migration Strategy:

1. Versioned Schemas:

 1// v1/user.proto
 2syntax = "proto3";
 3package user.v1;
 4
 5message User {
 6  int64 id = 1;
 7  string name = 2;
 8}
 9
10// v2/user.proto
11syntax = "proto3";
12package user.v2;
13
14message User {
15  int64 id = 1;
16  string name = 2;
17  string email = 3;  // New field
18}

2. Adapter Pattern:

 1type UserAdapter struct {
 2    v1User *v1.User
 3    v2User *v2.User
 4}
 5
 6func (a *UserAdapter) ToV2() *v2.User {
 7    return &v2.User{
 8        Id:    a.v1User.Id,
 9        Name:  a.v1User.Name,
10        Email: "",  // Default value
11    }
12}
13
14func (a *UserAdapter) ToV1() *v1.User {
15    return &v1.User{
16        Id:   a.v2User.Id,
17        Name: a.v2User.Name,
18        // Email field ignored
19    }
20}

3. Gradual Migration:

 1type UserService struct {
 2    v1Enabled bool
 3    v2Enabled bool
 4}
 5
 6func (s *UserService) GetUser(req *pb.GetUserRequest) (*pb.User, error) {
 7    // Check client version
 8    if s.supportsV2(req) && s.v2Enabled {
 9        return s.getUserV2(req)
10    }
11    return s.getUserV1(req)
12}

4. Schema Registry:

 1type SchemaRegistry struct {
 2    schemas map[string]*Schema
 3    versions map[string][]string
 4}
 5
 6func (r *SchemaRegistry) GetSchema(name string, version string) (*Schema, error) {
 7    key := fmt.Sprintf("%s:%s", name, version)
 8    return r.schemas[key], nil
 9}
10
11func (r *SchemaRegistry) Migrate(data []byte, fromVersion, toVersion string) ([]byte, error) {
12    // Deserialize from old version
13    oldSchema, _ := r.GetSchema("User", fromVersion)
14    oldMsg := oldSchema.Deserialize(data)
15    
16    // Convert to new version
17    newSchema, _ := r.GetSchema("User", toVersion)
18    newMsg := adapt(oldMsg, newSchema)
19    
20    // Serialize to new version
21    return newSchema.Serialize(newMsg)
22}

Documentation: Schema Evolution


Q7: How do you implement Protocol Buffer compression and encryption?

Answer:

Compression:

 1import (
 2    "compress/gzip"
 3    "bytes"
 4)
 5
 6func CompressProtobuf(msg proto.Message) ([]byte, error) {
 7    // Marshal
 8    data, err := proto.Marshal(msg)
 9    if err != nil {
10        return nil, err
11    }
12    
13    // Compress
14    var buf bytes.Buffer
15    writer := gzip.NewWriter(&buf)
16    writer.Write(data)
17    writer.Close()
18    
19    return buf.Bytes(), nil
20}
21
22func DecompressProtobuf(data []byte, msg proto.Message) error {
23    // Decompress
24    reader, err := gzip.NewReader(bytes.NewReader(data))
25    if err != nil {
26        return err
27    }
28    defer reader.Close()
29    
30    // Read decompressed data
31    decompressed, err := io.ReadAll(reader)
32    if err != nil {
33        return err
34    }
35    
36    // Unmarshal
37    return proto.Unmarshal(decompressed, msg)
38}

Encryption:

 1import (
 2    "crypto/aes"
 3    "crypto/cipher"
 4    "crypto/rand"
 5)
 6
 7func EncryptProtobuf(msg proto.Message, key []byte) ([]byte, error) {
 8    // Marshal
 9    data, err := proto.Marshal(msg)
10    if err != nil {
11        return nil, err
12    }
13    
14    // Create cipher
15    block, err := aes.NewCipher(key)
16    if err != nil {
17        return nil, err
18    }
19    
20    // Generate IV
21    iv := make([]byte, aes.BlockSize)
22    rand.Read(iv)
23    
24    // Encrypt
25    stream := cipher.NewCFBEncrypter(block, iv)
26    encrypted := make([]byte, len(data))
27    stream.XORKeyStream(encrypted, data)
28    
29    // Prepend IV
30    return append(iv, encrypted...), nil
31}

Documentation: Security Best Practices


Q8: How do you implement Protocol Buffer validation at the transport layer?

Answer:

Middleware Pattern:

 1type ValidationMiddleware struct {
 2    validator *Validator
 3    next      grpc.UnaryServerInterceptor
 4}
 5
 6func (m *ValidationMiddleware) Intercept(
 7    ctx context.Context,
 8    req interface{},
 9    info *grpc.UnaryServerInfo,
10    handler grpc.UnaryHandler,
11) (interface{}, error) {
12    // Validate request
13    if err := m.validator.Validate(req); err != nil {
14        return nil, status.Error(codes.InvalidArgument, err.Error())
15    }
16    
17    // Call handler
18    resp, err := handler(ctx, req)
19    if err != nil {
20        return nil, err
21    }
22    
23    // Validate response
24    if err := m.validator.Validate(resp); err != nil {
25        return nil, status.Error(codes.Internal, "invalid response")
26    }
27    
28    return resp, nil
29}

Stream Validation:

 1type ValidatingStream struct {
 2    grpc.ServerStream
 3    validator *Validator
 4}
 5
 6func (s *ValidatingStream) SendMsg(m interface{}) error {
 7    if err := s.validator.Validate(m); err != nil {
 8        return status.Error(codes.Internal, err.Error())
 9    }
10    return s.ServerStream.SendMsg(m)
11}
12
13func (s *ValidatingStream) RecvMsg(m interface{}) error {
14    if err := s.ServerStream.RecvMsg(m); err != nil {
15        return err
16    }
17    if err := s.validator.Validate(m); err != nil {
18        return status.Error(codes.InvalidArgument, err.Error())
19    }
20    return nil
21}

Documentation: gRPC Interceptors


Q9: How do you implement Protocol Buffer message routing and versioning?

Answer:

Message Router:

 1type MessageRouter struct {
 2    handlers map[string]MessageHandler
 3    versions map[string]string
 4}
 5
 6type MessageHandler func(proto.Message) (proto.Message, error)
 7
 8func (r *MessageRouter) Route(msg proto.Message) (proto.Message, error) {
 9    // Get message type
10    msgType := string(proto.MessageName(msg))
11    
12    // Get version
13    version := r.versions[msgType]
14    if version == "" {
15        version = "v1"  // Default
16    }
17    
18    // Get handler
19    key := fmt.Sprintf("%s:%s", msgType, version)
20    handler, exists := r.handlers[key]
21    if !exists {
22        return nil, fmt.Errorf("no handler for %s", key)
23    }
24    
25    // Route message
26    return handler(msg)
27}

Version Detection:

 1func DetectVersion(msg proto.Message) string {
 2    // Check for version field
 3    if m, ok := msg.(interface{ GetVersion() string }); ok {
 4        return m.GetVersion()
 5    }
 6    
 7    // Check message type name
 8    name := string(proto.MessageName(msg))
 9    if strings.Contains(name, ".v2.") {
10        return "v2"
11    }
12    if strings.Contains(name, ".v1.") {
13        return "v1"
14    }
15    
16    return "v1"  // Default
17}

Documentation: API Versioning


Q10: How do you optimize Protocol Buffer for embedded systems?

Answer:

1. Minimize Code Size:

1// Use smallest types
2message SensorData {
3  sint32 temperature = 1;  // Instead of int64
4  uint32 timestamp = 2;     // Instead of int64
5  fixed32 sensor_id = 3;    // Fixed size
6}

2. Disable Unused Features:

1// Minimal code generation
2protoc --go_out=paths=source_relative:. \
3  --go_opt=Mgoogle/protobuf/any.proto= \
4  sensor.proto

3. Use Packed Repeated:

1message Data {
2  repeated sint32 values = 1 [packed=true];  // Smaller encoding
3}

4. Custom Allocator:

 1type PoolAllocator struct {
 2    pool *sync.Pool
 3}
 4
 5func (a *PoolAllocator) Alloc(size int) []byte {
 6    buf := a.pool.Get().([]byte)
 7    if cap(buf) < size {
 8        return make([]byte, size)
 9    }
10    return buf[:size]
11}
12
13func (a *PoolAllocator) Free(buf []byte) {
14    a.pool.Put(buf)
15}

Documentation: Embedded Systems


Q11: How do you implement Protocol Buffer message queuing and batching?

Answer:

Message Queue:

 1type MessageQueue struct {
 2    queue chan proto.Message
 3    batchSize int
 4    timeout time.Duration
 5}
 6
 7func (q *MessageQueue) Enqueue(msg proto.Message) {
 8    select {
 9    case q.queue <- msg:
10    default:
11        // Queue full, handle error
12    }
13}
14
15func (q *MessageQueue) Batch() ([]proto.Message, error) {
16    batch := make([]proto.Message, 0, q.batchSize)
17    timeout := time.After(q.timeout)
18    
19    for {
20        select {
21        case msg := <-q.queue:
22            batch = append(batch, msg)
23            if len(batch) >= q.batchSize {
24                return batch, nil
25            }
26        case <-timeout:
27            if len(batch) > 0 {
28                return batch, nil
29            }
30            return nil, ErrTimeout
31        }
32    }
33}

Batch Serialization:

 1func SerializeBatch(msgs []proto.Message) ([]byte, error) {
 2    var buf bytes.Buffer
 3    
 4    for _, msg := range msgs {
 5        data, err := proto.Marshal(msg)
 6        if err != nil {
 7            return nil, err
 8        }
 9        
10        // Write length prefix
11        length := uint32(len(data))
12        binary.Write(&buf, binary.LittleEndian, length)
13        buf.Write(data)
14    }
15    
16    return buf.Bytes(), nil
17}

Documentation: Message Queuing Patterns


Q12: How do you implement Protocol Buffer schema validation and linting?

Answer:

Modern Approach: Using buf Lint (Recommended)

buf provides built-in, powerful linting capabilities that are much easier to use than custom linters.

buf.yaml Lint Configuration:

 1# buf.yaml
 2version: v1
 3name: buf.build/your-org/your-repo
 4lint:
 5  use:
 6    - DEFAULT  # Use all default rules
 7  except:
 8    - PACKAGE_VERSION_SUFFIX  # Allow v1, v2 suffixes
 9  rules:
10    FIELD_LOWER_SNAKE_CASE: ERROR
11    MESSAGE_PASCAL_CASE: ERROR
12    ENUM_PASCAL_CASE: ERROR
13    SERVICE_PASCAL_CASE: ERROR
14    RPC_PASCAL_CASE: ERROR
15    ENUM_VALUE_UPPER_SNAKE_CASE: ERROR

Run Linting:

 1# Lint all proto files
 2buf lint
 3
 4# Lint specific directory
 5buf lint proto/
 6
 7# Lint with specific config
 8buf lint --config buf.lint.yaml
 9
10# Fix auto-fixable issues
11buf lint --fix

Common Lint Rules:

  • FIELD_LOWER_SNAKE_CASE: Fields must be snake_case
  • MESSAGE_PASCAL_CASE: Messages must be PascalCase
  • ENUM_PASCAL_CASE: Enums must be PascalCase
  • RPC_PASCAL_CASE: RPC methods must be PascalCase
  • PACKAGE_LOWER_SNAKE_CASE: Packages must be lowercase
  • IMPORT_USED: All imports must be used
  • FIELD_NO_DELETE: Cannot delete fields (breaking change)

Custom Lint Rules:

 1# buf.yaml
 2version: v1
 3lint:
 4  use:
 5    - DEFAULT
 6  rules:
 7    # Custom severity
 8    FIELD_LOWER_SNAKE_CASE: ERROR
 9    MESSAGE_PASCAL_CASE: WARNING
10    
11    # Disable specific rules
12    PACKAGE_VERSION_SUFFIX: OFF

Breaking Change Detection:

1# Check against main branch
2buf breaking --against '.git#branch=main'
3
4# Check against remote
5buf breaking --against 'buf.build/your-org/your-repo:main'
6
7# Check specific file
8buf breaking --against '.git#branch=main' --path proto/user.proto

CI/CD Integration:

 1# .github/workflows/lint.yml
 2name: Lint Protobuf
 3on: [push, pull_request]
 4jobs:
 5  lint:
 6    runs-on: ubuntu-latest
 7    steps:
 8      - uses: actions/checkout@v3
 9      - uses: bufbuild/buf-setup-action@v1
10      - run: buf lint
11      - run: buf breaking --against '.git#branch=main'

Legacy: Custom Linter (if needed):

Only use custom linters if you need very specific rules not covered by buf:

 1type SchemaLinter struct {
 2    rules []LintRule
 3}
 4
 5type LintRule func(*FileDescriptor) []LintError
 6
 7func (l *SchemaLinter) Lint(fd *FileDescriptor) []LintError {
 8    var errors []LintError
 9    
10    for _, rule := range l.rules {
11        errors = append(errors, rule(fd)...)
12    }
13    
14    return errors
15}
16
17// Example rule
18func FieldNamingRule(fd *FileDescriptor) []LintError {
19    var errors []LintError
20    
21    for _, msg := range fd.Messages {
22        for _, field := range msg.Fields {
23            if !isSnakeCase(field.Name) {
24                errors = append(errors, LintError{
25                    Field: field,
26                    Message: "field name must be snake_case",
27                })
28            }
29        }
30    }
31    
32    return errors
33}

Recommendation: Use buf lint for all linting needs. It's comprehensive, fast, and well-maintained.

Documentation:


Related Snippets