
When we change apps or data, we need:
- Backward compatibility - newer code can read data that was written by older code
- Forward compatibility - older code can read data that was written by newer code
Formats for encoding data
We work with data in two forms:
- In-memory to keep objects, structs, lists, and arrays.
- When you want to send it over the network, encode it in a self-contained sequence of bytes (e.g., JSON)
- Language-specific formats
- E.g. Java has java.io.Serializable, …
- They are tricky because encoding is tied to a particular language, and to restore data in the same object types, the decoding process needs to be able to instantiate arbitrary classes. Efficiency and versioning are also problems.
- JSON, XML, and Binary Variants
- They can be written/read by many programs
- Also, I have problems, such as ambiguity around the encoding of numbers
- JSON and XML have good support for Unicode characters, but don’t support binary strings
- Binary encoding → a good choice when you have terabytes of data
- Encoding JSON with MessagePack → binary encoding is 66b, a little less than 81b taken in textual form (not worth it)
- Trifht and Protocol Buffers. Both require a schema for data to be encoded. They both have a code generation tool from the schema definition.
- Avro was started in 2009 by Hadoop. Also uses schema to specify the structure of data Avro IDL for human editing, and JSON is more machine-readable). It’s just 32 bytes long for the example. Writer’s schema and the reader’s schema don’t need to be the same, but they need to be compatible. Also, it can dynamically generate schemas.
Modes of Dataflow
- Through Databases
- Process read and write data. Backward and forward compatibility are necessary here. Data outlives code.
- Through Service calls
- Client and Server (POST request).
- REST and RPC.
- RPC is flawed because a local function call is predictable (pass or fail), depending on the parameters under control, but a network request is unpredictable.
- Also mention other problems with retries, etc.
- Mention newer and improved RPC frameworks, such as Finagle and Rest li that differentiate remote requests from local function calls. by using promises (futures)
- An example of microservices architecture ?!
- Through Async message passing
- Act as a buffer
- Automatically redeliver messages to a process that has crashed
- The sender needs to know the IP Address of the recipient
- Allows one message to be sent to more recipients
- Message brokers - one process sends a message to a named queue or topic, and the broker ensures that the message is delivered to one or more consumers or subscribers
- Past: TIBCO, IBM WebSphere, …
- Now: RabbitMQ, ActiveMQ, HornetQ, Kafka
- The topic provides only one-way data flow.
- Brokers don’t enforce the data model.
- Distributed actor frameworks - a model for concurrency in a single process.
- Don’t work with threads, but with actors, where the logic is
- Each actor is a client and may have a local state
- Message delivery is not guaranteed
- Most popular frameworks
- Akka (Java)
- Orleans (.NET)
- Erlang OTP