Overview
One of the problems with the distributed systems being built today is that they are fragile.
As one part of the system slows down, that tends to ripple out and cripple the entire system.
One of the primary design goals of nServiceBus was to eliminate that, guiding developers
to writing code that would be robust in production environments. That robustness also needs
to prevent data loss under various failure conditions.
In order to make use of nServiceBus effectively, you need to understand the
distributed systems architecture it is designed to support. In other words,
if you design your system according to the principles laid out below, nServiceBus
will make your life a lot easier. On the other hand, if you do not follow
these principles, nServiceBus will probably make them harder.
The extensibility features in nServiceBus will enable you tweak its behavior to
suit your specific needs, yet those will be documented separately.
The communications pattern that enables robustness is one-way messaging - also
known as "fire and forget". That will be discussed in more detail shortly.
Since the amount of time it can take to communicate with another machine across
the network is both unknown and unbounded, communications are based on a Store-and-Forward
model as shown in the following diagram:
In this model, when the client process calls an API to send a message to the server
process, the API returns control to the calling thread before the message is sent.
At that point, the transfer of the message across the network becomes the responsibility
of the messaging technology. There may be various kinds of communications interference,
the server machine may simply be down, or a firewall may be slowing down the transfer.
Also, even though the message may have reached the target machine, the target process
may currently be down.
While all of this is going on, the client process is oblivious. Critical resources
like threads (and it's allocated memory) are not being held waiting for the call
to complete. This prevents the client process from losing stability as a result
of having many threads and all their memory used up waiting for a response from
the other machine or process.
The common pattern of Request/Response, which is more accurately described as Synchronous
Remote Procedure Call, is handled differently when using one way messaging. Instead
of letting the stack of the calling thread manage the state of the communications
interaction, that is done explicitly. From a network perspective, request/response
is just two one-way interactions as shown in the next figure:
This communication is especially critical for servers as client behind problematic
network connections now have little effect on the server's stability.
If a client crashes between the time that it sent the request until the server sends
a response, the server will not have resources tied up waiting minutes and minutes
until the connection times out.
When used in concert with Durable Messaging, system-wide robustness increases even
more.
Durable messaging differs from regular store-and-forward messaging in that the messages
are persisted to disk locally before attempting to be sent.
What this means is that if after the calling thread has had control returned to
it the process crashes, the message sent will not have been lost. In server to server
scenarios, where a server can complete a local transaction but might crash a second
later, one-way durable messaging makes it easier to create an overall robust system
even in the face of unreliable building blocks.
A different communication style involves one-to-many communication.
In this style, the sender of the message often does not know about the specifics of
those that wish to receive the message. This additional loose coupling comes at the cost
of subscribers explicitly opting-in to receiving messages as shown in the following diagram:
Subscriptions
Subscribers need to know about which endpoint is responsible for a given message.
This information is usually made available as part of the contract, specifying
to which endpoint a subscriber should send its request. As a part of the subscription
message, a subscriber passes its "return address", the endpoint at which it wants
to receive messages.
Keep in mind that the publisher may choose to store the information about
which subscriber is interested in which message in a highly available manner.
This would allow multiple processes on multiple machines to publish messages
to all subscribers, regardless if one had received the subscription message or not.
Subscribers don't necessarily have to subscribe themselves. Through the use of the Return Address pattern, one central configuration station
could send multiple messages to each publisher specifying which subscriber endpoints
to subscribe to which message.
Another option that can be used is for multiple physical subscribers to make themselves
appear as one single logical subscriber. This makes it possible to load balance the handling
of messages between multiple physical subscribers without any explicit coordination on either the part of the
publisher or the part of any one subscriber. All that is needed is for all subscribers to
specify the same return address in the subscription message.
Publishing
Publishing a message involves having the message arrive at all endpoints which
had previously subscribed to that type of message.
Messages which are published often represent events - things that have happened,
for instance Order Cancelled, Product Out of Stock, and Shipping Delayed. Sometimes,
the cause of an event is the handling of a previous command message, for instance Cancel Order.
A publisher is not required to publish a message as a part of handling a command message
although it is the simplest solution.
Since many command messages can be received in a short period of time, publishing
a message to all subscribers for every command message multiplies the incoming load
and, as such, is a less than optimal solution. A better solution would have the publisher
roll up all the changes that had occurred in a given period of time into a single published
message. The appropriate period of time is dependent on the Service Level Agreement of
the publisher - its commitment to the freshness of the data.
For instance, in the financial domain the publishing period may be 10 ms while in
the business to consumer e-commerce domain a minute may be acceptable.
Another advantage of publishing messages on a timer is that that activity can
be offloaded from the endpoint/server processing command messages effectively
scaling out over more servers.
Many systems provide users with the ability to search, filter, and sort data.
While one-way messaging and publish/subscribe are core components of the
implementation of these features, the way they are combined is not at all like
regular client-server request/response.
In regular client-server development, the server is responsible for providing
the client with all CRUD (create, read, update, and delete) capabilities.
However, when users look at data they do not often require it to be up to date to the second
(given that they often look at the same screen for several seconds to minutes at a time).
As such, retrieving data from the same table as that being used for highly consistent
transaction processing creates contention resulting in poor performance for all CRUD actions
under higher load.
A solution that avoids this problem separates commands and queries at the system-level,
even above that of client and server. In this solution there are two "services" that span
both client and server - one in charge of commands (create, update, delete), the other in
charge of queries (read). These services communicate only via messages - one cannot access
the database of the other, as shown in the following diagram:
The command service publishes messages about changes to data, to which the query service
subscribes. When the query service receives such notifications, it saves the data in its
own data store which may well have a different schema (optimized for queries like a star schema).
The query service may also keep all data in memory if the data is small enough.