Load Balancing with the Distributor
Similar in behavior to standard load balancers
the NServiceBus Distributor is the key to scaling out
message processing over many machines transparently.
As a standard NServiceBus process, the distributor maintains
all the fault-tolerant and performance characteristics of
NServiceBus but also is designed never to overwhelm any
of the worker nodes configured to receive work from it.
Why use it
When starting to use NServiceBus, you'll see that you can easily run multiple instances of the same process
with the same input queue. This may look like scaling-out at first, but is really no different than running
multiple threads within the same process. If you try to do this with multiple machines, you'll see that
you can't share a single input queue across multiple machines.
The distributor gets around this limitation.
What about MSMQ v4
In version 4 of MSMQ, made available with Vista and Server 2008, there is now the ability to perform
'remote transactional receive' which wasn't possible in previous versions.
What this means is that processes on other machines can transactionally pull work from a queue on a different
machine. If the machine processing the message were to crash, the message would roll back to the queue and
other machines could then process it.
Even though the distributor provided similar functionality even before Vista was released, there are other
reasons to use it even on the newer OS.
The problem with 'remote transactional receive' is that it gets proportionally slower as more worker nodes
are added. This is due to the overhead of managing more transactions, as well as the longer period of time
that these transactions are open.
In short, the scale-out benefits of MSMQ v4 by itself are quite limited.
How does it work
Worker nodes send messages to the distributor, telling it when they're ready for work.
These messages arrive at the distributor via a separate 'control' queue.
The distributor stores this information.
When applicative messages arrive at the distributor, it uses the previously stored information to find a free
worker node, and sends the message on to it.
If no worker nodes are free, the distributor waits a bit and then repeats the previous step.
This way, all pending work stays in the distributor's queue (rather than building up in each of the workers'
queues) giving visibility into how long messages are actually waiting. This is important for complying with
time-based service level agreements (SLAs).
For more information on monitoring, see Monitoring NServiceBus Endpoints.
Where is it
You can find the distributor under the /processes/distributor directory in the download.
In there you'll see NServiceBus.Host.exe - the generic NServiceBus host process used
to run the distributor.
If you run the exe by double-clicking on it, you'll see a great deal of debug information.
This is standard NServiceBus host behavior. The common use of the distributor in production
will be as a Windows Service and not as a console application. In order to see how to install
the distributor as a Windows Service, see here.
Configuration
The distributor can be configured via the NServiceBus.Distributor.dll.config file:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<appSettings>
<add key="NumberOfWorkerThreads" value="1"/>
<add key="ErrorQueue" value="error"/>
<add key="Serialization" value="xml"/>
<!-- can be either "xml", or "binary" -->
<add key="NameSpace" value="http://www.UdiDahan.com"/>
<!-- relevant for a Serialization value of "xml" only -->
<add key="StorageQueue" value="distributorStorage"/>
<add key="DataInputQueue" value="distributorDataBus"/>
<add key="ControlInputQueue" value="distributorControlBus"/>
</appSettings>
</configuration>
Like any NServiceBus process, you can control the number of threads it runs with the
'NumberOfWorkerThreads' property, and the error queue to which it sends messages it
wasn't able to successfully process.
Similar to the way serialization is controlled in code with NServiceBus, the distributor exposes
these properties as configuration - the NameSpace value is only applicable when the value of
the Serialization property is set to "xml". Administrators should set these values to be the
same as those used in the rest of the application.
The StorageQueue property indicates which queue will be used by the distributor to
store the state it needs for its operation. This queue should be local - on the same
machine as the distributor process.
The last two properties are the most relevant to the use of a distributor by other processes.
Routing with the Distributor
The distributor makes use of two queues for its runtime operation.
The DataInputQueue is the queue that client processes will be sending their applicative
messages to. The ControlInputQueue is the queue that worker nodes will be sending their
control messages to.
Configuring a worker node to work with a distributor is done by setting the values of the
following attributes of the UnicastBusConfig section:
<UnicastBusConfig
DistributorControlAddress="distributorControlBus@Cluster1"
DistributorDataAddress="distributorDataBus@Cluster1">
<MessageEndpointMappings>
<!-- regular entries -->
</MessageEndpointMappings>
</UnicastBusConfig>
Similar to standard NServiceBus routing, you wouldn't want high priority messages to get stuck
behind lower priority messages, so just as you would have separate NServiceBus processes for
different message types, you would also set up different distributor instances (with separate
queues) for different message types.
In this case, you could name the queues just like the messages - for example,
SubmitPurchaseOrder.StrategicCustomers.Sales. This would be the name of the distributor's
data queue and that of the input queues of each of the workers. The distributor's control
queue would best be named with a prefix of 'control' as follows: Control.SubmitPurchaseOrder.StrategicCustomers.Sales.
When using the distributor in a full publish/subscribe deployment, what we'll see is
a distributor within each subscriber balancing the load of events being published as follows:
Keep in mind that the distributor is designed for load balancing within a single site
and should not be used between sites. In the image above, all publishers and subscribers
are within a single physical site.
For information on using NServiceBus across multiple physical sites, see the gateway.
High Availability
If the distributor goes down, even if its worker nodes remain running they will not receive any messages.
Therefore, it is important to run the distributor on a cluster configuring its queues as clustered resources.
Since the distributor doesn't do CPU or memory intensive work, you can often put several distributor processes
on the same clustered server. Be aware the network IO may end up being the bottleneck for the distributor
so take into account message sizes and throughput when sizing your infrastructure.