Making Distributed Computing Relevant and Accessible

First, let us assume that distributed computing is generally that area of developing and running software designed to process large numbers of long running tasks on servers that are optimally proximal to the data being processed.

Second, let us agree, if for this discussion only, that distributed computing is NOT your collection of services on back end servers that support your service oriented architecture (SOA) for your web and mobile apps.

Third, let us presume that you are NOT already blessed with a job where you write distributed computing software.

How then can distributed computing be relevant to you? And how can you take advantage of distributed computing without becoming an expert in one of the several well known distributed computing platforms on the market today?

Both are excellent questions. Thank you for asking. Let’s try a practical approach.

Imagine you are at your desk and your boss comes to you and ask how fast your web servers respond to the customer. Of course, your first instinct is to write this program to find out:

private static void DoTenUrlsInParallel()
{
   Console.WriteLine("Do 10 urls in parallel");
   var sw = Stopwatch.StartNew();
   ISpeedTest test = new SpeedTest();
   Parallel.ForEach(TestUrls, (url) =>
   {
      var result = test.GetSpeed(url);
      Console.WriteLine("r:{0}, s:{1}, b:{2}, u:{3}",
         result.ResponseTimeMs, result.ReadStreamTimeMs, 
         result.ResponseLength, result.Url);
   });
   sw.Stop();
   Console.WriteLine("Total elapsed time: {0}", 
      sw.ElapsedMilliseconds);
   Console.WriteLine(string.Empty);
}

You take him the results and he says, “But isn’t this from your desk? I want to know what these numbers look like from all around the world. East and west U.S. North and west Europe. And east and south east Asia. And I want a regular stream of these numbers fed into a spreadsheet for me every day.”

Do you say, “No problem.” You do if you have a Windows Azure account and you know about the distributed task parallel library from DuoVia called DuoVia.Net.Distributed. You go back to your desk and modify the code to look like this:

private static void DoTenUrlsThreeTimesEachAroundTheWorldInParallel(bool runLocal = false)
{
   var serverEndpoints = new IPEndPoint[0];
   if (runLocal)
   {
      serverEndpoints = new IPEndPoint[] { new IPEndPoint(IPAddress.Parse("127.0.0.1"), 9096) };
   }
   else
   {
      //these server names are temporary - to run this test use your own
      var servers = new string[]
      {
         "myaz-westus.cloudapp.net",
         "myaz-eastus.cloudapp.net",
         "myaz-northeu.cloudapp.net",
         "myaz-westeu.cloudapp.net",
         "myaz-soeastasia.cloudapp.net",
         "myaz-eastasia.cloudapp.net"
      };

      serverEndpoints = new IPEndPoint[servers.Length];
      for (int i = 0; i < servers.Length; i++)
      {
         var host = Dns.GetHostAddresses(servers[i]);
         var ip = (from n in host 
                   where n.AddressFamily == AddressFamily.InterNetwork 
                   select n).First();
         serverEndpoints[i] = new IPEndPoint(ip, 9096);
      }
   }

   float subscriptionRate = 2.0f; //oversubscribed 
   int logPollingIntervalSeconds = 2;
   using (DistributedClient<ISpeedTest> client = 
          Distributor.Connect<ISpeedTest>(typeof(SpeedTest),
          subscriptionRate,
          logPollingIntervalSeconds,
          LogLevel.Debug,
          serverEndpoints))
   {
      for (int i = 0; i < 3; i++)
      {
         var sw = Stopwatch.StartNew();
         Console.WriteLine(@"round:{0}", i + 1);
         var loopResult = client.ForEach(TestUrls, (url, proxy) => proxy.GetSpeed(url));
         foreach (var result in loopResult.Results)
         {
            Console.WriteLine(@"r:{0}, s:{1}, b:{2}, on: {3}, u:{4}",
               result.ResponseTimeMs, result.ReadStreamTimeMs, 
			   result.ResponseLength, result.MachineName, result.Url);
         }
         sw.Stop();
         Console.WriteLine("Total elapsed time: {0}", sw.ElapsedMilliseconds);
         Console.WriteLine(string.Empty);
      }
   }
}

And you and your boss are happy.

Sometimes distributed computing is more about location and proximity to data or infrastructure than it is to getting massive amounts of data processed in as little time as possible.

You can find the full demo source code here.

Distributed Parallel Processing in Simple .NET Console Application

You write a simple console application. Debug it locally. And flip a switch and run it on all your servers and workstations running MpiVisor Server. This demo is part of the original source on GitHub. It is has a small Main class, a "master" runner class and a "spawned" runner class. More on how it works in the next post. For now, just enjoy how easy the code is. Read on to see the code.

static void Main(string[] args)
{
    //connect agent and dispose at end of execution
    //use forceLocal to run in a single process with internal visor
    using (Agent.Connect(forceLocal: false)) 
    {
        //default is File only - spawned agents shuttle logs back to master
        Log.LogType = LogType.Both; 
        if (Agent.Current.IsMaster)
        {
            try
            {
                //keep Main clean with master message loop class
                MasterRunner.Run(args);
            }
            catch (Exception e)
            {
                Log.Error("Agent master exception: {0}", e);
            }
        }
        else
        {
            try
            {
                //keep Main clean with spawned agent message loop class
                SpawnRunner.Run(args);
            }
            catch (Exception e)
            {
                Log.Error("spawn agent exception: {0}", e);
                Agent.Current.Send(new Message(Agent.Current.SessionId, 
                    Agent.Current.AgentId,
                    MpiConsts.MasterAgentId, 
                    SystemMessageTypes.Aborted, 
                    e.ToString()));
            }
        }
    }
} //standard ending - will force service to kill spawned agents  


internal static class MasterRunner
{
    private static ushort numberOfAgentsToSpawn = 2;

    //use to stop message processing loop
    private static bool continueProcessing = true;

    //additional means of determining when to stop processing loop
    private static ushort spawnedAgentsThatHaveStoppedRunning = 0;

    public static void Run(string[] args)
    {
        //spawn worker agents, send messages and orchestrate work
        Agent.Current.SpawnAgents(numberOfAgentsToSpawn, args);
        Message msg;
        do
        {
            msg = Agent.Current.ReceiveAnyMessage();
            if (msg == null) continue; //we timed out
            switch (msg.MessageType)
            {
                //handle content types > -1 which are application specific
                //case 0-~:
                    //handle messages from master or other agents here
                    //break;
                case 2:
                    //handle messages from master or other agents here
                    Log.Info("AgentId {0} sent message type 2 with {1}", 
                        msg.FromId, msg.Content);

                    //this test/demo just sends the message back to the sender
                    Agent.Current.Send(new Message
                        {
                            FromId = Agent.Current.AgentId,
                            SessionId = Agent.Current.SessionId,
                            ToId = msg.FromId,
                            MessageType = SystemMessageTypes.Shutdown,
                            Content = null
                        });
                    break;

                //handle internal messages and unknown
                case SystemMessageTypes.Started:
                    Log.Info("AgentId {0} reports being started.", msg.FromId);
                    //send demo/test content message
                    Agent.Current.Send(new Message
                    {
                        FromId = Agent.Current.AgentId,
                        SessionId = Agent.Current.SessionId,
                        ToId = msg.FromId,
                        MessageType = 1,  //custom app type
                        Content = "hello from 1"
                    });
                    break;
                case SystemMessageTypes.Stopped:
                    Log.Info("AgentId {0} reports being stopped.", msg.FromId);
                    spawnedAgentsThatHaveStoppedRunning++;
                    break;
                case SystemMessageTypes.Aborted:
                    Log.Info("AgentId {0} reports being aborted.", msg.FromId);
                    spawnedAgentsThatHaveStoppedRunning++;
                    break;
                case SystemMessageTypes.Error:
                    Log.Info("AgentId {0} reports an error.", msg.FromId);
                    break;
                default:
                    Log.Info("AgentId {0} sent {1} with {2}", 
                        msg.FromId, msg.MessageType, msg.Content);
                    break;
            }
        }
        while (continueProcessing 
            && spawnedAgentsThatHaveStoppedRunning < numberOfAgentsToSpawn);
        
        //change while logic as desired to keep master running 
        //or shut it down and all other agents will as well
        Log.Info("done master");
    }
	
	
internal static class SpawnRunner	
{	
    private static bool continueProcessing = true;	

    public static void Run(string[] args)
    {
        Message msg;
        do
        {
            msg = Agent.Current.ReceiveAnyMessage();
            if (msg == null) continue; //we timed out
            switch (msg.MessageType)
            {
                //handle content types > -1 which are application specific
                case 1:
                    //handle messages from master or other agents here
                    Log.Info("AgentId {0} sent message type 1 with {1}", 
                        msg.FromId, msg.Content);

                    //this test/demo just sends the message back to the sender
                    Agent.Current.Send(new Message
                        {
                            FromId = Agent.Current.AgentId,
                            SessionId = Agent.Current.SessionId,
                            ToId = msg.FromId,
                            MessageType = 2,
                            Content = msg.Content.ToString() + " received"
                        });
                    break;

                //handle internal messages and unknown
                case SystemMessageTypes.Shutdown:
                    Log.Info("AgentId {0} sent shut down message", msg.FromId);
                    continueProcessing = false;
                    break;
                default:
                    Log.Info("AgentId {0} sent {1} with {2}", 
                        msg.FromId, msg.MessageType, msg.Content);
                    break;
            }
        }
        while (continueProcessing);
        Log.Info("work done");
    }
}

Distributed Parallel Computing for .NET with DuoVia.MpiVisor

I have not posted on my blog for some time. I’ve been busy. Work. Family. And DuoVia.Net. DuoVia Inc. has been my little corp-to-corp contracting vehicle which largely lays dormant while I’m employed in any other arrangement. I have no plans to change my current employment arrangement but I am transforming DuoVia into an open source services and tools for .NET organization.

It has taken a large number of weekends and evenings to pull it off, but I’m am now very excited to present DuoVia’s first three open source projects, new web site, and new vision. The site is nothing really to brag about. It is simple and clean. The vision is evolving. But the three very cool projects are something else.

DuoVia.MpiVisor
Don’t let anybody lie to you. Doing distributed parallel computing (DPC) on HPC or Hadoop or MPI can be very hard. Very complex. And for a advanced DPC systems that will not likely change.

But let’s face it. Many of the chores we need a distributed computing system for are small, down and dirty, one-off projects where something hard needs to be done once or perhaps occasionally. But the problem just doesn’t merit the hard work or the budget required to use an enterprise class system. So often as not the project just doesn’t get done or you solve the problem in a far less efficient and satisfying way.

You can get the “agent” library as a NuGet package. This will give you everything you need to write and debug your DPC application in Visual Studio without having to deploy it to your distributed processing servers or workstations. Once you’re ready to rock and roll, just get the code for the server at GitHub and run it on as many machines and you need.

That’s where DuoVia.MpiVisor comes in. Read more on the new web site www.duovia.net.

DuoVia.Net
Did you ever need to just write a simple interface, a couple of DTOs and a simple implementation of service, stand it up and use it without any configuration fuss? Don’t want to bother configuring WCF or spinning up a RESTful interface just to have to .NET apps exchange some data fast and easy? Look no further. Get the source code at GitHub or download the NuGet package today.

DuoVia.FuzzyStrings
Do you need to know if “Jensn” is probably equal to “Jensen”? Then the DuoVia.FuzzyStrings library is the perfect choice. You can download the source at GitHub or get the NuGet package.

Demo and Test Code on GitHub
If you want to learn how to use these libraries before I have time to fully document them and publish that on the site, please do go to GitHub and download the source. In each project you will find the source for a demo or test which shows just how easy these libraries are to use.

Apologies for the sales pitch, but I really do want you to visit the new web site and check it all out.

Of course I will continue to blog here about various and random .NET development topics. But I invite you to check out the new web site and more especially to give the new projects a try. You may find them useful and when you do, I hope to hear from you about your triumphs.