First, let us assume that distributed computing is generally that area of developing and running software designed to process large numbers of long running tasks on servers that are optimally proximal to the data being processed.
Second, let us agree, if for this discussion only, that distributed computing is NOT your collection of services on back end servers that support your service oriented architecture (SOA) for your web and mobile apps.
Third, let us presume that you are NOT already blessed with a job where you write distributed computing software.
How then can distributed computing be relevant to you? And how can you take advantage of distributed computing without becoming an expert in one of the several well known distributed computing platforms on the market today?
Both are excellent questions. Thank you for asking. Let’s try a practical approach.
Imagine you are at your desk and your boss comes to you and ask how fast your web servers respond to the customer. Of course, your first instinct is to write this program to find out:
private static void DoTenUrlsInParallel()
{
Console.WriteLine("Do 10 urls in parallel");
var sw = Stopwatch.StartNew();
ISpeedTest test = new SpeedTest();
Parallel.ForEach(TestUrls, (url) =>
{
var result = test.GetSpeed(url);
Console.WriteLine("r:{0}, s:{1}, b:{2}, u:{3}",
result.ResponseTimeMs, result.ReadStreamTimeMs,
result.ResponseLength, result.Url);
});
sw.Stop();
Console.WriteLine("Total elapsed time: {0}",
sw.ElapsedMilliseconds);
Console.WriteLine(string.Empty);
}
You take him the results and he says, “But isn’t this from your desk? I want to know what these numbers look like from all around the world. East and west U.S. North and west Europe. And east and south east Asia. And I want a regular stream of these numbers fed into a spreadsheet for me every day.”
Do you say, “No problem.” You do if you have a Windows Azure account and you know about the distributed task parallel library from DuoVia called DuoVia.Net.Distributed. You go back to your desk and modify the code to look like this:
private static void DoTenUrlsThreeTimesEachAroundTheWorldInParallel(bool runLocal = false)
{
var serverEndpoints = new IPEndPoint[0];
if (runLocal)
{
serverEndpoints = new IPEndPoint[] { new IPEndPoint(IPAddress.Parse("127.0.0.1"), 9096) };
}
else
{
//these server names are temporary - to run this test use your own
var servers = new string[]
{
"myaz-westus.cloudapp.net",
"myaz-eastus.cloudapp.net",
"myaz-northeu.cloudapp.net",
"myaz-westeu.cloudapp.net",
"myaz-soeastasia.cloudapp.net",
"myaz-eastasia.cloudapp.net"
};
serverEndpoints = new IPEndPoint[servers.Length];
for (int i = 0; i < servers.Length; i++)
{
var host = Dns.GetHostAddresses(servers[i]);
var ip = (from n in host
where n.AddressFamily == AddressFamily.InterNetwork
select n).First();
serverEndpoints[i] = new IPEndPoint(ip, 9096);
}
}
float subscriptionRate = 2.0f; //oversubscribed
int logPollingIntervalSeconds = 2;
using (DistributedClient<ISpeedTest> client =
Distributor.Connect<ISpeedTest>(typeof(SpeedTest),
subscriptionRate,
logPollingIntervalSeconds,
LogLevel.Debug,
serverEndpoints))
{
for (int i = 0; i < 3; i++)
{
var sw = Stopwatch.StartNew();
Console.WriteLine(@"round:{0}", i + 1);
var loopResult = client.ForEach(TestUrls, (url, proxy) => proxy.GetSpeed(url));
foreach (var result in loopResult.Results)
{
Console.WriteLine(@"r:{0}, s:{1}, b:{2}, on: {3}, u:{4}",
result.ResponseTimeMs, result.ReadStreamTimeMs,
result.ResponseLength, result.MachineName, result.Url);
}
sw.Stop();
Console.WriteLine("Total elapsed time: {0}", sw.ElapsedMilliseconds);
Console.WriteLine(string.Empty);
}
}
}
And you and your boss are happy.
Sometimes distributed computing is more about location and proximity to data or infrastructure than it is to getting massive amounts of data processed in as little time as possible.
You can find the full demo source code here.