Category Archive

The following is a list of all entries from the Parallelism category.

Cloudera building a business around open source Map Reduce

The heavy hitting ex-executives behind start up CloudEra are banking on a business based around Hadoop, the open source Map Reduce implementation with a distribution capable of running on Amazon’s EC2.   Google is credited with popularising (inventing) Map Reduce and has been tuning its own implementation for many years.  It gave insights into the origin and future research direction in a round table video last year.

Increasingly companies need to make sense of Terabytes or even Petabytes of data.  This information is stored across many machines on many disks, and needs distributed algorithms for sifting through the data in any reasonable time.  This is where Map-Reduce comes in.

Interestingly Microsoft has taken a step back from this direction when with deciding that its SDS offering should support standard ‘relational’ features, in effect turning the product into a hosted SQL Server cloud.

It has however been active in this research field.  It released its functional programming language F# and it runs its ad serving on Dryad – a distributed execution software engine.  DryadLINQ combines the power of this engine, with the simplicity of LINQ by creating a SQL-like execution plan for distributed processing, very cool! 

Large scale distributed processing software typically runs on many low grade Linux servers running open source software so that licensing costs are kept low.  However with the army of MS developers out there, there are companies springing up to provide software to make the most out of idle cycles on Windows boxes around the network.  Manjrasoft a recent graduate from Melbourne University’s GridBus laboratory have released an Alpha of their Aneka software – a .NET Map Reduce implementation.


The future is parallel

Although most of us are only running dual core computers, if you have a spare £2000 you can pickup up this G71 quad core notebook from Ausus, and if you belive the hype we will be hitting 80 cores by 2014.

While hardware plays catch up up with moore’s law, Microsoft’s next generation .NET 4 platform  will have a native concurrency run-time.  This is already available in the form of VS 2010 CTP and can be configured run with Windows 2008 Hyper-V to test out the running on virtualized multiple cores.

Parallel programming brings a host or problems for developers when it comes to debugging and diagnosing problems.  It is great to see Microsoft helping to ease this with initiatives such as PLINQ, and the interesting (albeit experimental) Transactional Memory group which looks at how code might be modelled to run as efficiently as database logic for example. 

F# will ship with VS 2010 and is a powerful functional language that is naturally immutable, compiles to .NET IL and supports erlang style message passing.

MPI.NET library enabling scientific cloud computing

The Message Passing Interface MPI standard to supporting writing software that run across many machines.  It has been used by the scientific community for high performance numerical libraries typically written in C or FORTRAN.

The University of Indiana recently released MPI.NET which requires the Microsoft Computer Server Cluster SDK on Windows 2003, or is naively supported on Windows Server 2008 HPC.

Cloud computing is shifting the way software is developed, so it probably won’t be long before native MPI implementations are supported in respective platforms such as Amazon’s EC2 with python, or naively on Microsoft’s forthcoming clustered offering.

Most software leveraging MPI such as PETSc is programmed in C.  Object oriented libraries have proved popular to programmers, and more recently functional languages such as Microsoft’s new F# (based on OCAML) have taken off as an alternative for mathematical libraries as seen in this blog post by Matthew Podwysocki.

High performance Pub/Sub .NET libraries

In my search for a high-performance publish/subscribe library i again stumbled across the MS robotics CCR (Concurrency and Coordination Runtime).  The April CTP was just released an includes a number of new features including LINQ support.  The CCR remains stable and relatively unchanged, but new C# language features of lambdas and inline iterators open up new possibilities of writing streamlined async code and reduces problems with handling exceptions across threads due to the notion of Ports for success & failure scenarios.

Developers have typically steered clear of CCR given it is only bundled with MS Robotics studio (opting for alternatives such as the new TPL concurrency library), however the 200k CCR library can be shipped with your software at worst case costing US$2 per unit.

Some great examples include using CCR with ASP.NET to handle async requests showcasing iterators with yield keyword.  Another being a web crawler by Angle ‘Java’ Lopez which also leverages the DSS (decentralised software services).

An alternatives to this in-memory concurrent processing is publish/subscribe messaging over a reliable transport such as MSMQ some libraries include:

Async or concurrent programming is getting easier, and importantly more reliable.  Expect more posts and samples in the future.

Concurrent programming lesson with Erlang

I recently watched a very good presentation by Joe Armstrong – the inventor of Erlang, the functionality language originally developped for Ericsson telecom infrastructure products.  He is a very eloquent speaker, taking through the history of computing, explaining that moore’s law trajectory may in fact be restored if Network On Chip architectures take off in a big way.

I really like messaging framework approach to parallelism and immutable types are at the heart of this approach – a native construct in functional language such as F# on the .net platform.  Joe Duffy has written an article on archiving a degree of immutability in C#.

I dug a little deeper and came across a great open source library ‘Retlang‘ written by Mike Rettig (it’s name borrowing heavily from Erlang).  It currently uses native .NET 2 constructs, but I’m sure would benefit from running on top of Microsoft’s new parallel initiatives, especially as they mature in the future.