Fluent.Interface


Cloudera building a business around open source Map Reduce

The heavy hitting ex-executives behind start up CloudEra are banking on a business based around Hadoop, the open source Map Reduce implementation with a distribution capable of running on Amazon’s EC2.   Google is credited with popularising (inventing) Map Reduce and has been tuning its own implementation for many years.  It gave insights into the origin and future research direction in a round table video last year.

Increasingly companies need to make sense of Terabytes or even Petabytes of data.  This information is stored across many machines on many disks, and needs distributed algorithms for sifting through the data in any reasonable time.  This is where Map-Reduce comes in.

Interestingly Microsoft has taken a step back from this direction when with deciding that its SDS offering should support standard ‘relational’ features, in effect turning the product into a hosted SQL Server cloud.

It has however been active in this research field.  It released its functional programming language F# and it runs its ad serving on Dryad – a distributed execution software engine.  DryadLINQ combines the power of this engine, with the simplicity of LINQ by creating a SQL-like execution plan for distributed processing, very cool! 

Large scale distributed processing software typically runs on many low grade Linux servers running open source software so that licensing costs are kept low.  However with the army of MS developers out there, there are companies springing up to provide software to make the most out of idle cycles on Windows boxes around the network.  Manjrasoft a recent graduate from Melbourne University’s GridBus laboratory have released an Alpha of their Aneka software – a .NET Map Reduce implementation.