Wednesday, June 20, 2012

MapReduce illustrated

This is a great illustration of the MapReduce concept for anyone who tries to understand the algorithm intuitively.  I saw it at a Hadoop talk from Salesforce.com.



Basically it's a laundry operation that sorts socks first, then washes them with "like colors", of course.  :)  The sorting tables basically is the Map step processor, and the washers carry out the Reduce.  One important concept is that the Map usually uses a generic processor, which doesn't mind working on any subset of the data; on the other hand the Reduce step is usually data-specific, which in this example means, red washers only run with the red socks.  The whole operation is horizontally scalable in a near linear fashion, i.e., just add processing power (people and equipment - table or washer - in this case) to scale up the ability to handle larger volume.

Here's the original talk, with the MapReduce part run by Jed Crosby.

No comments:

Post a Comment