Sevenforge Sevenforge by Curtis Spencer home

Meguro Javascript Documentation

Javascript Map/Reduce Functions

map

function map(key,value)

map gets passed a key and a value from the iterator. If you are iterating through a Tokyo Cabinet Hash file, then the key and value parameters are set to their corresponding key/values in that key/value pair of the current entry in the .tch file. When you are iterating line by line through a text file, there is no meaningful key, so the key is empty and the value is the content of the line.

In the map step, call Meguro.emitor Meguro.set 0 to N times depending on the task at hand. Pure map tasks, which means at most one call to emit or set per unique key, are also possible (and encouraged!).

reduce

function reduce(key,values)

reduce gets passed a key and array of values from the map step. In the reduce step, the values array should be converted to a single value. Inside reduce, call Meguro.save to save the key and final value for a key into the reducer's output file. If you call Meguro.save multiple times on the same key, the key will be overwritten with the most recent call.

Javascript Meguro Object Methods

Meguro.log

Meguro.log(str)

Logging is currently the best (and only) way to debug. Pass Meguro.log a string and it will log it to the shiny console. One tip is to use JSON.stringify(yourObj) if you have something more complex, and you can see your object's structure in clean JSON.

Meguro.emit

Meguro.emit(key,value)

emit is the core of Map/Reduce. This method only works inside of map calls. It allows you to write a key/value pair to be appended to the values array used in the reducer. For example, the following code will emit a count of 1 for every word that is the value. If the word "backpack" was encountered in the mapper 5 times, the reduce function would receive the value parameter of ['1','1','1','1','1'].

function map(key,value) {
    var words = value.split(/\s+/);
    for(var i=0; i < words.length; i++) {
        Meguro.emit(words[i],1)
    }
}

IMPORTANT: Meguro converts both parameters to strings, so don't expect the object to be serialized. If you want to get back an object in the reduce step, use JSON.stringify to serialize the object you pass in as the second parameter to emit, and then JSON.parse in your reduce step.

Meguro.save

Meguro.save(key,value)

save is used inside the reduce function. It is the final form of output that tells Meguro to save the key value pair to the reducer output tokyo cabinet file. If you are doing a pure map task (one that only has one output per key), then you will not need to write a reduce function or use save. However, if you emit more than once for a given key K, then you will want to reduce the multiple values of K to a single value in the reduce step. This is when you want to use save. Both parameters of save are converted to strings for the output step. If you want to save complicated objects, a common use case is to use JSON.stringify to properly serialize them.

function reduce(key,values) {
    var sum = 0;
    for(var i=0; i < values.length; i++) {
        sum += parseInt(values[i]);
    }
    Meguro.save(key,sum);
}

Fork me on GitHub