<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: More Erlang Fun: Distributed, Fault Tolerant MapReduce</title>
	<atom:link href="http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/feed/" rel="self" type="application/rss+xml" />
	<link>http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/</link>
	<description>Adventures in Open Source Erlang</description>
	<lastBuildDate>Sun, 06 Dec 2009 10:29:08 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Tim Dysinger</title>
		<link>http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/comment-page-1/#comment-269450</link>
		<dc:creator>Tim Dysinger</dc:creator>
		<pubDate>Mon, 06 Oct 2008 19:47:29 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/#comment-269450</guid>
		<description>The other things that are needed are

1- ability to add new nodes during the (sometimes) long run - &quot;alternates&quot; to step in for failed nodes.

4- ability to &quot;combine&quot; (local mini-reduce - post map/pre-partition) 

2- ability to partition (pre-reduce function that determines reduce grouping)

3- doing the reduction on romote nodes (not all on the master)</description>
		<content:encoded><![CDATA[<p>The other things that are needed are</p>
<p>1- ability to add new nodes during the (sometimes) long run &#8211; &#8220;alternates&#8221; to step in for failed nodes.</p>
<p>4- ability to &#8220;combine&#8221; (local mini-reduce &#8211; post map/pre-partition) </p>
<p>2- ability to partition (pre-reduce function that determines reduce grouping)</p>
<p>3- doing the reduction on romote nodes (not all on the master)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Dysinger</title>
		<link>http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/comment-page-1/#comment-267483</link>
		<dc:creator>Tim Dysinger</dc:creator>
		<pubDate>Sat, 04 Oct 2008 15:06:30 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/#comment-267483</guid>
		<description>Here&#039;s the hadoop word count ported to erlang as eunit test for the mapreduce module -&gt; (sorry for the formatting yarivs, maybe you can fix it.)

%%% Port of -&gt; http://wiki.apache.org/hadoop/WordCount
word_count_test() -&gt;
    Words = [the, quick, brown, fox, jumped, over, the, lazy, dog],
    Result =
        mapreduce(fun(X) -&gt; {X, 1} end,
                  fun({Word, _}, Dict) -&gt;
                          case dict:is_key(Word, Dict) of
                              true -&gt; dict:store(Word,
                                                 dict:fetch(Word, Dict) + 1,
                                                 Dict);
                              false -&gt; dict:store(Word, 1, Dict)
                          end
                  end,
                  dict:new(),
                  Words,
                  [node()]),
    [ ?assertEqual(case Word of the -&gt; 2; _ -&gt; 1 end,
                   dict:fetch(Word, Result)) &#124;&#124; Word &lt;- Words ].</description>
		<content:encoded><![CDATA[<p>Here&#8217;s the hadoop word count ported to erlang as eunit test for the mapreduce module -&gt; (sorry for the formatting yarivs, maybe you can fix it.)</p>
<p>%%% Port of -&gt; <a href="http://wiki.apache.org/hadoop/WordCount" rel="nofollow">http://wiki.apache.org/hadoop/WordCount</a><br />
word_count_test() -&gt;<br />
    Words = [the, quick, brown, fox, jumped, over, the, lazy, dog],<br />
    Result =<br />
        mapreduce(fun(X) -&gt; {X, 1} end,<br />
                  fun({Word, _}, Dict) -&gt;<br />
                          case dict:is_key(Word, Dict) of<br />
                              true -&gt; dict:store(Word,<br />
                                                 dict:fetch(Word, Dict) + 1,<br />
                                                 Dict);<br />
                              false -&gt; dict:store(Word, 1, Dict)<br />
                          end<br />
                  end,<br />
                  dict:new(),<br />
                  Words,<br />
                  [node()]),<br />
    [ ?assertEqual(case Word of the -&gt; 2; _ -&gt; 1 end,<br />
                   dict:fetch(Word, Result)) || Word &lt;- Words ].</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Dysinger</title>
		<link>http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/comment-page-1/#comment-267430</link>
		<dc:creator>Tim Dysinger</dc:creator>
		<pubDate>Sat, 04 Oct 2008 13:51:52 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/#comment-267430</guid>
		<description>I made a small change (monitor_node call needs a boolean arg) and added some eunit tests:


-module(mapreduce).
-export([mapreduce/5]).

-ifdef(TEST).
-include_lib(&quot;eunit/include/eunit.hrl&quot;).
-endif.

mapreduce(F1, F2, Acc, L, Nodes) -&gt;
    %% map F1 to the elements of L on nodes from Nodes
    Results = do_map(F1, L, Nodes),

    %% collect the results, retrying in the event of failures
    Vals = collect(F1, Results, Nodes),

    %% perform the reduce operation
    lists:foldl(F2, Acc, Vals).


%% exit if we ran out of good nodes
do_map(_F1, _L, []) -&gt; exit(no_nodes);
do_map(F1, L, Nodes) -&gt;
    Parent = self(),

    %% apply F1 to values of L on remote nodes in a rotating fashion
    {Results, _Nodes1} =
        lists:foldl(
          fun(X, {Acc, [Node &#124; Rest]}) -&gt;
                  erlang:monitor_node(Node, true),
                  Fun = fun() -&gt; Parent ! {ok, self(), (catch F1(X))} end,
                  Pid = spawn(Node, Fun),
                  {[{Pid, X} &#124; Acc], Rest ++ [Node]}
          end, {[], Nodes}, L),
    Results.

collect(F1, Results, Nodes) -&gt;
    collect(F1, Results, Nodes, []).
collect(F1, Results, Nodes, Acc) -&gt;
    {Successes, Failures, RemainingNodes}=
        lists:foldl(
          fun({Pid, X}, {Successes1, Failures1, Nodes1}) -&gt;
                  Node = node(Pid),
                  receive
                      {ok, Pid, Val} -&gt;
                          {[Val &#124; Successes1], Failures1, Nodes1};

                      %% we may receive this message because of call to
                      %% monitor_node()
                      {nodedown, Node} -&gt;
                          {Successes1, [X &#124; Failures1], Nodes1 -- Node}
                  end
          end, {[], [], Nodes}, Results),

    if Failures =/= [] -&gt;
            %% retry the failed computations on the remaining nodes
            %% and add the results to the current list of successes
            Results2 = do_map(F1, Failures, RemainingNodes),
            collect(F1, Results2, RemainingNodes, Successes ++ Acc);
       true -&gt;
            Successes ++ Acc
    end.

-ifdef(TEST).

pass_through_test() -&gt;
    ?assertEqual([one, two, three],
                 mapreduce(fun(X) -&gt; X end,
                           fun(X, Acc) -&gt; lists:append(Acc, [X]) end,
                           [],
                           [one, two, three],
                           [node()])).

multiply_by_2_no_reduce_test() -&gt;
    ?assertEqual([2, 4, 6],
                 mapreduce(fun(X) -&gt; X * 2 end,
                           fun(X, Acc) -&gt; lists:append(Acc, [X]) end,
                           [],
                           [1, 2, 3],
                           [node()])).

multiply_by_2_and_reduce_test() -&gt;
    ?assertEqual([2, 4, 6],
                 mapreduce(fun(X) -&gt; X * 2 end,
                           fun(X, Acc) -&gt;
                                   case lists:member(X, Acc) of
                                       false -&gt; lists:append(Acc, [X]);
                                       _ -&gt; Acc
                                   end
                           end,
                           [],
                           [1, 2, 2, 1, 3],
                           [node()])).

-endif.
</description>
		<content:encoded><![CDATA[<p>I made a small change (monitor_node call needs a boolean arg) and added some eunit tests:</p>
<p>-module(mapreduce).<br />
-export([mapreduce/5]).</p>
<p>-ifdef(TEST).<br />
-include_lib(&#8221;eunit/include/eunit.hrl&#8221;).<br />
-endif.</p>
<p>mapreduce(F1, F2, Acc, L, Nodes) -&gt;<br />
    %% map F1 to the elements of L on nodes from Nodes<br />
    Results = do_map(F1, L, Nodes),</p>
<p>    %% collect the results, retrying in the event of failures<br />
    Vals = collect(F1, Results, Nodes),</p>
<p>    %% perform the reduce operation<br />
    lists:foldl(F2, Acc, Vals).</p>
<p>%% exit if we ran out of good nodes<br />
do_map(_F1, _L, []) -&gt; exit(no_nodes);<br />
do_map(F1, L, Nodes) -&gt;<br />
    Parent = self(),</p>
<p>    %% apply F1 to values of L on remote nodes in a rotating fashion<br />
    {Results, _Nodes1} =<br />
        lists:foldl(<br />
          fun(X, {Acc, [Node | Rest]}) -&gt;<br />
                  erlang:monitor_node(Node, true),<br />
                  Fun = fun() -&gt; Parent ! {ok, self(), (catch F1(X))} end,<br />
                  Pid = spawn(Node, Fun),<br />
                  {[{Pid, X} | Acc], Rest ++ [Node]}<br />
          end, {[], Nodes}, L),<br />
    Results.</p>
<p>collect(F1, Results, Nodes) -&gt;<br />
    collect(F1, Results, Nodes, []).<br />
collect(F1, Results, Nodes, Acc) -&gt;<br />
    {Successes, Failures, RemainingNodes}=<br />
        lists:foldl(<br />
          fun({Pid, X}, {Successes1, Failures1, Nodes1}) -&gt;<br />
                  Node = node(Pid),<br />
                  receive<br />
                      {ok, Pid, Val} -&gt;<br />
                          {[Val | Successes1], Failures1, Nodes1};</p>
<p>                      %% we may receive this message because of call to<br />
                      %% monitor_node()<br />
                      {nodedown, Node} -&gt;<br />
                          {Successes1, [X | Failures1], Nodes1 &#8212; Node}<br />
                  end<br />
          end, {[], [], Nodes}, Results),</p>
<p>    if Failures =/= [] -&gt;<br />
            %% retry the failed computations on the remaining nodes<br />
            %% and add the results to the current list of successes<br />
            Results2 = do_map(F1, Failures, RemainingNodes),<br />
            collect(F1, Results2, RemainingNodes, Successes ++ Acc);<br />
       true -&gt;<br />
            Successes ++ Acc<br />
    end.</p>
<p>-ifdef(TEST).</p>
<p>pass_through_test() -&gt;<br />
    ?assertEqual([one, two, three],<br />
                 mapreduce(fun(X) -&gt; X end,<br />
                           fun(X, Acc) -&gt; lists:append(Acc, [X]) end,<br />
                           [],<br />
                           [one, two, three],<br />
                           [node()])).</p>
<p>multiply_by_2_no_reduce_test() -&gt;<br />
    ?assertEqual([2, 4, 6],<br />
                 mapreduce(fun(X) -&gt; X * 2 end,<br />
                           fun(X, Acc) -&gt; lists:append(Acc, [X]) end,<br />
                           [],<br />
                           [1, 2, 3],<br />
                           [node()])).</p>
<p>multiply_by_2_and_reduce_test() -&gt;<br />
    ?assertEqual([2, 4, 6],<br />
                 mapreduce(fun(X) -&gt; X * 2 end,<br />
                           fun(X, Acc) -&gt;<br />
                                   case lists:member(X, Acc) of<br />
                                       false -&gt; lists:append(Acc, [X]);<br />
                                       _ -&gt; Acc<br />
                                   end<br />
                           end,<br />
                           [],<br />
                           [1, 2, 2, 1, 3],<br />
                           [node()])).</p>
<p>-endif.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Yariv</title>
		<link>http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/comment-page-1/#comment-98873</link>
		<dc:creator>Yariv</dc:creator>
		<pubDate>Wed, 13 Feb 2008 04:31:25 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/#comment-98873</guid>
		<description>Oops, it was a late change that I didn&#039;t check with the compiler. Thanks for pointing it out.</description>
		<content:encoded><![CDATA[<p>Oops, it was a late change that I didn&#8217;t check with the compiler. Thanks for pointing it out.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Duane Johnson</title>
		<link>http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/comment-page-1/#comment-98845</link>
		<dc:creator>Duane Johnson</dc:creator>
		<pubDate>Wed, 13 Feb 2008 03:07:11 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/#comment-98845</guid>
		<description>I&#039;m fairly new to Erlang, so please excuse my ignorance.  I was wondering if the module should export mapreduce/5 since there doesn&#039;t seem to be a 4-argument mapreduce?</description>
		<content:encoded><![CDATA[<p>I&#8217;m fairly new to Erlang, so please excuse my ignorance.  I was wondering if the module should export mapreduce/5 since there doesn&#8217;t seem to be a 4-argument mapreduce?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: links for 2008-02-12 &#171; Bloggitation</title>
		<link>http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/comment-page-1/#comment-98284</link>
		<dc:creator>links for 2008-02-12 &#171; Bloggitation</dc:creator>
		<pubDate>Tue, 12 Feb 2008 00:19:53 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/#comment-98284</guid>
		<description>[...] More Erlang Fun: Distributed, Fault Tolerant MapReduce (tags: erlang programming blog cluster 247up) [...]</description>
		<content:encoded><![CDATA[<p>[...] More Erlang Fun: Distributed, Fault Tolerant MapReduce (tags: erlang programming blog cluster 247up) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Luke Hoersten</title>
		<link>http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/comment-page-1/#comment-98186</link>
		<dc:creator>Luke Hoersten</dc:creator>
		<pubDate>Mon, 11 Feb 2008 19:41:05 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/articles/2008/02/10/more-erlang-fun-distributed-fault-tolerant-mapreduce/#comment-98186</guid>
		<description>I noticed this is kind of an extended version to the question I asked you over Facebook. Very nice.</description>
		<content:encoded><![CDATA[<p>I noticed this is kind of an extended version to the question I asked you over Facebook. Very nice.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
