<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Parallel merge sort in Erlang</title>
	<atom:link href="http://yarivsblog.com/articles/2009/03/09/parallel-merge-sort-in-erlang/feed/" rel="self" type="application/rss+xml" />
	<link>http://yarivsblog.com/articles/2009/03/09/parallel-merge-sort-in-erlang/</link>
	<description>Adventures in Open Source Erlang</description>
	<lastBuildDate>Sun, 06 Dec 2009 10:29:08 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Jim Dukhovny</title>
		<link>http://yarivsblog.com/articles/2009/03/09/parallel-merge-sort-in-erlang/comment-page-1/#comment-439879</link>
		<dc:creator>Jim Dukhovny</dc:creator>
		<pubDate>Mon, 19 Oct 2009 05:07:18 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/?p=234#comment-439879</guid>
		<description>Can you just take a &quot;penalty&quot; on store rather then on retrieve?
If you need to keep top 100 posts of friends sorted by date then...
For every user keep &quot;Recent top 100&quot; posts and every time someone makes a posts you update all their friend&#039;s &quot;Recent top 100&quot; by putting it at #1 and kicking last one out.
I am a Front End guy:) so user&#039;s perception of Facebook working fast will be much higher if retrieve will be fast rather then save...
Just a subjective thought :)</description>
		<content:encoded><![CDATA[<p>Can you just take a &#8220;penalty&#8221; on store rather then on retrieve?<br />
If you need to keep top 100 posts of friends sorted by date then&#8230;<br />
For every user keep &#8220;Recent top 100&#8243; posts and every time someone makes a posts you update all their friend&#8217;s &#8220;Recent top 100&#8243; by putting it at #1 and kicking last one out.<br />
I am a Front End guy:) so user&#8217;s perception of Facebook working fast will be much higher if retrieve will be fast rather then save&#8230;<br />
Just a subjective thought :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris Hagan</title>
		<link>http://yarivsblog.com/articles/2009/03/09/parallel-merge-sort-in-erlang/comment-page-1/#comment-402598</link>
		<dc:creator>Chris Hagan</dc:creator>
		<pubDate>Sun, 26 Apr 2009 04:06:34 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/?p=234#comment-402598</guid>
		<description>Would you mind going into a little more detail about timing spawned processes in Erlang?  I&#039;m trying at the moment to benchmark a heavily parallel server, but what I&#039;m seeing is that timer:tc/4 seems to time the first process until it terminates, which isn&#039;t necessarily all the work that needed to be executed - the spawned processes continue until they also terminate, but the stopwatch has stopped already.  

In your map/reduce case, of course, you&#039;ve got your parent receiving so it&#039;s certain that the spawned processes are finished before the stopwatch stops, but that is a side effect of those spawned processes notifying their parent.

I don&#039;t want to have to end every process throughout my system by signalling a termination just so that I can benchmark.  What are your thoughts on a general practice for accurate timing of the interval until all work initiated by a method call is complete?</description>
		<content:encoded><![CDATA[<p>Would you mind going into a little more detail about timing spawned processes in Erlang?  I&#8217;m trying at the moment to benchmark a heavily parallel server, but what I&#8217;m seeing is that timer:tc/4 seems to time the first process until it terminates, which isn&#8217;t necessarily all the work that needed to be executed &#8211; the spawned processes continue until they also terminate, but the stopwatch has stopped already.  </p>
<p>In your map/reduce case, of course, you&#8217;ve got your parent receiving so it&#8217;s certain that the spawned processes are finished before the stopwatch stops, but that is a side effect of those spawned processes notifying their parent.</p>
<p>I don&#8217;t want to have to end every process throughout my system by signalling a termination just so that I can benchmark.  What are your thoughts on a general practice for accurate timing of the interval until all work initiated by a method call is complete?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tmd</title>
		<link>http://yarivsblog.com/articles/2009/03/09/parallel-merge-sort-in-erlang/comment-page-1/#comment-381953</link>
		<dc:creator>tmd</dc:creator>
		<pubDate>Thu, 19 Mar 2009 16:36:34 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/?p=234#comment-381953</guid>
		<description>&quot;When a user visits the site, you want to show her a list of all the recent updates from her friends, sorted by date.&quot;

Is Facebook just for women?</description>
		<content:encoded><![CDATA[<p>&#8220;When a user visits the site, you want to show her a list of all the recent updates from her friends, sorted by date.&#8221;</p>
<p>Is Facebook just for women?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Witek BARYLUK</title>
		<link>http://yarivsblog.com/articles/2009/03/09/parallel-merge-sort-in-erlang/comment-page-1/#comment-381888</link>
		<dc:creator>Witek BARYLUK</dc:creator>
		<pubDate>Thu, 19 Mar 2009 12:05:14 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/?p=234#comment-381888</guid>
		<description>You actually don&#039;t need fully sorted list in this case, just N first elemenets, so you mergers can stop if they ciollect this first N elements. Actually this can be also optimalised in qsort (after selection do only right recursion if needed).

I&#039;m also interested about more complex key used for comparission so it will be dominating (something like now() for date can be).</description>
		<content:encoded><![CDATA[<p>You actually don&#8217;t need fully sorted list in this case, just N first elemenets, so you mergers can stop if they ciollect this first N elements. Actually this can be also optimalised in qsort (after selection do only right recursion if needed).</p>
<p>I&#8217;m also interested about more complex key used for comparission so it will be dominating (something like now() for date can be).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ulf Wiger</title>
		<link>http://yarivsblog.com/articles/2009/03/09/parallel-merge-sort-in-erlang/comment-page-1/#comment-379581</link>
		<dc:creator>Ulf Wiger</dc:creator>
		<pubDate>Sun, 15 Mar 2009 21:32:37 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/?p=234#comment-379581</guid>
		<description>lists:sort/1 is not a BIF, actually. It&#039;s a merge sort algorithm implemented completely in Erlang. The one &quot;dirty trick&quot; it uses is calling lists:reverse(L1, L2), which is the same as doing lists:reverse(L1) ++ L2, but much faster (this is not esp dirty, since the function is documented...)

Your quicksort most likely takes a beating since it uses append.</description>
		<content:encoded><![CDATA[<p>lists:sort/1 is not a BIF, actually. It&#8217;s a merge sort algorithm implemented completely in Erlang. The one &#8220;dirty trick&#8221; it uses is calling lists:reverse(L1, L2), which is the same as doing lists:reverse(L1) ++ L2, but much faster (this is not esp dirty, since the function is documented&#8230;)</p>
<p>Your quicksort most likely takes a beating since it uses append.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael S.</title>
		<link>http://yarivsblog.com/articles/2009/03/09/parallel-merge-sort-in-erlang/comment-page-1/#comment-379550</link>
		<dc:creator>Michael S.</dc:creator>
		<pubDate>Sun, 15 Mar 2009 19:01:19 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/?p=234#comment-379550</guid>
		<description>Switching sort technique once the lists get small isn&#039;t really &quot;cheating&quot;--glibc&#039;s qsort switches to insertion sort when the sublists get small.  See

http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1.12.2.1&amp;content-type=text/x-cvsweb-markup&amp;cvsroot=glibc</description>
		<content:encoded><![CDATA[<p>Switching sort technique once the lists get small isn&#8217;t really &#8220;cheating&#8221;&#8211;glibc&#8217;s qsort switches to insertion sort when the sublists get small.  See</p>
<p><a href="http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1.12.2.1&amp;content-type=text/x-cvsweb-markup&amp;cvsroot=glibc" rel="nofollow">http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1.12.2.1&amp;content-type=text/x-cvsweb-markup&amp;cvsroot=glibc</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alex Arnon</title>
		<link>http://yarivsblog.com/articles/2009/03/09/parallel-merge-sort-in-erlang/comment-page-1/#comment-377275</link>
		<dc:creator>Alex Arnon</dc:creator>
		<pubDate>Thu, 12 Mar 2009 12:34:18 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/?p=234#comment-377275</guid>
		<description>Shifting around all these lists must also incur a penalty. What would happen when we&#039;re close to completion, and start sending lists of 100,000&#039;s of elements?

Maybe each worker should actually keep its sorted list, and then directly forward it to another (arbitrarily chosen by the master) once its work queue is truly empty. It could then wait for either more work, or a signal to terminate.</description>
		<content:encoded><![CDATA[<p>Shifting around all these lists must also incur a penalty. What would happen when we&#8217;re close to completion, and start sending lists of 100,000&#8217;s of elements?</p>
<p>Maybe each worker should actually keep its sorted list, and then directly forward it to another (arbitrarily chosen by the master) once its work queue is truly empty. It could then wait for either more work, or a signal to terminate.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jebu Ittiachen</title>
		<link>http://yarivsblog.com/articles/2009/03/09/parallel-merge-sort-in-erlang/comment-page-1/#comment-377091</link>
		<dc:creator>Jebu Ittiachen</dc:creator>
		<pubDate>Thu, 12 Mar 2009 05:48:41 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/?p=234#comment-377091</guid>
		<description>Nothing significant but combining the collect with the merge shaves a second off the merge and does better than the quick sort now :) in tune with what Jake mentioned about losing parallelism in between merge and collect

% mmerge process takes two elements off the list spawns merge
% merge merges them and send it back to the master, this adds it back 
% to the tail end of the list continue till list has one element and 
% no mergers are left
mmerge_all([List&#124;[]], 0) -&gt;
  List;
mmerge_all([L1, L2 &#124; Tl], N) -&gt;
  Parent = self(),
  spawn(
    fun() -&gt;
            Res = merge(L1, L2),
            Parent ! {list, Res}
    end),
  mmerge_all(Tl, N + 1);
mmerge_all(List, N) -&gt;
  L = 
    receive
      {list, L1} -&gt;
        lists:append(List, [L1])
    end,
  mmerge_all(L, N - 1).</description>
		<content:encoded><![CDATA[<p>Nothing significant but combining the collect with the merge shaves a second off the merge and does better than the quick sort now :) in tune with what Jake mentioned about losing parallelism in between merge and collect</p>
<p>% mmerge process takes two elements off the list spawns merge<br />
% merge merges them and send it back to the master, this adds it back<br />
% to the tail end of the list continue till list has one element and<br />
% no mergers are left<br />
mmerge_all([List|[]], 0) -&gt;<br />
  List;<br />
mmerge_all([L1, L2 | Tl], N) -&gt;<br />
  Parent = self(),<br />
  spawn(<br />
    fun() -&gt;<br />
            Res = merge(L1, L2),<br />
            Parent ! {list, Res}<br />
    end),<br />
  mmerge_all(Tl, N + 1);<br />
mmerge_all(List, N) -&gt;<br />
  L =<br />
    receive<br />
      {list, L1} -&gt;<br />
        lists:append(List, [L1])<br />
    end,<br />
  mmerge_all(L, N &#8211; 1).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt Williamson</title>
		<link>http://yarivsblog.com/articles/2009/03/09/parallel-merge-sort-in-erlang/comment-page-1/#comment-376596</link>
		<dc:creator>Matt Williamson</dc:creator>
		<pubDate>Wed, 11 Mar 2009 11:22:57 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/?p=234#comment-376596</guid>
		<description>It would be nice to see a number much bigger than 1,000,000 run across multiple machines.</description>
		<content:encoded><![CDATA[<p>It would be nice to see a number much bigger than 1,000,000 run across multiple machines.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jake Donham</title>
		<link>http://yarivsblog.com/articles/2009/03/09/parallel-merge-sort-in-erlang/comment-page-1/#comment-376026</link>
		<dc:creator>Jake Donham</dc:creator>
		<pubDate>Tue, 10 Mar 2009 17:53:36 +0000</pubDate>
		<guid isPermaLink="false">http://yarivsblog.com/?p=234#comment-376026</guid>
		<description>Seems like master / worker parallelism could be effective here. Instead of spawning a process for every pair of lists, keep a pool of processes, and have them request pairs of lists from the master then return the merged list. You really only need as many processes in the pool as you have processors.

Furthermore, in your code you proceed in phases (issue a bunch of merges, collect them all, repeat), so you lose some parallelism at the end of the phase when there are fewer active processes than processors. Instead, the master could return merged lists to the pool of lists needing merging (until there is just one list in the pool when you&#039;re done) so it&#039;s only at the end of the whole run that you have fewer tasks than processors.

As you discovered, the granularity of the tasks is important to balance parallelism against communication cost. There is a great but little-known book that you might be interested in on these topics: Carriero and Gelernter&#039;s How to Write Parallel Programs (http://www.amazon.com/dp/026203171X).</description>
		<content:encoded><![CDATA[<p>Seems like master / worker parallelism could be effective here. Instead of spawning a process for every pair of lists, keep a pool of processes, and have them request pairs of lists from the master then return the merged list. You really only need as many processes in the pool as you have processors.</p>
<p>Furthermore, in your code you proceed in phases (issue a bunch of merges, collect them all, repeat), so you lose some parallelism at the end of the phase when there are fewer active processes than processors. Instead, the master could return merged lists to the pool of lists needing merging (until there is just one list in the pool when you&#8217;re done) so it&#8217;s only at the end of the whole run that you have fewer tasks than processors.</p>
<p>As you discovered, the granularity of the tasks is important to balance parallelism against communication cost. There is a great but little-known book that you might be interested in on these topics: Carriero and Gelernter&#8217;s How to Write Parallel Programs (<a href="http://www.amazon.com/dp/026203171X)." rel="nofollow">http://www.amazon.com/dp/026203171X).</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>
