It is not possible to add the new link because it would replace an existing, longer link.
Line 7: | Line 7: | ||
In practice, the input file will not have the exact number of runs needed for a perfect distribution. One way to deal with this is by padding the actual distribution with imaginary "dummy runs" to simulate an ideal run distribution.<ref name=Knuth1973/> A dummy run behaves like a run with no records in it. Merging one or more dummy runs with one or more real runs just merges the real runs, and merging one or more dummy runs with no real runs results in a single dummy run. Another approach is to emulate dummy runs as needed during the merge operations.<ref>{{Cite web| title=The F-fibonaccl numbers and polyphase sorting | url=https://www.fq.math.ca/Scanned/8-1/lynch.pdf | archive-url=https://web.archive.org/web/20150704151507/http://www.fq.math.ca/Scanned/8-1/lynch.pdf | archive-date=2015-07-04}}</ref> |
In practice, the input file will not have the exact number of runs needed for a perfect distribution. One way to deal with this is by padding the actual distribution with imaginary "dummy runs" to simulate an ideal run distribution.<ref name=Knuth1973/> A dummy run behaves like a run with no records in it. Merging one or more dummy runs with one or more real runs just merges the real runs, and merging one or more dummy runs with no real runs results in a single dummy run. Another approach is to emulate dummy runs as needed during the merge operations.<ref>{{Cite web| title=The F-fibonaccl numbers and polyphase sorting | url=https://www.fq.math.ca/Scanned/8-1/lynch.pdf | archive-url=https://web.archive.org/web/20150704151507/http://www.fq.math.ca/Scanned/8-1/lynch.pdf | archive-date=2015-07-04}}</ref> |
||
"Optimal" distribution algorithms require knowing the number of runs in advance. Otherwise, in the more common case where the number of runs is not known in advance, "near optimal" distribution algorithms are used. Some distribution algorithms include rearranging runs.<ref>{{Cite web| title=Optimal polyphase sorting | url=http://i.stanford.edu/pub/cstr/reports/cs/tr/76/543/CS-TR-76-543.pdf | archive-url=https://web.archive.org/web/20170809021713/http://i.stanford.edu/pub/cstr/reports/cs/tr/76/543/CS-TR-76-543.pdf | archive-date=2017-08-09}}</ref> If the number of runs is known in advance, only a partial distribution is needed before starting the merge phases. For example, consider the 3 file case, starting with ''n'' runs in File_1. Define ''F<sub>i</sub>'' = ''F''<sub>''i''−1</sub> + F<sub>''i''−2</sub> as the ''i''th [[Fibonacci |
"Optimal" distribution algorithms require knowing the number of runs in advance. Otherwise, in the more common case where the number of runs is not known in advance, "near optimal" distribution algorithms are used. Some distribution algorithms include rearranging runs.<ref>{{Cite web| title=Optimal polyphase sorting | url=http://i.stanford.edu/pub/cstr/reports/cs/tr/76/543/CS-TR-76-543.pdf | archive-url=https://web.archive.org/web/20170809021713/http://i.stanford.edu/pub/cstr/reports/cs/tr/76/543/CS-TR-76-543.pdf | archive-date=2017-08-09}}</ref> If the number of runs is known in advance, only a partial distribution is needed before starting the merge phases. For example, consider the 3 file case, starting with ''n'' runs in File_1. Define ''F<sub>i</sub>'' = ''F''<sub>''i''−1</sub> + F<sub>''i''−2</sub> as the ''i''th [[Fibonacci]] number. If ''n'' = ''F<sub>i</sub>'', then move ''F''<sub>''i''−2</sub> runs to File_2, leaving ''F''<sub>''i''−1</sub> runs remaining on File_1, a perfect run distribution. If ''F<sub>i</sub>'' < ''n'' < ''F''<sub>''i''+1</sub>, move ''n''−''F<sub>i</sub>'' runs to File_2 and ''F''<sub>''i''+1</sub>−''n'' runs to File_3. The first merge iteration merges ''n''−''F<sub>i</sub>'' runs from File_1 and File_2, appending the ''n''−''F<sub>i</sub>'' merged runs to the ''F''<sub>''i''+1</sub>−''n'' runs already moved to File_3. File_1 ends up with ''F''<sub>''i''−2</sub> runs remaining, File_2 is emptied, and File_3 ends up with ''F''<sub>''i''−1</sub> runs, again a perfect run distribution. For 4 or more files, the math is more complicated, but the concept is the same. |