Step to UEFI (159）UEFI 下的排序

大多数情况下，2层循环的冒泡排序对于一般用户已经足够，但是如果对于速度有要求，那就需要考虑效率更高的排序算法了。最近偶然看到 MdeModulePkg\Include\Library\SortLib.h 提供的排序功能，研究之后编写如下的例子。
这个库的核心是PerformQuickSort() 函数，调用者给出需要排序的 buffer，然后给出其中每一个元素的大小和数量，然后这个函数还会调用作为参数指定的CompareFunction函数。通过这个函数，不但可以比较数字或者字符串，还可以比较自动定义的规则，比方说设定书记>厂长>科长之类的……

示例代码如下：

/**
  Function to perform a Quick Sort on a buffer of comparable elements.

  Each element must be equally sized.

  If BufferToSort is NULL, then ASSERT.
  If CompareFunction is NULL, then ASSERT.

  If Count is < 2 , then perform no action.
  If Size is < 1 , then perform no action.

  @param[in, out] BufferToSort   On call, a Buffer of (possibly sorted) elements;
                                 on return, a buffer of sorted elements.
  @param[in]  Count              The number of elements in the buffer to sort.
  @param[in]  ElementSize        The size of an element in bytes.
  @param[in]  CompareFunction    The function to call to perform the comparison
                                 of any two elements.
**/
VOID
EFIAPI
PerformQuickSort (
  IN OUT VOID                   *BufferToSort,
  IN CONST UINTN                Count,
  IN CONST UINTN                ElementSize,
  IN       SORT_COMPARE         CompareFunction
  );
代码如下：
#include <stdlib.h>
#include <stdio.h>
#include <Uefi.h>
#include <Library/UefiLib.h>
#include <Library/ShellCEntryLib.h>
#include <Library/MemoryAllocationLib.h>
#include <Library/BaseMemoryLib.h>
#include <Library/SortLib.h>

#define TOTAL           100
#define NAMELENGTH        4

/** Expands to an integer constant expression that is the maximum value
    returned by the rand function.
**/
#define RAND_MAX  0x7fffffff
static UINT32 next = 1;

typedef struct {
            char Name[NAMELENGTH+1];
            int  Score;
} TScoreRecord;

/** Compute a pseudo-random number.
  *
  * Compute x = (7^5 * x) mod (2^31 - 1)
  * without overflowing 31 bits:
  *      (2^31 - 1) = 127773 * (7^5) + 2836
  * From "Random number generators: good ones are hard to find",
  * Park and Miller, Communications of the ACM, vol. 31, no. 10,
  * October 1988, p. 1195.
**/
int
rand()
{
  INT32 hi, lo, x;

  /* Can't be initialized with 0, so use another value. */
  if (next == 0)
    next = 123459876;
  hi = next / 127773;
  lo = next % 127773;
  x = 16807 * lo - 2836 * hi;
  if (x < 0)
    x += 0x7fffffff;
  return ((next = x) % ((UINT32)RAND_MAX + 1));
}

/**
  Test comparator.

  @param[in] b1   The first INTN
  @param[in] b2   The other INTN

  @retval 0       They are the same.
  @retval -1      b1 is less than b2
  @retval 1       b1 is greater then b2
**/
INTN
EFIAPI
Compare (CONST TScoreRecord *b1, CONST TScoreRecord *b2)
{
  if ((b1->Score)<(b2->Score)) {
          return -1;
  }
  if ((b1->Score)>(b2->Score)) {
          return 1;
  }
  return (0);
}

int main()
{       
        TScoreRecord   *SR;
        int             Index,NameIndex;
        
        SR = AllocatePool (sizeof(TScoreRecord)*TOTAL);
        
        for (Index=0;Index<TOTAL;Index++) {
            //Generate a name
            for (NameIndex=0;NameIndex<NAMELENGTH;NameIndex++) {
                        SR[Index].Name[NameIndex]=(rand()%26)+'A';
                }
            SR[Index].Name[NAMELENGTH]=0;
            //Generate a score
            SR[Index].Score=rand()%150;
            //Output
            printf("[%d] %s %d\n",Index,SR[Index].Name,SR[Index].Score);
        }

        //Sort
        PerformQuickSort (SR,TOTAL,sizeof(TScoreRecord),Compare);

        //Show the sort result
        for (Index=0;Index<TOTAL;Index++) {
            printf("[%d] %s %d\n",Index,SR[Index].Name,SR[Index].Score);
        }        
        
        FreePool(SR);
        
        return 0;
}

运行结果：

完整代码和EFI 下载：

SortTest

此外，还有一个有意思的问题：如果Compare (CONST TScoreRecord *b1, CONST TScoreRecord *b2) 这里定义为Compare (TScoreRecord *b1, TScoreRecord *b2)，那么在编译时，会在PerformQuickSort (SR,TOTAL,sizeof(TScoreRecord),Compare); 报告C4090错误。因为编译器报告错误位置和实际位置着实不同，为了找到原因着实让我花费了不少时间。

《Step to UEFI (159）UEFI 下的排序》有3个想法

C 标准库有 qsort：https://en.cppreference.com/w/c/algorithm/qsort
C++ 标准库有 std::sort：https://en.cppreference.com/w/cpp/algorithm/sort

据说 benchmark 下来 C++ 的 std::sort 比 C 标准库的 qsort 快，原因在于 C++ 会 inline 这个 compare：https://stackoverflow.com/q/4708105/1190191

说到性能，算法是第一的。在算法一样的情况下，实现以及编译器优化也会影响性能。前阶段看见一个 benchmark：
https://benchmarksgame-team.pages.debian.net/benchmarksgame/which-programs-are-fastest.html
发现（比 C 更“高级”，有更多抽象）Rust 语言平均性能竟然可以比 C 更好（通常我们认为高级语言性能比不过底层语言）。我最近学了一下，随便写了一个性能测试的代码，正好 Rust 版本跑得比 C 快。我估计和数值计算里面 Fortran 可以优化得性能超过 C 类似，一些语言性能能辅助编译器优化（比如 Rust 里面的 immutable 和 borrowing）。建议有空学一下：https://www.rust-lang.org/learn

之前我知道我手写汇编性能不如 C，但不知道更高级的语言现在也能有类似现象（同样算法 Java 一般还是比不过 C 的）。在 C 这个层面，编译器很多优化我手写很难做（而且容易出错）。最简单的是整数除法 int x = y/5; 这种编译器用（溢出）的乘法和位运算能高效计算除以常数，比 div 高效得多。还有 vectorization，充分利用新的指令集等等。

比如说除以 29，clang 编译出来的汇编，edi 输入，eax 输出：

   0:	48 63 c7             	movsxd rax,edi
   3:	48 69 c8 09 cb 3d 8d 	imul   rcx,rax,0xffffffff8d3dcb09
   a:	48 c1 e9 20          	shr    rcx,0x20
   e:	01 c8                	add    eax,ecx
  10:	89 c1                	mov    ecx,eax
  12:	c1 e9 1f             	shr    ecx,0x1f
  15:	c1 f8 04             	sar    eax,0x4
  18:	01 c8                	add    eax,ecx

ziv2013说道：

2019年6月28日上午 10:35

不错，长知识。谢谢！

回复

《Step to UEFI (159）UEFI 下的排序》有3个想法

发表回复 取消回复

发表回复取消回复