内存池意义

内存的分配与释放会向系统进行申请，陷入内核，而频繁的内存分配、释放会降低应用程序性能。应用程序通常以特定方式使用内存，因此，能通过开发专门的内存管理器来解决这种情况。内存管理器，通常也称为内存池。

内存管理器需要考虑两方面：大小、并发。

大小

1）固定大小：分配单一固定大小内存块对内存管理器；
2）可变大小：分配任意大小内存块对内存管理器。适用于预先不知道所需内存大小场景。

并发

1）单线程：内存管理器受限于单个线程，内存由单个线程使用，不会跨越线程边界。
2）多线程：用于并发执行的多线程环境。实现中包含互斥执行的代码段，但在任何时刻，代码段中任何一个只能有一个线程执行。

=> 4种风格专用内存管理器。

下面讨论单线程内存管理器，目标是比默认管理器更快。

单线程内存池

全局函数new(), delete()局限性

默认内存管理器为用户（进程）提供接口new()、delete()，用于申请、回收内存，可用于多线程环境，且大小可能各不相同。这种灵活性牺牲了速度：向系统频繁申请、释放内存，线程安全。
通常，用户端不需要全局函数new(), delete()的全部功能，可能只需要特定大小的内存块，只在单线程中执行。可通过定制内存分配模式，满足这种特定需求，提升速度。

下面，以用户代码要求为表示有理数类Rational的对象频繁分配、释放内存为例：

class Rational
{
public:
    Rational(int a = 0, int b = 1) : n(a), d(b) {}

private:
    int n; // Numerator
    int d; // Denominator
};

为了测试new, delete的基准性能，执行以下测试代码：

#include <iostream>
#include <chrono>
int main()
{
    Rational *array[1000];

    // Start timing here
    auto t1 = std::chrono::system_clock::now();
    for (int j = 0; j < 1000; j++) {
        for (int i = 0; i < 1000; i++) {
            array[i] = new Rational(i);
        }
        for (int i = 0; i < 1000; ++i) {
            delete array[i];
        }
    }
    // Stop timing here
    auto t2 = std::chrono::system_clock::now();
    std::cout << "Use time: "
    << std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count()
    << " us" << std::endl;
   return 0;
}

根据测试结果，这段代码花费约20ms。

版本1：专用Rational内存管理器

基本思想：利用一个链表，来管理空闲地Rational类对象，从而避免频繁调用默认管理器的接口new, delete。

具体做法：Rational类维护一个预先分配的Rational对象的静态链表，用作可用对象的空闲列表。当需要Rational对象时，从空闲列表中取出一个；当对象操作完成后，再将其放回空闲列表。

需要一个辅助结构来表示空闲链表：

class NextOnFreeList
{
public:
    NextOnFreeList *next;
};

声明空闲链表freeList，用于链接Rational对象：

class Rational
{
...
private:
    static NextOnFreeList *freeList;
};

因为空闲链表freeList并非属于某个具体Rational对象，因此声明为static。

用户如何使用这个专用内存管理器，避免用全局new, delete？
1）定义用于替换全局new, delete的成员函数operator new, operator delete。
2）定义专用内存管理器初始化、释放、扩充方法，用于与默认管理器的交互。

完整Rational定义

class Rational
{
public:
    Rational(int a = 0, int b = 1) : n(a), d(b) {}

    void *operator new(size_t size); // Apply an object from free list
    void operator delete(void *doomed, size_t size); // Retrieve an object to free list

    static void newMemPool() { expandTheFreeList(); }
    static void deleteMemPool();

private:
    static NextOnFreeList *freeList; // A free list of Rational objects
    static void expandTheFreeList();
    enum { EXPANSION_SIZE = 32 };

    int n; // Numerator
    int d; // Denominator
};
NextOnFreeList *Rational::freeList = nullptr;

operator new成员函数实现

从空闲列表中分配一个新的Rational对象，每次只提供列表头部给用户，然后调整列表。如果空闲列表空，那么扩展之。

inline void *Rational::operator new(size_t size)
{
    if (!freeList) { // If the list is empty, fill it up
        expandTheFreeList();
    }
    NextOnFreeList *head = freeList;
    freeList = head->next;
    return head;
}

operator delete成员函数实现

回收用户不用的Rational对象，返还给空闲列表。添加到列表头部即可。

inline void Rational::operator delete(void *doomed, size_t size)
{
    NextOnFreeList *head = static_cast<NextOnFreeList*>(doomed);
    head->next = freeList;
    freeList = head;
}

空闲列表的扩充

空闲列表为空时，必须从堆里分配更多的Rational对象。
注意：在Rational和NextOnFreeList类型间进行转换，非常危险，需要确保空闲链表的每个元素都足够大，确保不会发生截断。

expandTheFreeList的本质，是每次申请一块内存作为Rational对象使用，其缺点也是每个元素都要调用一次operator new。如果只调用一次，得到一块大内存，然后自信分成多个元素，效率会更高。

// invoke operator new() one time for every Rational in the free list
void Rational::expandTheFreeList()
{
    // We must allocate an object large enough to contain the next pointer
    size_t size = (sizeof(Rational) > sizeof(NextOnFreeList *)) ?
            sizeof(Rational) : sizeof(NextOnFreeList*);
    // FIXME: better choice : use uint8_t instead of char?
    NextOnFreeList *runner = reinterpret_cast<NextOnFreeList *>(new char[size]);
    freeList = runner;
    for (int i = 0; i < EXPANSION_SIZE; i++) {
        runner->next = reinterpret_cast<NextOnFreeList *>(new char[size]);
        runner = runner->next;
    }
    runner->next = nullptr;
}

内存管理器的申请、释放

申请newMemPool，前面已经实现。
释放是遍历空闲链表，将内存返还给系统。

void Rational::deleteMemPool()
{
    for (NextOnFreeList* nextPtr = freeList; nextPtr != NULL; nextPtr = freeList) {
        freeList = freeList->next;
        delete[] nextPtr;
    }
}

测试方法基本不变，只是将全局new, delete替换成使用内存管理器来进行，而且多了内存管理器的初始化、释放。

int main()
{
    Rational *array[1000];
    Rational::newMemPool();

    // Start timing here
    auto t1 = std::chrono::system_clock::now();
    for (int j = 0; j < 1000; j++) {
        for (int i = 0; i < 1000; i++) {
            array[i] = new Rational(i);
        }
        for (int i = 0; i < 1000; ++i) {
            delete array[i];
        }
    }
    // Stop timing here
    auto t2 = std::chrono::system_clock::now();
    std::cout << "Use time: "
    << std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count()
    << " us" << std::endl;
    Rational::deleteMemPool();
   return 0;
}

测试代码执行时间直接变成了10ms左右，节省了一半以上时间。

版本2：固定大小对象的内存池

版本1的管理器存在一个很大局限性，就是每个类都需要开发一个内存管理器，这显然不合理。一种解决办法是，将管理器的实现模板化，实例化的模板用于管理某种具体类型对象。

模板化的内存池管理类MemoryPool，专门用于管理空闲列表

template<class T>
class MemoryPool
{
public:
    MemoryPool(size_t size = EXPANSION_SIZE);
    ~MemoryPool();

    // Allocate a T element from the free list
    void *alloc(size_t size);
    // Return a T element to the free list
    void free(void *someElement);

private:
    // next element on the free list
    MemoryPool<T> *next;
    // If the freeList is empty, expand it by this amount
    enum { EXPANSION_SIZE = 32 };
    // Add free elements to the free list
    void expandTheFreeList(int howMany = EXPANSION_SIZE);
};

构造函数对空闲列表进行初始化，size指定空闲列表初始长度（元素个数），而模板参数T指定元素类型（含每个元素长度）。

template<class T>
MemoryPool<T>::MemoryPool(size_t size)
{
    expandTheFreeList(static_cast<int>(size));
}

析构函数，遍历空闲列表，删除全部元素

template<class T>
MemoryPool<T>::~MemoryPool()
{
    MemoryPool<T> *nextPtr = next;
    for (nextPtr = next; nextPtr != nullptr; nextPtr = next) {
        next = next->next;
        delete []nextPtr;
    }
}

alloc()是提供给用户的接口，为用户分配类型为T的元素空间
注意：实际上只返回一个元素的空间，因此参数并没有用到。

template<class T>
inline void *MemoryPool<T>::alloc(size_t)
{
    if (!next) {
        expandTheFreeList();
    }

    MemoryPool<T> *head = next;
    next = head->next;
    return head;
}

free()是提供给用户端接口，用于回收用户不用的内存。

template<class T>
inline void MemoryPool<T>::free(void *doomed)
{
    MemoryPool<T> *head = static_cast<MemoryPool<T> *>(doomed);
    head->next = next;
    next = head;
}

expandTheFreeList()向空闲列表添加新元素。当空闲列表为空时，会调用此函数。

template<class T>
void MemoryPool<T>::expandTheFreeList(int howMany)
{
    // We must allocate an object large enough to contain the next pointer
    size_t size = (sizeof(T) > sizeof(MemoryPool<T>*)) ?
            sizeof(T) : sizeof(MemoryPool<T> *);

    MemoryPool<T> *runner = reinterpret_cast<MemoryPool<T> *>(new char[size]);
    next = runner;
    for (int i = 0; i < howMany; i++) {
        runner->next = reinterpret_cast<MemoryPool<T> *>(new char[size]);
        runner = runner->next;
    }
    runner->next = nullptr;
}

由于空闲列表交给内存池维护，Rational类不再需要维护自己的空闲列表。

class Rational
{
public:
    Rational(int a = 0, int b = 1) : n(a), d(b) {}
    void *operator new(size_t size) { return memPool->alloc(size); }
    void operator delete(void *doomed, size_t size)
    { memPool->free(doomed); }

    static void newMemPool() { memPool = new MemoryPool<Rational>; }
    static void deleteMemPool() { delete memPool; }

private:
    int n; // Numerator
    int d; // Denominator
    static MemoryPool<Rational> *memPool;
};
MemoryPool<Rational> *Rational::memPool = nullptr;

测试代码与版本1的一样。测试结果显示为15ms左右，效率介于版本0与版本1之间。

版本3：单线程可变大小内存管理器

版本2内存池有局限性，即只能分配固定大小的内存。如果应用程序需要可变内存，那么版本2显然不合适。

下面是一种可变大小的内存管理器，使用MemoryChunk类替换NextOnFreeList类（空闲列表），将各种大小的内存块串成一个块序列。

MemoryChunk内存块列表类

MemoryChunk类对象代表一个内存块，可用链表形式串起来表示一个块序列。相当于前面的NextOnFreeList，但MemoryChunk将next从版本1的已分配对象内存、版本2的内存池对象中分离开来。

// 内存块列表
class MemoryChunk
{
public:
    MemoryChunk(MemoryChunk *nextChunk, size_t chunkSize);
    ~MemoryChunk() { delete mem; }

    void *alloc(size_t size);
    void free(void *someElement);

    // Pointer to next memory chunk on the list
    MemoryChunk *nextMemChunk() { return next; }

    // How much space do we have left on this memory chunk?
    size_t spaceAvailable()

    { return chunkSize - bytesAlreadyAllocated;}
    // this is the default size of a single memory chunk
    enum { DEFAULT_CHUNK_SIZE = 4096 };

private:
    MemoryChunk *next;
    uint8_t *mem; // pointer to allocated memory block
    // The size of a single memory chunk
    size_t chunkSize;
    // This many bytes already allocated on the current memory chunk
    size_t bytesAlreadyAllocated;
};

构造函数

申请一块内存作为块序列的头部，参数reqSize可用于指定内存块初始大小。 mem指向申请到的内存块，本质是一段连续的内存。
注意：构造一个内存块MemoryChunk对象，会自动作为块序列头部，将当前next域指向传入的MemoryChunk对象（旧的块序列头部）。

MemoryChunk::MemoryChunk(MemoryChunk *nextChunk, size_t reqSize)
{
    chunkSize = std::max(reqSize, static_cast<size_t>(DEFAULT_CHUNK_SIZE));
    next = nextChunk;
    bytesAlreadyAllocated = 0;
    mem = reinterpret_cast<uint8_t *>(new uint8_t[chunkSize]);
}

析构函数

内存块对析构非常简单，直接delete用new申请的数组即可。

~MemoryChunk() { delete []mem; }

用户接口：alloc向内存池申请内存

用户使用alloc向内存块申请内存，参数requestSize指定需要的内存大小。
注意：由内存池ByteMemoryPool，来保证内存块有足够大内存空间供用户申请。

void *MemoryChunk::alloc(size_t requestSize)
{
    void *addr = static_cast<void *>(mem + bytesAlreadyAllocated);
    bytesAlreadyAllocated += requestSize;
    return addr;
}

用户接口：free回收用户不用的内存块

用户申请的内存，是由alloc分配，在内存段mem基础上，划分指定大小的空间。
注意：出于简便考虑，版本3并没有根据alloc反向回收部分内存，而是作为一个整体，在内存块对象不被需要时，一起释放。因此，free函数为空。

void MemoryChunk::free(void *doomed)
{
    // null
}

ByteMemoryPool字节内存池

内存池ByteMemoryPool管理着块序列MemoryChunk，每次对块序列的操作，针对头部即可。因为只有块序列头部，可用于分配内存。其他块表示已经分配了的内存，即使没有用完。

// 内存池
class ByteMemoryPool
{
public:
    ByteMemoryPool(size_t initSize = MemoryChunk::DEFAULT_CHUNK_SIZE);
    ~ByteMemoryPool();
    // Allocate memory from private pool
    void *alloc(size_t size);
    // Free memory previously allocated from the pool
    void free(void *someElement);

private:
    // A list of memory chunks. This is our private storage
    MemoryChunk *listOfMemoryChunks;
    // Add one memory chunk to our private storage
    void expandStorage(size_t reqSize);
};

构造函数

ByteMemoryPool的构造很简单，根据指定的初始大小initSize，初始化块序列头部单个内存块的大小。

ByteMemoryPool::ByteMemoryPool(size_t initSize)
{
    expandStorage(initSize);
}

析构函数

前面提到过，块序列所代表的内存块的释放，是由ByteMemoryPool内存池管理着，待内存池析构时，遍历块序列，用delete将块序列逐个释放。

ByteMemoryPool::~ByteMemoryPool()
{
    MemoryChunk *memChunk = listOfMemoryChunks;
    while (memChunk) {
        listOfMemoryChunks = memChunk->nextMemChunk();
        delete memChunk;
        memChunk = listOfMemoryChunks;
    }
}

alloc分配内存

用户向内存池申请内存时，会将块序列第一个内存块的可用空间分配给用户；当可用空间不足时，会扩展出一个新的内存块作为块序列头。

void *ByteMemoryPool::alloc(size_t requestSize)
{
    size_t space = listOfMemoryChunks->spaceAvailable();
    if (space < requestSize) {
        expandStorage(requestSize);
    }


    return listOfMemoryChunks->alloc(requestSize);
}

free回收内存

用户返还不用的内存时，会将内存块返还给第一个块序列序列，调用内存块MemoryChunk对象的free。当然，实际上什么也没做。

void ByteMemoryPool::free(void *doomed)
{
    listOfMemoryChunks->free(doomed);
}

思考：为什么不急于释放内存？
因为ByteMemoryPool的实现，不需要重用以前分配的内存；如果需要更多的内存，那么创建一个新的内存块并用于将来的分配。而内存在内存池析构时，会统一释放，因此不会造成内存泄露。
当然，这种做法对内存重用会有影响。

expandStorage扩展内存块

当块序列头部内存块空间不足时，需要扩展一个新的内存块作为序列头。

void ByteMemoryPool::expandStorage(size_t reqSize)
{
    listOfMemoryChunks = new MemoryChunk(listOfMemoryChunks, reqSize);
}

Rational测试内存池

Rational做了部分修改，内存池类型由版本2的MemoryPool，换成了ByteMemoryPool。

class Rational
{
public:
    Rational(int a = 0, int b = 1) : n(a), d(b) {}
    void *operator new(size_t size) { return memPool->alloc(size); }
    void operator delete(void *doomed, size_t size)
    { memPool->free(doomed); }

    static void newMemPool() { memPool = new ByteMemoryPool; }
    static void deleteMemPool() { delete memPool; }

private:
    int n; // Numerator
    int d; // Denominator
    static ByteMemoryPool *memPool;
};
ByteMemoryPool *Rational::memPool = nullptr;

测试代码没有修改。测试结果显示用了30ms，比版本1、版本2都慢，因为分配内存的逻辑更复杂，不过使用场景更广。

小结

内存池的设计，需要中灵活性与速度间做好取舍。
默认内存管理器（全局new, delete）代价昂贵。
专用内存管理器的效率通常更高，避免了全局new、delete频繁调用。
根据所需要场景，选择使用分配固定大小的内存块、不固定大小的内存块。

多线程内存池

单线程内存分配器无法在多线程环境中工作，多线程内存分配器能有效解决这个问题，通过在分配器中添加互斥锁，允许多个线程并发分配、释放内存。

然而，仅仅只是为了给alloc(), free()（分配、释放动作）加锁，就重新造一个线程池类，有些小题大做，完全可以将线程池类型、锁类型作为模板参数，这样便于用户能以不同的锁方案实例化内存池。

—— 这是一种常见的适用于单线程/多线程环境的技术，让用户通过模板参数来决定使用哪种方案，代码又不会繁琐。

版本4：多线程内存池

对内存池的alloc、free进行包装，对其进行加锁。为使用单线程、多线程环境，将内存池类型、锁类型作为模板参数，传给包装器类MTMemoryPool，为单线程内存池添加新的职责。

多线程内存池MTMemoryPool

适用于单线程、多线程组合方案的线程池MTMemoryPool

template<class POOLTYPE, class LOCK>
class MTMemoryPool
{
public:
    // Allocate an element from the freeList
    void *alloc(size_t size);
    // Return an element to the freeList
    void free(void *someElement);

private:
    POOLTYPE stPool; // Single-thread pool
    LOCK theLock;
};

要求：实例化MTMemoryPool时，提供的模板参数POLLTYPE必须为单线程内存池类型；LOCK可以是互斥锁，也可以是空锁，具体类型由用户决定，只需要支持加锁、解锁方法即可（lock, unlock）。

多线程alloc

MTMemoryPool的本质，是对单线程线程池的alloc()进行包装，利用锁theLock进行加锁、解锁。

template<class POOLTYPE, class LOCK>
void *MTMemoryPool<POOLTYPE, LOCK>::alloc(size_t size)
{
    void *mem = nullptr;
    theLock.lock();
    mem = stPool.alloc(size);
    theLock.unlock();
    return mem;
}

多线程free

包装单线程内存池的free，为其加锁、解说。

template<class POOLTYPE, class LOCK>
void MTMemoryPool<POOLTYPE, LOCK>::free(void *doomed)
{
    theLock.lock();
    stPool.free(doomed);
    theLock.unlock();
}

自定义锁方案：
要实例化MTMemoryPool模板，就要提供内存池类型+锁类型。对于内存池，可以用单线程内存池，如MemoryPool；对于锁，为了适应不同锁方案，可以定义一个锁的抽象基类，然后根据不同需求实现锁。

锁抽象基类

class ABClock
{
public:
    virtual ~ABClock() {}
    virtual void lock() = 0;
    virtual void unlock() = 0;
};

互斥锁

用POSIX互斥锁pthread_mutex_t实现互斥锁MutexLock，适用于Unix系列平台多线程环境。

class MutexLock : public ABClock
{
public:
    MutexLock() { pthread_mutex_init(&mutex_lock); }
    ~MutexLock() { pthread_mutex_destroy(&mutex_lock); }

    void lock() { pthread_mutex_lock(&mutex_lock); }
    void unlock() { pthread_mutex_unlock(&mutex_lock); }
private:
    pthread_mutex_t mutex_lock{};
};

空锁

lock、unlock操作为空，适用于单线程环境。

class NullLock : public ABClock
{
public:
    void lock() { }
    void unlock() { }
};

如何使用多线程内存池MTMemoryPool？
以Rational为例，只需要修改内存池类型即可，代码十分简洁。
下面代码中，内存池类型MemoryPool可用于分配固定大小（Rational）内存，MutexLock确保可用于多线程环境。

class Rational
{
public:
    Rational(int a = 0, int b = 1) : n(a), d(b) {}
    void *operator new(size_t size) { return memPool->alloc(size); }
    void operator delete(void *doomed, size_t size)
    { memPool->free(doomed); }

    static void newMemPool() { memPool = new MTMemoryPool<MemoryPool<Rational>, MutexLock>; }
    static void deleteMemPool() { delete memPool; }

private:
    int n; // Numerator
    int d; // Denominator
    static MTMemoryPool<MemoryPool<Rational>, MutexLock> *memPool;
};
MTMemoryPool<MemoryPool<Rational>, MutexLock> *Rational::memPool = nullptr;

测试程序不变。测试结果约55ms。

小结

单线程内存管理器要比多线程内存管理器更快，非必要，建议使用单线程内存管理器。
可以通过模板技术 + 包装器模式，将单线程内存管理器扩展成多线内存程管理器，在代码量改动较小前提下，适应不同使用环境。

参考

[1]DovBulka, DavidMayhew, 布尔卡,等. 提高C++性能的编程技术[M]. 清华大学出版社, 2003.

原文链接: https://www.cnblogs.com/fortunely/p/17248785.html

欢迎关注

微信关注下方公众号，第一时间获取干货硬货；公众号内回复【pdf】免费获取数百本计算机经典书籍;

也有高质量的技术群，里面有嵌入式、搜广推等BAT大佬

原创文章受到原创版权保护。转载请注明出处：https://www.ccppcoding.com/archives/402407

非原创文章文中已经注明原地址，如有侵权，联系删除

关注公众号【高性能架构探索】，第一时间获取最新文章

转载文章受原作者版权保护。转载请注明原作者出处！

C++ 内存池技术初探