Old Dlmalloc Unlink Tricks -REMASTERED-

References

这个 trick 应该早就被修了，所以应该没有必要再浪费时间在这个上了吧…

这两篇文章也是 2001 年的东西…

但是真的很好奇具体的实现，所以还是浪费了很多时间去理这个…

起笔的时候并没有理清其实，但是就是因为信息量太大了所以才试着弄懂多少记下来多少

~~毕竟本来这里也只是 Senri 的笔记本~~

总之全部写完之后自己也懂了不少…

当然不敢说自己全懂，不然就和那些说自己「精通 C++」的人一样了x

总之，这个 trick 的效果是「可以让你往任意地址上写一个值」（除非那段内存只读

Intro

GNU C 库所使用的 Memory Allocator 是 Doug Lea’s Malloc，简称为 dlmalloc，本文使用的是以下版本：

This is a version (aka dlmalloc) of malloc/free/realloc written by Doug Lea and released to the public domain. Use, modify, and redistribute this code without permission or acknowledgement in any way you wish. Send questions, comments, complaints, performance data, etc to dl@cs.oswego.edu

VERSION 2.7.0 Sun Mar 11 14:14:06 2001 Doug Lea (dl at gee)

Structure

#define INTERNAL_SIZE_T size_t

struct malloc_chunk {

/* Size of previous chunk (if free).  */
INTERNAL_SIZE_T      prev_size;  
/* Size in bytes, including overhead. */
INTERNAL_SIZE_T      size;       

/* double links -- used only if free. */
struct malloc_chunk* fd;         
struct malloc_chunk* bk;
}

在 32 位的 x86 架构上，INTERNAL_SIZE_T 为 4 bytes；

Before free()

一个 Heap Chunk 即堆块，看起来是这样：

Name	Description	Pointer	Size
prev_size	仅当前一个 chunk 已被释放 (free) 时：保存前一个 chunk 的大小	^chunk	4 bytes
size	当前 chunk 的大小：这个大小指的是 chunk 到 nextchunk 之间的大小最后两比特用于标记特殊的状态信息		4 byte
data	这里是数据…	^mem
prev_size	仅当前一个 chunk 已被释放 (free) 时：保存前一个 chunk 的大小从这里开始就是下一个 heap chunk 了	^nextchunk	4 bytes

也就是说，当执行 a = malloc(n) 时，a 实际指向的位置在 mem 处；

size 处提到：最后两比特用于标记特殊的状态信息，他们是：

#define PREV_INUSE 0x1
#define IS_MMAPPED 0x2

#define chunk_is_mmapped( p ) \
    ( (p)->size & IS_MMAPPED )
#define prev_inuse( p ) \
    ( (p)->size & PREV_INUSE )

分别代表着：

前一个 chunk 是否在使用中
是否通过 mmap 分配

例，若 size 为 0x29 即 00101101，那么它代表：

chunk 实际大小：00101100 即 0x28 = 40 bytes
mem 大小：40 bytes 减去 prev_size 和 size 都占用了的 4 bytes = 32 bytes
前一个块是否使用中？00101101 的最后一位为 1：确实在使用
当前块是通过 mmap 分配？00101101 的倒数第二位为 0：否。是通过 malloc() 分配的

After free()

当我们 free(mem) 释放一个 chunk 时，dlmalloc 会检查前一个 chunk 是否也被 free：使用的就是上面提到的 PREV_INUSE 标签

如果是的话，dkmalloc 会试图合并这两个 chunk：其目的是尽量减少堆中「可重复利用块」的数量

但是，如果前一个 chunk 正在使用中的话，dlmalloc 就无法做到去合并，这时候这个 chunk 会变成这样：

Name	Description	Pointer	Size
prev_size	仅当前一个 chunk 已被释放 (free) 时：保存前一个 chunk 的大小	^chunk	4 bytes
size	当前 chunk 的大小：这个大小指的是 chunk 到 nextchunk 之间的大小最后两比特用于标记特殊的状态信息		4 bytes
fd	指向已 free 的 chunk 双链表中的下一个 chunk	^mem	4 bytes
bk	指向已 free 的 chunk 双链表中的上一个 chunk		4 bytes
	这里的数据不会被修改…
prev_size	因为前一个 chunk 已被释放：所以现在这里保存前一个 chunk 的大小从这里开始就是下一个 heap chunk 了	^nextchunk	4 bytes

可以发现，mem 的位置现在多出来了两个值（或者说，是覆盖了原有的数据），分别都是 4 bytes，被称为 fd (forward) 和 bk (backward)

只在这个 chunk 被 free 之后，mem 这两个地方才会用来保存指针，否则这两个指针不存在

这两个指针指向保存着「可被重复利用的块」的一个环状双链表

如果在 bk 后面还有很多数据的话，这些数据并不会被清零，除非下一次 malloc() 使用了这块地方

Fastbins

Fastbin: An array of lists holding recently freed small chunks.

Fastbin 并不使用双链表的结构

Fastbin 的默认大小是 80 bytes:

/* The maximum fastbin request size we support */
#define MAX_FAST_SIZE     80

如果使用了 fastbin，那么就不会使用双链表，而那个 trick 恰好是与双链表有关的

所以，我们需要让 size > 80 才能触发 trick（无论本来 size 就大于 80，还是我们使用溢出的方式把 size 覆盖上了大于 80 的数值

Precondition: free()

free() frees the memory space pointed to by ptr, which must have been returned by a previous call to malloc(), calloc() or realloc(). Otherwise, or if free(ptr) has already been called before, undefined behavior occurs. If ptr is NULL, no operation is per formed.

free() 的完整源码会放在文章最底下

void fREe(Void_t* mem)
{
  mstate av = get_malloc_state();

  mchunkptr       p;           /* chunk corresponding to mem */
  INTERNAL_SIZE_T size;        /* its size */
  mfastbinptr*    fb;          /* associated fastbin */
  mchunkptr       nextchunk;   /* next contiguous chunk */
  INTERNAL_SIZE_T nextsize;    /* its size */
  int             nextinuse;   /* true if nextchunk is used */
  INTERNAL_SIZE_T prevsize;    /* size of previous contiguous chunk */
  mchunkptr       bck;         /* misc temp for linking */
  mchunkptr       fwd;         /* misc temp for linking *

  ......

当即将被 free 的 chunk 满足以下条件：

不是 fastbin
不是通过 mmap 分配

则会执行以下判断

先给出需要用的源码，再分别解释好了

/* consolidate backward */ 
/* if #1 */
if (!prev_inuse(p)) {
  prevsize = p->prev_size;
  size += prevsize;
  p = chunk_at_offset(p, -((long) prevsize));
  unlink(p, bck, fwd);
}

/* if #2 */
if (nextchunk != av->top) {
  /* get and clear inuse bit */
  nextinuse = inuse_bit_at_offset(nextchunk, nextsize);
  set_head(nextchunk, nextsize);

  /* consolidate forward */
  if (!nextinuse) {
    unlink(nextchunk, bck, fwd);
    size += nextsize;
  }
  ......

First if

if (!prev_inuse(p)) {
  prevsize = p->prev_size;
  size += prevsize;
  p = chunk_at_offset(p, -((long) prevsize));
  unlink(p, bck, fwd);
}

这里是第一次出现 unlink，也就是问题的根源

这段代码很容易能看出来，它所做的事情就是，若前一个块不在使用中（判断依据是 size 的最后一个比特）时，把 chunk 向前合并

这里出现了从来没见过的另一个东西：

/* Treat space at ptr + offset as a chunk */
#define chunk_at_offset(p, s)  ((mchunkptr)(((char*)(p)) + (s)))

其实也可以看出来，

p = chunk_at_offset(p, -((long) prevsize)) 相当于 p = p - prev_size

p = p - prev_size 的结果就是「上一个 chunk 的位置」，这里应该不难理解

接下来，把「上一个 chunk」给 unlink(p, bck, fwd);

关于 unlink() 会在下文详细说明

Second if

if (nextchunk != av->top) {
  /* get and clear inuse bit */
  nextinuse = inuse_bit_at_offset(nextchunk, nextsize);
  set_head(nextchunk, nextsize);

  /* consolidate forward */
  if (!nextinuse) {
    unlink(nextchunk, bck, fwd);
    size += nextsize;
  }
  ......

以及其中出现的两个宏：

/* check inuse bits in known places */
#define inuse_bit_at_offset(p, s)\
  (((mchunkptr)(((char*)(p)) + (s)))->size & PREV_INUSE) 

/* Set size/use field */
#define set_head(p, s)\
  ((p)->size = (s))

inuse_bit_at_offset(nextchunk, nextsize) 即：

(nextchunk + nextsize)-> size & PREV_INUSE

即获得了 下个 chunk 的 再下个 chunk 的 PREV_INUSE 这个比特

（即当前 chunk 的下一个临近 chunk 是否在被使用）

最后保存到 nextinuse 中
set_head(nextchunk, nextsize) 即：

(nextchunk->size = nextsize)

这句有点点意义不明，但是并不是很重要

执行完上面两句后是另一个 if 判断：

  /* consolidate forward */
  if (!nextinuse) {
    unlink(nextchunk, bck, fwd);
    size += nextsize;
  }

即当前面获得的 下个 chunk 的 再下个 chunk 的 PREV_INUSE 这个比特为假（即当前 chunk 的下一个临近 chunk 不在被使用）时：

unlink(nextchunk, bck, fwd)，然后再把当前的 size 增加 nextsize

结论是，在 free(aChunk) 时，想要让 unlink() 执行的话：

必须满足的条件：

不是 fastbin
不是通过 mmap 分配

接下来的条件满足一个就能执行 unlink():

aChunk 的 PREV_INUSE 被标记为否：
=> 会对 aChunk 的 相邻上一个 chunk 执行 unlink()
aChunk 的 相邻下一个 chunk 并不是最后堆中的最后一个
且，aChunk 的 相邻下一个 chunk 的 再相邻下一个 chunk 的 PREV_INUSE 为否
(也就相当于 aChunk 的 相邻下一个 chunk 不在被使用)
=> 会对 aChunk 的 相邻下一个 chunk 执行 unlink()

在这一段文字里，相邻上/下一个 chunk 的计算方式是 chunk_at_offset(p, s)，也就是说

相邻上一个 chunk = chunk_at_offset(p, -prev_size)
也就是 P 减去 prev_size
相邻下一个 chunk = chunk_at_offset(p, size)
也就是 P 加上 size

（别忘了，size 一直都指的是「包含 chunk metadata 的 chunk 大小」而不是「chunk 中 mem 部分的大小」

Tricking unlink

接着终于来到了出问题的地方呢….

/* Take a chunk off a bin list */
#define unlink(P, BK, FD) {
[1]  BK = P->bk;     \
[2]  FD = P->fd;     \
[3]  FD->bk = BK;    \
[4]  BK->fd = FD;    \
}

bk 和 fd 指针到 chunk 指针的偏移量，分别是 0xc 和 0x8

即：

[1] *BK = *(P+0xc)
[2] *FD = *(P+0x8)

那么很容易就能发现问题了，同样的事情套到 [3], [4] 上，就会变成：

[3] *(FD+0xc) = *BK
[4] *(BK+0x8) = *FD
          ↓
[3] *(*(P+0x8) + 0xc) = *(P+0xc)
[4] *(*(P+0xc) + 0x8) = *(P+0x8)
          ↓
[3] *fd + 0xc = *bk
[4] *bk + 0x8 = *fd

其中，fd 和 bk 是可以被我们伪造的，而这就导致了：我们可以用 unlink() 往自己想要的地址上写值

但有两个小问题是，1. 这个写是双向的；2. 这个写的操作会有一点点的偏移

例如，我们让 fd 指向动态链接表上某个函数的地址，让 bk 指向堆上的某段 shellcode，那么：

addr of func + 0xc = addr of shellcode
addr of shllcode + 0x8 = = addr of funcs

可以看到，func + 0xc 的位置被写了 shellcode，偏移了整整 0xc

以及，shellcode + 0x8 的位置也被写了 func 的地址，这绝对会导致 shellcode 到了这段时会无法正常执行

所以解决方法分别是

一开始就让想写入的地址减去 0xc
因为 shellcode + 0x8 的位置才会被覆盖，那么只要让 shellcode 小于 8 bytes 就好
如果需要执行非常长的一段 shellcode，那么可以写 PUSH 真实地址 & RET，这个只需要 6 bytes 就能完成

所以稍微改写一下，我们就可以得到这个公式：

令 fd = addr of func - 0xc
令 bk = addr of shllcode
那么
我们所想要的：
fd + 0xc = addr of shllcode
我们必须得纳入考虑并避免的：
bk + 0x8 = addr of func

Trick: use negative size

前面提到，如果大小小于 80 的话会使用 fastbin，这样就没法使用这个 trick 了；

所以我们必须得考虑到覆盖 size 的情况

但如果输入方式（把数据写入 mem 的方式）不允许输入 NULL BYTE 的话，只能在这个 4 bytes 上写一个特别特别大的数值么？

确实是这样的，可以写一个特别大，以至于是负数的值：例如 0xfffffffc

那么比如当他加上 0x80 的时候，

0xfffffffc + 0x80 = 0x10000007C =
0000,0000,0000,0001
0000,0000,0000,0000
0000,0000,0111,1100

而这是一台 32 位的机器，第33位的那个 1 会被舍去，也就变成了：
0000,0000,0000,0000
0000,0000,0000,0000
0000,0000,0111,1100

而这个值为 0x7c

所以当 0x80 加上 0xfffffffc 时，实际的结果却相当于减去了 0x4，变成了 0x7c，相当于我们可以直接把它看作一个负数：-0x4

同样的，0xfffffff8 也可以达到类似的效果，相当于 -0x8

这两个负数还有额外的 buff：由于他们的最低两个比特都是 0，所以就可以混过 PREV_INUSE 和 IS_MMAPPED 的检查

以及，当 malloc 计算临近的 chunk 时，使用的是 nextchunk = p + size；prevchunk = p - prev_size

那么假设这两个值都是 0xfffffff8 就会：

nextchunk = p + size = p + -8 = p - 8

prevchunk = p - prev_size = p - -8 = p + 8

变成以下的效果：

next chunk’s next’s PREV_INUSE

另外，还有一个需要提的：

nextsize = chunksize(nextchunk);
......
nextchunk = chunk_at_offset(p, size);
nextinuse = inuse_bit_at_offset(nextchunk, nextsize);

其中出现的宏 chunksize(p)：

/* Get size, ignoring use bits */
#define chunksize(p)         ((p)->size & ~(SIZE_BITS))

向后 unlink 的条件是 当前 chunk 的 相邻下一个 chunk 的 再相邻下一个 chunk 的 PREV_INUSE 为否

如果我们是向后 unlink，而且把相邻后一个 chunk 的 size 写上了一个负数 0xfffffffc

则 nextsize = ((nextchunk)->size & ~(SIZE_BITS)) 也是这个负数 0xfffffffc

那么 inuse_bit_at_offset(nextchunk, nextsize) 本来应该是 下下个 chunk 的 size

/* check inuse bits in known places */
#define inuse_bit_at_offset(p, s)\
  (((mchunkptr)(((char*)(p)) + (s)))->size & PREV_INUSE)

但现在却变成了

((nextchunk + nextsize) -> size) & PREV_INUSE

=> ((nextchunk + -0x4) -> size) & PREV_INUSE

=> (nextchunk - 0x4 + 0x4) & PREV_INUSE….

竟然就是 nextchunk & PREV_INUSE？！

这样就根本不用覆写 nextchunk 的 nextchunk 的 PREV_INUSE 了，好神奇啊…

Proof of Concept

本来想自己写例子的，但想想 Protostar 已经提供了那么好的题目那就用他们的吧x

Unlink previous

Protostar Heap Three Walkthrough

Unlink next

Protostar Final Two Walkthrough

Appendix

Source code of free()

展开...

/*
  -------------------- free ----------------------
*/

#if __STD_C
void fREe(Void_t* mem)
#else
void fREe(mem) Void_t* mem;
#endif
{
  mstate av = get_malloc_state();

  mchunkptr       p;           /* chunk corresponding to mem */
  INTERNAL_SIZE_T size;        /* its size */
  mfastbinptr*    fb;          /* associated fastbin */
  mchunkptr       nextchunk;   /* next contiguous chunk */
  INTERNAL_SIZE_T nextsize;    /* its size */
  int             nextinuse;   /* true if nextchunk is used */
  INTERNAL_SIZE_T prevsize;    /* size of previous contiguous chunk */
  mchunkptr       bck;         /* misc temp for linking */
  mchunkptr       fwd;         /* misc temp for linking */


  /* free(0) has no effect */
  if (mem != 0) {
    p = mem2chunk(mem);
    size = chunksize(p);

    check_inuse_chunk(p);

    /*
      If eligible, place chunk on a fastbin so it can be found
      and used quickly in malloc.
    */

    if ((unsigned long)(size) <= (unsigned long)(av->max_fast)

#if TRIM_FASTBINS
        /* 
           If TRIM_FASTBINS set, don't place chunks
           bordering top into fastbins
        */
        && (chunk_at_offset(p, size) != av->top)
#endif
        ) {

      set_fastchunks(av);
      fb = &(av->fastbins[fastbin_index(size)]);
      p->fd = *fb;
      *fb = p;
    }

    /*
       Consolidate other non-mmapped chunks as they arrive.
    */

    else if (!chunk_is_mmapped(p)) {
      nextchunk = chunk_at_offset(p, size);
      nextsize = chunksize(nextchunk);

      /* consolidate backward */
      if (!prev_inuse(p)) {
        prevsize = p->prev_size;
        size += prevsize;
        p = chunk_at_offset(p, -((long) prevsize));
        unlink(p, bck, fwd);
      }

      if (nextchunk != av->top) {
        /* get and clear inuse bit */
        nextinuse = inuse_bit_at_offset(nextchunk, nextsize);
        set_head(nextchunk, nextsize);

        /* consolidate forward */
        if (!nextinuse) {
          unlink(nextchunk, bck, fwd);
          size += nextsize;
        }

        /*
          Place the chunk in unsorted chunk list. Chunks are
          not placed into regular bins until after they have
          been given one chance to be used in malloc.
        */

        bck = unsorted_chunks(av);
        fwd = bck->fd;
        p->bk = bck;
        p->fd = fwd;
        bck->fd = p;
        fwd->bk = p;

        set_head(p, size | PREV_INUSE);
        set_foot(p, size);
        
        check_free_chunk(p);
      }

      /*
         If the chunk borders the current high end of memory,
         consolidate into top
      */

      else {
        size += nextsize;
        set_head(p, size | PREV_INUSE);
        av->top = p;
        check_chunk(p);
      }

      /*
        If freeing a large space, consolidate possibly-surrounding
        chunks. Then, if the total unused topmost memory exceeds trim
        threshold, ask malloc_trim to reduce top.

        Unless max_fast is 0, we don't know if there are fastbins
        bordering top, so we cannot tell for sure whether threshold
        has been reached unless fastbins are consolidated.  But we
        don't want to consolidate on each free.  As a compromise,
        consolidation is performed if FASTBIN_CONSOLIDATION_THRESHOLD
        is reached.
      */

      if ((unsigned long)(size) >= FASTBIN_CONSOLIDATION_THRESHOLD) { 
        if (have_fastchunks(av)) 
          malloc_consolidate(av);

#ifndef MORECORE_CANNOT_TRIM        
        if ((unsigned long)(chunksize(av->top)) >= 
            (unsigned long)(av->trim_threshold)) 
          sYSTRIm(av->top_pad, av);
#endif
      }

    }
    /*
      If the chunk was allocated via mmap, release via munmap()
      Note that if HAVE_MMAP is false but chunk_is_mmapped is
      true, then user must have overwritten memory. There's nothing
      we can do to catch this error unless DEBUG is set, in which case
      check_inuse_chunk (above) will have triggered error.
    */

    else {
#if HAVE_MMAP
      int ret;
      INTERNAL_SIZE_T offset = p->prev_size;
      av->n_mmaps--;
      av->mmapped_mem -= (size + offset);
      ret = munmap((char*)p - offset, size + offset);
      /* munmap returns non-zero on failure */
      assert(ret == 0);
#endif
    }
  }
}

negative_size.png (using draw.io)

old_dlmalloc_unlink_tricks