Understanding Malloc – Part2(SLOB)

The previous section helped us understand two major syscalls namely brk and sbrk. I also implemented a basic malloc technique using these syscalls and  addressed the problem on direct accessing these  syscalls, i.e. The order of Free should be exact reverse of malloc, else would lead to segmentation violation.

Another problem worth mentioning is, system calls are expensive due to context switch between kernel space and user space; Moreover this might also lead to unnecessary suspension of process.

Approach to solution. To address the mentioned problems we should reduce the number of time syscall’s is made and secondly rather than reverting or contracting the data segment we should create a hole in the heap which could be reused.

Solution. To make the above challenges  addressable we need to use a notion called memory pooling, i.e. pre-fetch a big chunk of memory and cut a piece from it when required. This cut piece is given back to the chunk on free function( not given back  to the Kernel, but rather pooled).

Challenge. The mentioned solution is not straight forward; first allocation should not provide you with memory very big compared to the requested size( causing Internal Fragmentation) and  the time taken to process the request should be reasonable. Second the freeing of memory should happen in constant time, i.e.O(1).

Simplest solution that addresses the mentioned challenges is SLOB, i.e. Simple List Of Blocks. As name suggests its a list holding reference to “set of chunks” of same size.The list is arranged  in ascending order based on size of the chunk it holds.To make this approach work, the memory needed for the process is approximately precomputed , prefetched and made into SLOB before main function is called.

Untitled Diagram.png

The Above figure is a simple representation of SLOB.

On Malloc the list is searched for the first fit( which in SLOB is also the best fit because of its ascending property).  If the desired requested size  can be allocated, then the chunk is removed from the list and its starting address is  returned, else NULL is returned. The complexity of the fetch is O(N), where N is the length of the list, i.e. linear complexity.

Free just takes the starting address of the chunk and provides no other information about the size or other information about the cache( or element) to which the chunk belongs; but on free the released chunk needs to identify the desired cache to which the chunk belongs.With just starting address as parameter,it is not possible to return the chunk to the desired cache in constant time, unless the chunk holds some reference of the ownership. To make this work a notion called boundary tag is used.

Boundary tag: Either before or after each chunk a record is added, holding the reference of the element to which the chunk belongs. This boundary tag is provided to each chunk. The total size of each chunk is chunk-size + boundary tag. The size of the boundary tag is of great importance; Bigger it becomes , overhead of book keeping data becomes proportionately larger, but at the same time the size should be efficient enough to hold required data so that chunk can be returned to the respective element it belongs. Study suggests that boundary tag should not be greater than 10% of the total payload memory. To implement SLOB the size of the tag would be a word length, i.e. 32bits in case of 32bit system to hold the reference of the owned element.

BoundaryTag.png

The diagram on the left is an abridged representation of the boundary tag.

Abstraction.When malloced the address at the end of the tag is returned; therefore, the user is abstracted from underneath implementation of the allocation.

In many scientific publications the boundary tagging is called hidden header field.

when free is called  it gets end address of the boundary tag, i.e. the starting address of the  chunk. To this address when the size of the boundary tag is negated we would get the reference of the chunk’s tag. This provides us with the required information about the element to which the chunk should be returned. The complexity of the Free is O(1).

Usage: 

This notion of SLOB is used in many RTOS that works on flat memory model. among many, One worth mentioning is OSE166’s redarrow platform. This platform was majorly used on many primitive mobile devices.

Another place with prominent usage is SLAB implementation in Linux kernel;  Where the kernel data structures are alloced using such an implementation, but with a slight twist; which we will discuss in much detail as we progress.

Many Middlewares  uses this approach to build a wrapper around the standard library to make access to memory faster.

Conclusion.This article provided us the semantic understanding of SLOB, In the next article we would discuss a method to implement the SLOB and would also discuss the pros and cons of this algorithm.

Definitions used in this articles:

Internal fragmentation:  is the wasted space within each allocated block because of rounding up from the actual requested allocation to the allocation granularity.

Flat memory model or linear memory model refers to a memory addressing paradigm in which “memory appears to the program as a single contiguous address space.”The CPU can directly (and linearly) address all of the available memory locations without having to resort to any sort of memory segmentation or paging schemes.

References and further study:

  1. https://en.wikipedia.org/wiki/Big_O_notation#Big_Omega_notation
  2. https://en.wikipedia.org/wiki/Fragmentation_(computing)
  3. https://en.wikipedia.org/wiki/SLOB
  4. https://en.wikipedia.org/wiki/Flat_memory_model

 

 

 

Understanding Malloc-Part1

The Word Dynamism in memory has always been a fascinating area to understand. For many beginner level programmers, the allocation of dynamic memory using library calls like malloc have been a magic like complex implementation.

This series of articles tries to provide a profound insight on how the dynamic allocation works. The article starts with very basic implementation and walks towards an advanced version which is currently used in large servers, embedded devices and general purpose operating systems.

The whole article uses Linux as the basis to explain the dynamic allocation.

To start with,  let’s try to understand the need for such a dynamic mechanism; Assume user has to enter an input for a programme, which in turn is just a reference to a file holding variable length data, but the length is not known in advance. This scenario  could be effectively handled with dynamic allocation, else programmer have to pre compute maximum needed data length and statically allocate them. This static allocation leads to unnecessary blow up of programme size. Other scenarios like reading data from sensors whose lifetime is short lived; but the length of the data produced is not known in advance would be justified efficiently with dynamic allocation.

Lets now start with a very basic implementation of  dynamic allocation .


#include <stdio.h>
#include <unistd.h> // unix standard library

/*
 * @size: size needs to be allocated
 * return: starting address of
 *         allocated memory else NULL
*/
void *Malloc(unsigned int size)
{
     int *tmp;
     if(size <= 0)
        return NULL;
     tmp = sbrk(size);
     if(tmp > 0)
        return tmp;
     else
        return NULL;
}

/*
 * @addr: starting addressed that needs to be freed
 */
void Free(void *addr)
{
   brk(addr);
} 

int main()
{
    int *p = NULL; 

    p = Malloc(sizeof(*p));
    *p = 7568;
    printf("Value in the pointer is %d\n", *p);
    Free(p); 

    return 0;
}

If the above snippet is noticed two different syscalls are used namely brk and sbrk. To understand the intuitiveness of these syscalls we would first skew through different memory segments of a basic C programme in unix like systems.

The above-mentioned diagram is a simplified view of segmentation in Unix like systems. To brief on the segments: stack holds the automatic variables, Text holds the code of the executing process, followed by data segment which in turn holds initialized data, uninitialized data and heap. The role of the data segment is to hold global variables.In which uninitialized data segment holds variables that are not initialized by user, the Initialised segment contains data that is initialised by the user  and Finally heap is the segment that provides dynamic memory during execution. The end of the data segment is named break which can be incremented and decremented as per the requirement.

Coming back to syscalls.

brk:  Gets an address as a parameter and sets the data segment to the corresponding address. On Success returns 0, else -1 and sets errno to indicate error.

sbrk: Gets size  as parameter and increments or decrements the data segment to the desired size and returns the  previous break address of the segment. If incr is negative, the amount of allocated space is decreased by incr bytes. sbrk(0) returns the current value of the break.

To understand sbrk more intuitively,  lets assume a programme that allocates memory using the above mentioned Malloc for a size of int. With the figure on left as a reference, old break is the boundary of the data segment before sbrk is applied and new break is the boundary after sbrk is applied, so sbrk returns old break address ( P.S. The figure is just much simplified version of segment heap within data segment ).

Now, its pretty straight forward to understand the above code snippet on basic malloc and free library functions. Malloc gets the size as argument and internally uses sbrk to increment the size of heap and returns old break address.  free gets the address and resets the data segment to the given address.

Problem.The above snippet would work only when the sequence of free is exactly the reverse order of malloc allocation. If done any other way would generate SIGSEGV leading to segmentation violation.

To understand this problem lets extend only the main function as


int main()
{
   int *p = NULL;
   int *q = NULL;

   p = Malloc(sizeof(*p));
   *p = 7568;
   q = Malloc(sizeof(*q));
   *q = 5525;
   Free(p); // free p before q
   printf("in the pointer is %d\n", *q);
   Free(q); 

    return 0;
}

The above snippet will produce segmentation violation when variable “q” is tried to print( though it is programmatically right ). This behavior is as per the definition of brk, which in this scenario will revert the break to the starting of variable “p”, but variable “q” is placed after variable “p”, which now falls into an unmapped segment,  generating SIGSEGV when “q” is tried to access.

But, as per the definition of dynamic allocation the order of malloc and free does not matter.

We will address the mentioned problem in next section by adding another layer around the syscalls called SLOB( Simple List Of Blocks ). This will take us a step closer to understanding practical libraries like uCLibC, GLibC…

 Some definitions for better understanding:

syscalls:  a system call is a programmatic way in which a computer program requests a service from the kernel of the operating system it is executed on

Segments or Segmentation: Memory segmentation is the division of a computer’s primary memory into segments or sections.  This notion provides a modular approach for loader along with a layer of protection. The segmentation also enables the operating system to provide a defined behavior. For Example, Segmentation Violation is one such example enabling OS to prevent user program corrupting other segments or programs.

References and further study:

[1] . http://www.bravegnu.org/gnu-eprog/c-startup.html

[2]. https://en.wikipedia.org/wiki/Data_segment

[3]. http://linux.die.net/man/2/sbrk

[4]. https://en.wikipedia.org/wiki/Memory_segmentation

[5]. https://en.wikipedia.org/wiki/System_call