Understanding Malloc-Part1

A series of articles that tries to provide understanding on Dynamic allocation and related algorithms.

Advertisements

The Word Dynamism in memory has always been a fascinating area to understand. For many beginner level programmers, the allocation of dynamic memory using library calls like malloc have been a magic like complex implementation.

This series of articles tries to provide a profound insight on how the dynamic allocation works. The article starts with very basic implementation and walks towards an advanced version which is currently used in large servers, embedded devices and general purpose operating systems.

The whole article uses Linux as the basis to explain the dynamic allocation.

To start with,  let’s try to understand the need for such a dynamic mechanism; Assume user has to enter an input for a programme, which in turn is just a reference to a file holding variable length data, but the length is not known in advance. This scenario  could be effectively handled with dynamic allocation, else programmer have to pre compute maximum needed data length and statically allocate them. This static allocation leads to unnecessary blow up of programme size. Other scenarios like reading data from sensors whose lifetime is short lived; but the length of the data produced is not known in advance would be justified efficiently with dynamic allocation.

Lets now start with a very basic implementation of  dynamic allocation .


#include <stdio.h>
#include <unistd.h> // unix standard library

/*
 * @size: size needs to be allocated
 * return: starting address of
 *         allocated memory else NULL
*/
void *Malloc(unsigned int size)
{
     int *tmp;
     if(size <= 0)
        return NULL;
     tmp = sbrk(size);
     if(tmp > 0)
        return tmp;
     else
        return NULL;
}

/*
 * @addr: starting addressed that needs to be freed
 */
void Free(void *addr)
{
   brk(addr);
} 

int main()
{
    int *p = NULL; 

    p = Malloc(sizeof(*p));
    *p = 7568;
    printf("Value in the pointer is %d\n", *p);
    Free(p); 

    return 0;
}

If the above snippet is noticed two different syscalls are used namely brk and sbrk. To understand the intuitiveness of these syscalls we would first skew through different memory segments of a basic C programme in unix like systems.

The above-mentioned diagram is a simplified view of segmentation in Unix like systems. To brief on the segments: stack holds the automatic variables, Text holds the code of the executing process, followed by data segment which in turn holds initialized data, uninitialized data and heap. The role of the data segment is to hold global variables.In which uninitialized data segment holds variables that are not initialized by user, the Initialised segment contains data that is initialised by the user  and Finally heap is the segment that provides dynamic memory during execution. The end of the data segment is named break which can be incremented and decremented as per the requirement.

Coming back to syscalls.

brk:  Gets an address as a parameter and sets the data segment to the corresponding address. On Success returns 0, else -1 and sets errno to indicate error.

sbrk: Gets size  as parameter and increments or decrements the data segment to the desired size and returns the  previous break address of the segment. If incr is negative, the amount of allocated space is decreased by incr bytes. sbrk(0) returns the current value of the break.

To understand sbrk more intuitively,  lets assume a programme that allocates memory using the above mentioned Malloc for a size of int. With the figure on left as a reference, old break is the boundary of the data segment before sbrk is applied and new break is the boundary after sbrk is applied, so sbrk returns old break address ( P.S. The figure is just much simplified version of segment heap within data segment ).

Now, its pretty straight forward to understand the above code snippet on basic malloc and free library functions. Malloc gets the size as argument and internally uses sbrk to increment the size of heap and returns old break address.  free gets the address and resets the data segment to the given address.

Problem.The above snippet would work only when the sequence of free is exactly the reverse order of malloc allocation. If done any other way would generate SIGSEGV leading to segmentation violation.

To understand this problem lets extend only the main function as


int main()
{
   int *p = NULL;
   int *q = NULL;

   p = Malloc(sizeof(*p));
   *p = 7568;
   q = Malloc(sizeof(*q));
   *q = 5525;
   Free(p); // free p before q
   printf("in the pointer is %d\n", *q);
   Free(q); 

    return 0;
}

The above snippet will produce segmentation violation when variable “q” is tried to print( though it is programmatically right ). This behavior is as per the definition of brk, which in this scenario will revert the break to the starting of variable “p”, but variable “q” is placed after variable “p”, which now falls into an unmapped segment,  generating SIGSEGV when “q” is tried to access.

But, as per the definition of dynamic allocation the order of malloc and free does not matter.

We will address the mentioned problem in next section by adding another layer around the syscalls called SLOB( Simple List Of Blocks ). This will take us a step closer to understanding practical libraries like uCLibC, GLibC…

 Some definitions for better understanding:

syscalls:  a system call is a programmatic way in which a computer program requests a service from the kernel of the operating system it is executed on

Segments or Segmentation: Memory segmentation is the division of a computer’s primary memory into segments or sections.  This notion provides a modular approach for loader along with a layer of protection. The segmentation also enables the operating system to provide a defined behavior. For Example, Segmentation Violation is one such example enabling OS to prevent user program corrupting other segments or programs.

References and further study:

[1] . http://www.bravegnu.org/gnu-eprog/c-startup.html

[2]. https://en.wikipedia.org/wiki/Data_segment

[3]. http://linux.die.net/man/2/sbrk

[4]. https://en.wikipedia.org/wiki/Memory_segmentation

[5]. https://en.wikipedia.org/wiki/System_call

 

Author: gokulvasanblog

Gokul Vasan

One thought on “Understanding Malloc-Part1”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s