DOT NET TRICKS: ValueTypes and ReferenceTypes : Under the Hood

In .NET, Value Type and Reference Types are forms an element of confusion between both developers and the students. Many of us take this as granted that Value Types are allocated in Thread Stack ( a 1 MB local stack created per Thread) and on each method calls the local value types are allocated in the Stack such that after the call ends, the object is deallocated. On the contrary, the reference types we know are those which are always allocated in heap (which is not always true, I will discuss later) and even though they are used as locals, and will be deallocated only after an interval by a separate Thread that is running with any .NET process (called finalizer thread) which occationally starts finding the memory blocks on Heap storage and compact and store only the reachable objects called Garbage Collector. Well, in this post, I am not going to cover the details of Garbage Collection, but rather I will focus more on Value Types and Reference Types that I know personally to clear any doubt regarding them in terms of IL constructs. So after reading the post, you will know some of the basics of IL too.

To tell you the truth, I have been trying to start blogging on this topic long ago, but could not get chance to finish it, but as my buddy @ZenWalker asked me a couple of questions, which I had to answer, I considered myself writing a blog on the same such that it benefits both him and all other people. His questions sounds like this :

If System.ValueType inherits System.object, then all valuetypes are object. right?
If so,
1) Why do we say c# isnt pure Object Oriented because Value Types are not objects.
2) Value Types gets created over stack because to save space over heap or may be for performance right because heap creation takes time.
3. Why are Value Types struct and not class if they both are objects. To avoid complexity for simple types??
4. Why we have to do boxing and unboxing if valuetypes are objects too, or just to copy from stack to heap n back on??

Well the questions sounds quite interesting to me to answer and lets answer then knowing whats going on under the hood of a normal C# application and lets demonstrate them with code.

Lets start a console application and write a code like below :

int x = 10;

object y = 10;

string s = string.Format("x :{0}, y : {1}", x, y);

Console.WriteLine(s);
Console.ReadKey(true);

This is basically very simple code, where I first created one ValueType (int) an object y which is a reference type and formatted them into a string variable and loaded the string into string variable s.

Now let me see how the IL for the same looks like :

Now here lets take a look at the IL. The line says .entrypoint. EntryPoint identifies the start of the program. So if you declare your method as Main in .NET, the entrypoint will automatically be written on it. So it is also a compiler trick which writes the .entrypoint correctly.

The second line says .maxstack 3. Here maxstack will indicate that the method allocates 3 units of stack for the current method. For our code, it will be

1. To store the integer value.
2. To store reference to the type y while the value is stored in Heap.
3. To store the reference to the type s while the value is stored in heap.

Hence you should remember, even though you create an object or a memory block that is allocated in Heap, the reference pointer is still allocated inside your stack, which de-reference itself once the code execution ends. When all the references to a memory is de-referenced, it is exposed to the GC for collection.

So the next statement creates three locals for the code block in stack. locals indicates the stack allocation.

Now lets skip the first expression and concentrate only on the lines highlighted.
stloc.0 : means store the value that is loaded in memory to 0th location of Stack. The line just above it says ldc.i4.s 10, which loads 4 bit integer variable (i4). So the two lines indicates that the value of 4 bit integer will get stored into stack. So without all those complexity, you can say, the two lines will actually initialize the value 10 to the Stack locals 0th location (which infact an integer).

The next highlighted lines indicates a box on Int32. Box means your program will store /convert the loaded value into a reference. Or in other words, the line box will create a storage of 4byte in process Heap and store 10 over it and load the Reference of the Heap to memory. The next line will store the reference to the Heap to the Stack.

Finally, the last three highlighted lines indicates that the locals are loaded into memory again from 0th position and 1st position, where the 0th position holds the value itself, and 1st stored the reference, and box the 0th position to pass in string.Format static method call.

So upto this point you must be clear that "Each valueTypes are actually stored into stack, while Reference Type are stored in some other place (called Process Heap) and the reference is stored inside your stack so as to determine by some other Thread (called Finalizer thread) if the memory is still in use."

Now lets see how our compiler behaves when your own objects are created :

public class MyType
{
}
public struct MySType
{
}

/////////////////////////////////////

MyType t = new MyType();
Console.WriteLine(t);

MySType s = new MySType();
Console.WriteLine(s);

Console.ReadKey(true);

So here, I have created my own type, one as class MyType and another MySType while the former being a class directly inherits from System.Object while the later being a struct inherits from ValueType which in turn inherits from System.Object.

So both of them somehow inherits all the properties of System.Object. Then why this discrimination between the two ?

Yes, actually if you think internally, the language implementers actually implemented this for us. In IL, if you specify an object of ValueType it will indicate the object will be created directly inside your stack and will have a single reference to load the value from Stack. So if you look into the IL it looks like :

So if you see the locals, instead of creating int32 or object, it actually puts a class to hold object reference and valuetype to hold the actual value of the stack. valuetype are immutable object and will be de-referenced as soon as control returns out of scope.

The next line creates an object of MyType by calling its default constructor using newobj. Remember, newobj means it will return the reference to the memory it creates in Heap. Its IL construct, similar to box, which actually creates a newobj internally and gets you the reference to the value you pass on the valuetype. You can see the same happen for our struct too in L_0017 which takes up the already loaded variable and creates a newobj.

initobj on the other hand, initializes all the members to its default value and store in local stack. Yes ValueType does not require a default constructor.

So, if you think deep into it, according to specification the IL treats a class local as Reference Type which should be constructed using newobj and a valuetype local as a Stack object which is created using initobj. So we have covered one step deep to wipe out the abstraction that is provided by the language.

Hence, we can say, the C# compiler or any other compiler has the logic which indicates that if your type is somehow inherits from System.ValueType, treat it differently (use initobj to create it).

Our C# construct uses Struct to do the same. Hence if you declare a struct, it will write the IL in such a way that it is immutable and stack allocated which itself is high performance memory and does not require extra GC process to de-reference it.

Now as you are clear about it lets answer the question that I mention before :

Why do we say c# isnt pure Object Oriented because Value Types are not objects?

In fact, it must be clear it internally a pure object oriented and value type itself are inherited from System.Object, but C# writes the IL for ValueType specially such that the memory it allocates does not involve an extra load to Heap to reference and de-reference objects much quicker. So if your compiler / or Language does not use the facility that is exposed to IL which allow you to create a stack immutable object, it could have allocated all valuetypes into heap. Its an added facility that the language provides to us.
Value Types gets created over stack because to save space over heap or may be for performance right because heap creation takes time.

ValueType are specially treated objects that are created directly into stack for high performance, as Stack is created on ThreadLocal storage which is limited to 1MB, but can allocate and deallocate very fact because it just moves the Pointer back and forth. Hence it saves processor cycles.

Actually we dont require to save space on heap, as Heap size is limited to the process not something to the Thread itself. Hence Heap size can be increased too at times. We use stack over heap because it is high performance and cleared more easily than the later.
Why are Value Types struct and not class if they both are objects. To avoid complexity for simple types?

Well, I think the question is already answered. But yes, value types are syntactically defined using struct in C#, but theoritically if you want, you can possibly build your own language which writes IL for types which are marked somehow as valueType probably implementing your class from System.ValueType to ensure it writes initobj rather than newobj for those classes.

But as we already know struct from C, Language team used this construct to define a valuetype. Yes to make programming simplistic.
Why we have to do boxing and unboxing if valuetypes are objects too, or just to copy from stack to heap and back on?

By saying object if you are talking about System.Object type, then yes, ValueTypes are actually a special objects that are created on local stack and referenced and de-referenced quickly. But if you are talking about object in terms of storage, valueTypes are not a normal object, but a special one that are immutable. Hence it needs to be boxed, I mean stored to Heap when it needs to be used as reference type. And Unbox when the object needs to be reallocated to stack.

I hope you enjoyed this post.

By the way, It is not always true that every ValueType is allocated in Stack. You can go for the Second part of the Series from here. On the other hand if you are eager to know more about internals of C# please read my posts on internals of C# language constructs.

Please put your feedback as comments.

Thank you for reading.

DOT NET TRICKS

Saturday, July 16, 2011

ValueTypes and ReferenceTypes : Under the Hood

No comments:

Post a Comment

Author's new book

Join me to get updated

About Me