Monday, 12 September 2016

c++ Vector, what happens whenever it expands/reallocate on stack?



I'm new to C++ and I'm using the vector class on my project. I found it quite useful because I can have an array that automatically reallocates whenever it is necessary (ie, if I want to push_back an item and the vector has reached it's maximum capacity, it reallocates itself asking more memory space to the OS), so access to an element of the vector is very quick (it's not like a list, that to reach the "n-th" element I must go through the "n" first elements).



I found this question very useful, because their answers explained perfectly how the "memory allocator" works when I want to store my vector on the heap/stack:



[1] vector vect;
[2] vector *vect = new vector;
[3] vector vect;



However, a doubt is bugging me for a while, and I can't find its answer:
Whenever I construct a vector and begin pushing a lot of items in, it would reach a moment when the vector would be full, so to continue growing it would need to reallocate, copy itself to a new location and then continue pushing_back items (obviously, this reallocation it's hidden on the implementation of the class, so it is completely transparent to me)



Fine, if I have created the vector on the heap [2], I have no troubles imagining what may be happening: class vector calls malloc, acquires new space and then copy itself into the new memory and finally deletes the old memory calling free.



However, a veil hides what is happening when I construct a vector on the stack [1]: What does it happens when the vector must reallocate? AFAIK, whenever on C/C++ you enter a new function, the computer would look at the declaration of variables and then expand the stack to get the necessary space to put these variables, but you can't allocate more space on the stack when the function is already running. How does the class vector solve this problem?


Answer



You wrote





[...] copy itself to a new location [...]




which is not the way a vector works. The vector data is copied to a new location, not the vector itself.



My answer should give you an idea of how a vector is designed.






Note: The std::allocator is actually likely to be an empty class and std::vector will probably not contain an instance of this class. This may not be true for an arbitrary allocator.



std::vector layout



In most implementations it consists of three pointers where




  • begin points to the start of the data memory of the vector on the heap (always on the heap if not nullptr)

  • end points one memory location past the last element of the vector data

    -> size() == end-begin

  • capacity points on memory location past the last element of the vector memory -> capacity() == capacity-begin





We declare a variable of type std::vector where T is any type and A is an allocator type for T (i.e. std::allocator).



std::vector vect1;



How does this look like in memory?



std::vector on the stack



As we see: Nothing happens on the heap but the variable occupies the memory that is necessary for all of its members on the stack.
There it is and it will stay there until vect1 goes out of scope, since vect1 is just an object like any other object of type double, int or whatever. It will sit there on its stack position and wait to get destroyed, regardless of how much memory it handles itself on the heap.



The pointers of vect1 do not point anywhere, since the vector is empty.






Now we need a pointer to a vector and use some dynamic heap allocation to create the vector.



std::vector * vp = new std::vector;


Let's again look at the memory.



std::vector on the heap




We have our vp variable on the stack and our vector is on the heap now. Again the vector itself will not move on the heap since its size is constant. Only the pointers (begin, end, capacity) will move to follow the data position in memory if a reallocation takes place. Let's have a look at that.





Now we can start pushing elements to a vector. Let's look at vect1.



T a;
vect1.push_back(a);



std::vector after single push_back



The variable vect1 is still where it has been but memory on the heap was allocated to contain one element of T.



What happens if we add one further element?



vect1.push_back(a);



std::vector after second push




  • The space allocated on the heap for the data elements will not be enough (since it is only one memory positions, yet).

  • A new memory block will be allocated for two elements

  • The first element will be copied/moved to the new storage.

  • The old memory will be deallocated.



We see: The new memory location is different.




To have additional insight let's look at the situation if we destroy the last element.



vect1.pop_back();


The memory allocated won't change but the last element will have its destructor called and the end pointer moves one position down.



std::vector after 2x push and 1x pop




As you can see: capacity() == capacity-begin == 2 while size() == end-begin == 1


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...