Hands-on C++ Tutorial: Pointers, References, Memory, and Segfaults
Table of Content
0.1 Goal
Learn and immediately try in practice how to use C++ pointers, references, and related memory access concepts effectively and safely.
0.2 Prerequisites
You are familiar with basics of C++, you might have actually already used pointers, but would like to clarify and sort out your knowledge. This tutorial is not intended for absolute beginners.
0.3 Programming Environment
This tutorial assumes you will use ROOT as a C++ command line interpreter. It is not required, but it can be very, very helpful. You will be able to type in a single line of code and immediately see its effect without a need for compiling or other fuzz and hassles.
Where to get ROOT?
ROOT is a physics analysis framework, you can get it here:
http://ROOT.cern.ch/, it is very easy to install, do not worry and get scared by its title page. This tutorial will only use a tiny bit of it, but it is worth it, trust me.
1 Terminology
1.1 Variables
syntax:
[const] type (identifier [= starting_value])*;
Example: (Type it into your ROOT command prompt, we will use it later.
Of course, omit the ROOT prompt, I will omit it in next examples as well.)
ROOT [0] int variable1 = 42, tmp_variable = 0;
Declares a regular variable. It can store an integer value whose exact size depends on your system and current compiler. As soon as we declare a variable, the required amount of memory is reserved for it. We cannot choose a specific location, this is done solely by a compiler. Statically declared variables are usually on the same place during every run-time until you recompile your program.
#define ARRAY_SIZE 5
Declares a constant. It is in fact a pre-compiler directive (therefore starting with
#
), which is expanded - replaced by its value before the code is actually sent to compilation. It is a good habit to have such constant in uppercase.
int my_array[ARRAY_SIZE];
Declares an array of five integers - a continuous piece of memory of five consequent integer slots.
my_array
itself is in fact nothing more than just a plain pointer to the first slot which gets a special treatment by a compiler. We will return to this topic again later.
for (tmp_variable = 0; tmp_variable < ARRAY_SIZE; tmp_variable ++) my_array[tmp_variable] = tmp_variable * tmp_variable;
Fill the array by some values (square of
tmp_variable
).
Note 1: Type this if you know it anyway, because it will be used later.
Note 2: ROOT checks if your index to an array is not out of bounds with tolerance of 1. That is not standard behavior for compilers, do not count with it.
1.2 Pointers
Pointer is from:
point to, it is a place in the memory that references some other place in memory, which actually stores some
useful data. It can be used as a
handle to the variable whose address it holds.
syntax:
[const] type ([const] \* name)*;
//where name is a valid identifier, \* means asterisk character, not a pattern repetition
For example:
int * pointer1 = NULL;
Declares a pointer. Pointer is in fact a regular unsigned integer type (no matter whether it points to float, object, or char), but because its use is notably different, it is defined as a specific type. This will allow compiler to know something about the data being pointed, especially its size.
Note 1: Both:
int* pointer1
,
int *pointer1;
will work as well, spaces are not important;
Note 2: There is no
pointer type, you need to use an asterisk for every pointer declaration, ie.
int * p1, * p2;
.
Note 3: NULL
is a special value that indicates that pointer is not pointing to any valid reference or memory address, it is called a
null pointer. This value is a result of type-casting an integer value of 0 to any pointer type. Many programming environments have a constant of this kind defined, ROOT uses
NULL
. Pointers are not initialized to
NULL
automatically.
It is of course possible to declare an array of pointers.
int * pointer_array[ARRAY_SIZE];
Declares an array of 5 pointers, each to an int variable.
1.3 Operators
1.3.1 Reference Operator (&) - an Address of
"Tell me your address in memory."
Returns the address of a variable in a memory, which can be, for example, later saved as a pointer. Quite like this:
pointer1 = &variable1; //assuming we declared pointer1 before, as we did
Then we can do:
cout << "Address of variable1 is: " << hex << (int) &variable1 << endl;
cout << "Content of pointer1 is: " << (int) pointer1 << endl;
Note: << hex
means: print variable in hexadecimal format, and
(int)
means treat the variable as an integer number (will be explained more in depth later). This is necessary to display the memory address properly on Windows (UNIX ROOT can do it without formatting mark).
Not surprisingly, results are the same, pointer1 point to the same place where variable 1 is. And content of a pointer should be an address.
However, this is not all. Check out this:
for (tmp_variable = 0; tmp_variable < ARRAY_SIZE; tmp_variable ++){
cout << "Address of an array slice " << tmp_variable << " is: " << hex << (int) &my_array[tmp_variable] << endl;
}
You will get something like (addresses my differ):
Address of an array slice 0 is: 0x335f170
Address of an array slice 1 is: 0x335f174
Address of an array slice 2 is: 0x335f178
Address of an array slice 3 is: 0x335f17c
Address of an array slice 4 is: 0x335f180
Array slices are on consequent addresses and we see that an integer in our (mine) ROOT has 4 bytes.
Summary: using the reference operator will give you an address on which data for given variable resides. This address can be stored in an pointer or used for de-referencing.
1.3.2 De-reference Operator (*) - a Value Pointed by
"Tell me what is on the place you point to."
Retrieves a content of a memory pointed by the pointer. The amount of memory read (or written) depends on the pointer type.
cout << "Content of variable1 is: " << dec << variable1 << endl;
cout << "Value pointed by pointer1 is: " << *pointer1 << endl;
Note: << dec
is needed to turn the number display back to decimal format.
Not surprisingly, results are the same, pointer1 points to the very same place as where variable 1 is. We set it like this, remember?
Summary: If you de-reference a memory address (stored in pointer for example), you will get a data which are stored there. This is reciprocal operation to referencing.
To make the chapter complete, I briefly mention other operators, which will be then explained later.
1.3.3 Offset Operator ([]) - a Special Version of a De-reference Operator
It is a mere shortcut, it combines a de-reference operation and an offset calculation:
"give me the content of a memory cell on the address of pointer plus an offset." It will be explained in detail in the section
Pointer Arithmetic.
2 Operators in Action
2.1 What Can Be Done (and Why)
We can also do the following:
int * pointer_to_array = my_array;
We are storing the address of the first element of an array to a pointer, it is actually quite same as:
int * pointer_to_array = &my_array[0];
Check it out (
note: << hex << (int)
are just to format the output properly):
cout << "Content of a pointer to array is: " << dec << *pointer_to_array << endl;
cout << "Content of my_array[0] is: " << my_array[0] << endl;
You can see that content of my_array[0]
is the same as data pointed by pointer_to_array
, they both point to the same location (were defined like this).
... and get the addresses (using the reference operator):
cout << "Address of an my_array is: " << hex << (int) &my_array << endl;
cout << "Address of 0th element of an my_array is: " << hex << (int) &my_array[0] << endl;
cout << "pointer_to_array points to: " << hex << (int) pointer_to_array << endl;
All are the same. Not so surprisingly, considering how they were defined.
Next, try this:
cout << "Content of my_array[1] is: " << dec << my_array[1] << endl;
cout << "Content of pointer_to_array[1] is: " << pointer_to_array[1] << endl;
Both are 1, we stored there one, remember 1*1 = 1?
And how does compiler know that pointer_to_array
is an array? It does not! This is just a language construct called de-referencing an address with an offset. I told ya.
Try this:
cout << "Content of my_array[8] is: " << my_array[8] << endl;
ROOT complains:
Error: Array index out of range my_array[8] -> [8] valid upto my_array[4]
,
but normal compiler will NOT do so, at least depending on the error level setting and its cleverness. Normal compilers are usually not so clever as ROOT. Praise ROOT.

But do not get used to it too much.
cout << "Content of my_array[8] is: " << pointer_to_array[8] << endl;
Now, we have cheated even ROOT and got a content of memory we have not initialized. ROOT is not so smart after all. Or we are doing too much magic with the memory. Using to big number will cause an invalid memory access and your OS will kill ROOT process.
Note: Valgrind can actually trace this issue, it is even smarter than ROOT. Keep his name in mind, it will save you lot of sleepless nights.
cout << "Content of pointer1[3] is: " << dec << pointer1[3] << endl;
4? What the hell? Maybe we have actually hit our own
my_array
. So yes, the code will compile. But now you are really asking for trouble. If you will write to this place, strange things will happen - I mean:
http://xkcd.com/371/.
However, we use this useful things too. Would you like a handy byte mask on our integer? Remember, it should be 4 byte (at least on my ROOT and my machine), but check it out.
cout << "sizeof(int) is:" << sizeof(int) << endl;
cout << "sizeof(char) is:" << sizeof(char) << endl;
Set its value to something big, like 1 167 291 123, and check if it did fit:
variable1 = 1167291123;
cout << "variable1 is: " << dec << variable1 << endl;
cout << "variable1 is: " << hex << variable1 << endl;
Are we a big-endian or a little-endian machine? Lets find out. Create a
char
pointer to our variable1:
char *pointer2 = &variable1;
And check (
note: (int) means casting a character to a number to allow proper printing its content, casting is described in depth later,
hex
will convert output to hexadecimal format):
cout << "Slices of variable1 are: ";
for (tmp_variable = 0; tmp_variable < sizeof(int); tmp_variable ++) cout << hex << (int) (pointer2[tmp_variable]) << ", ";
cout << endl;
Now, we have just sliced our integer variable to a single bytes and printed their content. My Intel Mac is a a little-endian machine, byts are in reciprocal order. Ok, there are few extra ff's printed out, but this is just a matter of formatting. Such operation can be very handy if you need to manipulate with a bit stream. But be a good programmer, always comment what you are doing; such pieces of code can be a hell to decipher.
2.2 Pointer Arithmetic
Can you type
pointer2++;
?
Oh yes you can. But what happens? You do not increase the content of memory pointed but the pointer itself. Remember, pointer2 was a char type pointer pointing to the beginning of an int, so now it points to next byte.
Increments or decrements are always done by
sizeof(pointer type)
bytes, so you do not need to worry about it.
In fact
ptr[x]
is a mere shortcut notation for
*(ptr + x)
, remember:
pointer_to_array = my_array; = my_array[0]
, check it out:
cout << "my_array[2] is equal to: " << my_array[2] << ", and *(pointer_to_array +2) is equal to: " << *(pointer_to_array +2) << endl;
Remember: The following assignment operation would be valid:
pointer_to_array = my_array;
that's how close they are, in fact that's how same they are. Identifier pointing to an array is nothing else but a pointer defined in a different way.
Keep in Mind: The increase (++) and decrease (--) operators have greater precedence than the de-reference operator (*). But there is a special behavior when they are used as suffix (
variable++
), then the expression is evaluated with the value variable had before changing. Do not be confused.
So...
cout << *pointer2++ << endl;
...would do: get pointer2 and increase it by one, and meanwhile print content of pointer2 old target, thus you get an old target value, but pointer is increased.
Of course only an addition and a subtraction with pointers is defined. You can add pointers together, but I cannot see any reasonable purpose for doing so, technically you will probably only add or subtract integers.
2.3 Functions and Parameter Passing
By default parameters to functions like this
int add (int x, int y);
are passed as copies - meaning a local copy of input data is created on stack and referenced to a function. This has following properties:
- Function cannot change the value of input parameters (safe).
- Only data structures, which are copyable can be passed.
- Function cannot return multiple values.
- Copying is slow.
There are two possibilities how to pass a parameter which can be modified from inside of the function:
Passing Pointer
This is a classical C style:
int add (int *x, int *y);
function will get a regular pointer, it must be used as a pointer in its body:
return = *x + *y;
Memory for data objects must be allocated manually, if static variables should be passed, they must be referenced in a function call:
result = add (&ax, &ay);
. If you return a pointer to local variable from the body of function, you are in trouble.
Passing Reference
This is a new feature in C++,
int add (int &x, int &y);
variables are passed as reference. They do not need to be referenced in a function call,
result = add (xa, ya);
, and can be used as a regular variable in function body
return = x + y;
.
This is both much easier and safer and thus recommended way, it is fast as well.
3 Extras (Speacials Which Might Be Handy)
3.1 Constant Pointers
Constant pointer is a normal pointer. You just cannot change it once it is set, or you cannot change the location it points to, depending on where you put the
const
keyword, that is all folks.
Changeable pointer to constant int
Lets have:
const int const_integer1 = 42, const_integer2 = 666;
const int *not_so_const_pointer = &const_integer;
... will work, and you can even do:
not_so_const_pointer = &const_integer2;
... and you can actually also do:
const int *const_pointer = &variable1;
... but typing:
*const_pointer = 3;
... will not work, because compiler assumes that target is a constant.
To summarize: having the
const
first makes a pointer which can be changed, but points to a location which cannot be changed through this pointer. But it says nothing about location itself: keep that in mind, its content can be changed trough some other pointer for example.
Constant pointer to changeable int
Type:
int *const const_pointer = &variable1;
... and you can:
*const const_pointer = 13;
... but cannot:
const_pointer = & const_integer2;
This pointer points to a defined unchangeble location whose content can be changed freely, even through this pointer. Actually an
my_array
identifier is internally
exactly just the same kind of thing (
int * const ptr
).
cout << "variable1 is: " << *const_pointer << endl;
Try:
const_pointer = &tmp_variable;
...and compiler will complain, not permitted.
Const pointer to const int
const int* const totally_constant_pointer = &const_integer1;
You cannot change neither the pointer target nor the target content. There are ways how to trick it, but that's only asking for trouble.
Why to use all of this const mess? Please look at resource 6, I would attach my signature if that would be printed on a paper. To summarize - at a little expense of some extra thinking, you can write a code which is less prone to incidental errors, easier to debug, and easier to optimize. Especially function parameters should be declared as constant if they are not changed inside the function, this can speed up the code a big bit.
3.2 Void Pointers
The void type pointer has no type, surprisingly. Void pointers point to a value that has no type (and thus also an undetermined length and undetermined de-reference properties). Therefore they cannot be de-referenced without being properly type cast first. On the other hand, they can point to any data type, from an integer value or a float, to an object or a string of characters.
void * void_pointer;
They are useful for input or output values of functions where they provide no limitations on data types. On the other hand, user must be told what to expect and what to cast the pointer to. Using them is like balancing on the edge. It is easy to fall and segfault.
Do not use them unless you have to.
3.3 Pointers to Pointers
We can have pointers, which point to pointers, pointing to a data (or even to other pointers). In order to do that, we only need to add an asterisk (*) for each level of reference in their declarations. Type:
int **pointer_to_pointer = &pointer1;
... and:
cout << "pointer1 points to a content: " << *pointer1 << ", and has itself an address: " << hex << (int) &pointer1 << endl;
cout << "pointer_to_pointer points to content: " << dec << **pointer_to_pointer << " << cout;
cout << "pointer_to_pointer points to an address: " << hex << (int) *pointer_to_pointer << ", and has itself an address of: " << (int) &pointer_to_pointer << endl;
Quite logical I assume. Why to use it? It might not make sense directly, but it can be useful if you already have several versions of content referenced by several pointers and you need to pick one and pass it.
3.4 Pointers to Functions
Yes, it can be done, you can point to function. Typical use of this is to pass a function as an argument to another function or an object - to an iterator for example. You declare it like the prototype of a function, enclose the name in parentheses () and add an asterisk.
Create a function pointer (note: no quotes around function name):
Note: sqrt is ROOT name for square root function.
double (*function_pointer)(int) = sqrt;
Call a function in a pointer (works in ROOT):
(*function_pointer)(25);
And output is:
(const double)5.00000000000000000e+00
Correct!
4 Few Bits on Dynamic Memory
4.1 Operators
4.1.1 New and Delete
C++ integrates the operators new and delete. Why to use these instead of malloc()?
-
new
is typesafe. malloc()
returns a void pointer
which needs to be cast, but new
returns an object of the specified type.
-
malloc()
only reserves memory, while new
also calls a constructor on an object.
You actually have two versions of each, one for variables and one for arrays:
-
new
-
delete
-
new[]
-
delete[]
Note: Rumors say that there is a version of new called placement new that creates a variable at given address. But I assume that if you are reading this tutorial you are not interested in such eccentric feature. You might use it when programming DMA for devices or that kind of thing.
If
new
fails, an exception bad_alloc should be generated. This can by prevented by adding
(nothrow)
between new and the entity. Older C++ compilers may not accept
(nothrow)
and will return 0 if
new
fails.
//remember int *pointer1;
//char *pointer2;
pointer1 = new (nothrow) int; //you should check if pointer1 is != null
pointer2 = new char[10] ;
*pointer1 = 112;
Check it:
cout << "Adress pointed by pointer1 is: " << pointer1 << ", and content is: " << pointer1 << endl;
And always do not forget to clean your memory:
delete pointer1;
delete[] pointer2;
4.1.2 Member Access from an Object (.)
Accesses a member of struct or class, either function, or variable.
Lets define a new struct
item
:
struct item
{
int weight;
float price;
} hammer;
//immediately also creates a static instance of an item object.
hammer.weight = 1;
hammer.price = 35.7;
4.1.3 Member Access from a Pointer (->)
This is nothing more than a mere shortcut for a notation
(*identifier).
, which means: give me a member of an object pointed by the identifier. Because of operator precedence definition, brackets would be required making the notation awkward. But it is the same.
Lets create another item dynamically (use previous item struct definition):
item * dynamic_hammer = new item;
dynamic_hammer -> weight = 2;
dynamic_hammer -> price = 22.3;
//clean the mess, free memory and set pointer to point nowhere
delete dynamic_hammer;
dynamic_hammer = NULL;
... To be continued in C++ Objects and Dynamic Memory Tutorial
5 Good Habits
- Always initialize pointer variables, or set them to
NULL
.
- Do not forget, C strings (char arrays) have a terminating 0 so they are one byte longer.
- Check if
new
succeeded.
- Clean your memory after yourself!
- Set unused pointers to
NULL
.
6 Useful Notes
6.1 sizeof()
An operator integrated in the C++ language that returns the size in
bytes of its parameter, constant for non-dynamic data types.
For ROOT:
sizeof(char)
sizeof(short)
sizeof(int)
sizeof(long)
sizeof(Long64_t)
For more:
List of ROOT data types
6.2 Casting
Casting means telling the compiler to use a variable like a specific type without regard its original type. It can be very strong tool, but can lead to pesky problems, especially if you port your code between different machines or architectures.
This tutorial already used casting in display routines.
cout << (int) *pointer2 << endl;
This is nothing more than telling the compiler to treat a character variable like an integer - it will not display it as an ASCII coded letter but like a number instead.
7 Links
7.1 Detailed
- C++ : Documentation : C++ Language Tutorial : Pointers
- C++ Tutorial - About Pointers
7.2 Really for Dummies
- Learning C++ Pointers for REAL Dummies
7.3 Additional
- C++ Tutorial - Storage Specifiers
- C++ Operator Precedence
- Const Correctness in C++
--
TomasKubes - 29 Jul 2008