(Significantly More, but Honestly Never Enough) Comprehensive Notes on C++
2024-05-14
The following notes are based on tutorials on LearnCpp.com. I take zero credit on any of the following contents except summarizing stuff here and there to fit them into one single post (barely). If you find it useful, please feel free to express your gratitude toward Alex and other authors of the original tutorial website.
Introduction / Getting Started
OK. This is a tediously long post but let’s start from the very basics.
Introduction to programming languages
A computer program is a set of instructions that the computer can perform. There are different levels of programming languages that a computer can run:
- Machine language: The limited set of instructions that a CPU can understand directly is called machine code (or machine language, or instruction set), which is composed of a sequence of binary digits (aka bits). Machine codes are usually not portable, namely, you have to rewrite them for every different system.
- Assembly language: To make machine language more readable, assembly language was invented with short abbreviations of instructions, and numbers are supported now. However, assembly language is still likely not portable. In order for CPU to understand the assembly language, the program must be translated to machine language by using an assembler.
- High-level language: To address the problem that programs are not generally portable across systems, there we have high-level programming languages, which must still be translated into a format the computer can understand. There are two ways in general:
- Through a compiler, which is a program that reads source code and produces a standalone executable program that can then be run. Compilers can produce very fast and optimized code, sometimes faster than assembly language human writes. Another benefit of compiled programs is that distributing a compiled program does not require distributing the source code. In an non-open-source environment, this is important for IP protection purposes.
- Through an interpreter, which is a program that directly executes the instructions in the source code without requiring them to be compiled into an executable first. Interpreters tend to be more flexible than compilers, but are less efficient when running programs because the interpreting process needs to be done every time the program is run. This also means the interpreter is needed every time the program is run.
Introduction to C/C++
The C language was developed in 1972 by Dennis Ritchie at Bell Lab. C ended up being so efficient and flexible that in 1973, Ritchie and Ken Thompson rewrite most of the Unix operating systems using C. In 1978, Brian Kernighan and Dennis Ritchie published the book The C Programming Language (later known as K&R) which provided an informal standard. In 1983, the American National Standards Institute (ANSI) formed a committee to establish a formal standard for C. They finished in 1989 with the C89 standard. In 1990, the International Organization for Standardization (ISO) adopted C89 and published C90, and later in 1999 the C99 standard.
C++ was developed by Bjarne Stroustrup at Bell Lab as an extension to C since 1979. C++ is mostly considered as a superset of C, but this is not strictly true as C99 introduced a few features that do not exist in C++. C++ was standardized in 1998 by the ISO and got a minor update in 2003, thereafter called C++03. Five major updates to the C++ language (C++11, C++14, C++17, C++20 and C++23) have been made since then, each adding new functionalities. C++11 in particular added a huge number of new capabilities and is widely considered to be the new baseline version of the language.
The underlying design philosophy of C/C++ can be summed up as “trust the programmer”, which is both wonderful and dangerous.
Compiler, linker and libraries
The compiler translates *.cpp
to corresponding *.o
object files. The linker combines all *.o
object files, together with C++ Standard Library and other libraries, and then generates an executable file e.g. something.exe
on Windows. This whole process is usually referred to as building. The specific executable produced as the result of building is sometimes called a build.
By combining softwares (editor, compiler, linker, debugger) involved in above steps into one, we have an integrated development environment (IDE).
C++ Basics
Preprocessor directive
The preprocessor directives are the include lines on top of source files e.g.
#include <iostream>
which here indicates that we would like to use the contents of the iostream
library (which is the part of the C++ standard library that allows us to read and write text from/to the console).
Statements
A computer program is a sequence of instructions that tell the computer what to do. A statement is a type of instruction that causes the program to perform some action. Most (but not all) statements in C++ end in a semicolon. There are many different kinds of statements in C++:
- Declaration statements
- Jump statements
- Expression statements
- Compound statements
- Selection statements (conditionals)
- Iteration statements (loops)
- Try blocks
Functions and the main
function
A function is a collection of statements that get executed sequentially (in order, from top to bottom). The name of a function is called its identifier.
Every C++ program must have a main
function.
Comments
std::cout << "Hello world!">; // this is a single-line comment
/* this is a multi-line
comment in C++ */
Data, values, objects and variables
Data on a computer is typically stored in a format that is efficient for storage or processing (and is thus not human readable). A single piece of data is called a value. An object in C++ is a region of storage on RAM that can store a value, and has other associated properties. Although objects in C++ can be unnamed (anonymous), more often we name our objects using an identifier. An object with a name is called a variable.
Variable instantiation
We use a special kind of declaration statement called a definition to create a variable, e.g.
int x;
At compile-time, when the compiler sees this statement, it makes a note to itself that we are defining a variable named x
of type int
. Moving forward, whenever the compiler sees x
, it knows that we’re referencing this variable. In C++, the type of a variable must be known at compile-time and cannot be changed without recompiling the whole program.
At runtime, the variable will be instantiated, which means the object will be created and assigned a memory address.
You can define multiple variables in either ways below:
int a;
int b;
// or
int a, b;
Variable Initialization
There are 6 basic ways to initialize a variable in C++:
int a; // default initialization
int b = 5; // copy initialization
int c(6); // direct initialization
int d {7}; // direct list initialization
int e = {8}; // copy list initialization
int f {}; // empty list / value initialization
The modern way to initialize objects in C++ is to use a form of initialization that makes use of curly braces. Informally, this is called list initialization (or uniform initialization or brace initialization). List initialization has an added benefit besides providing a uniform interface (for both atomic and a list of values): it disallows narrowing conversions. This means that is you try to brace initialize a variable using a value that the variable can not safely hold, the compiler will produce an error instead of a warning or even worse, silently taking the valid part of the input:
int width = 4.5; // warning: implicit conversion from 'double' to 'int' changes value from 4.5 to 4
int width { 4.5 }; // error: a number with a fractional value can't fit into an int
You can initialize multiple variables in the following ways (except the last one):
int a = 5, b = 6;
int a(5), b(6);
int a{5}, b{6};
int a = {5}, b = {6};
int a {}, b {};
int a, b(5);
int a, b{5};
int a, b = 5; // error: a is not initialized!
Unused initialized variables warning
C++ will generate a warning is a variable is initialized but not used. For example:
int main() {
int x { 5 };
return 0;
}
When compiling the program above, the following warning is generated
main.cpp:2:9: warning: unused variable 'x' [-Wunused-variable]
int x { 5 };
^
1 warning generated.
There are a few ways to fix this. First, if the variable is really unused, we can just remove its definition from the program. Second, we can use it somewhere e.g. cout << x
. When neither are desirable solutions, in C++17 we have introduced a new [[maybe_unused]]
attribute, which allows us to tell the compiler that we’re okay with these variables being unused.
int main() {
[[maybe_unused]] int x { 5 };
return 0;
}
Introduction to iostream
: cout
, cin
and endl
We need to include iostream
library so that we have access to std::cout
etc. We can use std::cout
with an insertion operator <<
to send the string to the console to be printed. Similarly, we can print a newline whenever a line of output is complete by using std::endl
.
It’s good to know that std::cout
is buffered, meaning that the slow transferring of a batch of data to an output device is optimized with a potential risk of not printing everything if the program crashes, aborts or is paused (e.g. for debugging) halfway before the buffer is flushed. Also, for the same underlying reason, using \n
instead of std::endl
is actually preferred for better performance, as the cursor is moved to the next line of console without flushing the buffer.
std::cout << "x is good" << std::endl; // this is slower than below
std::cout << "y is better\n"; // no flushing, just newline
Whereas std::cout
uses an insertion operator <<
, std::cin
reads the input using an extraction operator >>
. The input must be stored in a variable to be used later.
int x{};
std::cin >> x;
It’s (also) good to know that the C++ I/O library does not provide a way to capture keyboard input without the user pressing ENTER. In order to do that, one might need to resort to third-party libraries e.g. pdcurses
, FXTUI
, cpp-terminal
or notcurses
.
Let’s take a look at the following program
#include <iostream>
int main() {
std::cout << "Enter a number: ";
int x{};
std::cin >> x;
std::cout << "You entered " << x << '\n';
return 0;
}
What happens when we enter
-
123
:You entered 123
Works fine.
-
a
:You entered 0
Nothing to extract really.
std::cin
goes into failure mode and refused to pass anything tox
(which by default is 0). -
123.1
:You entered 123
This is because list initialization only disallows narrow conversion on the line of definition. The user’s input is processed and assigned to
x
on another line and thus conversion still can happen. -
-3
:You entered -3
Works fine for negative numbers as well.
-
aaa
:You entered 0
Nothing to extract.
-
a123
:You entered 0
Nothing to extract because the input is lead by character
a
. -
123a
:You entered 123
Extracting everything until it can’t.
Uninitialized variables
In C++, leaving the variables uninitialized can be dangerous, so we need to know what will happen when e.g. an integer is defined yet uninitialized:
int globalVar; // [GOOD] global variable is zero-initialized
void function() {
static int staticVar; // [GOOD] static variable is zero-initialized
}
int main() {
int localVar1; // [BAD] local variable is indeterminate when uninitialized
std::cout << localVar1; // [BAD] could be any value or lead to crash
int localVar2{}; // [GOOD] local variable is zero-initialized w/ curly braces
int array[5] = {42}; // [GOOD] unspecified array values are zero-initialized
}
Keywords (aka reserved words)
There are 92 keywords as of C++23, which all have special meanings in the C++ language:
alignas |
const_cast |
int |
staic_assert |
alignof |
continue |
long |
static_cast |
and |
co_await (C++20) |
mutable |
struct |
and_eq |
co_return (C++20) |
namespace |
switch |
asm |
co_yield (C++20) |
new |
template |
auto |
decltype |
noexcept |
this |
bitand |
default |
not |
thread_local |
bitor |
delete |
not_eq |
throw |
bool |
do |
nullptr |
true |
break |
double |
operator |
try |
case |
dynamic_cast |
or |
typedef |
catch |
else |
or_eq |
typeid |
char |
enum |
private |
typename |
char8_t (C++20) |
explicit |
protected |
union |
char16_t |
export |
public |
unsigned |
char32_t |
extern |
register |
using |
class |
false |
reinterpret_cast |
virtual |
compl |
float |
requries (C++20) |
void |
concept (C++20) |
for |
return |
volatile |
connst |
friend |
short |
wchar_t |
consteval (C++20) |
goto |
signed |
while |
constexpr |
if |
sizeof |
xor |
constinit (C++20) |
inline |
static |
xor_eq |
C++ also defines special identifiers: override
, final
, import
and module
. These have special meanings when used in certain contexts but are not reserved otherwise.
Identifier naming best practices
-
If the variable/function name is one word, the whole thing should be in lower case e.g.
int value; int function();
-
If the variable or function name is multi-word, both snake case and camel case are preferred.
int some_value; // snake case int some_function(); // snake case int someValue; // camel case int someFunction(); // camel case
-
Avoid naming identifiers starting with an underscore, as these are often reserved for OS, library and/or compiler use.
-
Name your identifiers as meaningful as possible, except for those trivial variables serving as intermediate purposes.
Whitespace rules
-
Whitespaces are used as separators in C++, but the compiler doesn’t care how many whitespaces are used.
-
Sometimes newlines are used as separators e.g. in single-line comments and preprocessor directives (e.g.
#include <iostream>
). -
Quoted text takes whitespaces literally and do not allow newlines. Quoted text separated by nothing but whitespaces or newlines are automatically concatenated:
std::cout << "this is not the same"; std::cout << "this is not the same"; std::cout << "this is not allowed"; // broken by newline, not allowed std::cout << "this is " "allowed though"; // separated by whitespace
Basic coding style
-
By converting tabs into spaces automatically, the formatting is self-describing – code that is spaced using spaces will always look correct regardless of editor. There is no right answer whether using tabs is better or worse than spaces though.
-
The Google C++ style recommends putting the opening curly brace on the same line as the statement, the other commonly accepted alternative is to align the opening and closing braces on the same level of indentation.
-
Each statement within curly braces should start one tab in from the opening brace.
-
Lines should not be too long. Typically by “too long” we mean more than 80 characters.
-
If a long line is split with an operator (e.g.
<<
or+
), the operator should be placed at the beginning of the next line, not the end of the current line:std::cout << 3 + 4 + 5 + 6 * 7 * 8;
-
Use whitespaces to make your code easier to read by aligning values or comments or adding spacing between blocks of code:
// 1A: hard to read: cost = 57; pricePerItem = 24; value = 5; numberOfItems = 17; // 1B: easier to read: cost = 57; pricePerItem = 24; value = 5; numberOfItems = 17; // 2A: hard to read: std::cout << "Hello world!\n"; // cout lives in the iostream library std::cout << "It is very nice to meet you!\n"; // these comments make the code hard to read std::cout << "Yeah!\n"; // especially when lines are different lengths // 2B: easier to read: std::cout << "Hello world!\n"; // cout lives in the iostream library std::cout << "It is very nice to meet you!\n"; // these comments are easier to read std::cout << "Yeah!\n"; // especially when all lined up // 3A: hard to read: // cout lives in the iostream library std::cout << "Hello world!\n"; // these comments make the code hard to read std::cout << "It is very nice to meet you!\n"; // especially when all bunched together std::cout << "Yeah!\n"; // 3B: easier to read: // cout lives in the iostream library std::cout << "Hello world!\n"; // these comments are easier to read std::cout << "It is very nice to meet you!\n"; // when separated by whitespace std::cout << "Yeah!\n";
Literals
Literals are values that are inserted directly into the source code. These values usually appear directly in the executable code (unless they are optimized out). In contrast, objects and variables represent memory locations that hold values and these values can be fetched on demand.
Operators
In mathematics, an operation is a process involving zero or more input values (called operands) that produce a new value (called an output value). The specific operation to be performed is denoted by a symbol usually, called an operator. This is the same as in C++. There are operators that are represented by symbols e.g. +
, -
, *
, /
, =
and insertion <<
, extraction >>
and equality ==
. There are also operators that are reserved words e.g. new
, delete
and throw
.
The number of operands an operator takes as input is called the operator’s arity. Operators in C++ come in four different arities:
- Unary operators act on one operand e.g. negative
-
in-5
. - Binary operators act on two operands (often called left and right operands) e.g. addition
+
in3 + 4
, or insertion<<
as instd::cout << "something"
. - Ternary operators act on three operands. There is only one ternary operator in C++, namely the conditional operator.
- Nullary operators act on zero operands. There is only one nullary operator in C++, namely the
throw
operator.
Note that some operators have different meanings depending on the use case e.g. operator -
can mean both negative and subtract.
There is a famous abbreviation for the order in which the arithmetic operators are executed: PEMDAS (parenthesis > exponents > multiplication/division > addition/subtraction).
Functions and Files
Introduction to functions
The syntax/form of a general value-returning function is
return_type function_name() {
// function body
return return_value;
}
and for a non-value-returning functions
void function_name() {
// function body
}
The C++ standard only defines the meaning of 3 status codes for programs: 0, EXIT_SUCCESS and EXIT_FAILURE. Both 0 and EXIT_SUCCESS mean the program executed successfully. EXIT_FAILURE means the program did not execute successfully.
#include <cstdlib> // for EXIT_SUCCESS and EXIT_FAILURE
int main() {
return EXIT_SUCCESS;
}
For maximum portability, we only use 0 or EXIT_SUCCESS to indicate a successful termination, and EXIT_FAILURE to indicate an unsuccessful termination. Function main
will implicitly return 0 if no return statement is provided. That being said, the best practice is still to return a value from main
explicitly.
C++ disallows calling the main
function directly.
Function parameters and arguments
When a function is called, all its parameters of the functions, as defined in the function header, are created as variables, and the value of each of the arguments is copied into the matching parameter (using copy initialization). The process is called pass by value. Function parameters that utilized pass by value are called value parameters.
Local variables
Variables defined inside the body of a function are called local variables (as opposed to global variables), and they’re destroyed in the opposite order of creation at the end of the set of curly braces in which they are defined. The local scope is thereby defined as where a local variable’s lifetime lasts.
Forward declarations
The following program will throw a compile error
#include <iostream>
int main() {
std::cout << "The sum of 3 and 4 is " << add(3, 4) << '\n';
return 0;
}
int add(int x, int y) {
return x + y;
}
because the C++ compiler compiles codes sequentially. By the time main
uses add
, the identifier add
was even defined yet and thus causes a compile error complaining identifier not found
. To address this problem, there are two options:
-
Option 1: reorder the function definitions
#include <iostream> int add(int x, int y) { return x + y; } int main() { std::cout << "The sum of 3 and 4 is " << add(3, 4) << '\n'; return 0; }
-
Option 2: use a forward declaration
This is important as option 1 is not always possible, e.g. when function A calls function B within, and vice versa. That means we can never reorder them to allow sequential compiling.
#include <iostream> int add(int x, int y); // with or without parameter names, both are fine int main() { std::cout << "The sum of 3 and 4 is " << add(3, 4) << '\n'; return 0; } int add(int x, int y) { return x + y; }
It’s crutial to note that, if we forget to actually define the funciton body, the program will still compile just fine, but will eventuall fail at the link stage.
Declaration and definition
In C++, all definitions are declarations, but not always the other way around.
Term | Definition | Examples |
---|---|---|
Definition | Implements a function or instantiates a variable. Definitions are always declarations. | void foo() {} // function definition int x; // variable definition |
Declaration | Tells compiler about an identifier and its associated type information | void foo(); // function declaration int x; // variable declaration |
Pure declaration | A declaration that is not a definition | void foo(); |
Initializer | Provides tan initial value for a defined object | int x {2}; // 2 here is the initializer |
The one definition rule (ODR)
The one definition rule (aka ODR) is a well-known rule in C++, which consists of three parts:
- Within a file, each function, variable, type or template can only have one definition. Definitions occurring in different scopes do not violate this rule.
- Within a program, each function or variable can only have one definition. This rule exists because a program can have more than one files. Functions and variables not visible to the linker are excluded from this rule.
- Types, templates, inline functions and inline variables are allowed to have duplicate definitions in different files, so long as each definition is identical.
Violating the first part will issue a redefinition compile error. Violating part two will cause the linker to issue a redefinition error. Violating part three will cause undefined behavior.
Functions that share the same identifier but different sets of parameters are considered to be different functions, so won’t violate part one/two of ODR above.
Programs with multiple files
In order to write the following program (which will throw a compile error, see above) into a multi-file program
#include <iostream>
int main()
{
std::cout << "The sum of 3 and 4 is: " << add(3, 4) << '\n';
return 0;
}
int add(int x, int y)
{
return x + y;
}
we might just write an app.cpp
as
int add(int x, int y) {
return x + y;
}
and main.cpp
as
#include <iostream>
int main() {
std::cout << "The sum of 3 and 4 is: " << add(3, 4) << '\n';
return 0;
}
Naming collisions and namespaces
Most naming collisions occur in two ways:
- Two (or more) identically named functions (or global variables) are introduced into separate files belonging to the same program. This will result in a linker error.
- Two (or more) identically named functions (or global variables) are introduced into the same file. This will result in a compiler error.
C++ provides plenty of mechanisms for avoiding naming collisions. Besides local scope, we have namespaces:
- The global namespace (aka the global scope)
- The
std
namespace - Other namespaces
There are two ways to use a namespace e.g. std
:
- Explicit namespace qualifier
::
e.g.std::cout
- Using namespace (not recommended) e.g.
using namespace std;
Preprocessors
Preprocessors make various changes to the text of the code file, only after which the compilation happens. In modern compilers, the preprocessors are usually built right inside the compiler itself. Most of what the preprocessors do are fairly uninteresting, e.g. strips out comments, ensures each code file ends in a newline, and (most importantly) processing the #include
directives.
Preprocessor directives
When the preprocessor runs, it scans through the code file (from top to bottom), looking for preprocessor directives. Preprocessor directives (often called directives) are instructions that start with a #
symbol and end with a newline (NOT a semicolon). These directives tell the preprocessor to perform certain text manipulation tasks. Note that the preprocessor does not understand the C++ syntax – instead, the directives have their own syntax (which in some case resembles C++ syntax, and in other cases, not so much).
Below are some of the most common preprocessor directives:
-
#include
to include content of files -
#define
to create global macros (in C++ a macro is a rule that defines how text is converted into replacement output text)There are two types of macros: object-like macros and function-like macros. The latter is usually considered unsafe, and almost anything they can do can be done by a normal function. The first, instead, can be used widely and in one of two ways:
#define identifier #define identifier substitution_text
Note that the substitution only happens in normal code, i.e. it does apply to other preprocessor directives.
#define FOO 9 // Here's a macro substitution #ifdef FOO // This FOO does not get replaced because it’s part of another preprocessor directive std::cout << FOO << '\n'; // This FOO gets replaced with 9 because it's part of the normal code #endif
We recommend avoiding the second kind of macros altogether at all, as there are better ways to do the same thing e.g. using named constants. For the first kind, they are accepted more generally.
-
#ifdef
,#ifndef
and#endif
for conditional compilationThe conditional compilation preprocessor directives allow one to specify under what conditions something will or will not compile
#include <iostream> #define PRINT_JOE int main() { #ifdef PRINT_JOE std::cout << "Joe\n"; // will be compiled #endif #ifdef PRINT_BOB std::cout << "Bob\n"; // will not be compiled at all #endif #ifndef PRINT_BOB #define PRINT_BOB // will define PRINT_BOB #endif #ifdef PRINT_BOB std::cout << "Bob\n"; // will compile this time #endif return 0; }
-
#if 0
can be used to exclude a block of code from being compiled as if it’s inside a comment block:#include <iostream> int main() { std::cout << "Joe\n"; // will be compiled #if 0 std::cout << "Bob\n"; // will not be compiled at all /* this is some previously existing block comment and because of that, we cannot use block comment again to "comment out" this chunk of codes. using #if 0 will be the only solution to avoid this part to be compiled or run by C++ */ #endif return 0; }
The scope of #define
The preprocessor doesn’t understand C++, so all preprocessor directives will be resolved before compilation from top to bottom on a file-by-file basis. The following macro MY_NAME
, as a result, will be resolved without actually calling foo
at all.
#include <iostream>
void foo() {
#define MY_NAME "Alex"
}
int main() {
std::cout << "My name is: " << MY_NAME << '\n';
return 0;
}
and the following code will print Not printing!
because the directives are only valid from the point of definition to the end of the exact file.
function.cpp:
#include <iostream>
void doSomething() {
#ifdef PRINT
std::cout << "Printing!\n";
#endif
#ifndef PRINT
std::cout << "Not printing!\n";
#endif
}
main.cpp:
void doSomething(); // forward declaration for function doSomething()
#define PRINT
int main() {
doSomething();
return 0;
}
Header files
The main purpose of a header file (.h or .hpp) is to propagate declarations to code (.cpp) files. Source files (.cpp) should always include their own paired header files (.h or .hpp) if exist. No source files (.cpp) should be included, although C++ preprocessors are able to do that.
We use angled brackets <>
for header files that are not written by ourselves. The preprocessor will search for the header only in the directories specified by the include directories
. The include directories
are configured as part of the project/IDE/compiler settings, and typically default to the directories containing the header files that come with your compiler and/or OS.
When we use double-quotes ""
for the header file, we’re telling the preprocessor that the header file is written by us and should be first searched in the current directory. If it can’t find the file, it will then search the include directories
.
When trying to include header files from another directory, you can, but are not recommended to do the following
#include "headers/myHeader.h"
#include "../moreHeaders/myOtherHeader.h"
Instead, it’s advised to change the IDE/compiler setting to include the extra include directories e.g.
g++ -o main -I/folder/other/headers main.cpp"
This is the same in VS Code, as you can add -I/folder/other/headers
in the Args
section in tasks.json
of your project.
The #include
order of headers
The best practice is to include headers in the following order:
- The paired header file
- Other headers from your project
- 3rd party library headers
- Standard library headers
The headers for each group should be sorted alphabetically unless documentation for a 3rd party library instructs you to do otherwise.
Header file best practices (summary)
- Always include header guards
- Do not define variables or functions inside header files – only declaration
- Give a header file the same name as the paired source file
- Each header file should have a specific job and be as independent as possible
- A header file should
#include
all other headers as long as needed so that it can function independently - Only
#include
headers that we actually need - Do not
#include
.cpp files - Prefer putting documentation on
- what something does or how to use it in the header as it’s more likely to be seen there
- how something actually works in the source files
Header guards
The following project has header guards to prevent including duplicate definitions
square.h:
#ifndef SQUARE_H
#define SQUARE_H
int getSquareSides() {
return 4;
}
#endif
wave.h:
#ifndef WAVE_H
#define WAVE_H
#include "square.h"
#endif
main.cpp:
#include "square.h"
#include "wave.h"
int main() {
return 0;
}
However, if we add a source file square.cpp
as follows, then the fact that square.h
is included twice causes the problem of getSquareSides
being defined once in square.cpp
and main.cpp
, causing a linker error:
square.cpp:
#include "square.h"
int getSquarePerimeter(int sideLength) {
return sideLength * getSquareSides();
}
In order to solve this issue, we can put all definitions into source.cpp
and leave only function declarations in square.h
:
square.h:
#ifndef SQUARE_H
#define SQUARE_H
int getSquareSides();
int getSquarePerimeter();
#endif
square.cpp:
#include "square.h"
int getSquareSides() {
return 4;
}
int getSquarePerimeter(int sideLength) {
return sideLength * getSquareSides();
}
How about #pragma once
In modern compilers there is a simpler (but not as safe) way to do header guard:
#pragma once
// define whatever
Note: if a header file is copied so that it exists in multiple places on the file system, if somehow both copies of the header get included, header guards will successfully de-dupe the identical headers, but #pragma once
won’t because the compiler won’t realize they are actually identical content.
Debugging C++ Programs
Debugging tactics
Several tips on debugging a C++ program:
- Conditionalizing the code using preprocessor directives e.g.
ifdef
,endif
anddefine
- Using a logger e.g.
plog
,glog
orspdlog
- Using an integrated debugger
- Step into: run line by line inside function
- Step over: run and skip the whole function
- Step out: when you accidentally step into a function, use this to step out and pause on the next line of the function
- Start: starting from the beginning
- Continue: continue until the end
- Breakpoints: tell debugger to stop no matter what
- Variable watchers: monitoring the values of variables
Fundamental Data Types
Bits, bytes and memory addressing
The smallest unit of memory is a binary bit (aka a bit) which can hold 0 or 1. A modern de-facto standard byte is 8 bits.
Fundamental data types
Here is a list of fundamental data types in C++
Category | Data Type (Minimum Size in Bytes) | Meaning | Example |
---|---|---|---|
Floating Point | float (4)double (8)long double (8) |
a number with a fractional part | 3.14159 |
Integral (Boolean) | bool (1) |
true or false | true |
Integral (Character) | char (1)wchar_t (1)char8_t (1, C++20)char16_t (2, C++11)char32_t (4, C++11) |
a single character of text | 'c' |
Integral (Integer) | short int (2)int (2)long int (4)long long int (8, C++11) |
whole numbers including zero | 64 |
Null Pointer | std::nullptr_t (4, C++11) |
a null pointer | nullptr |
Void | void |
no type | n/a |
The void
type
There are three use cases of the void
type:
-
Functions that do not return anything
void writeSomething(int x) { std::cout << x << '\n'; }
-
Functions that take no parameters (depreciated)
int getValue(void) { return 0; }
This is compilable in C++ (for backwards compatibility mostly) but not recommended. You can just remove the
void
in the brackets. -
Void pointers (advanced, covered later)
The sizeof
function
See above table for the sizes in bytes of different fundamental data types. Notice that using of sizeof
on incomplete type e.g. void
will result in a compilation error. Also, sizeof
don’t take dynamically allocated memory used by an object into consideration, about which we need to have further discussion.
On a side note, how fast a type is in C++ doesn’t really depend on how large the memory it uses. Instead of “smaller is faster”, CPUs are actually optimized w.r.t. the corresponding specs and thus seeing 32-bit int
being faster than 16-bit short
or an 8-bit char
is totally possible on a 32-bit CPU.
Signed integer ranges
Again, in C++ only the minimum sizes of fundamental data types are specified, int
and short
starts with 2 bytes, long
4 bytes and long long
8 bytes. This means the actual size of the integers can vary based on implementation. The corresponding ranges of these types are listed below:
Size / Type | Range |
---|---|
8 bit / ? | -128 to 127 |
16 bit / short , int |
-32,768 to 32,769 |
32 bit / long |
-2,147,483,648 to 2,147,483,647 |
64 bit / long long |
-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
Assigning a value beyond the defined range will result in undefined behavior, which is called integer overflow.
Unsigned integer ranges
The corresponding range table for unsigned integer types are:
Size / Type | Range |
---|---|
8 bit / ? | 0 to 255 |
16 bit / unsigned short , unsigned int |
0 to 65,535 |
32 bit / unsigned long |
0 to 4,294,967,295 |
64 bit / unsigned long long |
0 to 18,446,744,073,709,551,615 |
Unsigned integer overflow: if an unsigned value is out of range, is it divided by one greater than the largest number of the type, and only the remainder is kept. For example, the number 280 is too big for 1-byte integer, and thus 280 % 256 = 24 is set to the variable in the end.
Fixed-width integers, fast and least integers
Different from the native C/C++ integer types which have variable sizes (only minimum sizes were specified in above table), starting from C99 we have fixed-width integers in the stdint.h
(later in C++ becoming cstdint.h
) header:
#include <cstdint>
#include <iostream>
int main() {
std::int16_t i{5};
std::cout << i << '\n';
return 0;
}
However, the fixed-width integers are not guaranteed to be defined on all architectures, nor is it faster than the native C/C++ integer types. This introduces the fast and least integer types:
#include <cstdint>
#include <iostream>
int main() {
std::cout << "least 8: " << sizeof(std::int_least8_t) * 8 << "bits\n";
std::cout << "least 16: " << sizeof(std::int_least16_t) * 8 << "bits\n";
std::cout << "least 32: " << sizeof(std::int_least32_t) * 8 << "bits\n";
std::cout << "fast 8: " << sizeof(std::int_fast8_t) * 8 << "bits\n";
std::cout << "fast 16: " << sizeof(std::int_fast16_t) * 8 << "bits\n";
std::cout << "fast 32: " << sizeof(std::int_fast32_t) * 8 << "bits\n";
return 0;
}
where the least integer types guarantee that the integer holds at least given number of bits of width, and the fast integer types guarantee that the given integer type is the fastest with a width of at least given number of bits.
The downsides of these fast and least integers are that nobody really uses them and they might be too dynamic to make programs stable across architectures.
Best practice for integral types
Best practice:
- Prefer
int
when the size of the integer doesn’t matter - Prefer
std::int#_t
when storing a quantity that needs a guaranteed range - Prefer
std::uint#_t
when doing bit manipulation or where well-defined wrap-around behavior is required
Avoid the following if possible:
short
andlong
integers - use a fixed-width instead- Unsigned types for holding quantities
- The 8-bit fixed-width integer types
- The fast and least fixed-width types
- Any compiler-specific fixed-width integers e.g.
__int8
and__int16
in Visual Studio
What is std::size_t
The size of std::size_t
is the upper limit of any object’s size in the system.
Floating point numbers
Category | Data Type (Minimum Size in Bytes) | Typical Size in Bytes |
---|---|---|
floating point | float (4) | 4 |
double (8) | 8 | |
long double (8) | 8, 12 or 16 |
Remember to match the type of literal in initialization to the type of the variable, e.g.
int x{5};
double y{5.0};
float z{5.0F}; // notice the suffix `f`
When we print the floating point numbers, there are a few interesting behaviors that worth attention:
#include <iostream>
int main()
{
std::cout << 5.0 << '\n';
std::cout << 6.7F << '\n';
std::cout << 9876543.21 << '\n';
return 0;
}
The output of above program is
5
6.7
9.87654e+06
because
std::cout
by default doesn’t print the fractional part of a number if the fractional part is 0- the number
6.7F
was printed as expected because it’s defined as a short float that defaults to be printed full - the scientific notation is used by default with 6 significant digits only
We can override the default precision that std::cout
shows by default using an output manipulator function named std::setprecision()
:
#include <iomanip> // for output manipulator std::setprecision()
#include <iostream>
int main()
{
std::cout << std::setprecision(17); // show 17 digits of precision
std::cout << 3.33333333333333333333333333333333333333f <<'\n'; // f suffix means float
std::cout << 3.33333333333333333333333333333333333333 << '\n'; // no suffix means double
return 0;
}
which will now print
3.3333332538604736
3.3333333333333335
However, there’s a concept called rounding error that makes precision handling in floats a headache. See this example
#include <iomanip> // for std::setprecision()
#include <iostream>
int main()
{
float f { 123456789.0f }; // f has 10 significant digits
std::cout << std::setprecision(9); // to show 9 digits in f
std::cout << f << '\n';
return 0;
}
The output of the above program is 123456792
which is a totally different number, and that’s because the original number 123456789.0
has 10 significant digits. In order to show the original number with 9 significant digits only, the number cannot be stored exactly/precisely. A corollary of this is to be wary of using floating point numbers for financial or currency data.
The official ranges of floating point numbers are defined as below
Data Type Size in Bytes | Range | Precision |
---|---|---|
4 | ±1.18 x 10-38 to ±3.4 x 1038 and 0.0 | 6-9 significant digits, typically 7 |
8 | ±2.23 x 10-308 to ±1.80 x 10308 and 0.0 | 15-18 significant digits, typically 16 |
80-bits (typically uses 12 or 16 bytes) | ±3.36 x 10-4932 to ±1.18 x 104932 and 0.0 | 18-21 significant digits |
16 | ±3.36 x 10-4932 to ±1.18 x 104932 and 0.0 | 33-36 significant digits |
NaN and infinity
There are two special categories of floating point numbers, NaN (not a number) and infinity.
#include <iostream>
int main()
{
double zero {0.0};
double posinf { 5.0 / zero }; // positive infinity
std::cout << posinf << '\n';
double neginf { -5.0 / zero }; // negative infinity
std::cout << neginf << '\n';
double nan { zero / zero }; // not a number (mathematically invalid)
std::cout << nan << '\n';
return 0;
}
The output of above program is
inf
-inf
nan
Best practice is to avoid using NaN or infinity at all.
Boolean values
bool b;
bool b1{true};
bool b2{false};
b1 = false;
bool b3{}; // default is false <- 0
bool b4{!true}; // initialized to false
bool b5{!false}; // initialized to true
Notice that printing boolean values is the same as printing the integer representation of the variables, i.e. only 0
and 1
will be printed. If you want std::cout
to print actual true
or false
, you need to use another manipulator:
#include <iostream>
int main() {
std::cout << true << '\n';
std::cout << false << '\n';
std::cout << std::boolalpha; // from now on, true/false will be printed
std::cout << true << '\n';
std::cout << false << '\n';
std::cout << std::noboolalpha; // from now on, 1/0 will be printed
std::cout << true << '\n';
std::cout << false << '\n';
}
The conversion between a boolean and an integer variable is possible:
#include <iostream>
int main() {
bool bFalse{0}; // ok
bool bTrue{1}; // ok
bool bNo{2}; // error (narrowing conversion is disallowed)
bool b1 = 2; // ok (copy initialization allows implicit conversion)
}
Inputting boolean values can be a bit unintuitive. See example below:
#include <iostream>
int main() {
bool b{};
std::cout << "Enter a boolean value: ";
std::cin >> b;
std::cout << "You entered " << b << '\n';
return 0;
}
The output (if you typed true
) is
Enter a Boolean value: true
You entered: 0
because std::cin
only accepts either 0
or 1
for boolean variables, and thus the input true
caused a silent fail in std::cin
and thus the variable b
never got assigned otherwise than its default value false
. To allow std::cin
to accept true
and false
, we need to again turn on the alphabetical mode of boolean variables by
#include <iostream>
int main() {
bool b{};
std::cout << "Enter a boolean value: ";
std::cin >> std::boolalpha;
std::cin >> b;
std::cout << "You entered " << b << '\n';
return 0;
}
Notice when std::boolalpha
is enabled for std::cin
, 0
and 1
will no longer work.
Chars
Some information worth noting:
- ASCII 0-31 are unprintable chars, 32 is
0
, 65 isA
and 97 isa
. - Multi-character input is allowed but only one character a time can be read into a
char
variable usingstd::cin
. That means the subsequentstd::cin
can still read from the remaining input from the buffer. - Escape sequences are listed below
\a
: makes a beep alert\b
: moves the cursor back one space\f
: moves the cursor to the next logical page\n
: moves the cursor to next line\r
: moves the cursor to beginning of line\t
: prints a horizontal tab\v
: prints a vertical tab\'
: prints a single quote\"
: prints a double quote\\
: prints a backslash\?
: prints a question mark (no longer relevant, as you can now always use question marks)\(number)
: prints a char represented by an octal number\x(number)
: prints a char represented by a hex number
Implicit type conversion vs explicit static_cast
Implicit type conversion happens in the below program:
#include <iostream>
void print(int x) {
std::cout << x << '\n';
}
int main() {
print(5.5); // warning: we're passing in a double value
return 0;
}
The output is 5
in this case, as an integer cannot hold fractional part. Brace initialization, on the other hand, doesn’t allow implicit conversion:
int main() {
double d { 5 }; // okay: int to double is safe
int x { 5.5 }; // error: double to int not safe
return 0;
}
Explicit type conversion, in the same time, is supported via the static_cast
operator:
#include <iostream>
void print(int x) {
std::cout << x << '\n';
}
int main() {
print( static_cast<int>(5.5) ); // explicitly convert double value 5.5 to an int
return 0;
}
and the output is still 5
here as expected. It’s worth noting that by converting char
into int
we can effectively get the the ASCII code of a character:
#include <iostream>
int main() {
char ch{97}; // 97 is ASCII code for 'a'
std::cout << ch << " has value " << static_cast<int>(ch) << '\n'; // print value of variable ch as an int
return 0;
}
where the output is
a has value 97
Meanwhile, by using static_cast
to convert unsigned integer to signed integer variable, the compiler will produce undefined behavior and is definitely not recommended.
Constants and Strings
Two types of constants
C++ supports two different types of constants:
- Named constants are constant values that are associated with an identifier, aka symbolic constants
- Literal constants are constant values that are not associated with an identifier
Three types of named constants
There are three ways to define a named constant in C++:
- Constant variables (most common)
- Object-like macros with substitution text
- Enumerated constants (covered later)
In the first way, namely constant variables, please do remember that constant variables must be initialized. Parameters of a function can also be declared constant, but only necessary when passed by reference or pointer/address. When passed by value, a parameter doesn’t need to be declared constant to avoid the risk of being changed. Function return value can also be declared constant, but only when it’s not a fundamental type. For fundamental type return values, the const
qualifier is ignored.
In the second way, we can define a constant via macro e.g.
#include <iostream>
#define MY_NAME "Allen"
int main() {
std::cout << "My names is " << MY_NAME << '\n';
return 0;
}
There are, however, at least three major reasons why people prefer the first method over using preprocessor macros:
- Macros don’t follow normal C++ scoping rules - once defined they’re there forever and cannot be replaced
- It’s harder to debug code with macros
- Macro substitution behaves differently than everything else in C++ and thus may cause inadvertent mistakes easily
Type qualifiers
We have known that const
is a type qualifier. As a matter of fact, in C++ there are only two type qualifiers as of C++23, namely const
and volatile
. Latter is rarely used and is there to tell the compiler that an object may change its value at any time. This disables certain types of optimizations, in exchange.
The as-if rule
In C++, compilers are given a lot of leeway to optimize programs. The as-if rule says that the compiler can modify a program however it likes in order to optimize the performance. The only exception is that unnecessary calls to a copy constructor can be elided even if those copy constructors do have observable behavior.
For example, the following program can be optimized step by step:
Version 0 (original program):
#include <iostream>
int main() {
int x { 3 + 4 };
std::cout << x << '\n';
return 0;
}
Version 1:
#include <iostream>
int main() {
int x { 7 };
std::cout << x << '\n';
return 0;
}
Version 2:
#include <iostream>
int main() {
std::cout << 7 << '\n';
return 0;
}
Compile-time and runtime constants
Constants that are fixed at compile-time are called compile-time constants (like above). Constant variables that are only initialized at runtime are called runtime constants (see below).
#include <iostream>
int get_number() {
std::cout << "Enter a number: ";
int y{};
std::cin >> y;
return y;
}
int main() {
const int x{3}; // x is compile-time constant
const int y{get_number()}; // y is runtime constant
const int z{x + y}; // z is runtime constant
return 0;
}
Compile-time constants help optimization, but runtime constants are there just to keep the objects’ values are not changed.
The constexpr
keyword
It’s sometimes hard to tell whether a variable will end up being compile-time constant or runtime constant, for example:
int x { 5 }; // not const at all
const int y { x }; // obviously a runtime const (since initializer is non-const)
const int z { 5 }; // obviously a compile-time const (since initializer is a constant expression)
const int w { getValue() }; // not obvious whether this is a runtime or compile-time const
Notice that, depending on how the function getValue
is defined, w
actually can be compile-time constant here - we have no idea, Fortunately, we can enlist the compiler’s help to ensure we get a compile-time constant when we desire so, simply by using the constexpr
instead of const
. When a variable is not compile-time constant yet we insist to use constexpr
, there will be a compile error. As a result, the best practice is to use constexpr
for any variable that should not be modifiable after initialization and whose initializer is known at compile-time.
There are, however, some types that are currently not compatible with constexpr
keyword, including std::string
, std::vector
and other types that use dynamic memory allocation. For those objects we need to use const
instead. Also, function parameters cannot be declared as constexpr
as constexpr
objects must be initialized with a compile-time constant.
Constant folding
Example 1:
#include <iostream>
int main() {
constexpr int x { 3 + 4 }; // 3 + 4 is a constant expression
std::cout << x << '\n'; // this is a runtime expression
return 0;
}
Example 2:
#include <iostream>
int main() {
std::cout << 3 + 4 << '\n'; // this is a runtime expression
return 0;
}
In example 1 above, the variable x
is a constant variable with its value initialized at compile-time, and as a result, the program will be optimized and consider x
a compile-time constant. In example 2, 3 + 4
is not a constant expression, but because of an optimization process in C++ called constant folding, it’s still optimized at compile-time.
Literal suffixes
Data type | Suffix | Meaning |
---|---|---|
integral | U |
unsigned int |
L |
long | |
UL |
unsigned long | |
LL |
long long | |
ULL |
unsigned long long | |
Z |
the signed version of std::size_t (C++23) |
|
UZ |
std::size_t (C++23) |
|
floating point | F |
float |
L |
long double | |
string | S |
std::string |
SV |
std::string_view |
You may find lower-case suffixes accepted as well but they’re not recommended as lower-case l
looks similar 1
. Another thing to notice is that string literals in C++ are essentially C-style arrays of char
with an extra null terminator. These C-style string literals are objects that are created at the start of the program and guaranteed to exist for the entirety of the program. In contrast, std::string
and std::string_view
create temporary objects that have to be used immediately, as they’re destroyed at the end of the full expression in which they’re created.
Magic numbers
Magic numbers are not recommended usually.
Numeral systems: decimal, binary, hexadecimal and octal
By default C++ integers are decimal numbers. Including decimals, there are 4 main numeral systems available in C++, which are decimal (base 10), binary (base 2), hexadecimal (base 16) and octal (base 8).
Decimal | Binary | Hexadecimal | Octal |
---|---|---|---|
0 | 0 | 0 | 0 |
1 | 1 | 1 | 1 |
2 | 10 | 2 | 2 |
3 | 11 | 3 | 3 |
4 | 100 | 4 | 4 |
5 | 101 | 5 | 5 |
6 | 110 | 6 | 6 |
7 | 111 | 7 | 7 |
8 | 1000 | 8 | 10 |
9 | 1001 | 9 | 11 |
10 | 1010 | A | 12 |
11 | 1011 | B | 13 |
12 | 1100 | C | 14 |
13 | 1101 | D | 15 |
14 | 1110 | E | 16 |
15 | 1111 | F | 17 |
16 | 10000 | 10 | 20 |
… | … | … | … |
To use an octal literal, prefix the literal number with a 0
(zero):
#include <iostream>
int main() {
int x{012}; // octal 12
std::cout << x << '\n';
return 0;
}
and the output would be 10
, as numbers are output in decimal by default.
To use a hexadecimal literal, prefix your literal number with a 0x
(zero x):
#include <iostream>
int main() {
int x{0xF}; // hex F
std::cout << x << '\n';
return 0;
}
and the output would be 15
. One digit of hexadecimal numbers represent 16=2^4 possible values, and thus two makes a full byte. A 32-bit integer, as a result, can be represented by eight hexadecimal digits concisely.
To use a binary literal, prefix your literal number with a 0b
(zero b):
#include <iostream>
int main() {
int x1{0b1}; // binary 1
int x2{0b11110001}; // binary 11110001
int x3{0b1111'0001}; // binary 11110001 with a quotation mark as visual separator
std::cout << x1 << ' ' << x2 << ' ' << x3 << '\n';
return 0;
}
and the output is 1 241 241
. Notice how we used a single quotation mark as separator - this is also supported for other numeral systems e.g. a long decimal integer.
In order to print the values in other numeral systems than the current one, we can use std::dec
, std::oct
, std::hex
manipulators:
#include <iostream>
int main() {
int x{12};
std::cout << x << '\n'; // default is dec
std::cout << std::hex << x << '\n'; // now hex
std::cout << x << '\n'; // still hex
std::cout << std::oct << x << '\n'; // now oct
std::cout << x << '\n'; // still oct
std::cout << std::dec << x << '\n'; // now back to dec
std::cout << x << '\n'; // still dec
return 0;
}
As you may have guessed, outputting values in binary is a little harder than the rest three. We need to use a type called std::bitset
in C++ STL (specifically, the <bitset>
header) that gets the job done:
#include <bitset>
#include <iostream>
int main() {
std::bitset<8> x1{ob1100'0101}; // an 8-digit binary number
std::bitset<8> x2{oxC5}; // also 8-bit
std::cout << x1 << ' ' << x2 << '\n'; // printing the variables
std::cout << std::bitset<4>{0b1010} << '\n'; // printing a literal
return 0;
}
There is an even better solution in C++20 and C++23, where we can use the <format>
header and <print>
header correspondingly:
#include <format> // c++20
#include <iostream>
#include <print> // c++23
int main() {
std::cout << std::format("{:b}\n", 0b1010); // c++20
std::cout << std::format("{:#b}\n", 0b1010); // c++20
std::print("{:b} {:#b}", 0b1010, 0b1010); // c++23
return 0;
}
Conditional statement
The conditional statement is defined in the form condition ? statement1 : statement2
. Remember to parenthesize the entire conditional statement to avoid unexpected behavior. Also, remember that statement1 and statement2 must match in types, namely the following won’t compile:
#include <iostream>
int main() {
constexpr int x{5};
std::cout << (x != 5 ? x : "x is 5") << '\n';
return 0;
}
Inline functions and variables
Every time a function is called, there is a certain amount of performance overhead that occurs, specifically:
- The CPU must store the address of the current instruction it is executing (so that it knows where to return)
- The parameters must be instantiated and then initialized
- The execution path must then jump to the code in the function body
- Finally, the program need to jump back to the location of the function call and copy the return value to the main scope
This overhead is significant for small function with frequent calls. Fortunately, in C++ the compiler has a trick that can avoid such overhead: inline expansion. This is a process where a function call is replaced by the code from the called function’s definition. Inline expansion, however, doesn’t guarantee performance improvements as there’s a tradeoff between removal of the functional overhead vs the cost of a larger executable.
There used to be an inline
keyword that suggests a function is inline for compilers. However, the keyword is no longer used and in fact:
- Using
inline
to request inline expansion might harm performance - The
inline
keyword is now just a hint and the compilers simply ignore them now - they’ll decide whether to inline expand a function no matter if it has aninline
keyword or not - The
inline
keyword is defined at the wrong level of granularity, as the keyword goes with the function definition but the expansion actually happens only per function call
Inline variables, on the other hand, is a modern design of C++ that allows multiple definitions of the same variable identifier across multiple header files. The following, particularly, are implicitly inline:
- Functions defined inside a class, struct or union type definition (covered later)
constexpr
/consteval
functions (covered later)- Functions implicitly instantiated from function templates (covered later)
constexpr
static variables (but notconstexpr
non-static variables)
constexpr
and consteval
Remember we can use constexpr
variable to optimize programs for compile-time evaluation:
#include <iostream>
int main() {
constexpr int x{5};
constexpr int y{6};
std::cout << (x > y ? x : y) << " is greater!\n";
return 0;
}
However, if we replace the conditional statement by a function, the same program won’t enjoy the compile-time optimization and that means tradeoff of performance for modularity benefits. This isn’t ideal. We can instead declare functions as constexpr
so that they can be evaluated at compile-time:
#include <iostream>
constexpr int greater(int x, int y) {
return (x > y ? x : y);
}
int main() {
constexpr int x{5};
constexpr int y{6};
constexpr int g{greater(x, y)}; // evaluated at compile-time
std::cout << g << " is greater!\n";
return 0;
}
It is worth noting that, despite the example above, constexpr
doesn’t guarantee the compile-time evaluation, but instead just make the function eligible for compile-time evaluation. In the following example, because x
and y
are not constexpr
, the function is still evaluated at runtime:
#include <iostream>
constexpr int greater(int x, int y) {
return (x > y ? x : y);
}
int main() {
int x{5};
int y{6};
std::cout << greater(x, y) << " is greater!\n";
return 0;
}
Prior to C++20, there were no standard language tools available to test whether a function call is evaluating at compile-time or runtime. In C++20, std::is_constant_evaluated
function defined in <type_traits>
header gives a boolean indication whether the function call is executing in a constant context:
#include <type_traits>
constexpr int greater(int x, int y) {
if (std::is_constant_evaluated()) {
std::cout << "greater(" << x << ", " << y << ") is being"
<< " evaluated at compile-time!\n";
} else {
std::cout << "greater(" << x << ", " << y << ") is being"
<< " evaluated at runtime!\n";
}
return (x > y ? x : y);
}
In addition to testing, we can also force a function call to be evaluated at compile-time by ensuring the return value is used where a constant expression is required. This, however, need to be done on a per-call basis and thus can be seen tedious. In C++20, there is a better workaround to enforce this - consteval
, which is used to indicated that a function must be evaluated at compile-time, otherwise a compile error will result. Such functions are called immediate functions. For example:
#include <iostream>
consteval int greater(int x, int y) { // consteval! not constexpr
return (x > y ? x : y);
}
int main() {
constexpr int g {greater(5, 6)}; // evaluated at compile-time because return goes into constexpr
std::cout << g << '\n';
std::cout << greater(5, 6) << '\n'; // still at compile-time, guaranteed!
int x{5};
std::cout << greater(x, 6) << '\n'; // compile error!
return 0;
}
The downside of consteval
keyword is that it forces the function to be evaluated at compile-time, which in other words means the function can no longer be evaluated at runtime even if we want to. To solve this problem, we can define a helper function using abbreviated function template in C++20 and auto
type (no need to fully understand at this stage)
#include <iostream>
consteval auto eval_at_compile_time(auto value) { // consteval
return value;
}
constexpr int greater(int x, int y) { // constexpr
return (x > y ? x : y);
}
int main() {
std::cout << greater(5, 6) << '\n'; // runtime / compile-time -> we don't have full control
std::cout << eval_at_compile_time(greater(5, 6)) << '\n'; // guaranteed compile-time
int x{5};
std::cout << greater(x, 6) << '\n'; // runtime, no error!
return 0;
}
Input with std::string
#include <iostream>
#include <string>
int main() {
std::cout << "Enter your full name: ";
std::string name{};
std::cin >> name; // this won't work as expected since std::cin breaks on whitespace
std::cout << "Enter your favorite color: ";
std::string color{};
std::cin >> color;
std::cout << "Your name is " << name << " and your favorite color is " << color << '\n';
return 0;
}
The result from a sample run of the above program gives
Enter your full name: John Doe
Enter your favorite color: Your name is John and your favorite color is Doe
This is because std::cin
breaks on whitespaces and thus John
and Doe
are separately passed to name
and color
strings. To remedy this issue, we need to use std::getline
function:
#include <iostream>
#include <string>
int main() {
std::cout << "Enter your full name: ";
std::string name{};
std::getline(std::cin >> std::ws, name); // read a full line
std::cout << "Enter your favorite color: ";
std::string color{};
std::getline(std::cin >> std::ws, color); // read a full line
std::cout << "Your name is " << name << " and your favorite color is " << color << '\n';
return 0;
}
and this time the output is
Enter your full name: John Doe
Enter your favorite color: blue
Your name is John Doe and your favorite color is blue
What is std::ws
? It’s an input manipulator that tells std::cin
to ignore any leading whitespace before extraction. Noticing that whitespace characters include spaces, tabs and newlines, we know the following program won’t work as expected:
#include <iostream>
#include <string>
int main() {
std::cout << "Pick 1 or 2: ";
int choice{};
std::cin >> choice;
std::cout << "Now enter your name: ";
std::string name{};
std::getline(std::cin, name); // note: no std::ws here
std::cout << "Hello, " << name << ", you picked " << choice << '\n';
return 0;
}
Pick 1 or 2: 2
Now enter your name: Hello, , you picked 2
as we entered only 2\n
, where 2
was passed to choice
and \n
passed to the std::getline
function. The function notices that there’s nothing before end of the line and thus passed empty string to name
. The best practice is, therefore, to keep using std::ws
for every std::getline
.
The length of strings
To get the length of a std::string
variable, we can
#include <iostream>
#include <string>
int main() {
std::string name{ "Alex" };
std::cout << name << " has " << name.length() << " characters\n";
return 0;
}
Notice the return type is not a regular integer but unsigned integral of type size_t
. If you want to use this integer for following computation, it’s best to convert the type right away:
int length { static_cast<int>(name.length()) };
In C++20, there’s an easier way to get size of strings in signed integral type:
#include <iostream>
#include <string>
int main() {
std::string name{ "Alex" };
std::cout << name << " has " << std::ssize(name) << " characters\n";
return 0;
}
Initializing a std::string
is expensive
…and thus don’t randomly initialize duplicate string variables, or pass them as values into functions. However, it’s okay to return a string by value when the expression of the return statement resolves to any of the following
- A local variable of type
std::string
- A
std::string
that has been returned by value from a function call or operator - A
std::string
that is created as part of the return statement
Because of scope and memory concerns. std::string
may also be returned by (const) reference, which will be covered later.
To solve this problem, we have std::string_view
(C++17). Instead of quickly copying e.g. a C-string to a new std::string
and destroy it right away, we can create a readonly access to the original C-string and do whatever we want (as long as it’s readonly, like printing)
#include <iostream>
#include <string_view>
void print_string(std::string_view str) {
std::cout << str << '\n';
}
int main() {
std::string_view str{"Hello world"};
print_string(str);
return 0;
}
Note that implicit conversion from std::string_view
to a std::string
is not allowed and will give a compile error. However, explicitly initializing a std::string
with a std::string_view
is possible.
#include <iostream>
#include <string>
#include <string_view>
void print_string(std::string str) {
std::cout << str << '\n';
}
int main() {
std::string_view str{"Hello world"};
print_string(str); // will throw a compile error for implicit conversion
std::string str2{str}; // okay: explicit initialization
print_string(str2); // okay
print_string(static_cast<std::string>(str)); // okay: explicit conversion
return 0;
}
Assignment, regardless whatever variable a std::string_view
was originally viewing, makes it view another variable and that’s all. It doesn’t change anything and the content being viewed remains readonly.
Also, unlike std::string
, std::string_view
supports constexpr
:
#include <iostream>
#include <string_view>
int main() {
constexpr std::string_view s{ "Hello, world!" }; // s is a string symbolic constant
std::cout << s << '\n'; // s will be replaced with "Hello, world!" at compile-time
return 0;
}
That being said, it’s best to just use std::string_view
as a readonly function parameter instead of general variable in most cases. Specifically, when the underlying object is destroyed, the viewer gives undefined behavior (so don’t return a std::string_view
from function in most cases); when the underlying object is modified, the viewer is invalidated and again, may give undefined behavior (every time the underlying object is modified, you need to revalidate the viewer with an assignment).
Additionally, std::string_view
can return substrings by .remove_prefix(#)
and .remove_suffix(#)
with #
being the number of characters to remove from the view.
Operators
There are two important properties of operators:
- Precedence: how much priority an operator is given
- Associativity: given a chain of operators under the same priority, how should we evaluate them, left-to-right or right-to-left?
The following is an exhaustive list of operators with their precedence and associativity:
Prec/Asso | Operator | Description | Pattern |
---|---|---|---|
1 / LR | :: |
Global scope (unary) | ::name |
:: |
Namespace scope (binary) | class_name::member_name |
|
2 / LR | () |
Parentheses | (expression) |
() |
Function call | function_name(parameters) |
|
() |
Initialization | type_name(expression) |
|
{} |
List initialization (C++11) | type_name{expression} |
|
type() |
Functional cast | new_type(expression) |
|
type{} |
Functional cast (C++11) | new_type{expression} |
|
[] |
Array subscript | pointer[expression] |
|
. |
Member access from object | object.member_name |
|
-> |
Member access from object pointer | object_pointer->member_name |
|
++ |
Post-increment | lvalue++ |
|
-- |
Post-decrement | lvalue-- |
|
typeid |
Run-time type information | typeid(type) or typeid(expression) |
|
const_cast |
Cast away const | const_cast<type>(expression) |
|
dynamic_cast |
Run-time type-checked Cast | dynamic_cast<type>(expression) |
|
reinterpret_cast |
Cast on type to another | reinterpret_cast<type>(expression) |
|
static_cast |
Compile-time type-checked cast | static_cast<type>(expression) |
|
sizeof... |
Get parameter pack size | sizeof...(expression) |
|
noexcept |
Compile-time exception check | noexcept(expression) |
|
alignof |
Get type alignment | alignof(type) |
|
3 / RL | + |
Unary plus | +expression |
- |
Unary minus | -expression |
|
++ |
Pre-increment | ++lvalue |
|
-- |
Pre-decrement | --lvalue |
|
! |
Logical NOT | !expression |
|
not |
Logical NOT | not expression |
|
~ |
Bitwise NOT | ~expression |
|
(type) |
C-style case | (new_type)expression |
|
sizeof |
Size in bytes | sizeof(type) or sizeof(expression) |
|
co_await |
Await asynchronous call (C++20) | co_await expression |
|
& |
Address of | &lvalue |
|
* |
Dereference | *expression |
|
new |
Dynamic memory allocation | new type_name |
|
new[] |
Dynamic array allocation | new type_name[expression] |
|
delete |
Dynamic memory deletion | delete pointer |
|
delete[] |
Dynamic array deletion | delete[] pointer |
|
4 / LR | ->* |
Member pointer selector | object_pointer->*pointer_to_member |
.* |
Member object selector | object.*pointer_to_member |
|
5 / LR | * |
Multiplication | expression * expression |
/ |
Division | expression / expression |
|
% |
Remainder | expression % expression |
|
6 / LR | + |
Addition | expression + expression |
- |
Subtraction | expression - expression |
|
7 / LR | << |
Bitwise shift left / insertion | expression << expression |
>> |
Bitwise shift right / extraction | expression >> expression |
|
8 / LR | <=> |
Three-way comparison (C++20) | expression <=> expression |
9 / LR | < |
Comparison less than | expression < expression |
<= |
Comparison less than or equals | expression <= expression |
|
> |
Comparison greater than | expression > expression |
|
>= |
Comparison greater than or equals | expression >= expression |
|
10 / LR | == |
Equality | expression == expression |
!= |
Inequality | expression != expression |
|
11 / LR | & |
Bitwise AND | expression & expression |
12 / LR | ^ |
Bitwise XOR | expression ^ expression |
13 / LR | | |
Bitwise OR | expression | expression |
14 / LR | && |
Logical AND | expression && expression |
and |
Logical AND | expression and expression |
|
15 / LR | || |
Logical OR | expression || expression |
or |
Logical OR | expression or expression |
|
16 / RL | throw |
Throw expression | throw expression |
co_yield |
Yield expression (C++20) | co_yield expression |
|
?: |
Conditional | expression ? expression : expression |
|
= |
Assignment | lvalue = expression |
|
*= |
Multiplication assignment | lvalue *= expression |
|
/= |
Division assignment | lvalue /= expression |
|
%= |
Remainder assignment | lvalue %= expression |
|
+= |
Addition assignment | lvalue += expression |
|
-= |
Subtraction assignment | lvalue -= expression |
|
<<= |
Bitwise shift left assignment | lvalue <<= expression |
|
>>= |
Bitwise shift right assignment | lvalue >>= expression |
|
&= |
Bitwise AND assignment | lvalue &= expression |
|
|= |
Bitwise OR assignment | lvalue |= expression |
|
^= |
Bitwise XOR assignment | lvalue ^= expression |
|
17 / LR | , |
Comma operator | expression, expression |
Remainder and exponent operators
The remainder operator in C++ is operator%
which takes the sign of the first operand. The exponent operator is provided by the <cmath>
header.
#include <iostream>
#include <cmath>
int main() {
std::cout << 21 % 4 << '\n';
std::cout << -21 % 4 << '\n'; // the remainder's sign follows that of the first operand
std::cout << std::pow(3, 4) << '\n';
return 0;
}
In order to avoid overflow, we sometimes can manually check the limit of the integral types:
#include <cassert> // for assert
#include <cstdint> // for std::int64_t
#include <limits> // for std::numeric_limits
#include <iostream>
int main() {
std::int64_t max_int64 = std::numeric_limits<std::int64_t>::max();
std::int64_t min_int64 = std::numeric_limits<std::int64_t>::min();
assert((10 > 9) && "assert message"); // just showing how assert() works
std::cout << max_int64 << ' ' << min_int64 << '\n';
return 0;
}
Increment/decrement operators
There are 4 increment/decrement operators
Operator | Form | operation |
---|---|---|
Prefix increment | ++x |
Increment x , then return x |
Prefix decrement | --x |
Decrement x , then return x |
Postfix increment | x++ |
Copy x , then increment x , then return the copy |
Postfix decrement | x-- |
Copy x , then decrement x , then return the copy |
Best practice is to use the prefix versions most of the time as they are more performant (no copy) and less surprising (thus to cause bugs). For certain cases, avoiding increment/decrement operators is recommended even:
int x{1};
x + ++x
The above code evaluates as 2 + 2
in Visual Studio and GCC, but 1 + 2
in Clang. To avoid confusion and potential bugs, we should altogether just avoid such coding.
The comma operator
The comma operator works very differently as in Python, thus we’re paying extra attention to it. The formal definition of operation x,y
is “evaluate x
and then y
, then return the value of y
”. For example, the following code will print 3
:
#include <iostream>
int main() {
int x{1};
int y{2};
std::cout << (++x, ++y) << '\n';
return 0;
}
Even worse, check these two lines:
z = (a, b);
z = a, b;
The first line is “evaluate a
first, then evaluate b
, and assign b
to z
”. The second line is “assign a
to z
, then evaluate b
and discard it right away”.
As a result, the best practice is DON’T USE COMMA OPERATORS AT ALL.
Relational operators and floating point comparison
The first and third lines below are redundant and should be replaced by the second and last lines instead:
if (b1 == true) {};
if (b1) {};
if (b1 == false) {};
if (!b1) {};
The following will print d1 > d2
despite mathematically d1
and d2
are equal, and this is because of precision handling in floating points:
#include <iostream>
int main() {
double d1{ 100.0 - 99.99 }; // should equal 0.01 mathematically
double d2{ 10.0 - 9.99 }; // should equal 0.01 mathematically
if (d1 == d2)
std::cout << "d1 == d2" << '\n';
else if (d1 > d2)
std::cout << "d1 > d2" << '\n';
else if (d1 < d2)
std::cout << "d1 < d2" << '\n';
return 0;
}
As a result, floating point comparison using relational operators can be dangerous sometimes. Instead of using these native operators, we can define our own “approximately equal” function:
// C++23 version -> so that std::abs is constexpr
#include <algorithm> // for std::max
#include <cmath> // for std::abs (constexpr in C++23)
// Return true if the difference between a and b is within epsilon percent of the larger of a and b
constexpr bool approximatelyEqualRel(double a, double b, double relEpsilon) {
return (std::abs(a - b) <= (std::max(std::abs(a), std::abs(b)) * relEpsilon));
}
// Return true if the difference between a and b is less than or equal to absEpsilon, or within relEpsilon percent of the larger of a and b
constexpr bool approximatelyEqualAbsRel(double a, double b, double absEpsilon, double relEpsilon) {
// Check if the numbers are really close -- needed when comparing numbers near zero.
if (std::abs(a - b) <= absEpsilon)
return true;
// Otherwise fall back to Knuth's algorithm
return approximatelyEqualRel(a, b, relEpsilon);
}
With that, we can compare like this:
int main() {
constexpr double a{ 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 }; // almost 1
constexpr double relEps { 1e-8 };
constexpr double absEps { 1e-12 };
constexpr bool same { approximatelyEqualAbsRel(a, 1.0, absEps, relEps) };
std::cout << same << '\n';
return 0;
}
We’re using C++23 so that std::abs
is constexpr
, otherwise the compiler will throw an error when instantiating same
. In that case, we can fix the above program by defining our own abs
function.
Logical operators
There are three kinds of logical operators: logical NOT (!
), logical OR ||
and logical AND &&
. The latter two share a property called short circuit evaluation, which means the left operand, if determining the final result already, will terminate the evaluation early without the right operand being evaluated at all. If we overload these operators with our own function definitions, this short circuit behavior is gone.
Another thing worth noting is that &&
actually has high precedence than ||
. As a result, when mixing &&
and ||
, it’s best to use parentheses whenever possible.
Bit Manipulations
Bit flags and bit manipulation via std::bitset
When defining bit flags, we use std::bitset
from <bitset>
header:
#include <bitset>
std::bitset<8> mybitset {}; // 8 bits = 8 flags
The class std::bitset
provides some key functions that are useful for bit manipulation:
test() // query whether a bit is a 0 or 1
set() // turn a bit on (this will do nothing if the bit is already on)
reset() // turn a bit off (same as above)
flip() // flip a bit from 0 to 1 (or vice versa)
all() // check if all are 1
any() // check if any is 1
none() // check if none is 1
Bitwise operators
C++ provides 6 bit manipulation operators, often called bitwise operators:
Operator | Symbol | Form | Operation |
---|---|---|---|
left shift | << |
x << y |
all bits in x shift left by y bits |
right shift | >> |
x >> y |
all bits in x shift right by y bits |
bitwise NOT | ~ |
~x |
all bits in x get flipped |
bitwise AND | & |
x & y |
each bit in x AND each bit in y |
bitwise OR | | |
x | y |
each bit in x OR each bit in y |
bitwise XOR | ^ |
x ^ y |
each bit in x XOR each bit in y |
Of all operators above, we have a corresponding assignment version.
Scope, Duration and Linkage
User-defined namespaces
Imagine we have the project structure as below:
foo.cpp:
int do_something(int x, int y) {
return x + y;
}
goo.cpp:
int do_something(int x, int y) {
return x - y;
}
with these two files only, the project will compile just fine, as we’re not combining them together. Now, let’s say we have another file main.cpp:
#include <iostream>
int do_something(int, int);
int main() {
std::cout << do_something(1, 2) << '\n';
return 0;
}
We would get a linker error right away for defining the same function twice. Notice that this error doesn’t require the main
to actually call the do_something
function - in fact, simply by declaring the function in main.cpp cause the error. To resolve this issue (without renaming one of the two function definitions), we can define our own namespace:
foo.cpp:
namespace Foo {
int do_something(int x, int y) {
return x + y;
}
}
goo.cpp:
namespace Goo {
int do_something(int x, int y) {
return x - y;
}
}
We can, if needed, define nested namespaces in either of the following ways:
namespace Foo {
namespace Goo {
...
}
}
// or
namespace Foo::Goo {
...
}
When using the nested namespaces a lot, we can define namespace aliases like below:
namespace Temp = Foo::Goo;
Local variables and global variables
- Local variables have block scope: they’re in scope from definition to the end of the block
- Local variables have automatic storage duration: same as above
- Best practice is to define local variables in the smallest scope allowed
On the contrary:
- Global variables are defined outside all functions including
main
- Global variables can be defined inside a global namespace
- Best practice is to define a global variable, if not constant, with a prefix
g
org_
Internal linkage and internal variables
An identifier with internal linkage can be seen and used within a single translation unit, but it is not accessible from other translation units (that is, it is not exposed to other linkers). This means if two source files share the same linker, the identifiers with internal linkage will be treated as independent and thus not violating ODR for duplicate definitions.
To make a global variable internal (and thus making it an internal variable), we can use the keyword static
. Notice that static
is ignored for const global variables as they’re internal by default already.
There are typically two reasons to give an identifier internal linkage:
- There s an identifier we want to make sure isn’t accessible to other files
- To be pedantic about avoiding naming collisions
External linkage and external variables
An identifier with external linkage can be seen and used by other other files. In that sense, external variables are more “global” than global variables. Functions and non-constant global variables are external by default. Constant global variables can be defined as external using extern
keyword.
Sharing global constants across files
Prior to C++17, the following is the best practice:
- Create a header file
- Inside the header file, define a namespace
- Inside the namespace, define all the constants (make sure they’re
constexpr
) #include
the header to wherever you need those constants
Because constant global variables are by default internal, each .cpp file are actually going to have an independent copy of all these constants and header guards won’t stop this from happening. Changing any of these constants, therefore, will result in lengthy recompiling/rebuilding time for large projects. In addition, it can involve a large lump of memory every time a copy is needed.
Instead of above, we can
- Create a header file and corresponding cpp file
- Inside both files, define a namespace
- Inside the namespace of the header file, declare all the constants as
extern const
(usingconst
instead ofconstexpr
here becauseconstexpr
cannot be forward declared) - Inside the namespace of the cpp file, define the constants
We can, after learning inline
function and variables, define the constants as inline constexpr
:
- Create a header file
- Inside the header file, define a namespace
- Inside the namespace, define all the constants as
inline constexpr
Static local variables
We have covered what static
global variable means, now let’s talk about static
local variables. Using the static
keyword on a local variable changes the duration from automatic duration
(from definition to end of block) to static duration
(from definition to explicit destroy) - just like a global variable. One of the most common use case of static
local variables is unique ID generation:
int generate_unique_id() {
static int s_item_id {0};
return s_item_id++; // make copy; inc original; return copy
}
Another use case is when you have a function that requires a constant caching that takes a lot of time to generate on the first time.
Scope, duration and linkage summary
One last time let’s go over these concepts:
- scope: where the identifier is accessible
- duration: birth to death of an identifier
- linkage: whether multiple declarations of an identifier refer to the same entity or not
In table format:
Type | Example | Scope | Duration | Linkage | Notes |
---|---|---|---|---|---|
Local variables | int x; |
Block | Automatic | None | |
Static local variables | static int s_x; |
Block | Static | None | |
Dynamic local variables | int* x {new int{}}; |
Block | Dynamic | None | |
Function parameters | void foo(int x) |
Block | Automatic | None | |
External non-constant global variables | int g_x; |
Global | Static | External | Initialized or uninitialized |
Internal non-constant global variables | static int g_x; |
Global | Static | Internal | Initialized or uninitialized |
Internal constant global variables | constexpr int g_x{1}; |
Global | Static | Internal | Must be initialized |
External constant global variables | extern const int g_x{1}; |
Global | Static | External | Must be initialized |
Inline constant global variables (C++17) | inline constexpr int g_x{1}; |
Global | Static | External | Must be initialized |
Unnamed and inline namespaces
We have a few different namespaces available:
- Global namespace: no need for any accessor
::
- Normal named namespace: need name accessor
name::
- Unnamed namespace (aka anonymous namespace): anything defined inside an anonymous namespace is seen as “static”, that is, they can only be accessed, globally, within the current source file; no need for any accessor
::
- Inline namespace: overtaking the global namespace
Using inline namespaces allow us to “switch” easily between multiple versions of functions etc without changing the old code.
Control Flow and Error Handling
Categories of flow control statements
Category | Meaning | Implemented in C++ by |
---|---|---|
Conditional statements | Execute only if some condition is met | if , switch |
Jumps | Start executing the statement at some other location | goto , break , continue |
Function calls | Function calls a jumps to some other location and back | Function call, return |
Loops | Repeatedly execute some sequence of code until end condition is met | while , do-while , for , ranged-for |
Halts | Quit running the whole program | std::exit , std::abort |
Exceptions | Error handling | try , throw , catch |
If conditions
Only thing that needs attention is the following example:
#include <iostream>
int main() {
if (true)
int x{5};
else
int x{6};
std::cout << x << '\n';
return 0;
}
The program above won’t compile because both definitions of x
are within the conditional blocks and thus destroyed by std::cout
needs it.
Null statements
A null statement is an expression with just a semicolon:
if (x > 10)
;
which is equivalent to
if (x > 10);
and thus can be dangerous, for example:
if (nuclear_code_activated());
blow_up_the_world();
The world will be blown up no matter whether the button is pressed or not, because what’s actually going on in the example above is
if (nuclear_code_activated()) {
;
}
blow_up_the_world();
constexpr
if statements
Check this example:
#include <iostream>
int main(){
constexpr double g {9.8};
if (g == 9.8) {
std::cout << "gravity is noraml\n";
} else {
std::cout << "gravity is " << g << '\n';
}
return 0;
}
The above example will compile but the if statement is only evaluated at compile time, which is wasteful as g
is constexpr
. In C++17 we can optimize the flow using constexpr
if statements:
#include <iostream>
int main() {
constexpr double g {9.8};
if constexpr (g == 9.8) {
std::cout << "gravity is normal\n";
} else {
std::cout << "gravity is " << g << '\n';
}
return 0;
}
Modern compilers may or may not use a warning to advise such conversion of if statement for optimization purposes, and the may or may not automatically do the constexpr
treatment for you. So when such optimization is required, it’s best to be explicit than letting the compiler to decide.
switch
statements
A switch
statement is nothing but a chain of if-else-if
statements:
switch (statement) {
case 1:
...
case 2:
...
...
default:
...
}
Notice that a switch statement is evaluated case by case sequentially and it won’t automatically break out of the switch
block without explicitly using break
or return
. As a result, the best practice is to keep a break
or return
at all cases, and keep a default
in the end. For example:
#include <iostream>
int main() {
switch (2) {
case 1: // Does not match
std::cout << 1 << '\n'; // Skipped
case 2: // Match!
std::cout << 2 << '\n'; // Execution begins here
case 3:
std::cout << 3 << '\n'; // This is also executed
case 4:
std::cout << 4 << '\n'; // This is also executed
default:
std::cout << 5 << '\n'; // This is also executed
}
return 0;
}
This is called switch
overflow (or fall-through). When fall-through happens, compilers usually give warnings as it’s usually not intentional. That being said, if it is indeed desired, we can use the [[fallthrough]]
attribute to indicate the compiler that a warning won’t be necessary:
#include <iostream>
int main() {
switch (2) {
case 1:
std::cout << 1 << '\n';
break;
case 2:
std::cout << 2 << '\n';
[[fallthrough]]; // fall through starting from here
case 3:
std::cout << 3 << '\n';
break;
}
return 0;
}
Sequential case
labels
Because case
labels are not statements (they’re labels), we can stack them if desired:
bool is_vowel(char c) {
return (c == 'a' || c == 'e' || c == 'i' || c =='o' || c == 'u' ||
c == 'A' || c == 'E' || c == 'I' || c == 'O' || c == 'U');
}
The above can be rewritten using switch
statement as
bool is_vowel(char c) {
switch (c) {
case 'a':
case 'e':
case 'i':
case 'o':
case 'u':
case 'A':
case 'E':
case 'I':
case 'O':
case 'U':
return true;
default:
return false;
}
}
This is not fall-through because only one case is being evaluated and executed.
switch
case scoping
It’s worth noting that different from if
statements where there’s an implicit block, under switch
cases there’re not individual blocks. Instead, all cases share the same switch
block:
switch (1) {
int a; // okay: definition is allowed before the case labels
int b{ 5 }; // illegal: initialization is not allowed before the case labels
case 1:
int y; // okay but bad practice: definition is allowed within a case
y = 4; // okay: assignment is allowed
break;
case 2:
int z{ 4 }; // illegal: initialization is not allowed if subsequent cases exist
y = 5; // okay: y was declared above, so we can use it here too
break;
case 3: { // note addition of explicit block here
int x{ 4 }; // okay, variables can be initialized inside a block inside a case
std::cout << x;
break;
}
}
while
and do-while
statements
Skipped.
for
loops
The general template is
for (init-statement; condition; end-expression) {
statement;
}
which can be very confusing if we allow null statements:
for (;;) {
statement;
}
The above is equivalent to
while (true) {
statement;
}
A for
loop can also have multiple variables in the init-statement:
for (int x{0}, y{10}; x < 10; ++x, --y) {
std::cout << x << ' ' << y << '\n';
}
break
, continue
and early return
Skipped.
Halts
In <cstdlib>
we have std::exit
and std::atexit
defined. std::exit
terminates the program with the given exit status code. It also performs a number of cleanup functions:
- objects with static storage duration are destroyed
- file cleanup if files are used in the program
- return the control back to the OS with the status code
However, std::exit
doesn’t clean up any local variables, and because of that, it’s advised to avoid std::exit
generally. If we really need to use std::exit
and have concerns like such, we can use std::atexit
to register our own custom cleanup functions before calling std::exit
.
For multi-threaded programs, calling std::exit
in a subprocess can crash the main program because of destroyed static objects. To remedy this problem, C++ has provided another pair of halting functions that doesn’t clean up the static objects: std::quick_exit
and std::at_quick_exit
.
In addition, C++ has provided std::abort
for abnormal termination, and std::terminate
for exception handling. Notice that std::terminate
actually calls std::abort
implicitly.
std::cout
vs std::cerr
We know that std::cout
is buffered. This means the output may or may not be printed to console by the time the program crashes (if ever). In contrast, std::cerr
is unbuffered and thus can always print the needed error message to console.
Clearing buffer in std::cin
We can simply
std::cin.ignore(100, '\n');
which ignores the following 100 characters in the buffer until and including the next ‘\n’. Even better, we can just ignore the maximum allowed length of stream:
#include <limits>
void ignore_line() {
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
The failure mode of std::cin
When an input is invalid and extraction fails, the wrong input is kept in the buffer and std::cin
enters the failure mode. In order to continue input, we need to
if (std::cin.fail()) {
std::cin.clear();
ignore_line();
}
Because std::cin
has an automatic boolean conversion indicating whether the last input succeeded, we can also
if (!std::cin) {
std::cin.clear();
ignore_line();
}
One last corner case is when end-of-file (EOF) character (via ctrl-D) is passed, the whole input stream closes. This is something that can’t be fixed by std::cin.clear()
and thus we need to modify the above resetting logic to
if (!std::cin) {
if (std::cin.eof()) {
exit(0); // we can just shut down the program
}
std::cin.clear(); // back to normal input stream
ignore_line(); // ignore bad characters remaining in the buffer
}
It’s worth noting that when the user input overflows the range of target variable, e.g. passing 40,000 to int16 (whose range is -32,768 to 32,767), the assignment still happens:
- The closest limit is assigned to the value
- The input stream goes into failure mode
assert
and static_assert
In C++, runtime assertions are implemented via the assert
preprocessor macro, which lives in the <cassert>
header:
#include <cassert> // for assert
#include <cmath> // for std::sqrt
#include <iostream>
double f(double x, double y) {
assert(y > 0);
if (x <= 0)
return 0;
return std::sqrt(2 * x / y);
}
int main() {
std::cout << "The answer is " << f(100, -9) << '\n';
return 0;
}
We can make our assertion statement more descriptive by adding a followup string to it
assert(condition && "my assert message");
This is because the logical AND (&&
) operator has short circuit property and thus any false
condition would yield the whole compound statement inside the parenthesis and thus including the added string literal in the error message.
It’s worth noting that assert
macro comes with a small performance cost at each time the condition is checked, and thus in production code it’s usually advised to not use assert
at all, as your code should be fully tested already. We can use the NDEBUG
macro to turn off assert
statements in some IDEs.
C++ also has another type of assertion called static_assert
. This is basically the compile-time version of the assert
which is runtime. Unlike assert
which is a macro defined in <cassert>
, static_assert
is actually a keyword and thus no header is needed. A static_assert
takes the following form:
static_assert(condition, message);
where the condition must be constant expression, and if any error, it would be a compiler error. Prior to C++17, the message used to be required, but this is no longer the case.
Generating random numbers
Using Mersenne Twister:
#include <iostream>
#include <random> // for std::mt19937
int main() {
std::mt19937 mt{}; // instantiate a 32-bit Mersenne Twister
for (int count{1}; count <= 40; ++count) {
std::cout << mt() << '\t';
if (count % 5 == 0)
std::cout << '\n';
}
return 0;
}
which will print 40 32-bit PRNG as a 8x5 table.
We can also generate uniform distribution:
#include <iostream>
#include <random>
int main() {
std::mt19937 mt{};
std::uniform_int_distribution die6{1, 6};
for (int count{1}; count <= 40; ++count) {
std::cout << die6(mt) << '\t';
if (count % 10 == 0) std::cout << '\n';
}
return 0;
}
We can seed the random number generator as mt{42}
instead of the default seed mt{1}
, but more preferably we can use the std::random_device{}
to achieve better pseudo-randomness:
#include <iostream>
#include <random>
int main() {
std::mt19937 mt{ std::random_device{}() };
// other following code
return 0;
}
Notice that random_device
only gives 32-bit (4 bytes) integer to seed the Mersenne number, but the random number generator is 624 bytes in size (156 folds of the size given). This means we’re effectively underseeding the generator, potentially making the result less random. To cope with that we can use std::seed_seq
which basically combines a sequence of seed numbers, each of 4 bytes, into a large seeding object to be passed to the generator:
#include <iostream>
#include <random>
int main() {
std::random_device rd{}; // just device, not called yet
std::seed_seq ss { rd(), rd(), rd(), rd(), rd(), rd(), rd() }; // you can go all the way to 156 of rd() if you want
std::mt19937 mt{ss};
std::uniform_int_distribution die6{1, 6};
for (int count{1}; count <= 40; ++count) {
std::cout << die6(mt) << '\t';
if (count % 10 == 0) std::cout << '\n';
}
return 0;
}
Type Conversion and Function Overloading
Implicit type conversion
Implicit type conversion happens in all of the following case:
When initializing a variable with a value of a different type:
double d {3}; // int converted to double
d = 6; // int converted to double
When the type of a return value is different from the function’s declared return type:
float something() {
return 3.0; // double 3.0 converted to float
}
When using certain binary operators with operands of different types:
double division {4.0 / 3}; // int 3 converted to double
When an argument passed to a function is a different type than the function parameter:
void something(long l) {
}
something (3); // int 3 converted to long
Numeric promotion
Because C++ is designed to be portable and performant across a wide range of architectures, the language designers did not want to assume a given CPU would be able to efficiently manipulate values that were narrower than the natural data size for that CPU.
To help address this challenge, C++ defines a category of type conversions informally called the numeric promotions. A numeric promotion is the type conversion of certain narrower numeric types (such as a char) to certain wider numeric types (typically int or double) that can be processed efficiently and is less likely to have a result that overflows.
All numeric promotions are value-preserving, which means that the converted value will always be equal to the source value (it will just have a different type). Since all values of the source type can be precisely represented in the destination type, value-preserving conversions are said to be “safe conversions”.
Because promotions are safe, the compiler will freely use numeric promotion as needed, and will not issue a warning when doing so.
#include <iostream>
void printInt(int x) {
std::cout << x << '\n';
}
#include <iostream>
void printDouble(double d) {
std::cout << d << '\n';
}
int main() {
printDouble(5.0); // no conversion necessary
printDouble(4.0f); // numeric promotion of float to double
return 0;
}
Narrowing conversion
In contrast to numeric conversion that are widening (note: numeric promotion, which widens a type to the native type, is just a subset of general numeric conversion), narrowing conversions are in most cases not recommended. If we really need to do so, we’d better do it explicitly:
void some_function(int i) {
}
int main() {
double d{5.0};
some_function(d); // bad: implicit convertion
some_function(static_cast<int>(d)); // good: explicit conversion
return 0;
}
It’s worth noting (and praising) that brace initialization doesn’t allow narrow conversion implicitly:
int main() {
int i {3.65}; // won't compile at all!
return 0;
}
…except the value being converted is constexpr
:
int main() {
constexpr int n1 {5};
unsigned int n2 {n1}; // not considered narrowing due to constexpr excludion clause
int n3 {n1}; // error: brace initializtion disallows narrowing
return 0;
}
Arithmetic conversion
What happens when we evaluate e.g. 2 + 5.5
? There is actually a list of prioritized types for usual arithmetic conversion:
long double
(highest)double
float
unsigned long long
long long
unsigned long
long
unsigned int
int
(lowest)
If the operands are not of the same type, the type with higher priority will be returned.
Explicit type conversion (casting)
C++ supports 5 different types of casts: C-style casts, static casts, const casts, dynamic casts and reinterpret casts. The latter four are often referred to as named casts.
C-style casts:
#include <iostream>
int main(){
int x {10};
int y {4};
double d {(double)x / y}; // convert x to double
std::cout << d << '\n'; // prints: 3.5
return 0;
}
We can also use a more function-call like syntax:
double d {double(x) / y}; // same as above
Static cast:
#include <iostream>
int main() {
char c {'a'};
std::cout << c << ' ' << static_cast<int>(c) << '\n'; // prints: a 97
return 0;
}
The static_cast
is intentionally less powerful than the C-style cast, for example, you cannot remove the const
from a variable after the cast:
int main() {
const int x {5};
int& ref{static_cast<int&>(x)}; // error: dropping the const qualifier here
ref = 6;
return 0;
}
typedef
and type aliases
There are two ways the create type aliases, an old, backward-compatible way using typedef
, and a more modern way with using
.
typedef long Miles; // typedef { some existing type } { (as) some name }
using Miles = long; // using { some name (for) } = { some existing type }
The old typedef
can be hard to read sometimes:
typedef int (*FuncType)(double, char);
using FuncType = int(*)(double, char);
Type aliases are very useful when we want to hide platform specific details on typing. For example, on some platforms, an int
is of 2 bytes while on others 4. We can define the explicit types in a header file like this:
#ifdef INT_2_BYTES
using int8_t = char;
using int16_t = int;
using int32_t = long;
#else
using int8_t = char;
using int16_t = short;
using int32_t = int;
#endif
Using type aliases can make complex types easier to read and use:
using VectPairSI = std::vector<std::pair<std::string, int>>;
Type deduction using auto
Type deduction (also sometimes called type inference) is a feature that allows the compiler to deduce the type of an object from the object’s initializer. To use type deduction, the auto keyword is used in place of the variable’s type:
int add(int x, int y) {
return x + y;
}
int main() {
auto d {5.0}; // double
auto i {1 + 2}; // int
auto x {i}; // int as well
auto sum {add(5, 6)}; // compiler konws add() returns and int, so auto works for function as well
auto x; // invalid: no idea of type
auto y {}; // invalid: same as above
return 0;
}
It’s worth noting that type deduction drops const
and constexpr
qualifiers implicitly:
int main() {
const int x {5}; // x has type const int (compile-time)
auto y {x}; // y is just an int
constexpr z {x}; // z is constexpr int
}
For strings, it’s a bit tricky, as auto
by default infers the type to be char*
. In order to make auto
to deduct a string literal into string
type, we need to use literal suffixes:
#include <string>
#include <string_view>
int main() {
using namespace std::literals;
auto s1 {"goo"s}; // string
auto s2 {"foo"sv}; // string_view
return 0;
}
For function return types (note: different from the returned values of a function with given type, see above) it’s even trickier. When the compiler knows the returned type of a function is unique and deterministic, we can use the auto
keyword in a similar way as for variables:
auto add(int x, int y) {
return x + y;
} // this works
auto something(bool b) {
if (b) {
return 5;
} else {
return 6.5;
}
} // won't compile
auto foo(); // this compiles
int main() {
foo(); // but throws a compiler error here, because the function definition isn't specified yet
}
auto foo() {
return 5;
} // only defned here after main
To solve the forward declaration problem in the above example, we can use the trailing return type syntax:
auto add(int, int) -> int;
int main() {
add(5, 2); // won't throw compiler error anymore
}
auto add(int x, int y) -> int {
return x + y;
} // even though the definition is here
Note that type deduction doesn’t apply to function parameters. If you use auto
like the example below, it would only compile for C++20 and beyond and because of a different feature called function templates, rather than type deduction:
#includer <iostream>
void add_and_print(auto x, auto y) {
std::cout << x + y << '\n';
}
int main() {
add_and_print(2, 3); // int
add_and_print(2.5, 3.5); // double
return 0;
}
Function overloading
int add(int x, int y) { // integer version
return x + y;
}
double add(double x, double y) { // floating point version
return x + y;
}
int main() {
return 0;
}
The compiler knows how to differentiate the overloaded functions. Specifically:
Property | Used for differentiation? | Notes |
---|---|---|
Number of parameters | Yes | |
Type of parameters | Yes | Excludes typedef , type alias and const qualifier. Includes ellipses. |
Return type | No |
Note that for member functions, const
/volatile
and ref-qualifiers are indeed used for overloading.
Given above we define a function’s type signature as the combination of the following
- function name
- number of parameters
- parameter types
- function-level qualifiers
which doesn’t include the return type.
Deleting functions
When we want to avoid undesired usage of functions e.g.
#include <iostream>
void print_int(int x) {
std::cout << x << '\n';
}
int main() {
print_int(5); // prints 5; ok
print_int('a'); // prints 97; ???
print_int(true); // prints 1; ???
return 0;
}
We can define the function as deleted by using the = delete
specifier:
void print_int(char) = delete;
void print_int(book) = delete;
This way, when we evaluate e.g. print_int('a')
, there will be a compile error. This is explicit but sometimes too verbose. If we want to delete all other overloads of the function, we can use function templates:
#include <iostream>
void print_int(int x) {
std::cout << x << '\n';
}
template <typename T>
void print_int(T x) = delete; // deleting all other types
int main() {
print_int(5); // ok
print_int('a'); // compile error
print_int(true); // compile error
return 0;
}
Default arguments
Functions can have default arguments and that can be really convenient:
#include <iostream>
void print(int x, int y=10, int z=20) {
std::cout << "x=" << x << ", y=" << y << ", z=" << z << '\n';
}
int main() {
print(1, 2, 3); // prints: x=1, y=2, z=3
print(1, 2); // prints: x=1, y=2, z=20
return 0;
}
However, default arguments cannot be re-declared:
void print(int x, int y=10); // forward declaration
void print(int x, int y=20); // error
Therefore, the best practice is to define default arguments in header files as forward declarations:
in foo.h:
#ifndef FOO_H
#define FOO_H
void print(int x, int y=10);
#endif
in main.cpp:
#include "foo.h"
#include <iostream>
void print(int x, int y) { // no default arguments defined
std::cout << "x=" << x << ", y=" << y << '\n';
}
int main() {
print(5); // still prints x=5, y=10 as expected
return 0;
}
Templated functions
In C++, the template system was designed to simplify the process of creating functions (or classes) that are able to work with different data types. For example, below is a templated max function
template <typename T>
T max(T x, T y) {
return (x < y) ? y : x;
}
which is actually very similar to the templated/overloaded std::max
function defined the in the STL:
template<class T, class Compare>
const T& max(const T& a, const T& b, Compare comp);
In order to use a templated function, we need to instantiate it like below
#include "max.h" // assuming our templated max function is defined here
#include <iostream>
int main() {
std::cout << max<int>(1, 2) << '\n';
return 0;
}
Thanks to template argument deduction, we can also do one of below instead:
max<>(1, 2);
max(1, 2);
This deduction doesn’t always work, e.g.
max("what?"); // error
but we can effectively avoid this kind of unknown behavior by deleting certain instantiations of templated functions:
template <>
const char* max(const char*, const char*) = delete;
Another headache happens in the following example:
template <typename T, typename U>
T max(T x, U y) {
return (x < y) ? y : x;
}
int main() {
std::cout << max(2, 2.5) << '\n';
return 0;
}
What’s gonna happen? Because in this case, by type deduction, type T
is int
and U
is double
, and because double
takes precedence over int
in arithmetic rules, we’re effectively comparing 2.0 versus 2.5. However, as the return type is T
namely int
, how can we return 2.5 back? We cannot simply say using U
as return type, as the user can always just switch the two input arguments. The better way is to utilize the auto
type as return type:
template <typename T, typename U>
auto max(T x, U y) {
return (x < y) ? y : x;
}
Starting from C++20, we can even use auto
as argument types to create templated functions, which is called abbreviated function templates:
auto max(auto x, auto y) {
return (x < y) ? y : x;
}
which is translated by compiler automatically into the correctly templated max function above.
Non-type template parameters
In addition to type parameters, in C++ we can also pass non-type parameters to templates. A non-type template parameter is a template parameter with a fixed type that servers as a place holder for a constexpr
value passed in as a template argument. A non-type template parameter can be any of the following types:
- An integral type
- An enumeration type
std::nullptr_t
- A floating point type (since C++20)
- A literal class type (since C++20)
- A pointer or reference to
- an object
- a function
- a member function
We have seen non-type template parameters before, when we introduced std::bitset
:
#include <bitset>
int main() {
std::bitset<8> bits{ 0b0000'0101 }; // The <8> is a non-type template parameter
return 0;
}
Here is a simple example involving a non-type template parameter:
template <int N>
void print() {
std::cout << N << '\n';
}
It might seem trivial in this case, but when we want to static_assert
an argument (remember that arguments cannot be constexpr
and thus cannot be checked at compile-time), we can use this trick to achieve our goal.
Just like auto
in type template parameters, we can also use auto
for automatic type deduction in non-type parameters:
#include <iostream>
template <auto N>
void print() {
std::cout << N << '\n';
}
int main() {
print<5>();
print<'c'>();
}
Compound Types: References and Pointers
What are compound data types in C++?
Compound data types (also sometimes called composite data types) are data types that can be constructed from fundamental data types (or other compound data types). Each compound data type has its own unique properties as well. C++ supports the following compound types:
- Functions
- Arrays
- Pointer types
- to object
- to function
- to data member
- to member function
- Reference types
- lvalue references
- rvalue references
- Enumerated types
- Unscoped enumerations
- Scoped enumerations
- Class types
- Structs
- Classes
- Unions
Value categories (lvalue and rvalue)
Prior to C++11, there were only two value categories: lvalue and rvalue. In C++11, three additional value categories were added to support a new feature called move
semantics: glvalue, prvalue and xvalue.
An lvalue is an expression that evaluates to an identifiable object or function (or bit-field). Entities with identities can be accessed via an identifier, reference or pointer. Therefore, we have modifiable and non-modifiable lvalues.
An rvalue is an expression that is not an lvalue. Commonly seen rvalues include literals (except C-style string literals, which are lvalues) and the return value of functions and operators that return by value. Rvalues are not identifiable and only exist within the scope of the expression in which they are used.
Lvalue references
In C++, a reference is an alias for an existing object. Once a reference has been defined, any operation on the reference is applied to the object being referenced. An lvalue reference (commonly just called a reference since prior to C++11 there was only one type of reference) acts as an alias for an existing lvalue (such as a variable).
int // int type
int& // an lvalue reference on an int object
double& // an lvalue reference on a double object
For who knows about pointers already - the ampersand symbol here does not mean “the address of”, but “lvalue reference to” an object.
A reference, once initialized, cannot be reseated, meaning that we cannot reassign it to reference another object:
#include <iostream>
int main() {
int x{5};
int y{6};
int& ref{x}; // ref now is an alias of x
ref = y; // assign 6 to x! not reseating ref
std::cout << x << '\n';
return 0;
}
It’s worth noting that, although reference variables follow the same scoping and duration rules as normal variables, references and referents can have independent lifetimes. A reference can be destroyed before the object it references to, and the object can be destroyed without the reference as well.
#include <iostream>
int main() {
int x{5};
{
int& ref{x}; // ref is an alias to x
std::cout << ref << '\n'; // prints 5
} // ref destroyed here
std::cout << x << '\n'; // prints 5
return 0;
} // x destroyed here
Dangling references, namely references with their referents destroyed already, are dangerous in coding and should be avoided. Luckily, they mostly only occur when we try to return functions by references/addresses.
Last bit but quite shockingly perhaps, references are not objects. They may or may not require storage and are not considered objects in C++. As a result, we cannot define a reference to another reference, as an lvalue reference must reference an identifiable object.
int var{};
int& ref1{var};
int& ref2{ref1}; // referencing var, not ref1
Lvalue references to const
An lvalue reference can not directly reference a const
object:
int main() {
const int x { 5 }; // x is a non-modifiable (const) lvalue
int& ref { x }; // error: ref can not bind to non-modifiable lvalue
return 0;
}
We can use the const
keyword on the reference to do that:
int main() {
const int x{5};
const int& ref{x};
return 0;
}
in which case we call ref
as a const reference.
We can also bind a const reference to a modifiable lvalue, which just means we’re not going to modify the original object.
Furthermore, we can also bind a const lvalue reference to an rvalue, in which case we effectively extend the lifetime of the temporary rvalue object and can safely use the value until the end of the current scope. This does not, however, work with return by (even const) references.
Last and least used, we can define constexpr
references in certain cases:
int g_x {5};
int main() {
constexpr int& ref1{g_x}; // ok: constexpr ref to global
static int s_x {6};
constexpr int& ref2{s_x}; // ok: constexpr ref to static local
static const int s_xc{6};
constexpr const int& ref2c{s_xc}; // ok: constexpr const ref to static const local (...)
int x{7};
constexpr int& ref3{x}; // compile error: constexpr ref to non-static local
}
Pass by lvalue references
Some objects are expensive to copy so we prefer to pass them by lvalue references to functions. Also, pass by (non-const
) reference allows us to modify the original object within the function.
#include <iostream>
void printValue(int& y) {
std::cout << y << '\n';
}
int main() {
int x { 5 };
printValue(x); // ok: x is a modifiable lvalue
const int z { 5 };
printValue(z); // error: z is a non-modifiable lvalue, cannot be passed as regular lvalue ref
printValue(5); // error: 5 is an rvalue, cannot be passed as lvalue reference
return 0;
}
We can fix it by passing by const
lvalue reference:
#include <iostream>
void printValue(const int& y) {
std::cout << y << '\n';
}
int main() {
const int z{5};
printValue(z); // ok
printValue(5); // ok
}
We can now answer the question of why we don’t pass everything by reference:
- For objects that are cheap to copy, the cost of copying is similar to the cost of binding, so we favor pass by value so the code generated will be faster.
- For objects that are expensive to copy, the cost of the copy dominates, so we favor pass by (const) reference to avoid making a copy.
How do we tell is a type is “cheap to copy” or not? Below is a function-like macro that tries to answer this question:
#include <iostream>
// Function-like macro that evaluates to true if the type (or object) is
// equal to or smaller than the size of two memory addresses
#define isSmall(T) (sizeof(T) <= 2 * sizeof(void*))
struct S {
double a;
double b;
double c;
};
int main()
{
std::cout << std::boolalpha; // print true or false rather than 1 or 0
std::cout << isSmall(int) << '\n'; // true
std::cout << isSmall(double) << '\n'; // true
std::cout << isSmall(S) << '\n'; // false
return 0;
}
Last not is that it’s always better (except if you’re using C++14 or older, where std::string_view
is not available) to use std::string_view
than const std::string&
because std::string_view
has less expensive copy/initialization than the potential type conversion that happens when we use const std::string&
.
Introduction to pointers and the address-of operator &
The &
symbol tends to cause confusion because it has different meanings depending on context:
- When following a type name,
&
denotes an lvalue reference:int& ref
- When used in a unary context in an expression,
&
is the address-of operator:std::cout << &x
- When used in a binary context in an expression,
&
is the bitwise AND operator:std::cout << x & y
Paired with the address-of operator &
, we have the dereference operator *
:
#include <iostream>
int main() {
int x{5};
std::cout << x << '\n'; // prints 5
std::cout << &x << '\n'; // prints address of x
std::cout << *(&x) << '\n'; // prints the value at the memory address of x, which is 5
}
In short, &
gives the address of RHS object, and *
gives the object of the RHS address.
With this concept in mind, we define a pointer as an object that holds a memory address as its value:
int x; // a normal int
int& y; // an lvalue reference to an int
int* z; // a pointer to an int
Notice we generally recommend putting the dereference operator *
next to the type, just like the reference operator &
. But when it comes to multiple definitions in the same line, it becomes somewhat awkward:
int* x, * z; // we CANNOT define them as int* x, y
but we still would recommend placing *
right next to the type and keeping a space between it and the variable - although a better suggestion is to not define multiple variables in the same line.
The size of a pointer is always the same, as it’s just the address in the memory represented by a bunch of bits / hexadecimal digits. So on a 32-bit machine, the size of a pointer is always 32 bits (4 bytes), and on a 64-bit machine, it’s always 64 bits (8 bytes).
Pointer initialization and assignment
Initialization:
int main() {
int x{5};
float d{2.2};
int* p1; // uninitialized pointer (holds garbage address)
int* p2{}; // a null pointer
int* p3{&x}; // a pointer at address of x
int* p4{&d}; // error: initializing an int pointer to address of a double
int* p5{5}; // error: cannot initialize with an int literal
int* p6{ 0x0012FF7C}; // error: 0x0012FF7C is here just an integer literal
return 0;
}
Assignment:
int main() {
int x{5};
int* ptr {&x}; // initializing ptr to address of x
int y{6};
ptr = &y; // re-assign ptr to address of y
*ptr = 7; // assigning 7 to variable y
}
Differences between pointers and references
- References must be initialized, pointers are not required to (but are recommended to)
- References are not objects, pointers are
- References can not be reseated, pointers can
- References must always be bound to an object, pointers can point to nothing (nullptr)
- References are “safe” (except dangling references), pointers are inherently dangerous
Null pointers
A null pointer (often shortened to just null) is a pointer that points at nothing, and the easiest way to create it is by initializing with empty braces:
int* ptr {};
int* ptr2 { nullptr }; // more explicit, same result
Notice that dereferencing a null pointer can lead to undefined behavior. Luckily, we can always check before doing anything:
#include <iostream>
int main() {
int x { 5 };
int* ptr { &x };
if (ptr == nullptr)
std::cout << "ptr is null\n";
else
std::cout << "ptr is " << *ptr << '\n';
return 0;
}
Even simpler, we can rely on the implicit conversion of a pointer to boolean, realizing that a nullptr
converts to false, and anything else true.
Notice that we cannot use this trick to avoid dangling pointers automatically, as a dangling pointer doesn’t necessarily points at a null value, but most of the time rubbish/random stuff.
Pointers and const
int main() {
const int x{5};
int* ptr {&x}; // compile error: const int* -> int*
const int* ptr{&x}; // okay
*ptr = 6; // compile error
const int y{6};
ptr = &y; // okay: just reseating
int z{7};
const int* ptr {&z}; // okay: const int pointer
*ptr = 8; // error: cannot change const int value
z = 8; // okay
int* const ptr {&z}; // int pointer that is constant
ptr = &y; // error: cannot reseat a const pointer
*ptr = 1; // okay: ptr is an int pointer, can change value
const int* const ptr{&z}; // const pointer to const int
}
Pass by value, reference and address
#include <iostream>
#include <string>
void print_by_value(std::string s) {
std::cout << s << '\n';
}
void print_by_reference(std::string& s) {
std::cout << s << '\n';
}
void print_by_address(std::string* s_ptr) {
std::cout << *s_ptr << '\n';
}
int main() {
std::string s{"hello"};
print_by_value(s); // making a copy
print_by_reference(s); // not making a copy
print_by_address(&s); // not making a copy
std::string* s_ptr {&str};
print_by_address(s_ptr); // not making a copy
}
As user might just pass a null pointer to a function, it’s still considered dangerous. So we should prefer passing by (const) references.
A little more tricky, what if we want to change a pointer to a null pointer by using a function? One might do this:
void nullify([[maybe_unused]] int* ptr) {
ptr = nullptr;
}
int main() {
int x {5};
int* ptr{&x};
nullify(ptr);
return 0;
}
However, ptr
will not be nullptr
after calling the function, this is because the changing the address held by the pointer parameter has no impact on the address held by the original argument/object. When the function is called, ptr
receives a copy of the address (a copy, of the address, has indeed happened) and thus whatever happens to within the function, stays with the copied pointer, not the original.
We might pass by the reference of the pointer object, as pointers are objects:
void nullify(int*& refptr) {
refptr = nullptr;
}
and it would work perfectly. Worth noting that it may sound funky int*&
but the order actually makes a lot sense, since we’re now passing by the (reference of) a (pointer to int), and even better, compiler will actually throw an error if we accidentally do the order wrong, as int&*
doesn’t make sense for a pointer must hold address for an object, and a reference is not an object, sadly.
Return by reference and address
Returning by reference avoids making a copy:
#include <iostream>
#include <string>
const std::string& get_program_name() {
static const std::string s_program_name {"computer"};
return s_program_name;
}
int main() {
std::cout << "the name of the program is" << get_program_name();
return 0;
}
It’s generally recommended not to return a non-const static local variable by reference. It’s non-trivial that assigning/initializing a normal variable with a returned reference makes a copy:
#include <iostream>
#include <string>
const int& get_next_id() {
static int s_x {0}; // non-const
++s_x;
return s_x; // returning a copy, not reference
}
int main() {
const int id1 { get_next_id() };
const int id2 { get_next_id() }; // id1 and id2 are NOT references on the same variable
std::cout << id1 << id2 << '\n';
return 0;
}
The output would be 12
and that’s desired. It’s important to realize that id1
and id2
above are not two references of the same variable. They’re returned by value effectively.
One other case where we return by reference is to return the original parameters of the function by reference (when we’ve passed them by reference already).
Return by address is almost identical in terms of use case as returning by reference, except that we can potentially return nullptr
suggesting “nothing” is returned. A big disadvantage of returning by address, therefore, is that the user has the obligation to check whether the returned pointer is a null pointer before dereferencing it.
In and out parameters
Parameters that are used only for receiving input from the caller are sometimes called in parameters:
void print(int x) { // x is an in parameter
std::cout << x << '\n';
}
In parameters are usually passed by value or const reference.
A function parameter that is used only for the purpose of returning information back to the caller is called an out parameter:
#include <cmath> // for sin and cos
void get_sin_cos(double degrees, double& sin_out, double& cos_out) {
constexpr double pi { 3.14159265358979323846 };
double radians = degrees * pi / 180.0;
sin_out = std::sin(radians);
cos_out = std::cos(radians);
} // not returning anything
We should generally avoid using out parameters except in the rare case where no better options exist.
Type deduction with pointers, references and const
Type deduction, in addition to dropping const
qualifiers, also drops the references:
std::string& get_string_ref();
int main() {
auto ref {get_string_ref()}; // ref's type deduced as string instead of string&
auto& ref2 {get_string_ref()}; // ref2's type is string&
return 0;
}
In terms of const
dropping, type deduction drops the top-level const
qualifiers (and leave the low-level unchanged):
const int x; // top-level
int* const ptr; // top-level
const int& ref; // low-level
const int* ptr; // low-level
const int& ref; // low-level, but if reference is dropped... then becomes top-level
const int* const ptr; // left const is low-level, right const is top-level
// top-level const will be dropped, and low-level unched:
auto z {x};
auto z {ref};
auto z {ptr};
Specifically, for const references we have example below:
#include <string>
const std::string& getConstRef(); // some function that returns a const reference
int main() {
auto ref1{ getConstRef() }; // std::string (reference and top-level const dropped)
const auto ref2{ getConstRef() }; // const std::string (reference dropped, const reapplied)
auto& ref3{ getConstRef() }; // const std::string& (reference reapplied, low-level const not dropped)
const auto& ref4{ getConstRef() }; // const std::string& (reference and const reapplied)
return 0;
}
The best practice is to reapply both reference and const
like in ref4
to make it more explicit, although the reapplication of const
isn’t necessary.
All above works identically with constexpr
.
When it comes to type deduction with pointers, things are different this time - type deduction don’t drop pointers at all:
std::string* get_string_ptr();
int main() {
auto ptr { get_string_ptr() }; // ptr is string*
auto* ptr2 { get_string_ptr() }; // also string* just better explicity
return 0;
}
In the above example, ptr
and ptr2
are effectively the same. However, as a pointer type must resolve to a pointer initializer, they can differ in certain situation, for example:
int main() {
auto ptr {*get_string_ptr()}; // type is string
auto* ptr2 {*get_string_ptr()}; // compile error, as initializer is not a pointer
}
Another difference occurs when we introduce const
pointers:
int main() {
auto const ptr2 {get_string_ptr()}; // string* const (const pointer)
const auto ptr1 {get_string_ptr()}; // string* const (const pointer)
auto* const ptr4 {get_string_ptr()}; // string* const (const pointer)
const auto* ptr3 {get_string_ptr()}; // const string* (pointer to const)
}
A maybe better example:
#include <iostream>
int main() {
std::string s{};
const std::string* const ptr {&s}; // const string* const
auto ptr1{ptr}; // const string* (top-level const droppped)
auto* ptr2{ptr}; // const string*
auto const ptr3{ptr}; // const string* const (top-level const reapplied)
const auto ptr4(ptr); // const string* const
auto* const ptr5{ptr}; // const string* const
const auto* ptr6{ptr}; // const string* (only const auto* is different... const for low-level only)
const auto const ptr7{ptr}; // error: (const auto) = (const string* const) already. applying duplicate const
const auto* const ptr8{ptr}; // const string* const (the right const is for top-level reapplication)
}
Compound Types: Enums and Structs
Unscoped enumerations
Let’s say we want to define a bunch of colors, the most basic way is
int main() {
int red {0};
int green {1};
int blue {2};
int apple_color {red};
int shirt_color {green};
return 0;
}
which is not intuitive and involves magic numbers. We can define the colors as constexpr
instead, to make it better for reading:
constexpr int red {0};
constexpr int green {1};
constexpr int blue {2};
int main() {
int apple_color {red};
int shirt_color {green};
return 0;
}
Or even, to avoid using type int
, we can use type alias:
using Color = int;
constexpr Color red {1};
constexpr Color green {2};
constexpr Color blue {3};
int main() {
Color apple_color {red};
Color shirt_color {green};
}
However, this doesn’t prevent somebody to initialize a Color
variable like
Color something {9};
as Color
is nothing but an alias of int
.
We can use an enumeration (also known as enum) in this case. C++ supports two kinds of enumerations, unscoped and scoped. Here we talk about unscoped enums:
enum Color {
red,
green,
blue, // trailing comma optional but recommended
}; // the enum definition ends with a semicolon, it's just a definition statement
enum Pet {
cat,
dog,
pig,
};
int main() {
Color apple_color {red};
Color shirt_color {green};
Color cup_color {blue};
Color socks_color {white}; // error: white not in Color
Color hat_color {9}; // error: 2 is not in Color
Color some_color {pig}; // error: pig is not in Color
return 0;
}
It’s worth noting that by default the enumerators are automatically assigned corresponding integer values from 0. If we explicitly assign the enumerator to a value, automatic incrementation would apply:
enum Color {
red = -3,
blue, // assigned -2
green, // assigned -1
yellow = 3, // assigned 3
orange = 3, // also assigned 3
black, // assigned 4
};
and the corresponding integral values will be automatically used when e.g. passed to std::cout
. If we want to print the name instead of integer value, we need to explicit tell the compiler to do so:
constexpr std::string_view get_color(Color color) {
switch (color) {
case black:
return "black";
case red:
return "red";
// etc
default:
return "???";
}
}
Even better, we may just directly overload the <<
operator for std::cout
:
#include <iostream>
std::ostream& operator<<(std::ostream& out, Color color) {
switch (color) {
case black:
return out << "black";
case red:
return out << "red";
// etc
default:
return out << "???";
}
}
We can specify the used integral type by an enum if we want to:
#include <cstdint>
enum Color : std::int8_t {
black,
red,
white,
};
Another potentially relevant topic is how we convert integer to enumerator (the other way round is easy):
enum Color {
red,
black,
yellow,
orange,
blue,
green,
};
int main() {
Color color {5}; // compile error: cannot implicit convert
color = 3; // compile error: cannot implicit convert
Color color {static_cast<Color>(3)}; // ok
color = static_cast<Color>(2); // ok
return 0;
}
If we set an explicit base type for the enum, then implicit conversion would work (for brace initialization):
enum Color : int {
red,
black,
yellow,
orange,
blue,
green,
};
int main() {
Color color {5}; // ok for brace initialization
Color color2(2); // compile error
Color color3 = 3; // compile error
color = 2; // compile error
return 0;
}
For the same reason we cannot simply >>
input an enumerator. We can use static_cast
to explicitly convert an inputted integer to an enumerator type, or we can overload the >>
operator:
std::istream& operator>>(std::istream& in, Color& color) {
int input{};
in >> input;
color = static_cast<Color>(input);
return in;
}
Enumerated typed can be useful in production for error codes:
enum FileReadResult {
readResultSuccess,
readResultErrorFileOpen,
readResultErrorFileRead,
readResultErrorFileParse,
};
FileReadResult readFileContents() {
if (!openFile())
return readResultErrorFileOpen;
if (!readFile())
return readResultErrorFileRead;
if (!parseFile())
return readResultErrorFileParse;
return readResultSuccess;
}
int main() {
if (readFileContents() == readResultSuccess)
// do something
else
// print error message
return 0;
}
and for limited parameter values:
enum SortOrder {
alphabetical,
alphabeticalReverse,
numerical,
};
void sort_data(SortOrder order) {
switch (order) {
case alphabetical:
// something
break;
case alphabeticalReverse:
// something
break;
case numeral:
// something
break;
default:
// raise error?
}
}
Note that when the scope is valid, we can use both red
and Color::red
per our preference. This means our namespace is, unfortunately, always polluted by such enums. We can avoid that by wrapping enums inside corresponding namespaces
:
namespace Color {
enum Color {
red,
green,
blue,
};
}
Scoped enumerations (enum class)
Check the following example:
#include <iostream>
int main() {
enum Color {
red,
blue,
};
enum Fruit {
banana,
apple,
};
Color color { red };
Fruit fruit { banana };
if (color == fruit) // The compiler will compare color and fruit as integers
std::cout << "color and fruit are equal\n"; // and find they are equal!
else
std::cout << "color and fruit are not equal\n";
return 0;
}
To solve this problem, we can either rely on users to use explicit Color::red
everywhere and that they won’t compare Color::red
versus Fruit::banana
, or, we can instead use the scoped enumerations (also known as enum classes):
#include <iostream>
int main() {
enum class Color { // "enum class" defines this as a scoped enumeration rather than an unscoped enumeration
red, // red is considered part of Color's scope region
blue,
};
enum class Fruit {
banana, // banana is considered part of Fruit's scope region
apple,
};
Color color { Color::red }; // note: red is not directly accessible, we have to use Color::red
Fruit fruit { Fruit::banana }; // note: banana is not directly accessible, we have to use Fruit::banana
if (color == fruit) // compile error: the compiler doesn't know how to compare different types Color and Fruit
std::cout << "color and fruit are equal\n";
else
std::cout << "color and fruit are not equal\n";
return 0;
}
Notice that here class
is just an “overly” overloaded keyword and doesn’t suggest Color
and Fruit
are of “class type”. They are just scoped enumerator types, and the real class types are struct
, class
and union
.
Another thing that a scoped enum is different from unscoped enum, is that they don’t implicitly convert to integers.
Last but not least, if we find ourselves in a situation where we really have to “import” all the enumerators into a scope, we can always un-scope an enum by using enum
statements:
constexpr std::string_view get_color(Color color) {
using enum Color; // just in this scope, get rid of all Color:: namespaces
switch (color) {
case black: // instead of Color::black
return "black";
case red:
return "red";
// etc
default:
return "???";
}
}
Introduction to structs, members and member selection
Below is an example of a struct definition:
struct Employee {
int id {};
int age {};
double wage {};
};
int main() {
Employee aaron; // not initialized; no error, but we don't know what aaron.wage would be
Employee joe {}; // deafult initialization
joe.id = 0; // updating member variables
joe.age = 32;
joe.wage = 50000.0;
Employee frank {1, 22, 60000.0}; // aggregation initialization
Employee bob {2, 50}; // aggregation initialization w/ partial missing to default; ok
const Employee lucy{}; // const struct; must be initialized
Employee zoe (10, 38, 30000.0); // direct initialization (C++20, not recommended)
return 0;
}
The last type of initialization is only valid after C++20 and is not recommended, as it doesn’t currently work with aggregates that utilize brace elision (notably std::array
).
In the cases where we have a number of members to initialize in order, it might be messy when we all of a sudden introduce a new member:
Old:
struct Foo {
int a {};
int c {};
}
int main() {
Foo foo {1, 2}; // foo.a=1, foo.c=2
return 0;
}
New:
struct Foo {
int a {};
int b {}; // just added
int c {};
}
int main() {
Foo foo {1, 2}; // note now foo.a=1 foo.b=2, foo.c=0!
return 0;
}
To avoid this kind of bug-prone design, we can use designated initializers (available since C++20):
int main() {
Foo foo {.a{1}, .c{3}}; // ok
Foo foo {.a=1, .c=3}; // ok
Foo foo {.b{2}, .a{1}}; // ok
foo = {1, 32, 20}; // ok
foo = {.a=1, .b=32, .c=100}; // ok
return 0;
}
Besides designated initialization, we can also initialize a struct using another:
Foo x = foo; // copy initialization
Foo x(foo); // direct initialization
Foo x {foo}; // list initialization
Default member initialization
When we define a struct (or class) type, we can provide a default initialization value for each member as part of the type definition. This process is called non-static member initialization, and the initialization value is called a default member initializer.
struct Something {
int x;
int y {};
int z {5};
};
int main() {
Something s1; // s1.x is uninitialized (value initialized to 0); s1.y=0; s1.z=2
Something s2 { 5, 6, 7}; // s2.x=5; s2.y=6; s2.z=7
return 0;
}
Passing and returning structs
We can pass structs by const references to avoid making copies:
#include <iostream>
struct Employee {
int id {}
int age {};
double wage {};
};
void print_employee(const Employee& employee) {
std::cout << "id=" << employee.id
<< ", age=" << employee.age
<< ", wage=" << employee.wage
<< '\n';
}
we can also return structs, but this time usually by values, as to not return a dangling reference. Note that thanks to return type deducing, we don’t need to initialize a struct first, return by value and let compiler copy initialize it again - we can just pass the list that compiler may use to initialize the struct and let type deduction work:
Employee get_employee() {
return Employee(1, 23, 10'000);
} // is the same as
Employee get_employee() {
return {1, 23, 10'000};
}
Nested structs
Once we have a struct defined, we can define another that houses it:
// assuming we have `Employee` defined already
struct Company {
int number_of_employees {};
Employee CEO {};
};
int main() {
Company my_company {7, {1, 23, 10'000}}; // nested list initilization
}
We can even wrap the type definition inside the second one, if we believe it’s not necessary to be exposed to global level:
struct Company {
struct Employee {
int id {}
int age {};
double wage {};
};
int number_of_employees {};
Employee CEO {};
};
With this structure, we can now only access the Employee
struct via Company::Employee
.
The size of structs
The size of a struct is NOT equal to the size of all its members, but AT LEAST AT LARGE AS of the sum of all sizes:
#include <iostream>
struct Foo1 {
int a {};
short b {};
short c {};
};
struct Foo2 {
short a {};
int b {};
short c {};
};
int main() {
std::cout << "the size of int is " << sizeof(int) << " bytes\n";
std::cout << "the size of short is " << sizeof(short) << " bytes\n";
std::cout << "the size of Foo1 is " << sizeof(Foo1) << " bytes\n";
std::cout << "the size of Foo2 is " << sizeof(Foo2) << " bytes\n";
return 0;
}
The output would be
the size of int is 4 bytes
the size of short is 2 bytes
the size of Foo1 is 8 bytes
the size of Foo2 is 12 bytes
The size of Foo1
and Foo2
are different merely because of the order of member declarations. That is related to the data structure alignment which is a more advanced topic.
Member selection with pointer and references
We can access the members of a struct via itself (object) using .
, via its reference using .
, and via its pointer using ->
:
// assuming we have `Employee` defined
int main() {
Employee john {1, 23, 10'000};
std::cout << john.id << '\n'; // object.member
const Employee& john2 {john};
std::cout << john2.id << '\n'; // reference.member
const Employee* ptr {&john};
std::cout << ptr->id << '\n'; // pointer->member
return 0;
}
Class templates
template <typename T>
struct Pair {
T x {};
T y {};
};
template <> // tells the compiler we're not using template parameter
struct Pair<int> {
int x {};
int y {};
int n {};
};
We can define function under template along with the struct:
template <typename T>
struct Pair {
T x {};
T y {};
};
template <typename T>
constexpr T max(Pair<T> p) {
return (p.x < p.y ? p.y : p.x);
}
We can use class templates across multiple files, as long as we put the definitions in (and include properly, into source files) the header files:
pair.h:
#ifndef PAIR_H
#define PAIR_H
template <typename T>
struct Pair {
T first {};
T second {};
};
template <typename T>
constexpr T max(Pair<T> p) {
return (p.first < p.second ? p.second : p.first);
}
foo.cpp:
#include "pair.h"
#include <iostream>
void foo() {
Pair<int> p1{1,2};
std::cout << max(p1) << ' is larger\n';
}
main.cpp:
#include "pair.h"
#include <iostream>
void foo(); // forward declaration
int main() {
foo();
Pair<double> p2{1.2, 2.3};
std::cout << max(p2) << "is larger\n";
return 0;
}
Starting from C++17, when instantiating an object from a class template, the compiler can deduce the template types from the types of the object’s initializer, which is called class template argument deduction or CTAD:
#include <utility> // for std::pair
int main() {
std::pair<int, int> p1 {1, 2}; // p1 has explicit type std::pair<int, int>
std::pair p2 {1,2}; // p2 is deduced to be of type std::pair<int, int>
std::pair<> p3 {1,2}; // error: too few template arguments, both arguments not deduced
std::pair<int> p4 {1,2}; // error: too few template arguments, second argument not deduced
}
We can omit the argument type specification for good to utilize this feature, but notice in above example that we cannot omit only a few of the types, nor can we leave empty brackets <>
.
Another thing to keep in mind is that, although C++17 introduced CTAD, it doesn’t work all the time in C++17. In the following example, the program doesn’t compile and throws some error like “class template argument deduction failed” in C++17 (but compiles fine in C++20, because C++20 can automatically generate deduction guides). This is when we need to hint the compiler with some deduction guide:
template <typename T, typename U>
struct Pair {
T first {};
U second {};
};
// without the following deduction guide, we will have a compile error:
template <typename T, typename U>
Pair(T, U) -> Pair<T, U>;
int main() {
Pair p {1,2};
return 0;
}
In addition to deduction guide, we can also provide the default type for type template parameters (so C++17 can fall back to the default types when it cannot decide):
template <typename T=int, typename U=int>
struct Pair {
T first {};
U second {};
};
This is also what happens when we define a variable as just Pair p
without initializer.
Last thing to remember, is that CTAD doesn’t work with non-static member initialization. In other words, it cannot work inside another struct:
struct Foo {
std::pair p {1,2}; // error: CTAD doesn't work here
};
Alias templates
We can create a type alias for a class template where all template arguments are specified:
// assuing we have defined `Pair` struct
int main() {
using Point = Pair<int>;
Point p {1,2};
return 0;
}
Instead of using
explicit templates, we can also created alias template which is an alias regardless of template argument:
// assuming we have defined `Pair` struct
template <typename T>
using Coord = Pair<T>;
Introduction to Classes
Classes vs structs
A class is define as follows
class Employee {
int m_id {};
int m_age {};
double m_wage {};
};
Notice the ending semicolon as class declaration is also considered a statement. It’s apparent that classes are very similar to structs:
struct Employee {
int id {};
int age {};
double wage {};
};
Both class and struct can have member functions, except that a struct’s constructor cannot be user-defined.
The difference lies in the accesibility of their members, which we will cover in the following sections.
Const class objects and const member functions
There are some rules concerning the const-ness of the class object and its members.
-
You cannot modify the members of a const class object.
-
You cannot call a non-const member function of a const class object.
So make the function const if you need to.
-
You can call a const member function of a non-const class object.
-
When passing a class object by const reference, it’s considered a const class object inside the function.
-
You can overload a non-const function with its const version and vice versa
Public and private members and access specifiers
By default, the members of a struct are public and the members of a class are private. That is why in convention it is recommended to name member variables of classes with a prefix m_
suggesting that they’re private members (same logic goes with s_
for static and g_
for global variables).
We can set the access levels via access specifiers:
class Date {
public:
void print() const {
std::cout << m_year << '/' << m_month << '/' << m_day;
}
private:
int m_year {2024};
int m_month {2};
int m_date {28};
};
Together with protected members, we have the full summary of access level within classes: p| Access | Member Access | Derived Class Access | Public Access |
Access | Member Access | Derived Class Access | Public Access |
---|---|---|---|
public | yes | yes | yes |
protected | yes | yes | no |
private | yes | no | no |
where “derived class access” means whether the derived class object can access its parent’s member.
In practice, we’re best to avoid using access specifiers for structs altogether, while keeping all data members of a class private (or protected) if possible. Classes normally have public member functions if they’re intended user interface.
The benefits of encapsulation
There are quite a few benefits in encapsulating data and functions inside a class:
- Class becomes easier to use with less complexity
- Data hiding allows us to maintain invariants
- Easier error detection/handling and debugging!
- Encapsulation makes it possible to change implementation without breaking existing programs
That being said, when a function can totally be implemented as a non-member function, it is always recommended to do so instead of blindly adding to the encapsulation. This makes the class more light-weighted and straightforward.
Introduction to constructors
A constructor is a special member function that is automatically called after a non-aggregate class type object is created.
Constructors must have the same name as the class (with the same capitalization). For template classes, this name excludes the template parameters. Constructors have no return type (not even void). Below is a simple example of a class with constructor defined:
#include <iostream>
class Foo {
private:
int m_x {};
int m_y {};
public:
Foo(int x, int y) { // here's our constructor function that takes two initializers
std::cout << "Foo(" << x << ", " << y << ") constructed\n";
}
void print() const {
std::cout << "Foo(" << m_x << ", " << m_y << ")\n";
}
};
int main() {
Foo foo{ 6, 7 }; // calls Foo(int, int) constructor
foo.print();
return 0;
}
We can list initialize the members of the class inside the constructor:
...
Foo(int x, int y):
m_x {x}, m_y {y} {
std::cout << "Foo(" << x << ", " << y << ") constructed\n";
}
Notice that the member initialization happens in the order of the member variables’ definitions, instead of the list initialization.
Default constructors
A class comes without explicit definition a default constructor:
class Foo {
public:
Foo() {}
};
We can customize the default constructor by modifying the constructor body. However, a class can only have one default constructor, so we cannot define a default constructor without arguments, and define another one with all-default arguments.
In practice, it is recommended to write explicit default constructor instead of leaving the empty body:
class Foo {
public:
Foo() = default; // instead of empty body
};
It’s more than just an asthetic difference: user defined default constructor with empty body won’t zero-initialize the member variables if we forgot to:
class UserDefault {
private:
int m_x; // not initialized
int m_y {};
public:
UserDefault() {}
void print() {
std::cout << m_x << ' ' << m_y << '\n';
}
};
class ExplicitDefault {
private:
int m_x; // not initialized
int m_y {};
public:
ExplicitDefault() = default;
void print() {
std::cout << m_x << ' ' << m_y << '\n';
}
};
int main() {
UserDefault ud{};
ud.print();
ExplicitDefault ed{};
ed.print();
return 0;
}
The fisrt will print 85 0
instead of 0 0
in this case.
Delegating constructors
Constructors are allowed to delegate (transfer responsibility for) initilization to another constructor from the same class type. This process is sometimes called constructor chaining and such constructors are called delegating constructors.
class Employee {
private:
std::string m_name {};
int m_id {};
public:
Employee(std::string_view name)
: Employee(name, 0) {
}
};
We can, however, in this case reduce the constructor by using default arguments:
class Employee {
private:
std::string m_name;
int m_id {};
public:
Employee(std::string_view name, int id = 0)
: m_name {name}
, m_id {id} {
std::cout << "Employee(" << name << ", " << id << ")\n";
}
};
Copy constructors
If you don’t provide one explicitly, C++ will create a public implicit copy constructor for you. By default, a copy constructor does memberwise initializations and nothing else.
If you decide to provide a customized copy constructor (not recommended), you should avoid doing anything other than copying, and remember to always pass a const reference in the parameter.
You can explicitly use the default
specifier just like the default constructor, if you prefer to have an explicit copy constructor.
You can also use delete
to prevent copies and throw compile errors if a copy on the underlying class isn’t desired:
class Fraction {
public:
Fraction(const Fraction& fraction) = delete; // throw compile error if copy
}
There is a concept called “copy ellision” which basically describes an optimization by compiler that turns
Something s {Something (5)};
into
Something s {5};
so that redundant copy is skipped (or rather, elided). Copy ellision was optional before C++17 and became mandatory afterwards.
Explicit constructors
While C++ supports implicit conversion, only one user-defined conversion may be applied each time. This means implicit conversion sometimes has to be converted to (partially, at least) explicit definitions. Meanwhile, sometimes we need to explicity from the user-passed argument and such implicit conversion may cause confusion. To solve such problems, we can specify a constructor to be explicit:
class Dollars {
private:
int m_dollars {};
public:
explicit Dollars(int d)
: m_dollars{d} {
}
};
void print(Dollars d) {
std::cout << '$' << d.getDollars() << '\n';
}
void print(int x) {
std::cout << x << '\n';
}
int main() {
print(5); // prints 5, no confusion/conversion
print(Dollars{5}); // prints $5
}
More on Classes
The hidden this
pointer and member function chaining
Inside every member function, the keyword this
is a const pointer that holds the address of the current implicit object.
class Simple {
private:
int m_id {};
public:
Simple(int id)
: m_id {id}{
}
void setID(int id) {
m_id = id;
}
void print() {
std::cout << m_id << ' ' // this is the same as
<< this->m_id // this with explicit this->
<< '\n';
}
};
As a refresher of pointers, this->m_id
is equivalently (*this).m_id
.
How the compiler rewrites the setID
function is that it turns the function to a static function with this
being its first argument:
static void setID(Simple* const this, int id) {
this->m_id = id;
}
For my fellow Python users, here static
is the same as @staticmethod
in concept. That means the function is nothing but similar to a normal function.
In general, despite being more explicit, it is not recommended to use this->
everywhere. Instead, using the m_
prefix is a much more concise way while doing the same effect.
In addition to using this
to access and set member variables, we can also use this
to chain up consecutive function calls. For example:
class Calc {
private:
int m_value {};
public:
void add(int value) {
m_value += value;
}
void sub(int value) {
m_value -= value;
}
void mul(int value) {
```m_value *= value;
}
int getVaue() {
return m_value;
}
}
The above class requires us to do consecutive calculation in the following manner:
int main() {
Calc calc{};
calc.add(5);
calc.sub(3);
calc.mul(2);
std::cout << calc.getValue() << '\n';
return 0;
}
Instead of above we can do a much concise calculation by using this
class Calc {
private:
int m_value {};
public:
Calc& add(int value) {
m_value += value;
return *this;
}
Calc& sub(int value) {
m_value -= value;
return *this;
}
Calc& mul(int value) {
m_value *= value;
return *this;
}
int getValue() {
return m_value;
}
};
int main() {
Calc calc{};
calc
.add(5)
.sub(3)
.mul(2)
std::cout << calc.getValue() << '\n';
return 0;
}
We can also reset a class object back to its default state by
void reset() {
*this = {};
}
It’s worth noting that this
was added to C++ before reference was a thing, and should it be done today, it would definitely be implemented as a self reference, just like many other moderner languages.
Classes and header files
To address the problem of larger and larger classes, C++ allows us to separate the declaration from class member function definitions. For example,
class Date {
private:
int m_year;
int m_month;
int m_day;
public: // a bunch of declarations only below
Date(int year, int month, int day);
void print() const;
int getYear() const {return m_year};
int getMonth() const {return m_month};
int getDay() const {return m_day};
};
Date::Date(int year, int month, int day)// actual definition
: m_year{year},
m_month{month},
m_day{day} {
}
Even further, we can put declarations and definitions in different files. For example, inside date.h
we can have
#pragma once
class Date {
private:
int m_year {};
int m_month {};
int m_day {};
public:
Date(int year, int month, int day);
void print() const;
int getYear() const {return m_year};
int getMonth() const {return m_month};
int getDay() const {return m_day};
};
and in date.cpp
we have
#include "date.h"
Date::Date(int year, int month, int day)
: m_year {year},
m_month {moth},
m_day {day} {
}
void Date::print() const {
std::cout << "Date(" << m_year <<
", " << m_month <<
", " << m_day << '\n';
}
If you’re concered about the one-definition rule (ODR), rest easy as types are exempt from it and there is no issue to include class definitions into multiple files. Meanwhile, the rule that forbids multiple definitions inside the same file still has its effect and thus we cannot include the same class into the same file multiple times, thus the header guards or #pragma once
.
Nested types
#include <iostream>
enum class FruitType {
apple,
banana,
cherry
};
class Fruit {
private:
using PercentageType = int;
FruitType m_type { };
PercentageType m_percentageEaten { 0 };
public:
Fruit(FruitType type) : m_type { type } {}
FruitType getType() { return m_type; }
PercentageType getPercentageEaten() { return m_percentageEaten; }
bool isCherry() { return m_type == FruitType::cherry; }
};
int main() {
Fruit apple { FruitType::apple };
if (apple.getType() == FruitType::apple)
std::cout << "I am an apple";
else
std::cout << "I am not an apple";
return 0;
}
Note how we define the type alias inside the private chunk to be used in public etc.
Another interesting feature of nested classes is that they can access the private members of their parent class, as long as they’re defined inside the parent class:
class Employee {
private:
std::string m_name;
int m_id;
public:
class Printer {
public:
void print(const Employee& e) {
std::cout << e.m_name << '\n';
}
};
};
Notice how the Printer
class can access the private members of Employee
(although this
is not implicitly accessible).
Introduction to destructors
There are several rules about destructors:
- The destructor must have the same name as the class, preceded by a tilde (~)
- The destructor can not take arguments
- The destructor has no return type
The following is an example of a destructor:
class Simple {
private:
int m_id {};
public:
Simple(int d)
: m_id {id} {
}
~Simple() {
std::cout << "Destructing Simple " << m_id << '\n';
}
};
If a non-aggregate class type object has no user-declared desturctor this compiler will generate a destructor with an empty body. The destructor is called an implicit destructorm and it is effectively just a placeholder.
It’s important to note that when std::exit()
is called, the destructors won’t be triggered and thus no cleanup work shall happen (which is bad).
Class template with member functions
Let’s combine type template and class template:
template <typename T>
class Pair {
private:
T m_first {};
T m_second {};
public:
Pair(const T& first, const T& second)
: m_first {first},
m_second {second} {
}
bool isEqual(const Pair<T>& pair);
};
template <typename T>
bool Pair<T>::isEqual(const Pair<T>& pair) {
return m_first == pair.m_first && m_second == pair.m_second;
}
int main() {
Pair p1 { 5, 6 };
std::cout << std::boolalpha << "isEqual(5,6): "
<< p1.isEqual(Pair{5,6}) << '\n';
std::cuut << std::boolalpha << "isEqual(5,7): "
<< p1.isEqual(Pair(5,6)) << '\n';
return 0;
}
Static member variables
Static member variables are shared by all objects of the class.
struct Something {
static int s_value;
};
int main() {
Something first {};
Something second {};
first.s_value = 2;
std::cout << first.s_value << ' ' << second.s_value << '\n';
return 0;
}
It’s worth noting that static member variables are not associated with the objects at all, which actually makes sense, since they exist even before a class gets instantiated. In order to access the member, we can just use Something::s_value
:
class Something {
public:
static int s_value;
};
int Something::s_value {1};
int main() {
Something::s_value = 2;
std::cout << Something::s_value << '\n';
return 0;
}
See, no object involved in this example.
Static member variables cannot be initialized at the same time as definition, hence the extra line outside the class definition. There are two exceptions, though:
- A const variable can be instantiated static
- An inline variable can be instantiated static (this is the preferred way if we have a non-const variable and want it static)
Static member functions
Note that static member variables are not accessible via class if they’re private. To solve this problem, we need static member functions:
class Something {
private:
static inline int s_value {1};
public:
static int getValue() {
return s_value;
}
};
int main() {
std::cout << Something::s_value << '\n'; // compile error
std::cout << Something::getValue() << '\n'; // ok
return 0;
}
Is also worth noting (and quite natural) that a static function has no access to the this
pointer, since no object must be involved. Also, C++ doesn’t support static constructors (which quite makes sense too).
Friend non-member functions
When we want a non-member function to ba able to access the private variables, we need friendship. Inside the body of a class, a friend declaration using the friend
keyword can be used to tell the compiler that some other class or function is now a friend, which in C++ means they have been granted access to the private and protected members of the said class.
class Accumulator {
private:
int m_value {};
public:
void add(int value) {
m_value += value;
}
friend void print(const Accumulator& accumulator);
};
void print(const Accumulator& accumulator) {
std::cout << accumulator.m_value;
}
int main() {
Accumulator acc {};
acc.add(5);
print(acc);
return 0;
}
Instead of declaring a friend non-member function and define it outside the class, as can also just define the friend function inside the class:
class Accumulator {
private:
int m_value {};
public:
void add(int value) {
m_value += value;
}
friend void print(const Accumulator& accumulator) {
std::cout << accumulator.m_value;
}
};
A function can be friend of multiple classes.
Friend classes and friend member functions
Just like friend non-member functions, we can define friend classes and member functions.
class Storage {
private:
int m_value {};
public:
Storage(int value) : m_value{value} {}
friend class Display;
};
class Display {
private:
bool m_display {};
public:
Display(bool display) : m_display {display} {}
void displayStorage(const Storage& storage) {
if (m_display) {
std::cout << storage.m_value << '\n';
}
}
};
which declares the class Display
to be a friend of Storage
and thus make the prior to be able to access the private members of the latter. Instead of doing that for the whole class, we can also declare only the function as the friend of Storage
. It’s worth noting that in order to avoid a compile error, we need to have Storage
declared before Display
. If that’s not possible for the full class, we can at least do forward declaration and just write this on the top:
class Storage;
Ref qualifier
Let’s say we have the following member function tht returns a const reference:
auto& getName() const { return m_name; }
and want to avoid the potential problem of dangling references. We could ref-qualify this function by adding a &
qualifier to the overload that will match only lvalue implicit objects, and a &&
qualifier to the overload that will match only rvalue implicit objects:
const auto& getName() const & {return m_name;} // & for lvalue implicit (return by reference)
auto getName() const && {return m_name;} // && for rvalue implicit (return by value)
With the above overloading, we can safely run the below program without worrying about dangling references:
int main() {
Employee joe{};
joe.setName("Joe");
std::cout << joe.getName() << '\n'; // Joe is an lvalue, so now returned by reference
std::cout << createEmployee("Frank").getName() << '\n'; // Frank is an rvalue, so now returned by value
return 0;
}
Two things to note about ref-qualifying:
- The non-ref-qualified overloads will conflict with ref-qualified overloads, so you must use one or the other
- If only an lvalue-qualified overload is provided (post single-
&
, no rvalue overload defined), then any call to the function with an rvalue implicit object will result in a compilation error – this actually provides a great way to detect and prevent the use of rvalue implicit objects at all.
Dynamic Arrays: std::vector
Introduction to containers and arrays
The following types are containers under the general programming definition, but are not considered to be containers by the C++ standard:
- C-style arrays
std::string
std::vector<bool>
To be a container in C++, the container must implement all of the requirements here. That being said, since std::string
and std::vector<bool>
implement most of the requirements, they behave like containers in most circumstances and are sometimes called “pseudo-containers”.
Of the provided container classes, std::vector
and std::array
see the most use and will be where we focus the bulk of our attention.
An array is a container data type that stores a sequence of values contiguously (meaning each element is placed in an adjacent memory location with no gaps). Arrays allow fast, direct access to any element and are conceptually simple and easy to use.
There are three types of primary array types in C++: C-style arrays, std::vector
container class and std::array
container class. C-style arrays were inherited from the C language. For backwards compatibility, these arrays are defined as part of the core C++ language (much like the fundamental data types). They are also sometimes called “naked arrays”, “fixed-sized arrays” or “built-in arrays”. To help make arrays safer and easier to use in C++, the std::vector
container class was introduced in C++03 and is the most flexible of the three array types, and has a bunch of useful capabilities that the other array types don’t. Finally, the std::array
container class was introduced in C++11 as a direct replacement for C-style arrays. It is more limited than std::vector
but can also be more efficient, especially for smaller arrays.
Introduction to std::vector
and list constructors
The std::vector
container is defined in the <vector>
header as a class template with a template type parameter that defines the type of the elements. For example:
#include <vector>
int main() {
std::vector<int> empty{};
return 0;
}
You can initialie the std::vector
with a list of values also:
std::vector<int> primes {2, 3, 5, 7, 11};
std::vector vowels {'a', 'e', 'i', 'o', 'u'}; // type char deduced by CTAD
The above is called a list constructor and is provided often by container classes. The list constructor does three things:
- Ensures the container has enough storage to hold all the initialization values
- Sets the length of the container to the number of elements in the initializer list
- Initializes the elements to the values in the initializer list (in sequential order)
We can access the elements like the following:
int pp = primes[0] + primes[1];
Notice operator[]
does not do any bounds checking and thus passing an invalid index (negative / greater than or equal to the length of the array) will result in undefined behavior.
One of the defining characteristics of arrays is that the elements are always allocated contiguously in memory, meaning the elements are all adjacent in memory (with no gaps between them):
#include <iostream>
#include <vector>
int main() {
std::vector primes {2, 3, 5, 7, 11};
std::cout << "An int is " << sizeof(int) << "bytes\n";
std::cout << &(primes[0]) << '\n';
std::cout << &(primes[1]) << '\n';
std::cout << &(primes[2]) << '\n';
return 0;
}
This produces the following output:
An int is 4 bytes
00DBF720
00DBF724
00DBF728
We can construct a vector of specific length to avoid the following kind of work:
std::vector<int> data = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; // ten zeros
by using the explicit constructor explicit std::vector<T>(int)
:
std::vector<int> data(10); // vector containing ten int elements, value-initialized to zero
Notice the above initialization uses direct initialization with parentheses instead of braces – there is apparent ambiguity from using braces in this case since both list constructor and the explicit constructor can take this form. The reality is, C++ has a special rule to select a matching list constructor over other constructors (but we should always just avoid such confusion).
Lastly, the following is a list of examples of different ways to initialize a vector:
std::vector<int> v1 = 10; // compile error
std::vector<int> v2(10); // explicit constructor; ten zeros
std::vector<int> v3{10}; // list constructor; just single zero
std::vector<int> v4 = {10}; // copy constructor; single zero
std::vector<int> v5({10}); // copy constructor; single zero
The unsigned length and subscript problem
Before everything, let’s talk about history.
As Bjarne Stroustrup recalls, when the container classes in the C++ standard library was being designed (circa 1997), the designers had to choose whether to make the length (and array subscripts). They chose to make them unsigned as lengths cannot be nagative naturally, and unsigned type allows greater maximum (important back in the 16-bit days).
In retrospect, this is generally regarded as having been the wrong choice, because negative signed integer will be implicitly converted to a large unsigned integer producing a garbage result, and the commonly used operator[]
doesn’t do range-checking at all.
Above being said, we can ask a container for its length using size()
member function (which returns the length as unsigned size_type
):
#include <iostream>
#include <vector>
int main() {
std::vector prime{2, 3, 5, 7, 11};
std::cout << "length: " << prime.size() << '\n';
return 0;
}
Unlike std::string
and std::string_view
which have both length()
and size()
member functions (that do the same thing), std::vector
(together with most other container types) only has size()
. Starting from C++17, we can also use std::size
non-member function to do the same thing:
std::size(prime)
Then, starting from C++20, we have std::ssize()
to get the length as a large signed integral type (usually std::ptrdiff_t
) which is the type normally used as the signed counterpart to std::size_t
. This is the only function of the three that returns the length as a signd type. When saving the returned value to an integral type, you can either use int
(but with static_cast
to avoid implicit narrowing conversion warning/error). Or, you can use auto
to let the compiler to deduce the type used for the variable.
On the other hand, while operator[]
does no bounds checking, we have at()
member function that does runtime bounds checking:
prime.at(5); // throws exception of type std::out_of_range
Alternatively, we can also define the index as a constexpr int
or std::size_t
to avoid implicit narrowing conversion (but alas, no bounds checking).
Passing std::vector
An object of type std::vector
can be passed to a function just like any other object. That means if we pass it by value, an expensive copy will be made, hence we should typically pass std::vector
by (const) reference to avoid such copies. While doing so, we may use CTAD to omit the template parameter for better compatibility in function definition.
Returning std::vector
, and an introduction to move semantics
Check the following example:
#include <iostream>
#include <vector>
int main() {
std::vector arr1 {1, 2, 3, 4, 5};
std::vector arr2 { arr1 }; // copy of arr1
arr1[0] = 6;
arr2[0] = 7;
std::cout << arr1[0] << arr2[0] << '\n';
return 0;
}
The above prints 6 and 7 because we know arr2
is nothing but a copy of arr1
. The term copy semantics refers t othe rules that determine how copies of objects are made. When we say a type supports copy semantics, we mean that objects of that type are copyable, because the rules for making such copies have been defined. When we say copy semantics are being invoked, that means we’ve done something that will make a copy of an object.
For class types, copy semantics are tytically implemented via the copy constructor (and copy assigment operator).
Let’s now check another example:
#include <iostream>
#include <vector>
std::vector<int> generate() {
std::vector arr1 {1, 2, 3, 4, 5};
return arr1;
}
int main() {
std::vector arr2 { generate() }; // the return value of generate() dies after this line
arr2[0] = 7; // nothing to do with arr1
std::cout << arr2[0] << '\n';
return 0;
}
When arr2
is initialized this time, it is being initialized using a temporary object returned from function generate()
and unlike the prior case, here the rvalue is destroyed right after and copying is very pointlessly costly. That is why we need to introduce the move semantics, which refers to the rules that determine how the data from one object is moved to another object. When the move semantics is invoked, any data member that can be moved is moved, and any data member that can’t be moved is copied. Normally, when an object is being initialized with or assigned an object of the same type, copy semantics will be used (assuming copy isn’t elided by user). However when all of the following conditions are true, the move semantics will be invoked instead:
- The type of the object supports move semantics
- The initializer or object being assigned from is an rvalue (temporary) object
- The move isn’t elided
The sad news is that not many types support move semantics – however, std::vector
and std::string
both do. That means we can return move-capable types like std__vector
by value just fine. That being said, we should still pass these move-capable objects by const references because move semantics won’t be invoked when we pass these by values, as they’ll be lvalues at the time. So in short: pass by const references, return by values.
Arrays and loops
Arrays provide a way to store multiple objects without having to name each element. Loops provide a way to traverse an array without having to explicitly list each element. Templates provide a way to parameterize the element type. Together, they allow us to write code that can operate on a container of elements, regardless of the element type or number of elements in the container.
#include <iostream>
#include <vector>
template <typename T>
T calculateAverages(const std::vector<T>& arr) {
std::size_t length {arr.size()};
T average{0};
for (std::size_t i{0}; i < length; ++i) {
average += arr[i];
}
average /= static_cast<int>(length);
return average;
}
It’s good to remember (yes) how the simple for loop is written: initialized zero, and terminate when i < length
.
Arrays, loops and sign challenge solutions
We can use int
as the type of loop index (and it’s preferred, in fact). If you’re lazy, you can use auto
which will deduce the type for you. In C++23, you can even att the Z
suffix to define a literal to be signed:
for (auto i{0Z}; i < static_cast<std::ptrdiff_t>(arr.size()); ++i)
or utilizing std::ssize()
introduced in C++20:
for (auto i{0Z}; i < std::ssize(arr); ++i)
Notice that when we use these index variables, since they’re now signed, they will throw warnings when being narrowing converted to unsigned inside operator[]
. We can use a number of ways to avoid such warnings (and no, we’re not gonna introduce all of them here cuz they’re annoyingly ugly imo) but the preferred solution is actually surprisingly simple: we don’t index arrays altogether. In fact, C++ provides several other methods for traversing through arrays that do not use indices at all, and if we don’t have indices, we don’t worry about these signed/unsigned issues. Two most common methods are range-based for loops, and interators.
Range-based for loops
The range-based for statement has the following syntax:
for (element_declaration : array_object)
statement;
For example:
#include <iostream>
#include <vector>
int main() {
std::vector fibonacci { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 };
for (int num : fibonacci)
std::cout << num << ' ';
std::cout << '\n';
return 0;
}
We can even use auto
for type deduction:
for (auto num: fibonacci)
std::cout << num << ' ';
For classes with expensive copying e.g. strings, we may also want to avoid copying altogether in the range-based loops by using const references:
#include <iostream>
#include <string>
#include <vector>
int main() {
using namespace std::literals; // for s suffix as string literals
std::vector words {"peter"s, "likes"s, "frozen"s, "yogurt"s};
for (const auto& word : words)
std::cout << word << ' ';
std::cout << '\n';
return 0;
}
It’s important to notice that range-based loop doesn’t give you the index automatically, so you might want to keep your own index counter variable if needed (or just use the classic for loop).
Last but not least, if you want to do range-based loop in reverse:
#include <iostream>
#include <ranges> // C++20
#include <string_view>
#include <vector>
int main() {
using namespace std::literals; // for sv suffix as string_view
std::vector words {"alex"sv, "bobby"sv, "chad"sv, "dave"sv};
for (const auto& word : std::views::reverse(words)) // from <ranges>
std::cout << word << ' ';
std::cout << '\n';
return 0;
}
Array indexing and length using enumerators
One of the bigger documentation problems with arrays is that their indices don’t normally provide any real information. For example:
#include <vector>
int main() {
std::vector testScores {78, 79, 71, 56, 55};
testScores[2] = 77; // whose score is it?
return 0;
}
We can use unscoped enumerators instead:
namespace Students {
eunm Names {
Sam,
Allen,
Billy,
Wonka,
Zoe
};
}
int main() {
std::vector testScores {78, 79, 71, 56, 55};
testScores[Students::Allen] = 78;
return 0;
}
Because enumerators are implicitly converted to std::size_t
, we don’t need to worry about them being used indices. Meanwhile, as they’re constexpr, we don’t need to worry about signed/unsigned indexing problem either. Cool.
When we define a non-constexpr variable of the enumeration type, that would break the last statement above, thus causing a sign conversion warning. We can be more explicit if we really need that variable:
namespace Students {
enum Names : unsigned int {
Sam,
Allen,
Billy,
Wonka,
Zoe
};
}
To avoid the size of the array being shorter than the enumeration, we can add an extra enumerator in the end for array initialization:
namespace Students {
enum Names {
Sam,
Allen,
Billy,
Wonka,
Zoe,
num_students
};
}
int main() {
std::vector<int> testScores(Students::num_students); // properly initialized
...
}
or assert using this extra enumerator:
#include <cassert>
int main() {
std::vector testScores { 78, 79, 71, 56, 55 };
assert(std::size(testScores) == Students::num_students);
return 0;
}
std::vector
resize and capacity
You can resize a std::vector
at runtime:
#include <iostream>
#include <vector>
int main() {
std::vector v{0, 1, 2};
std::cout << "the length of v is " << v.size() << '\n';
v.resize(5)
std::cout << "the length of v is " << v.size() << '\n';
for (auto i : v)
std::cout << i << ' ';
std::cout << '\n';
return 0;
}
This prints:
The length is: 3
The length is: 5
0 1 2 0 0
Another important concept is the capacity of std::vector
. This is the actual units of memory allocated to the array. When we resize an array down to some smaller value, the memory does not get reallocated right away and thus making capacity lagging behind the size. The indexing is based on the size instead of capacity. In order to force the reallocation, we have shrink_to_fit()
member function.
std::vector
and stack behavior
In programming, a stack is a container data type where the insertion and removal of elements occurs in a LIFO (last-in-first-out) manner. This is commonly implemented via two operations named push and pop.
In C++, stack-like operations were added (as member functions) to the existing standard library container classes that support efficient insertion and removal of elements at one end (std::vector
, std::deque
and std::list
). This allows any of these containers to be used as stacks in addition to their native capabilities.
For std::vector
, we have the following member functions to support stack-like behavior:
push_back()
: insert new element on top of stackpop_back()
: remove the top element from stackback()
: get the top element on the stackemplace_back()
: alternate form ofpush_back()
that is more efficient
Here emplace_back
is mostly the same as push_back
except when we’re creating a temporary variable to be pushed into a stack, emplace_back
skips the potentially expensive copying and just passes the bare arguments to the constructor:
class Foo {
private:
std::string m_a{};
int m_b{};
public:
Foo(std::string_view a, int b)
: m_a{a}, m_b{b} {
}
explicit Foo(int b)
: m_a {}, m_b {b} {
}
};
int main() {
std::vector<Foo> stack{};
Foo f{"a", 2};
stack.push_back(f); // preferred
stack.emplace_back(f); // works too
stack.push_back({"a", 2}); // create and then copy
stack.emplace_back("a", 2); // create, no copy
stack.push_back({ 2 }); // compile error: Foo(int) is explicit
stack.emplace_back(2); // ok
}
Unlike the subscript operator (operator[]
and at()
), here push_back
and pop_back
(and emplace_back
) will actively change the size of the array (but pop_back
is still lazy on capacity).
In order to limit the activeness in reallocating capacities, we can use the reserve()
member function, which reallocates a std::vector
without changing its current size.
int main() {
std::vector<int> stack{};
stack.reserve(6); // size: 0, capacity: 6
stack.push_back(2); // size: 1, capccity: 6
stack.push_back(1); // size: 2, capccity: 6
stack.push_back(4); // size: 3, capccity: 6
stack.push_back(7); // size: 4, capccity: 6
stack.push_back(8); // size: 5, capccity: 6
stack.push_back(9); // size: 6, capccity: 6
stack.pop_back(); // size: 5, capccity: 6
}
std::vector<bool>
vs std::bitset
std::vector<bool>
works similarly to std::bitset
with maybe better space efficiency. However, the sizeof(std::vector<bool>)
has a 40-byte overhead so the saved memory won’t mean much unles we’re allocating more than 40 boolean values. Meanwhile, the space optimization depends on the implementation a lot and thus is not always trustworthy. Lastly, we should note that std::vector<bool>
is actually not a vector (not contiguous in memory) nor does it hold real boolean values, nor does it meet the C++ standard of containers. In short, we should maybe just avoid using std::vector<bool>
altogether.
Instead, three alternative containers are recommended:
- Use (constexpr)
std::bitset
when the number of bits needed is known at compile-time - Use
std::vector<char>
when a resizable container is needed - Use third-party implementations for dynamic bitsets e.g.
boost:dynamic_bitset
when bit operations are needed
Fixed-size Arrays: std::array
and C-style Arrays
Introduction to std::array
There are two reasons why we still need std::array
when we’ve already got std::vector
:
std::vector
is slightly less performance than the fixed-sizedstd::array
std::vector
only supports constexpr in very limited contexts
and the second reason makes the majority of the it. In short, whenever we need constexpr arrays, we should use std::array
over std::vector
.
In order to declare a std::array
:
#include <array>
int main() {
std::array<int, 5> a{};
}
The size of a std::array
must be a constexpr upon declaration, e.g. a integer literal, a constexpr variable or an unscoped enumerator.
The initialization of a std::array
does not involve a constructor, because it’s not a class but an aggregate. It means if a std::array
is defined without an initializer, the elements will be default initialized.
std::array<int, 5> a{1,2,3,4,5}; // list initialization
std::array<int, 5> b = {1,2,3,4,5}; // copy initialization
std::array<int, 5> c; // uninitialized!
std::array<int, 5> d{}; // zero initialized; ok
std::vector<int> v(5); // vector can be value initialized
We can define a std::array
as const or constexpr (latter preferred all the time, otherwise ask yourself if you really need a std::array
rather than a std::vector
):
const std::array<int, 5> a{1,2,3,4,5};
constexpr std::array<int, 5> b{1,2,3,4,5};
Starting from C++17, we can use CTAD for type deduction (and are recommended to use it as much as possible):
constexpr std::array a {1,2,3,4,5}; // <int, 5> deducted
constexpr std::array<int> b{1,2,3,4,5}; // error: cannot partially omit template parameter!
The most common way to access the elements of a std::array
is through subscripts/indices (via operator[]
). There is also an at()
member function that unfortunately doesn’t do bound checking at compile time (it does at runtime) either, and thus is not recommended.
std::array
length and indexing
There are three common ways to ge the length of a std::array
:
- using the
size()
member function - using the
std::size()
function for unsigned length (since C++17) - using the
std::ssize()
function for signed length (since C++20)
All above methods returns the length in constexpr even when the std::array
itself is not constexpr.
In addition to operator[]
(does not bound checking) and at()
(runtime bound checking), std::array
also supports std::get()
non-member function that does compile time bound checking:
#include <array>
#include <iostream>
int main() {
constexpr std::array prime{2, 3, 5, 7, 11};
std::cout << std::get<3>(prime); // prints 7
std::cout << std::get<9>(prime); // compile error
return 0;
}
Passing and returning std::array
CTAD doesn’t work for function parameters, so when passing std::array
to a function, we need to explicitly specify the element type and array length. If we really want the function to be able to accept general array type, we can create a function template that parameterizes both the element type and array length on the function level.
Before indexing the array inside the function, we need to statically assert the length versus the indices of interest, there are two ways:
- use
std::get<i>(array)
to potentially raise a compile error - use
static_assert(N > i)
to expllicitly validate the precondition
In terms of returning the std::array
, there are three options:
- returning by value, if:
- the array isn’t huge
- the element type is cheap to copy or move
- the code isn’t being used in a performance-sensitive context
- returning via an out parameter, if above doesn’t satisfy
- returning a
std::vector
since it’s less expensive if copying is involved
std::array
of class types, and brace elision
We can define a std::array
of class types and assign elements like follows:
#include <array>
#include <iostream>
struct House {
int number {};
int stories {};
int roomsPerStory {};
};
int main() {
std::array<House, 3> houses {};
houses[0] = {12, 1, 7};
houses[1] = {11, 2, 5};
houses[2] = {10, 2, 3};
for (const auto& house : houses) {
std::cout << "House number " << house.number
<< " has " << (house.stories * house.roomsPerStory)
<< " rooms.\n";
}
return 0;
}
Alternatively, we can also explicitly initialize a std::array
of structs:
constexpr std::array houses {
House{12, 1, 7},
House{11, 2, 5},
House{10, 2, 3}
};
However, do remember that the following way of initialization does not work:
constexpr std::array<House, 3> houses {
{12, 1, 7},
{11, 2, 5},
{10, 2, 3}
};
Instead, a compile error will be thrown because the std::array
implementation is nothing but a struct with a C-style array as its first member, thus the second and third pair of braces are not recognized. To remedy this problem, we can add an additional set of braces:
constexpr std::array<House, 3> houses {
{
{12, 1, 7},
{11, 2, 5},
{10, 2, 3}
}
};
In fact, you can always add this extra set of braces in the initialization of std::array
, and that is totally legal in C++. Truth is aggregates in C++ supports a concept called brace elision that basically says if it’s not totally necessary, you can elide the outer pair of braces just fine.
Arrays of references via std::reference_wrapper
Because references are not objects, you cannot make an array of references. The elements of an array must be assignable, and references simply can’t be reseated.
int x{1};
int y{2};
[[maybe_unused]] std::array<int&, 2> refarr{ x, y }; // compile error
int& rx{x};
int& ry{y};
[[maybe_unused]] std::array valarr{rx, ry}; // ok, but this is just <int, 2>
If we do want an array of references, there is a workaround by using std::reference_wrapper
which lives in the <functional>
header and takes a type template parameter T
and then behaves like a modifiable lvalue reference to T
. Several things to pay attention:
operator=
reseats astd::reference
std::reference_wrapper<T>
will implicitly convert toT&
- the
get()
member function can be used to get aT&
which we can then use to update the value of the object being referenced
Here’s a simple example:
#include <array>
#include <functional>
#include <iostream>
int main() {
int x{1};
int y{2};
int z{3};
std::array<std::reference_wrapper<int>, 3> arr{x, y, z};
arr[1].get() = 5; // you can modify the object being referenced
std::cout << arr[1] << y << '\n'; // prints 55 because y has been modified
return 0;
}
Prior to C++17, CTAD didn’t exist, so explicit type must be specified. Therefore, in order to make things easier, introduced were std::ref()
and std::cref()
functions that basically served as shorthands for std::reference_wrapper
:
int x{1};
std::reference_wrapper<int> rx{x}; // C++11 explicit
auto rx2{std::reference_wrapper<int>{x}} // C++11 explicit
auto ref{std::ref(x)}; // C++11 shorthand for std::reference_wrapper<int>
auto cref{std::cref(x)}; // C++11 shorthand for std::reference_wrapper<const int>
std::reference_wrapper rx3{x}; // C++17 with CTAD
auto rx4{std::reference_wrapper{x}}; // C++17 with CTAD
Introduction to C-style arrays
Being part of the core language, C-style arrays have their own special declaration syntax. In an C-style array declaration, we use square brackets []
to tell the compiler that a declared object is a C-style array. Inside the brackets, we can optionally provide the length of the array, which is an integer value of type std::size_t
that tells the compiler how many elements are in the array.
int main() {
int testScores[30] {};
return 0;
}
The length of a C-style array must be at least 1, or the compiler will throw an error. The length of the array must be a constexpr. Also, in contrast to e.g. std::vector
, the indices here doesn’t need to be unsigned.
There are several different ways to initialize a C-style array:
int fibonacci[5] = {1, 1, 2, 3, 5}; // copy-list initialization
int prime[5] {2, 3, 5, 7, 11}; // list initialized (preferred)
int prime[5] {2, 3, 5, 7, 11, 13}; // compile for too many initializers
auto prime[5] {2, 3, 5, 7, 11}; // compile error for CTAD doesn't work for C-style arrays
int prime[] {2, 3, 5, 7, 11}; // length deduced (preferred)
int prime[] {}; // compile error for zero-lengthed array
int arr[5]; // default initialized with elements uninitialized
int arr[5] {}; // value initialized with elements zero initialized (preferred)
As for getting the size of a C-style array, we can use the sizeof()
operator or std::size()
and std::ssize()
non-member functions :
const int prime[] { 2, 3, 5, 7, 11 };
sizeof(prime); // returns 20 (assuming 4 bytes each int)
std::size(prime); // C++17, returns unsigned int 5
std::ssize(prime); // C++20, returns signed int 5
For C++14 or older versions, we can also use the following custom function to get the length of an array:
#include <iostream>
template <typename T, std::size_t N>
constexpr std::size_t length(const T(&)[N]) noexcept {
return N;
}
int main() {
int prime[5] {2, 3, 5, 7, 11};
std::cout << "length: " << length(prime) << '\n';
return 0;
}
C-style array decay
In most cases, when a C-style array is used in an expression, the array will be implicitly converted into a pointer to the element type, initialized with the address of the first element (with index 0). Colloquially, this is called array decay.
#include <iomanip> // for std::boolalpha
#include <iostream>
int main() {
int arr[5] {1, 2, 3, 4, 5};
auto ptr {arr}; // ptr is of type int*
std::cout << std::boolalpha << (typeid(ptr) == typeid(int*)) << '\n'; // prints true
std::cout << std::boolalpha << (&arr[0] == ptr) << '\n'; // prints true
return 0;
}
There are only a few cases in C++ where a C-style array doesn’t decay:
- when used as an argument to
sizeof()
ortypeid()
- when taking the address of the array using
operator&
- when passed as a memer of a class type
- when passed by reference
A decayed array pointer does not know the length of itself and thus the term “decay”. This decay behavior actually solves the problem of passing a huge array as argument – when passed as decayed array, it’s actually the pointer that holds the address of the first element of the array that gets passed. Therefore, no copy is made in the process.
void printElementZero(const int* arr) {
std::cout << arr[0];
} // same as below
void printElementZero(const int arr[]) {
std::cout << arr[0];
}
The problem with array decay is, as the length information is lost during the process, the following function won’t work correctly:
void printArraySize(const int arr[]) {
std::cout << sizeof(arr) << '\n'; // prints 4 no matter what
}
Fortunately, C++17’s better replacement std::size()
(and C++20’s std::ssize()
) won’t compile in this case:
void printArraySize(const int arr[]) {
std::cout << std::ssize(arr) << '\n'; // compile error: std::size() doesn't work on pointers
}
In general, we can just avoid using the good O’ C-style arrays nowadays.
Pointer arithmetic and subscripting
Pointer arithmetic is a feature that allows us to apply certain integer arithmetic operators (addition, subtraction, increment, or decrement) to a pointer to produce a new memory address.
We can subscript a pointer holding the address of an array’s first element to get other elements of the array:
#include <iostream>
int main() {
const int arr[] {1,2,3,4,5};
const int* ptr{arr};
std::cout << ptr[2] << '\n'; // prints 3
std::cout << *(ptr + 2) << '\n'; // prints 3
return 0;
}
Following this trick we can traverse through an array using pointers:
#include <iostream>
int main() {
constexpr int arr[] = {1,2,3,4,5};
const int* begin{arr};
const int* end{arr + std::size(arr)};
for (int* i{begin}; i != end; ++i) { // terminate when i == end
std::cout << *i << ' ';
}
std::cout << '\n';
return 0;
}
Not surprisingly, the range-based for loops over C-style arrays are exactly implemented using pointer arithmetic:
for (auto i : arr) {
std::cout << i << ' ';
}
std::cout << '\n';
C-style strings
Although C-style strings have fallen out of favor in modern C++ for being hard to use and dangerous comparing with std::string
and std::string_view
, we’re going through the basics of them here. To define a classic C-style string vairable, we simply declare a C-style array variable of char
(or const char
/ constexpr char
) type:
char str[8]{};
const char str[]{"this is a string"};
constexpr char str[]{"hello world"};
Remember that there is an extra character for the implicit null terminator. For this particular reason, it’s highly recommended to omit the length upon declaration and the let the compiler calculate the length for you.
To print a C-style string, we can simply std::cout
it because the output streams (e.g. std::cout
) make some assumptions about your intent (address for non-cahr pointer, whole string for char pointers):
#include <iostream>
void print(char ptr[]) {
std::cout << ptr << '\n';
}
int main() {
char str[] {"this is fucked up"};
print(str);
return 0;
}
Since the output streams always prints the whole string when the underlying is a char*
, weird things like the following can happen:
#include <iostream>
int main() {
char c{'Q'}; // just regular char
std::cout << &c << '\n'; // trying to print the address of a char
std::cout << static_cast<const void*>(&c) << '\n'; // this would work as expected
return 0;
}
The output is
Q
0x16d64b08b
but the first line can be literally anything, like Q╠╠╠╠╜╡4;¿■A
– that’s what undefined behavior means.
To read a C-style string:
#include <iostream>
#include <iterator>
int main() {
char str[255]{}; // an array large enough to hold 254 characters + null terminator
std::cout << "Enter your string: ";
std::cin.getline(str, std::size(str));
std::cout << "You entered: " << str << '\n';
return 0;
}
To modify a C-style string, you can only assign values to each elements one by one:
char str[]{"what?"};
str[2] = "o";
To get the length of a C-style string, the previously mentioned functions don’t work:
char str[255]{"string"}; // 254 available characters + null terminator; using 6+1 only
std::size(str); // returns 255
char* ptr{str}; // decayed C-style string
std::size(ptr); // compile error
Luckily, we can use the strlen()
function in the <cstring>
header:
#include <cstring>
#include <iostream>
int main() {
char str[255] {"string"};
std::cout << std::strlen(str) << '\n'; // prints 6
char* ptr {str};
std::cout << std::strlen(ptr) << '\n'; // prints 6
return 0;
}
Some other C-style string manipulating functions:
strcpy
,strncpy
,strcpy_s
: overwrites one C-style string with anotherstrcat
,strncat
: append one C-style string to the end of anotherstrcmp
,strncmp
: compare two C-style strings (returns 0 if equal)
C-style string symbolic constants
Although seemingly producing the same strings, C++ deals with the memory allocation differently in the following two ways to define a string symbolic constant:
const char name[] {"Allen"}; // one copy in global memory, one for `name`
const char* name{"Allen"}; // only one copy and a pointer; more efficient
Type deduction for a C-style string is fairly straightforward:
auto s1{"Allen"}; // const char*
auto* s2{"Allen"}; // const char*
auto& s3{"Allen"}; // const char(&)[5]
Multidimensional C-style arrays
We can define a two-dimensional C-style array as:
int a[2][3];
C++ uses row-major order (there are languages e.g. Fortran that use column-major order), meaning that elements in an array are sequentially placed in memer row-by-row, ordered from left to right, top to bottom. For example, the elements inside the above array are placed sequentially in the following order:
[0][0] [0][1] [0][2] [1][0] [1][1] [1][2]
Initializing a two-dimentional array is as easy as
int a[2][3] {
{1,2,3},
{4,5,6}
};
The missing initializers will be value-initialized:
int a[2][3] {
{1,2},
{4},
}; // result in {{1,2,0},{4,0,0}}
You can also omit (only the leftmost) length in the declaration:
int a[][3] {
{1,2,3},
{4}
};
Lastly, just like regular one-dimensional arrays, you can value-initialize the whole array to zeros:
int a[2,3]{};
Multidimensional std::array
Is there a standard library class for multidimensional arrays? Sadly, the answer is no. However, we can still define one like follows:
std::array<std::array<int, 3>, 2> a {{
{1,2,3},
{4,5,6}
}}; // notice the double braces!
This syntax is verbose and hard to read and requires double braces (for reasons mentioned in previous sections) and swaps the row- and column-number awkwardly. To make it easier to use, we can create an alias template like the following:
template <typename T, std::size_t Row, std::size_t Col>
using Array2d = std::array<std::array<T, Col>, Row>;
Array2d<int, 2, 3> a {{
{1,2,3},
{4,5,6}
}};
In C++23 we have a new std::mdspan
which provides a simple way to reshape a one-dimensional array to a multidimensional one:
std::array<int, 6> a{1,2,3,4,5,6};
std::mdspan mdView { a.data(), 2, 3 }; // first arg is a pointer to the array data
std::size_t row {mdView.extents().extent(0)};
std::size_t col {mdView.extents().extent(1)};
// print in 1d
for (std::size_t i=0; i < mdView.size(); ++i)
std::cout << mdView.data_handle()[i] << ' ';
std::cout << '\n';
// print in 2d
for (std::size_t r=0; r < row; ++r) {
for (std::size_t c=0; c < col; ++c)
std::cout << mdView[r, c] << ' '; // operator[] accepts multiple indices since C++23
std::cout << '\n';
}
std::cout << '\n';
In C++26, we’ll have std::mdarray
which officially combines std::array
with std::mdview
. Hooray!
Iterators and Algorithms
Sorting an array using selection sort
Selection sort performs the following steps to sort an array from smallest to largest:
- Starting at array index 0, search the entire array to find the smallest value
- Swap the smallest value found in the array with the value at index 0
- Repeat steps 1 & 2 starting from the next index
Here is how this algorithm is implemented in C++:
#include <iostream>
#include <iterator>
#include <utility>
int main() {
int array[] {3,5,2,1,4};
constexpr int length{static_cast<int>(std::size(array))};
for (int i{}; i < length - 1; ++i) {
int i_min {i};
for (int j{i + 1}; j < length; ++j) {
if (array[j] < array[i_min]) i_min = j;
}
std::swap(array[i], array[i_min]);
}
for (int i{}; i < length; ++i) std::cout << array[i] << ' ';
std::cout << '\n';
return 0;
}
Provided in the <algorithm>
header we have std::sort
that does the tedius work for us:
#include <algorithm>
#include <iostream>
#include <iterator>
int main() {
int array[] {3,5,2,1,4};
std::sort(std::begin(array), std::end(array));
for (auto i : array) std::cout << i << ' ';
std::cout << '\n';
return 0;
}
Introduction to iterators
An iterator is an object designed to traverse through a container (e.g. the values in an array, or the characters in a string), providing access to each element along the way. A container may provide different kinds of iterators. For example, an array container might offer a forwards iterator that walks through the array in forward order, and a reverse iterator that walks through the array in reverse order.
We can use a pointer as an iterator:
#include <array>
#include <iostream>
int main() {
std::array data{0,1,2,3,4,5,6};
auto begin{&data[0]};
auto end{begin + std::size(data)}; // this is one after the last element
for (auto ptr{begin}; ptr != end; ++ptr)
std::cout << *ptr << ' ';
std::cout << '\n';
return 0;
}
We can also use standard library iterators:
#include <array>
#include <iostream>
int main() {
std::array data{0,1,2,3,4,5,6};
auto begin{data.begin()};
auto end{data.end()}; // also one after the last element
for (auto ptr{begin}; ptr != end; ++ptr)
std::cout << *ptr << ' ';
std::cout << '\n';
return 0;
}
The <iterator>
header also provides std::begin
and std::end
similarly:
#include <array> // this includes <iterator>
#include <iostream>
int main() {
std::array data{0,1,2,3,4,5,6};
auto begin{std::begin(data)};
auto end{std::end(data)};
for (auto ptr{begin}; ptr != end; ++ptr)
std::cout << *ptr << ' ';
std::cout << '\n';
return 0;
}
Notice we’re not using operator<
in the for loop above, because some iterator types are not relationally comparable and operator!=
works with those types.
All types that have both begin()
and end()
member functions, or that can be used with std::begin()
and std::end()
, are usable in range-based for loops:
#include <array>
#include <iostream>
int main() {
std::array data{1,2,3};
for (auto i : data)
std::cout << i << ' ';
std::cout << '\n';
return 0;
}
There is a concept called iterator invalidation which basically refers to an iterator becoming a dangling pointer. This happens when you modify the container while using the iterator. For example:
#include <vector>
int main() {
std::vector v {0,1,2,3};
for (auto i : v) {
if (i % 2 == 0)
v.push_back(i + 1); // modifying the container here
}
return 0;
}
The above program will generate undefined behavior for the iterator has been invalidated when we push the new elements into the container. Another example is
#include <iostream>
#include <vector>
int main() {
std::vector v{0,1,2,3,4,5};
auto i {v.begin};
++i;
std::cout << *i << '\n'; // ok: prints 1
v.erase(i); // modifying the container, iterator invalidated
++i;
std::cout << *i << '\n'; // undefined behavior!
return 0;
}
To fix the above program, we can use the fact vector’s erase()
member function returns the iterator to the next element (or end()
when the last element is erased):
#include <iostream>
#include <vector>
int main() {
std::vector v{0,1,2,3,4,5};
auto i{v.begin()};
++i;
std::cout << *i << '\n'; // ok: prints 1
i = v.erase(i); // i is overridden to the next position
std::cout << *i << '\n'; // ok: prints 2
return 0;
}
Introduction to standard library algorithms
The functionality provided in the algorithms library generally fall into one of three categories:
- Inspectors: used to view (but not modify) data in a container, e.g. searching and counting
- Mutators: used to modify data in a container, e.g. sorting and shuffling
- Facilitators: used to generate a result based on values of the data members, e.g. multiplier or determining what order pairs of elements should be sorted in
Using std::find
to find an element by value:
#include <algorithm>
#include <array>
#include <iostream>
int main() {
std::array data{1,2,3,4,5};
std::cout << "Enter a value to search for and replace with: ";
int search{};
int replace{};
std::cin >> search >> replace;
auto found{std::find(data.begin(), data.end(), search)};
if (found == data.end())
std::cout << "Could not find " << search << '\n';
else
*found = replace; // yes it's that easy
for (int i : data)
std::cout << i << ' ';
std::cout << '\n';
return 0;
}
Using std::find_if
to find an element that matches some condition:
#include <algorithm>
#include <array>
#include <iostream>
#include <string_view>
bool contains_nut(std::string_view str) {
// std::string_view::find returns std::string_view::npos if not found
return (str.find("nut") != std::string_view::npos);
}
int main() {
std::array<std::string_view, 4> arr{"apple", "banana", "walnut", "lemon"};
auto found{std::find_if(arr.begin(), arr.end(), contains_nut)};
if (found == arr.end())
std::cout << "no nuts\n";
else
std::cout << "found " << *found << '\n';
return 0;
}
Using std::count
and std::count_if
to count how many occurrences there are in an array:
#include <algorithm>
#include <array>
#include <iostream>
#include <string_view>
bool contains_nut(std::string_view str) {
// std::string_view::find returns std::string_view::npos if not found
return (str.find("nut") != std::string_view::npos);
}
int main() {
std::array<std::string_view, 4> arr{"apple", "banana", "walnut", "lemon"};
auto nuts {std::count_if(arr.begin(), arr.end(), contains_nut)};
std::cout << "there are " << nuts << " nut(s)\n";
}
Using std::sort
to do (custom) sort:
#include <algorithm>
#include <array>
#include <iostream>
bool greater(int a, int b) {
return (a > b); // true means first!
}
int main() {
std::array arr{1,2,5,3,2};
std::sot(arr.begin(), arr.end(), greater); // this is gives DESCENDING order in result!
for (auto i : arr)
std::cout << i << ' ';
std::cout << '\n';
return 0;
}
Notice we can replace our greater()
by the one provided by the C++:
std::sort(arr.begin(), arr.end(), std::greater{}); // greater is a type and thus need {} instantiation
Using std::for_each
to do something to all elements of a container:
#include <algorithm>
#include <array>
#include <iostream>
void double_number(int& i) {
i *= 2;
}
int main() {
std::array arr{1,2,4,2,1};
std::for_each(arr.begin(), arr.end(), double_number);
for (auto i : arr)
std::cout << i << ' ';
std::cout << '\n';
return 0;
}
Having to explicitly pass arr.begin()
and arr.end()
in the above algorithms is a bit annoying, but luckily in C++20 we have added ranges, which allows us to simply pass arr
. This will make our code even shorter and more readable.
Timing your code
C++11 comes with some functionality in the <chrono>
library to do some simple timing. We can encapsulate the timing inside a class to be used easily:
#include <chrono> // for std::chrono
class Timer {
private:
using Clock = std::chrono::steady_clock;
using Second = std::chrono::duration<double, std::ratio<1>>;
std::chrono::time_point<Clock> m_beg {Clock::now()};
public:
void reset() {
m_beg = Clock::now();
}
double elapsed() const {
return std::chrono::duration_cast<Second>(Clock::now() - m_beg).count();
}
};
Then we can easily use it as follows:
#include <iostream>
int main() {
Timer t;
std::cout << "Time has elapsed " << t.elapsed() << " seconds\n";
return 0;
}
There are three things that may impact timing:
- Make sure you’re using a release build target instead of debug build target, which typically turn optimizatino off
- Make sure other system activities are not impacting your program performance
- If the program uses a random number generator, the particular random number used may impact timing, which a lot people overlook
Dynamic Allocation
Dynamic memory allocation with new and delete
C++ supports three basic types of memory allocation, of which you’ve already seen two:
- Static memory allocation
- Automatic memory allocation
- Dynamic memory allocation
Both static and automatic allocation have two things in common:
- The size of the variable/array must be known at compile time
- Memory allocation and deallocation happens automatically (when the variable is instantiated / destroyed)
While we can use static/automatic allocation with a considerably large size at compile time that gauges on the maximum size of the variable, there are severa drawbacks:
- Wasted memory
- Hard to tell which bits are actually used while others wasted
- Stack overflow: variables are stored on stack which has generally quite small size e.g. 1MB for Visual Studio, which means you can overflow it with
int array[1000000]
- It can leads to artificial limitations and/or array overflows when the user tries to read in more records than allocated
Fortunately, we can address these problems by using dynamic memory allocation, which requests memory when needed from a much larger pool of memory called heap.
We can allocate a single variable dynamically using the new
operator:
new int; // returns a pointer
int* ptr { new int }; // dynamically allocate an integer and address it to a pointer
int* ptr { new int(5) }; // direct initialization
int* ptr { new int{5} }; // uniform initialization
Note that accessing heap-allocated objects is generally slower than accessing stack-allocated objects because the compiler knows the address of stack-allocated objects and can go directly to the address. For heap-allocated ones, there are two steps: to get the address of the object (from the pointer) and to get the value at that address.
To delete a single variable from memory:
delete ptr; // free the memory used by the object that ptr refers to
ptr = nullptr; // set ptr back to a null pointer
When operator new
fails, a bad_alloc
exception is thrown, which may cause a program termination if it’s not handled properly. We can use the following method to solve this:
int* ptr { new(std::nothrow) int }; // don't throw error if new fails; instead, assign ptr to nullptr
which is kinda shady and thus not recommended. Alternatively we can explicitly “handle” the issue (we’ll cover actual exception handling later):
int* ptr { new(std::nothrow) int{} };
if (!value) { // if value is nullptr
std::cerr << "Could not allocate memory\n";
}
To free the memory referred by a pointer (even when it’s null already), we can simply
delete ptr;
Dynamically allocated memory stays allocated until it is explicitly deallocated or until the program ends, assuming your operating system does regular cleanup. This means we can sometimes accidentally cause memory leak by writing functions like this:
void do_something_stupid() {
int* ptr { new int };
}
By constantly calling this function, we’re accumulating the memory allocated while deleting ptr
as it’s out of scope once out of the function, hence the unreferencable memory will increase until the program ends.
Another situation when we cause memory leak is to re-assign a pointer, and to fix these problems we can always just delete the pointer before leaving.
Dynamically allocating arrays
We can dynamically allocate a C-style array too (for a dynamic std::array
, you might as well just consider std::vector
which is non-dynamically allocated). To allocate an array, we can
#include <cstddef>
#include <iostream>
int main() {
std::cout << "Enter a positive integer: ";
std::size_t len{};
std::cin >> len;
int* arr{ new int[len]{} }; // notice that length here doesn't need to be const!
std::cout << "I just allocated an array of integers of length " << len << '\n';
arr[0] = 5;
delete[] arr; // delete the whole array
return 0;
}
In the above example, since we’re dynamically allocating a C-style array, there are several differences than our previously covered arrays:
- The length/size doesn’t need to be const any more
- The size of the array can be very large, since it’s now on heap rather than stack
- There is a small performance regression
Notice we’re writing int
twice in the above example. In practice, we can also avoid that by
auto* arr { new int[len]{} };
When it comes to resizing the dynamic array – it’s typically recommended to just go for std::vector
as C++ doesn’t have a built-in way to resize an array that’s been allocated.
Destructors
A destructor is another special kind of class member function that is executed when an object of that class is destroyed. Whereas constructors are designed to initialize a class, destructors are designed to help clean up. When your class is holding any resources (e.g. dynamic memory or a file or database handle), or if you need to do any kind of maintenance before the object is destroyed, the destructor is the perfect place to do so.
Like constructors, destructors have specific naming rules:
- The destructor must have the same name as the class, preceded by a tilde
~
- The destructor can not take arguments
- The destructor has no return type
Below is a simple example:
#include <iostream>
#include <cassert>
#include <cstddef>
class IntArray {
private:
int* m_array{};
int m_len{};
public:
IntArray(int len) {
assert(len > 0);
m_array = new int[static_cast<std::size_t>(len)]{};
m_len = len;
}
~IntArray() {
delete[] m_array;
}
// other public member functions
};
The above examle adopts a concept called RAII (Resource Acquisition Is Initialization) which states the resource (e.g. memory, file and database handles) acquisition should happen in constructors and resource releasing in destructors. This helps prevent resource leaks.
Lastly, take special note about std::exit()
because no destructors will be called when you use std::exit()
to terminate the program.
Pointers to pointers and dynamic multidimensional arrays
Since pointers are also objects, it’s natural to think that we can define a pointer that points to another pointer (though a bit tongue-twistery):
int val { 5 }; // just an int
int* ptr { &value }; // ptr is pointing at the address of val
int** ptrptr { &ptr}; // ptrptr is pointing at the address of ptr
std::cout << **ptrptr; // prints 5
With pointers to pointers, we can define dynamic multidimensional arrays:
int rows { 5 };
int cols { 10 };
int** arr { new int*[rows] };
for (int i{}; i < cols; ++i)
arr[i] = new int[cols];
Note that we can also make the array non-rectangular e.g. triangular, since we’re just iteratively allocating row arrays.
Void pointers
The void pointer, also known as the generic pointer, is a special type of pointer that can be pointed at objects of any data type. A void pointer is declared like any normal pointer with its type being void*
:
int x{};
float y{};
struct S {
int n;
float f;
};
S s{};
void* ptr{};
ptr = &x; // ok
ptr = &y; // ok
ptr = &s; // ok!
However, since a void pointer doesn’t know the actual underlying type it’s pointing to, dereferencing a void pointer is illegal. Instead, it must be first cast to a specific pointer type before being dereferenced:
int x{5};
void* ptr{};
ptr = &x;
int* iptr{static_cast<int*>(ptr)};
std::cout << *iptr;
For the same reason, deleting a void poitner will result in undefined behavior and should be avoided unless you cast it to a specific typed pointer first. Also, pointer arithmetic is not allowed on void pointers as it doesn’t know what size the underlying object has.
There is no such thing as void reference, as a reference always has an underlying instance/object and thus knows the type.
Functions
Function pointers
Idenifier is a function’s name, but what type is the function? Functions have their own lvalue function types: the return type together with arguments make the type. Much like variables, functions live at an assigned address in memory. When we print the function itself via operator<<
, the function pointer is implicitly converted to a bool
and always prints 1
. In some other cases, te compiler has an extension to prints the actual address of the function pointer. If it’s not done automatically, you can cast the function pointer to a void pointer to let the compiler know it’s the address you want to print:
#include <iostream>
int foo() {
return 5;
}
int main() {
std::cout << reinterpret_cast<void*>(foo) << '\n';
return 0;
}
To define a function pointer:
int (*fcnPtr) (); // notice the difference between this and below
int* fcn (); // this is just a function taking no args and returning int*
Only the first line defines fcnPtr
as a function pointer. The second line is just a function.
To define a const function pointer:
int (*const fcnPtr) (); // correct
const int (*fcnPtr) (); // wrong
The first line defines a const function pointer. The second is a function pointer on a function returning const int.
To assign a function pointer to a function, it’s pretty much what we’ve been doing with pointers all the time:
fcnPtr = &foo; // no parentheses!!
Same with calling a function via it’s pointer:
(*fcnPtr)(5); // assuming function foo takes a single arg in int
fcnPtr(5); // implicit dereference! works like magic
We can also pass a function pointer to other functions as arguments:
void selection_sort(int* array, int size, bool (*compare)(int, int));
void selection_sort(int* array, int size, bool compare(int, int)); // implicit conversion again!
Notice how on the second line above the compare
function pointer gets implicitly converted for more succinct code. This only works when it’s a function argument inside another function. To make everything more neat, we can define type aliases for function pointers:
using CompareFunction = bool(*)(int, int);
void selection_sort(int* array, int size, CompareFunction compare);
Alternatively, we can also use the std::function
class provided by the standard library in <functional>
:
#include <functional>
void selection_sort(int* array, int size, std::function<bool(int, int)> compare);
The stack and the heap
The memory that a program uses is typically divided into a few different areas called segments:
- The code segment (also known as the text segment), where the compiled program sits in memory and is read-only
- The bss segment (also known as the uninitialized data segment), where zero-initialized global and static variables are stored
- The data segment (also known as the initialied data segment), where initialied global and static variables are stored
- The heap segment (also known as the free store), where dynamically allocated variables are allocated from
- The call stack, where function parameters, local variables and other function related information are stored
Advantages and disadvantages of the heap:
- Allocating memory on the heap is comparatively slow.
- Allocated memory stays allocated until it is specifically deallocated (beware memory leaks) or the application ends (at which point the OS should clean it up).
- Dynamically allocated memory must be accessed through a pointer. Dereferencing a pointer is slower than accessing a variable directly.
- Because the heap is a big pool of memory, large arrays, structures, or classes can be allocated here.
Advantages and disadvantages of the stack:
- Allocating memory on the stack is comparatively fast.
- Memory allocated on the stack stays in scope as long as it is on the stack. It is destroyed when it is popped off the stack.
- All memory allocated on the stack is known at compile time. Consequently, this memory can be accessed directly through a variable.
- Because the stack is relatively small, it is generally not a good idea to do anything that eats up lots of stack space. This includes allocating or copying large arrays or other memory-intensive structures.
Recursion
A recursive function is a function that calls itself. Check the following (poorly written) function:
void count_down(int count) {
std::cout << "push " << count << '\n';
count_down(count - 1);
}
The function calls itself indefinitely and thus likely will cause a stack overflow. When tail call (recursive calling itself at the end only) optimization happens in compiler, however, the proram will run forever instead of throwing a stack overflow error.
To remedy this problem, we can write the termination condition for the recursion:
void count_down(int count) {
if (!count) return;
std::cout << "push " << count << '\n';
count_down(count - 1);
}
Command line arguments
When we want to utilize the command line arguments of a program, we can explicitly write main()
in the following form:
int main(int argc, char* argv[]); // or
int main(int argc, char** argv);
Where argc
is the count of command line arguments, and argv
is the array of actual arguments passed.
When we want to convert the element in argv
to numeric (since they’re always read as strings), we can
#include <sstream> // for std::stringstream
int num {};
std::stringstream convert { argv[1] };
convert >> num;
Ellipsis (and why to avoid them)
There are certain cases where it can be useful to be able to pass a variable number of parameters to a function, and C++ provides a special specifier known as ellipsis that allows us to do precisely that. Functions that use ellipsis take the form as below:
return_type function_name(argument_list, ...)
To use ellipsis, we need <cstdarg>
header and use std::va_list
to hold the values, std::va_start()
to initialize it, and std::va_arg()
to extract values from the std::va_list
. For example, the following function calculates the average of variable count of numbers:
#include <iostream>
#include <cstdarg> // needed for ellipsis
double calc_average(int count, ...) {
int sum{0};
std::va_list list; // to get values in ellipsis
std::va_start(list, count); // the first is the target list to initialize, the second is the last non-ellipsis arg
for (int arg{}; arg < count; ++arg)
sum += std::va_arg(list, int); // the first is the target list, the second is element type
std::va_end(list); // clean up the va_list
return static_cast<double>(sum) / count;
}
int main() {
std::cout << calc_average(5, 1,2,3,4,5) << '\n';
std::cout << calc_average(3, 5,3,1) << '\n';
return 0;
}
The ellipsis is dangerous as it doesn’t do type checking at all when you extract the values using va_arg
. For example, using the calc_average
function above:
std::cout << calc_average(6, 1.0, 2, 3, 4, 5, 6) << '\n';
The output would be surprising:
1.78782e+008
This result epitomizes the phrase “garbage in, garbage out” which means a computer, unlike humans, will unquestioningly process whatever you want it to and potentially produce nonsentical output.
Another reason why ellipsis is dangerous, is that it doesn’t know the length of the input. To properly use it we much pass a count argument (like above) or use a sentinel value to terminate.
As a conclusion, it’s generally suggested to avoid using ellipsis altogether.
Introduction to lambdas (anonymous functions)
A lambda expression (also known as a lambda or closure) allows us to define an anonymous function inside another function. This nesting is important as it allows us to avoid namespace pollution and to define the function as close to where it is used as possible.
The syntax for lambdas in C++ is
[captureClause] (parameters) -> returnType { statements; }
The capture clause can be left empty if not needed. The paremeters can be left empty or even ommited entirely with brackets. The return type is optional and assumed auto if omitted. Therefore, the simplest lambda is just []{}
.
Below is another example that passes a lambda to std::find_if
:
#include <algorithm>
#include <array>
#include <iostream>
#include <string_view>
int main() {
constexpr std::array<std::string_view, 4> arr{"apple", "banana", "walnut"};
auto found{ std::find_if(
arr.begin(),
arr.end(),
[](std::string_view str){ return str.find("nut") != std::string_view::npos; }
) };
if (found == arr.end())
std::cout << "No nuts\n";
else
std::cout << "Found " << *found << '\n';
return 0;
}
For advanced readers: a lambda is not a function, but instead a special kind of object in compiler called a functor that contain an overloaded operator()
that make them callable like a function.
If for some reason we want to “name” a lambda and save it to a variable, there are three ways:
#include <functional>
int main() {
double (*add_numbers)(double, double) {
[](double a, double b) {
return a + b;
}
}; // to a function pointer
std::function add_number { // note we can omit <double(double, double)> here since C++17
[](double a, double b) {
return a + b;
}
}; // to a std::function
auto add_number {
[](double a, double b) {
return a + b;
}
};
}
We can use auto
for the parameters in a lambda to make our code simpler:
#include <algorithm>
#include <array>
#include <iostream>
#include <string_view>
int main() {
constexpr std::array months {
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December"
};
const auto start_with_same_letter { std::adjacent_find(
months.begin(),
months.end(),
[](const auto& a, const auto& b) { return a[0] == b[0]; }
) };
if (start_with_same_letter != months.end()) {
std::cout << *start_with_same_letter << " and "
<< *std::next(start_with_same_letter)
<< " start with the same letter\n";
}
return 0;
}
However, using auto
can be dangerous as well in some cases, say
const auto five_letter_months { std::count_if(
months.begin(),
months.end(),
[](std::string_view str) { return str.length() == 5; }
) };
If we use auto
instead of str:string_view
in the lambda, the inferred type would be const char*
and thus a lot of functionalities would be lost.
In terms of constexpr lambdas: all lambdas are implicitly constexpr as of C++17 if the lambda has no captures (or all captures are constexpr) and calls no other functions (or all functions it calls are constexpr). Considering a lot of functions in standard library are made constexpr only since C++20, these conditions are likely not true until C++20.
One thing to be aware is that each generic lambda corresponds to a different type, meaning that we can define one lambda for different independent tasks:
int main() {
auto count_print {
[](auto value) {
static int count{};
std::cout << count << ": " << value << '\n';
}
};
print("hello");
print("world");
print(1);
print(2);
print("bye");
return 0;
}
and the output for above program is
0: hello
1: world
0: 1
1: 2
2: bye
In terms of return type deduction: we need to make sure all possible return values are of the same type when we don’t specify the return type for a lambda, otherwise there will be a compile error:
auto divide {
[](int x, int y, bool integer_division) {
if (integer_division)
return x / y; // return type is int
else
return static_cast<double>(x) / y; // error: return type doesn't match
}
};
To fix this problem, we need to either explicitly convert and make sure types match, or explicitly specify the return type and let the compiler implicitly convert types for us:
auto divide {
[](int x, int y, bool integer_division) -> double {
if (integer_division)
return x / y; // compiler would implicitly convert int to double for us
else
return static_cast<double>(x) / y;
}
};
Lastly, we don’t need to define many simple functions as lambdas because we have a bunch of them defined in <functional>
provided by standard library. What differentiates them from lambdas is that they need to instantiated e.g. std::greater{}
before being used like a function.
Lambda captures
Lambdas can only access global identifiers, entities that are known at compile time, and entities with static storage duration. That means local variables are not accessible by lambdas and thus we cannot “partially” pass values to a lambda. This brings up the concept of captures. The captuer clause is used to indirectly give a lambda access to variables available in the surrounding scope that it normally would not have access to. All we need to do is list the entities we want to access from within the lambda as part of the capture clause. For example:
...
std::string search{};
std::cin >> search; // search is not available at compile time
auto found { std::find_if(
arr.begin();
arr.end();
[search](std::string_view str) { return str.find(search) != std::string_view::npos; } // check capture clause!
) };
Captures are essentially const copies of the original variables (instead of actual references, for example). While we can mark the lambda with specifier mutable
so that the captures becomes non-const, they’re still just copies and thus won’t affect the original variables:
#include <iostream>
int main() {
int ammo{10};
auto shoot {
[ammo]() mutable {
--ammo;
std::cout << "Pew! " << ammo << " shot(s) left\n";
}
};
shoot();
shoot();
std::cout << ammo << " shot(s) left\n";
return 0;
}
What the above prints is
Pew! 9 shot(s) left
Pew! 8 shot(s) left
10 shot(s) left
To actually capture a variable by reference, we can prepend the variable name with an ampersand &
:
#include <iostream>
int main() {
int ammo{10};
auto shoot {
[&ammo]() { // no need for mutable any more
--ammo;
std::cout << "Pew! " << ammo << " shot(s) left\n";
}
};
shoot();
shoot();
std::cout << ammo << " shot(s) left\n";
return 0;
}
and the above prints
Pew! 9 shot(s) left
Pew! 8 shot(s) left
8 shot(s) left
Multiple variables can be captured by suing commas to separate them. This can include a mix of captures by value and by reference. In the extreme case, a default capture (also known as a capture-default) captures all variables that are mentioned in the lambda. To capture all used variables by value, use =
. To capture all used variables by reference, use &
:
int main() {
...
int width{};
int height{};
std::cin >> width >> height; // width and height are not known at compile time
auto found { std::find_if(
arr.begin(),
arr.end(),
[=](int area) { return width * height == area; }
) }; // width and height are captured by value to lambda
}
We can even define new variables inside capture:
auto found { std::find_if(
arr.begin(),
arr.end(),
[specified_area{width * height}](int area) { return specified_area == area; };
) };
Operator Overloading
Introduction to operator overloading
In C++, operators are implemented as functions. By using function overloading on the operator functions, we can define our own version of the operators that work with different data types.
There are some limitations on operator overloading:
- Almost all C++ operators can be overloaded. The exceptions are:
?:
(conditional),sizeof
,::
(scope),.
(member selector),.*
(pointer member selector),typeid
and the casting operators. - You can only overload existing operators and cannot create e.g.
operator**
to do exponents. - At least one of the operands must be a user-defined type, e.g. you cannot overload
operator+
onint
anddouble
, as both are native types. However, since standard library classes are also considered user-defined, overloading onint
andstd::string
is allowed (but not recommended). - It’s not possible to change the number of operands an operator supports
- All operators keep their default precedence and associativity regardless of what they’re used for and this doesn’t change with overloading. For example, while some beginners might want to overload the XOR operator
^
to do exponentiation, since^
has a lower precedence than the basic arithmetic operators, the expressions in the end might get evaluated incorrectly.
Overloading the arithmetic operators using friend functions
There are three different ways to overload operators:
- The member function way
- The friend function way
- The normal function way
In this lesson we focus on the friend function way because it’s the most intuitive for binary operators.
Say we have a custom class storing how many cents of money we have and we’d like to overload operator+
for multiple cents instances:
#include <iostream>
class Cents {
private:
int m_cents{};
public:
Cents(int cents) : m_cents { cents } {}
friend Cents operator+(const Cents& c1, const Cents& c2);
int getCents() const { return m_cents; }
};
Cents operator+(const Cents& c1, const Cents& c2) {
return c1.m_cents + c2.m_cents;
} // we can also define the friend function inside the class
int main() {
Cents c1{6};
Cents c2{4};
Cents csum{c1 + c2};
std::cout << "I have " << csum.getCents() " cents in total\n";
return 0;
}
We can also overload other arithmetic operators and even for mixed types. Also, we can call overloaded operators when defining another friend function, e.g. overload operator-
using operator+
.
Overloading operators using normal functions
When we don’t need the access to private members, we can just define the overloaded operator as a normal member function:
Cents operator+(const Cents& c1, const Cents& c2) {
return Cents{ c1.getCents() + c2.getCents() };
}
The one difference between a friend function versus a normal function in this case, besides the accessibility, is that when we declare the class, we declare the friend functions in the header file as well, and that serves as a prototype inherently. For a normal function, we need to provide this prototype/declaration inside the header ourselves to make it explicit that this overload exists for the users of the header file.
Overloading the I/O operators
Overloading operator<<
is similar to overloading operator+
since they’re both binary operators, except that the parameter types are a bit different:
// std::ostream is the type of std::cout
friend std::ostream& operator<<(std::ostream& out, const Type& obj);
Overloading operator>>
is done in a manner analogous to overloading operator>>
:
// std::istream is the type of std::cin
friend std::istream& operator>>(std::istream& in, Type& obj);
Overloading operators using member functions
Overloading operators using a member function is very similar to overloading operators using a friend function. When overloading an operator using a member function:
- The overloaded operator must be added as a member function of the left operator
- The left operator becomes the implicit
*this
object - All other operands become function parameters
Re-using the previous example where we implemented using friend functions:
class Cents {
private:
int m_cents{};
public:
Cents(int cents) : m_cents {cents } {};
Cents operator+(int value) const;
int getCents() const {return m_cents;}
};
Cents Cents::operator+(int value) const {
return Cents{m_cents + value};
}
Since they’re just so similar, how on earth should we decide whether we’re doing the friend function or member function way? Well, there are a few more things to note:
- Not everything can be overloaded as a friend function:
operator=
,operator[]
,operator()
andoperator->
can only be overloaded as member functions - Not everything can be overloaded as a member function: we cannot add a member function to a left operand who’s not a class (e.g.
int
) or who’s not modifiable (e.g.std::ostream
)
Typically, the following are some rules of thumb to decide which overloading we should define:
- When dealing with binary operators that don’t modify the left operand (e.g.
operator+
) it’s recommended to use the normal or friend function overload - When dealing with binary operators that do modify the left operand (e.g.
operator+=
) it’s preferred to do it via member functions – if you can modify the class of the left operand, otherwise use normal functions - Unary operators are usually overloaded as member functions since they don’t have parameters in that case
- These operators can only be overloaded as member functions:
operator=
,operator[]
,operator()
andoperator->
Overloading unary operators +
, -
and !
A simple example of how we might define operator-
on the Cents
class:
Cents Cents::operator-() const {
return Cents{-m_cents};
}
Overloading the comparison operators
Because comparison operators are all binary and not modifying the left operands, we can define them as friend functions (or normal if possible):
class Car {
private:
std::string m_make;
std::string m_model;
public:
Car(std::string_view make, std::string_view model)
: m_make{make}, m_model{model} {
}
friend bool operator==(const Car& c1, const Car& c2);
friend bool operator!=(const Car& c1, const Car& c2);
};
bool operator==(const Car& c1, const Car& c2) {
return (c1.m_make == c2.m_make) && // & is bitwise/logical; && is logical only
(c1.m_model == c2.m_model);
}
bool operator!=(const Car& c1, const Car& c2) {
return !(c1 == c2);
}
As suggested above, we can minimize the comparative redundancy by
- Implement
operator!=
as!(operator==)
- Implement
opertor>
asoperator<
with parameters flipped - Implement
operator>=
as!(operator<)
- Implement
operator<=
as!(operator>)
C++20 introduces the spaceship operator operator<=>
which allows us to reduce the number of comparison functions to implement to just 2 or even 1:
(A <=> B) < 0
ifA < B
(A <=> B) > 0
ifA > B
(A <=> B) == 0
ifA == B
Overloading the increment and decrement operators
Since increment and decrement are urary and modifying the operand, they’re best implemented as member functions:
class Digit {
private:
int m_digit{};
public:
Digit(int digit) : m_digit {digit} {}
Digit& operator++(); // prefix
Digit& operator--(); // prefix
Digit& operator++(int); // postfix
Digit& operator--(int); // postfix
};
Digit& Digit::operator++() {
if (m_digit == 9)
m_digit = 0;
else
++m_digit;
return *this;
}
Digit& Digit::operator--() {
if (m_digit == 0)
m_digit = 9;
else
--m_digit;
return *this;
}
Digit& Digit::operator++(int) {
Digit temp{*this};
++(*this);
return temp;
}
Digit& Digit::operator--(int) {
Digit temp{*this};
--(*this);
return temp;
}
Notice how we differentiate the prefix and postfix versions with the parameter int
: when there’s an int
parameter, C++ will deem the function to be a postfix instead of prefix.
Overloading the subscript operator
We can overload the subscript operator []
in C++ to allow intuitive access to elements in the private member list:
#include <iostream>
class IntList {
private:
int m_list[10]{};
public:
int& operator[] (int index) {
return m_list[index];
}
};
int main() {
IntList list{};
list[2] = 3;
std::cout << list[2] << '\n'; // prints 3
return 0;
}
Notice operator[]
need to return by (const, if needed) reference cuz otherwise when we run list[2] = 3
, we are essentially evaluating 6 = 3
and will cause a compile error. When we need functionality for const list subscription, we can define a const version of the overload.
When we need to implement the logics for both const and non-const versions of the overload function, what we can do to save the amount of duplicate coding is to put majority of the function body into another function that gets called by both versions.
In C++23, we have an even simpler solution utilizing several new features:
#include <iostream>
class IntList {
private:
int m_list{1,2,3,4,5};
public:
auto&& operator[](this auto&& self, int index) {
return self.m_list[index];
} // && and self to differentiate const vs non-const
};
int main() {
IntList list{};
list[2] = 3; // ok
const IntList clist{};
clist[2] = 3; // compile error
std::cout << clist[2] << '\n';
return 0;
}
Note that we don’t mix pointers and overloaded subscriptions:
IntList* list { new IntList{} };
list[2] = 3; // compile error
Overloading the parenthesis operator
The parenthesis operator must be implemented as a member function. Taking the following as an example:
class Matrix {
private:
double data[4][4]{};
public:
double& operator()(int row, int col);
double operator()(int row, int col) const;
};
double& Matrix::operator()(int row, int col) {
assert(row >= 0 && row < 4);
assert(col >= 0 && col < 4);
return m_data[row][col];
}
double Matrix::operator()(int row, int col) const {
assert(row >= 0 && row < 4);
assert(col >= 0 && col < 4);
return m_data[row][col];
}
int main() {
Marix matrix;
matrix(1, 2) = 4.5;
return 0;
}
The reason why we didn’t implement something like matrix[1][2]
is because the second subscript operator is harder to implement than just using the parenthesis.
The operator()
is often overloaded to implement functors (also known as function objects), which are classes that behave like functions except that they store data in member variables:
#include <iostream>
class Accumulator {
private:
int m_counter {};
public:
int operator()(int i) { return (m_counter += i); }
void reset() { m_counter = 0; }
};
int main() {
Accumulator acc{};
std::cout << acc(1) << '\n'; // prints 1
std::cout << acc(3) << '\n'; // prints 4
Accumulator acc2{};
std::cout << acc2(10) << '\n'; // prints 10
std::cout << acc2(20) << '\n'; // prints 30
return 0;
}
Overloading typecasts
C++ inherently converts an int
variable into a Cents
object as shown above, we can also “explicitly” define the behavior of typecasting from a Cents
into an int
:
class Cents {
private:
int m_cents {};
public:
Cents (int cents = 0) : m_cents {cents} {};
operator int() const {return m_cents;}
// ...
};
Note we used the word “explicit” in quotes above, because the behavior we defined would guide C++ how to implicitly convert a Cents
object into an int
. We can also force users to actually “explicitly” perform this typecast:
explicit operator int() const {return m_cents; }
and by specifying explicit
like this, users can only convert using static_cast<int>(cents)
if needed.
Overloading the assignment operator
The purpose of the copy constructor and the copy assignment operator are almost equivalent – both copy one object to another. However, the copy constructor initialiezs new objects, whereas the assignment operator replaces the contents of existing objects. In sum:
- If a new object is to be created before copying, the copy constructor is used
- If a new object doesn’t have to be created before copying, the assignment operator is used
We can overload the assignment operator as follows:
Cents& operator=(const Cents& cents);
Cents& Cents::operator=(const Cents& cents) {
m_cents = cents.m_cents;
return *this;
}
For simple cases, this implementation works. For cases where dynamic memory allocation is expected, we might want to handle self-assignment explicitly. Say we have a custom MyString
class that does dynamic memory allocation:
MyString& MyString::operator=(const MyString& str) {
if (this == &str)
return *this; // important: avoid self-assignment
if (m_data)
delete[] m_data; // important: delete existing variable and free its memory
m_length = str.m_length;
m_data = nullptr;
if (m_length)
m_data = new char[static_cast<std::size_t>(str.m_length)];
std::copy_n(str.m_data, m_length, m_data); // copy m_length of str.m_data into m_data
return *this;
}
The compiler provides an implicit copy assignment operator by default and you can optionally avoid this behavior by making it private
or using the delete
keyword:
Cents& operator=(const Cents& cents) = delete;
Shallow vs deep copying
Because C++ doesn’t know much about your class, the default copy constructor and default assignment operator use a copying method known as memberwise copy (also known as shallow copy). This means C++ copies each member of the class individually (using the assignment operator for overloaded operator=
; using direct initialization for the copy constructor).
This works until it doesn’t – when designing classes that handle dynamically allocated memory, memberwise/shallow copying can get us in a lot of troubles. This is because shallow copies of a pointer is just a copy of the pointer itself without actually allocating any memory or copying the content.
We can solve this by doing a deep copy, which allocates memory for the copy and then copies the actual value, so that the copy lives in distinct memory from the source.
void MyString::deep_copy(const MyString& str) {
delete[] m_data; // delete existing variable and free its memory
m_length = str.m_length;
if (str.m_data) {
m_data = new char[m_length];
for (int i{}; i < m_length; ++i)
m_data[i] = str.m_data[i];
} else
m_data = nullptr;
}
MyString::MyString(const MyString& str) {
deep_copy(str);
} // copy constructor
MyString& MyString::operator=(const MyString& str) {
if (this != &source)
deep_copy(str);
return *this;
} // copy assignment operator
This is exactly the reason why we have the rule of three, namely whenever you need to define a destructor, a copy constructor or a copy assignment operator, you will eventually realize that you need to define all three of them, since very likely you’re handling dynamic memory allocation.
Move Semantics and Smart Pointers
Introduction to smart pointers and move semantics
There are a myriad of ways where a pointer gets allocated but not properly deleted/deallocated, say we return early or throw an exception before deleting a pointer. To avoid this kind of situation for good, there we have smart pointers:
#include <iostream>
template <typename T>
class AutoPtr {
private:
T* m_ptr;
public:
AutoPtr(T* ptr=nullptr) : m_ptr(ptr) {};
~AutoPtr() { delete m_ptr; }
T& operator*() const { return *m_ptr; }
T* operator->() const { return m_ptr; }
};
class Resource {
public:
Resource() { std::cout << "Resource constructed\n"; }
~Resource() { std::cout << "Resource destructed\n"; }
};
int main() {
AutoPtr<Resource> res(new Resource());
return 0;
}
The above program prints
Resource constructed
Resource destructed
As long as the AutoPtr
is defined locally, it will be properly deleting the internal pointer and memory. Relatively, the default built-in pointers are sometimes called “dumb pointers” because they can’t clean up after themselves.
One problem of the above implementation is that it has no properly defined copy constructor and copy assignment operator. That means when we pass an AutoPtr
to another AutoPtr
in copy construction, the original pointer would be deleted without notice:
AutoPtr<Resource> ptr1(new Resource());
AutoPtr<Resource> ptr2(ptr1);
We will see one constructed notice and two destructed notices – on the same resource object. To fix this problem, we can define explicity the copy constructor and copy assignment operator to both use references and thereby avoid the copies altogether, but how about returning an AutoPtr
from a function?
??? get_auto_ptr() {
...
return AutoPtr(res);
}
We cannot return by copy as it will trigger the same problem as in the copy constructor (multiple pointers to the same resource and destructing the same memory multiple times), neither can we return by reference because the AutoPtr
object will be destructred once leaving the local function scope and the user will be left with a dangling reference, same reason for not returning by a pointer to AutoPtr
.
This is why we need the move semantics, which is basically transfering the ownership between objects rather than making copies:
AutoPtr(AutoPtr& ap) {
m_ptr = ap.m_ptr;
ap.m_ptr = nullptr;
}
AutoPtr& operator=(AutoPtr& ap) {
if (&ap != this) { // avoid self assignment
delete m_ptr;
m_ptr = ap.m_ptr;
ap.m_ptr = nullptr;
}
return *this;
}
In C++98 there introduced std::auto_ptr
which was later removed in C++17 for a few reasons:
- It implements move semantics through the copy constructor and assignment operator (like above), so when you pass it by value to a function, the original pointer loses its ownership automatically and you might deference a null pointer by accident.
- It always deletes its contents using non-array destructors and thus won’t work properly with dynamically allocated objects
- It doesn’t work well with a lot of other classes in STL including most of the containers and algorithms, because those STL classes assumes copy instead move when we explicitly copy an object.
We should use std::unique_ptr
and std::shared_ptr
moving forward, which are introduced in C++11.
R-value references
Let’s go over some recap on the value categories. Prior to C++11, only one type of reference existed in C++, and so it was just called a reference. Starting from C++11, it is instead called l-value reference which can only be initialized with modifiable l-values:
l-value reference | can be initialized with | can modify |
---|---|---|
modifiable l-values | yes | yes |
non-modifiable l-values | no | no |
r-values | no | no |
L-value references to const objects can be initialized with modifiable and non-modifiable l-values and r-values alike. However, those values can’t be modified:
l-value reference to const | can be initialized with | can modify |
---|---|---|
modifiable l-values | yes | no |
non-modifiable l-values | yes | no |
r-values | yes | no |
C++11 adds a new type of reference called an r-value reference. An r-value reference is a reference that is designed to be initialized with an r-value (only). While an l-value reference is created using a single ampersand, an r-value reference is created using a double ampersand:
int x{5}; // l-value
int& lref{x}; // l-value ref on a l-value
int&& rref{5}; // r-value ref on an r-value
R-value references cannot be initialized with l-values.
r-value reference | can be initialized with | can modify |
---|---|---|
modifable l-values | no | no |
non-modifiable l-values | no | no |
r-values | yes | yes |
And for r-value reference to const:
r-value reference to const | can be initialized with | can modify |
---|---|---|
modifiable l-values | no | no |
non-modifiable l-values | no | no |
r-values | yes | no |
These r-value references have two properties that are quite useful:
- They extend the lifespan of the object they are initialized with to the lifespan of the r-value reference (though l-value references to const objects can achieve this as well)
- Non-const r-value references allow you to modify the r-values!
When we want to differentiate the behavior of an l-value vs r-value parameter of a function:
void fun(const int& lref) {
// do something
}
void fun(const int&& rref) {
// do something
}
int main() {
int x{5};
fun(x); // x passed as lref
fun(5); // 5 passed as rref
return 0;
}
Move constructors and move assignments
We can define move constructor and move assignment like below:
// not const cuz we need to move the ownership from the original ptr to current
AutoPtr(AutoPtr&& a) noexcept : m_ptr(a.m_ptr) { a.m_ptr = nullptr; }
AutoPtr& operator=(AutoPtr&& a) noexcept {
if (&a != this) {
delete m_ptr;
m_ptr = a.m_ptr;
a.m_ptr = nullptr;
}
return *this;
}
The compiler will create an implicit move constructor and move assignment operator if all of the following are true:
- There are no user-declared copy constructors or copy assignment operators.
- There are no user-declared move constructors or move assignment operators.
- There is no user-declared destructor.
For the above reasons we’re better off to delete the definitions of default copy constructor and assignment operator using =delete
. Also it’s worth noting that the compiler will not generate an implicit move constructor when we delete the copy constructor, and thus it’s better to be very explicit what behavior we need from the compiler about the move constructor and move assignment operators, using either =delete
or =default
(or our own definition).
std::move
There are cases where we want to invoke the move semantics but the compiler does copying for us because the object is an l-value instead of r-value, and thus the double ampersand r-value reference isn’t matched. For example:
#include <iostream>
#include <string>
template<class T>
void my_swap_function(T& a, T& b) {
T tmp{a};
a = b;
b = tmp;
}
int main() {
std::string x{"abc"};
std::string y{"de"};
std::cout << "x: " << x << '\n';
std::cout << "y: " << y << '\n';
my_swap_function(x, y);
std::cout << "x: " << x << '\n';
std::cout << "y: " << y << '\n';
return 0;
}
which prints
x: abc
y: de
x: de
y: abc
So this program works as expected except that a bunch of copies are made. Apparently it isn’t necessary now that we know about move semantics. The problem is that both a
and b
are l-values and thus we cannot explicitly invoke move instead of copy. To solve this problem, we have std::move
from <utility>
library:
#include <iostream>
#include <string>
#include <utility>
template <class T>
void my_swap_function(T&a, T&b) {
T tmp{ std::move(a) };
a = std::move(b);
b = std::move(tmp);
}
// same as previous program
Instead of making a copy of a
, b
and tmp
, we use std::move
to convert l-value variables into r-values, and since the parameter is an r-value, the move semantics are invoked to avoid expensive copying.
Another example when filling elements of a container:
#include <iostream>
#include <string>
#include <utility> // for std::move
#include <vector>
int main() {
std::vector<std::string> v;
// We use std::string because it is movable (std::string_view is not)
std::string str { "Knock" };
std::cout << "Copying str\n";
v.push_back(str); // calls l-value version of push_back, which copies str into the array element
std::cout << "str: " << str << '\n';
std::cout << "vector: " << v[0] << '\n';
std::cout << "\nMoving str\n";
v.push_back(std::move(str)); // calls r-value version of push_back, which moves str into the array element
std::cout << "str: " << str << '\n'; // The result of this is indeterminate
std::cout << "vector:" << v[0] << ' ' << v[1] << '\n';
return 0;
}
The above program prints:
Copying str
str: Knock
vector: Knock
Moving str
str:
vector: Knock Knock
std::move
is also useful when sorting an array of elements and moving contents managed by one smart pointer to another.
std::unique_ptr
C++11 standard library ships with 4 smart pointer classes: std::auto_ptr
(removed in C++17 for reason mentioned before), std::unique_ptr
, std::shared_ptr
and std::weak_ptr
. Among these, std::unique_ptr
is by far the most commonly used smart pointer class and we’ll cover that one in this section.
std::unique_ptr
(provided in <memory>
) can be seen as a replacement for std::auto_ptr
. It should be used to manage any dynamically allocated object that is not shared by multiple objects. That is, it can and should completely own the object it manages and cannot share the ownership with other classes.
#include <iostream>
#include <memory>
class Resource {
public:
Resource() { std::cout << "Acquired\n"; }
~Resource() { std::cout << "Destroyed\n"; }
};
int main() {
std::unique_ptr<Resource> res{ new Resource() };
return 0;
}
The above program prints
Acquired
Destroyed
Since std::unique_ptr
, just like other smart pointers, is allocated on the stack here, whenever it exits its scope, it properly destroys the resource it’s managing. Better than std::auto_ptr
, it also implements move semantics by default:
// same definition of Resource
int main() {
std::unique_ptr<Resource> res1 { new Resource() };
std::unique_ptr<Resource> res2 {}; // initialized as nullptr
std::cout << "res1 is " << (res1 ? "not null\n" : "null\n");
std::cout << "res2 is " << (res2 ? "not null\n" : "null\n");
// res2 = res1; // Won't compile: copy assignment is disabled
res2 = std::move(res1); // res2 assumes ownership, res1 is set to null
std::cout << "Ownership transferred\n";
std::cout << "res1 is " << (res1 ? "not null\n" : "null\n");
std::cout << "res2 is " << (res2 ? "not null\n" : "null\n");
return 0;
}
The above prints
Acquired
res1 is not null
res2 is null
Ownership transferred
res1 is null
res2 is not null
Destroyed
std::unique_ptr
has *
and ->
overloaded, and has a cast to bool that returns true if the pointer is managing a resource:
if (res) {
std::cout << *res << '\n';
}
Another improvement from std:auto_ptr
is that std::unique_ptr
knows when to use scalar vs array delete, and hence it’s ok to use std::unique_ptr
with array objects. However, it’s almost always better to just use std::array
or std::vector
than using std::unique_ptr
with a fixed, dynamic or C-style array.
Additionally, in C++14 we have a new function named std::make_unique
which constructs an object of the template type and initialize it with the arguments passed into the function:
std::unique_ptr<Resource> create_resource() {
return std::make_unique<Resource>();
}
std::make_unique
distinguishes T
from T[]
and T[N]
, and avoids the using of new
altogether, so it’s pretty much always preferred than std::unique_ptr
explicitly. Also, it’s just more succinct:
f(std::unique_ptr<MyClass>(new MyClass(param)), g());
f(std::make_unique<MyClass>(param), g());
Notice we don’t return std::unique_ptr
by pointer or reference in most cases. Also, don’t let multiple std::unique_ptr
manager the same resource (apparently). Don’t delete the resourse for your std::unique_ptr
.
std::shared_ptr
Unlike std::unique_ptr
, which is designed to singly own and manage a resource, std::shared_ptr
is meant to solve the case where you need multiple smart pointers co-owning a resource. std::shared_ptr
keeps track of how many of them are sharing ownership on the same resource and won’t deallocate the memory until the last std::shared_ptr
goes out of scope.
// assuming same definition of Resource
int main() {
Resource* res{ new Resource };
std::shared_ptr<Resource> ptr1{ res };
{
std::shared_ptr<Resource> ptr2{ ptr1 }; // share ownership from ptr1 to ptr2
std::cout << "Killing one shared pointer\n";
}
std::cout << "Killing another shared pointer\n";
return 0;
}
This prints:
Acquired
Killing one shared pointer
Killing another shared pointer
Destroyed
Notice we’re sharing the ownership using a copy constructor instead of below
int main() {
Resource* res{ new Resource };
std::shared_ptr<Resource> ptr1{ res };
{
std::shared_ptr<Resource> ptr2{ res }; // set ownership independently
std::cout << "Killing one shared pointer\n";
}
std::cout << "Killing another shared pointer\n";
return 0;
}
which prints
Acquired
Killing one shared pointer
Destroyed
Killing another shared pointer
Destroyed
and the program will crash after the second Destroyed
, as the same resource has now been deallocated twice.
Similar to std::make_unique
, we have std::make_shared
for more succinct coding:
int main() {
auto ptr1 { std::make_shared<Resource>() };
{
auto ptr2 { ptr1 };
std::cout << "Killing one pointer\n";
}
std::cout << "Killing another pointer\n";
return 0;
}
Simpler, and more importantly safer, as there’s now no way to independently create ptr2
based on res
.
It’s worth knowing that a std::unique_ptr
can be converted to a std::shared_ptr
using a special constructor but not vice versa.
std::weak_ptr
Check the following example:
#include <iostream>
#include <memory>
#include <string>
class Person {
private:
std::string m_name{};
std::shared_ptr<Person> m_partner{}; // default to nullptr
public:
Person(const std::string& name) : m_name(name) { std::cout << name << " is born\n"; }
~Person() { std::cout << m_name << " is dead\n"; }
friend bool marry(std::shared_ptr<Person>& p1, std::shared_ptr<Person>& p2) {
if (!p1 || !p2) {
return false;
}
p1->m_partner = p2;
p2->m_partner = p1;
std::cout << p1->m_name << " marries " << p2->m_name << '\n';
return true;
}
};
int main() {
auto allen { std::make_shared<Person>("Allen") };
auto christine { std::make_shared<Person>("Christine") };
marry(allen, christine);
return 0;
}
The above program will print
Allen is born
Christine is born
Allen marries Christine
But that’s it! No follow-up deallocations will happen because when the destructor tries to destroy allen->m_partner
, it realizes that chris
is still owning the underlying resource and thus skips. Same happens when chris->m_partner
skips to deallocate the memory for the resource co-owned by allen
. As a result, this circular reference causes buggy deallocation despite the using of std::shared_ptr
.
std::weak_ptr
was designed to solve this problem and the only thing we need to change was to declare the m_partner
member variable as a std::weak_ptr
:
std::weak_ptr<Person> m_partner{}; // default to nullptr
and the same program will now print
Allen is born
Christine is born
Allen marries Christine
Christine is dead
Allen is dead
One drawback of std::weak_ptr
is that it’s just an observer and has no ->
overload. In order to access its resource, we need to convert a std::weak_ptr
into a std::shared_ptr
using lock
member function. For example:
const std::shared_ptr<Person> get_partner() const {
return m_partner.lock(); // otherwise it's not a shared_ptr
}
Unlike std::shared_ptr
which will keep the underlying resource alive, a std::weak_ptr
won’t and might become a dangling pointer. Luckily, it still got something that’s better than a dumb pointer, namely the boolean expire
member function which tells whether the reference count to an object has become zero:
std::weak_ptr<Resource> getWeakPtr() {
auto ptr{ std::make_shared<Resource>() };
return std::weak_ptr<Resource>{ ptr };
} // ptr goes out of scope, Resource destroyed
Resource* getDumbPtr() {
auto ptr{ std::make_unique<Resource>() };
return ptr.get();
} // ptr goes out of scope, Resource destroyed
int main() {
auto dumb{ getDumbPtr() };
std::cout << "Our dumb ptr is: " << ((dumb == nullptr) ? "nullptr\n" : "non-null\n");
auto weak{ getWeakPtr() };
std::cout << "Our weak ptr is: " << ((weak.expired()) ? "expired\n" : "valid\n");
return 0;
}
and it prints
Resource acquired
Resource destroyed
Our dumb ptr is: non-null
Resource acquired
Resource destroyed
Our weak ptr is: expired
Object Relationships
Composition and Aggregation
The process of building complex objects from simpler ones is called composition. There are two basic subtypes of object composition: composition and aggregation.
To qualify a composition, an object and a part must have the following relationship:
- The part (member) is part of the object (class)
- The part (member) can only belong to one object (class) at a time *
- The part (member) has its existence managed by the object (class) *
- The part (member) does not know about the existence of the object (class)
A real-world example of composition would be the relationship between a person’s body and a heart.
An aggregation must satisfy the following relationships:
- The part (member) is part of the object (class)
- The part (member) can (if desired) belong to more than one object (class) at a time *
- The part (member) does not have its existence managed by the object (class) *
- The part (member) does not know about the existence of the object (class)
A good example would be a person versus their address. Since multiple people can share the same address without the address being managed by the homeowners/leasers, it’s an aggregation.
A further summary:
Compositions:
- Typically use normal member variables
- Can use pointer members if the class handles object allocation/deallocated itself
- Responsible for creation/destruction of parts
Aggregations:
- Typically use pointer or reference members that point to or reference objects that live outside the scope of the aggregation class
- Not responsible for creating/destroying parts
Notice that we cannot initialize a list/array of references (because list elements must be assignable while references can’t)
std::vector<const Teacher&> m_teachers{}; // illegal
What we can do is to use std::reference_wrapper
provided in the <functional>
header which has a get
member function to unwrap and get the underlying reference:
#include <functional>
#include <iostream>
#include <vector>
#include <string>
int main() {
std::string tom{"Tom"};
std::string berta{"Berta"};
std::vector<std::reference_wrapper<std::string>> names {tom, berta}; // list of refs
std::string jim{"Jim"};
names.emplace_back(jim); // convert to ref wrapper and push to the end
for (auto name: names) {
name.get() += " Beam";
}
std::cout << jim << '\n'; // prints Jim Beam
return 0;
}
Association
To qualify an association, an object and another object must have the following relationship:
- The associated object (member) is otherwise unrelated to the object (class)
- The associated object (member) can belong to more than one object (class) at a time
- The associated object (member) does not have its existence managed by the object (class)
- The associated object (member) may or may not know about the existence of the object (class)
The relationship between doctors and patients is a great example of an association. The doctor clearly has a relationship with his patients, but conceptually it’s not a part/whole relationship and thus is not considered object composition. A doctor can see multiple patients and vice versa. There is naturally a circular dependency here and we need to take care how we implement this kind of relationships:
#include <functional>
#include <iostream>
#include <string>
#include <string_view>
#include <vector>
class Patient; // forward declaration
class Doctor {
private:
std::string m_name{};
std::vector<std::reference_wrapper<const Patient>> m_patients{};
public:
explicit Doctor(std::string_view name) : m_name(name) {
}
void add_patient(Patient& patient);
friend std::ostream& operator<<(std::ostream& os, const Doctor& doctor);
[[nodiscard]] const std::string& get_name() const {
return m_name;
}
};
class Patient {
private:
std::string m_name{};
std::vector<std::reference_wrapper<const Doctor>> m_doctors{};
void add_doctor(const Doctor& doctor) {
m_doctors.emplace_back(doctor);
} // this needs to be private because we always prefer Doctor.add_patient
public:
explicit Patient(std::string_view name) : m_name(name) {
}
friend std::ostream& operator<<(std::ostream& os, const Patient& patient);
[[nodiscard]] const std::string& get_name() const {
return m_name;
}
friend void Doctor::add_patient(Patient& patient); // so that it can access Patient.add_doctor
};
void Doctor::add_patient(Patient& patient) {
m_patients.emplace_back(patient);
patient.add_doctor(*this);
}
std::ostream& operator<<(std::ostream& os, const Doctor& doctor) {
if (doctor.m_patients.empty()) {
os << doctor.m_name << " has no patients right now\n";
return os;
}
os << doctor.m_name << " is seeing these patients:\n";
for (const auto patient : doctor.m_patients) {
os << "- " << patient.get().get_name() << '\n';
}
return os;
}
std::ostream& operator<<(std::ostream& os, const Patient& patient) {
if (patient.m_doctors.empty()) {
os << patient.m_name << " has no doctors right now\n";
return os;
}
os << patient.m_name << " is seeing these doctors:\n";
for (const auto doctor : patient.m_doctors) {
os << "- " << doctor.get().get_name() << '\n';
}
return os;
}
int main() {
Patient dave{"Dave"};
Patient frank{"Frank"};
Patient betsy{"Besty"};
Doctor james{"James"};
Doctor scott{"Scott"};
james.add_patient(dave);
james.add_patient(frank);
scott.add_patient(betsy);
std::cout << james << '\n';
std::cout << scott << '\n';
return 0;
}
Below is a summary table of the three kinds of relationships we’ve talked about by far:
Property | Composition | Aggregation | Association |
---|---|---|---|
relationship type | whole/part | whole/part | otherwise unrelated |
members can belong to multiple classes | no | yes | yes |
members’ existence managed by class | yes | no | no |
directionality | undirectional | undirectional | undirectional or bidirectional |
relationship verb | part of | has-a | uses-a |
Dependencies
A dependency occurs when one object invokes another object’s functionality in order to accomplish some specific task. This is a weaker relationship than an association, but still, any change to object being depended upon may break functionality in the (dependent) caller. A dependency is always an undirectional relationship.
In C++, associations are a relationship where one class always directly or indirectly “links” to the associated class as a member, namely one class knows its members. Dependencies, on the other hand, are not memberships and they are typically instantiated as needed only.
Container classes
A container class is a class designed to hold and organize multiple instances of another type (either another class, or a fundamental type). There are many different kinds of container classes, each of which has various advantages, disadvantages and restrictions in their use. By far the most commonly used container in programming is the array, which you have already seen in many examples. Although C++ has built-in array functionality, programmers will often use an array container class (std::array
or std::vector
) instead because of the additional benefits they provide. Unlike built-in arrays, array container classes generally provide dynamic resizing (when elements are added or removed), remember their size when they are passed to functions, and do bounds-checking. This not only mankes array container classes more convenient than normal arrays, but safer too.
Most well-defined containers provide these functionalities:
- Create an empty container via constructor
- Insert a new object into the container
- Remove an object from the container
- Report the number of objects currently in the container
- Empty the container of all objects
- Provide access to the stored objects
- Sort the elements (optional)
Sometimes certain classes omit some of these functionalities: for example, arrays often omit insert and remove functionalities because they are slow and not encouraged. Below is an example of an IntArray
container class:
// IntArray.h
#ifndef INTARRAY_H
#define INTARRAY_H
#include <algorithm> // for std::copy_n
#include <cassert> // for assert()
class IntArray {
private:
int m_length{};
int* m_data{};
public:
IntArray() = default;
IntArray(IntArray&&) = delete;
IntArray& operator=(IntArray&&) = delete;
explicit IntArray(int length) : m_length{length} {
assert(length >= 0);
if (length > 0) {
m_data = new int[length]{};
}
}
~IntArray() {
delete[] m_data;
// we don't need to set m_data to null or m_length to 0 here, since the object will be destroyed immediately
// after this function anyway
}
IntArray(const IntArray& a) {
// Set the size of the new array appropriately
reallocate(a.get_length());
std::copy_n(a.m_data, m_length, m_data); // copy the elements
}
IntArray& operator=(const IntArray& a) {
// Self-assignment check
if (&a == this) {
return *this;
}
// Set the size of the new array appropriately
reallocate(a.get_length());
std::copy_n(a.m_data, m_length, m_data); // copy the elements
return *this;
}
void erase() {
delete[] m_data;
// We need to make sure we set m_data to nullptr here, otherwise it will
// be left pointing at deallocated memory!
m_data = nullptr;
m_length = 0;
}
int& operator[](int index) {
assert(index >= 0 && index < m_length);
return m_data[index];
}
// reallocate resizes the array. Any existing elements will be destroyed. This function operates quickly.
void reallocate(int newLength) {
// First we delete any existing elements
erase();
// If our array is going to be empty now, return here
if (newLength <= 0) {
return;
}
// Then we have to allocate new elements
m_data = new int[newLength];
m_length = newLength;
}
// resize resizes the array. Any existing elements will be kept. This function operates slowly.
void resize(int newLength) {
// if the array is already the right length, we're done
if (newLength == m_length) {
return;
}
// If we are resizing to an empty array, do that and return
if (newLength <= 0) {
erase();
return;
}
// Now we can assume newLength is at least 1 element. This algorithm
// works as follows: First we are going to allocate a new array. Then we
// are going to copy elements from the existing array to the new array.
// Once that is done, we can destroy the old array, and make m_data
// point to the new array.
// First we have to allocate a new array
int* data{new int[newLength]};
// Then we have to figure out how many elements to copy from the existing
// array to the new array. We want to copy as many elements as there are
// in the smaller of the two arrays.
if (m_length > 0) {
int elements_to_copy{(newLength > m_length) ? m_length : newLength};
std::copy_n(m_data, elements_to_copy, data); // copy the elements
}
// Now we can delete the old array because we don't need it any more
delete[] m_data;
// And use the new array instead! Note that this simply makes m_data point
// to the same address as the new array we dynamically allocated. Because
// data was dynamically allocated, it won't be destroyed when it goes out of scope.
m_data = data;
m_length = newLength;
}
void insert_before(int value, int index) {
// Sanity check our index value
assert(index >= 0 && index <= m_length);
// First create a new array one element larger than the old array
int* data{new int[m_length + 1]};
// Copy all of the elements up to the index
std::copy_n(m_data, index, data);
// Insert our new element into the new array
data[index] = value;
// Copy all of the values after the inserted element
std::copy_n(m_data + index, m_length - index, data + index + 1);
// Finally, delete the old array, and use the new array instead
delete[] m_data;
m_data = data;
++m_length;
}
void remove(int index) {
// Sanity check our index value
assert(index >= 0 && index < m_length);
// If this is the last remaining element in the array, set the array to empty and bail out
if (m_length == 1) {
erase();
return;
}
// First create a new array one element smaller than the old array
int* data{new int[m_length - 1]};
// Copy all of the elements up to the index
std::copy_n(m_data, index, data);
// Copy all of the values after the removed element
std::copy_n(m_data + index + 1, m_length - index - 1, data + index);
// Finally, delete the old array, and use the new array instead
delete[] m_data;
m_data = data;
--m_length;
}
// A couple of additional functions just for convenience
void insert_at_beginning(int value) {
insert_before(value, 0);
}
void insert_at_end(int value) {
insert_before(value, m_length);
}
[[nodiscard]] int get_length() const {
return m_length;
}
};
#endif
std::initializer_list
We have seen the following examples using initializer lists:
int array[] { 1,2,3,4,5 };
auto* array{new int[5]{ 1,2,3,4,5 }};
However, we cannot use initializer list to instantiate our IntArray
out of box:
IntArray array{1,2,3,4,5}; // won't compile
and this is because we need to explicitly define a constructor that takes initializer lists:
#include <cassert> // for assert
#include <initializer_list> // for std::initializer_list
// etc
class IntArray {
private:
// etc
public:
IntArray(std::initializer_list<int> list) // constructor for initializer list
: IntArray(static_cast<int>(list.size())) { // delegation constructor
int count {0};
for (auto element : list) {
m_data[count] = element;
++count;
}
}
}
Note that list initialization prefers list constructors over non-list constructors:
IntArray a1(5); // using IntArray(int) and has length 5
IntArray a2{5}; // using IntArray(std::initializer_list<int>) and has length 1
For the very reason it’s wise to remember that adding a list constructor to an existing class that did not have one may break existing programs, and hence should overall be avoided unless you know what you’re doing. In the case we really need to implement/add a constructor that takes a std::initializer_list
, we need to ensure that we do at least one of the following:
- Provide an overloaded list assignment operator
- Provide a proper deep-copying copy assignment operator
- Delete the copy assignment operator
Inheritance
Basic inheritance in C++
Inheritance in C++ takes place between classes. In an inheritance relationship, the class being inherited from is called the parent class, base class or superclass. The class doing the inheriting is called the child class, derived class or subclass. For example:
#include <string>
#include <string_view>
class Person {
public: // making our members public just in this example
std::string m_name{};
int m_age{}
Person(std::string_view name = "", int age = 0)
: m_name(name), m_age(age) {
}
const std::string& get_name() const { return m_name; }
const std::string& get_age() const { return m_age; }
};
class BaseballPlayer : public Person {
public:
double m_batting_average{};
int m_homeruns{};
BaseballPlayer(double batting_average = 0, int homeruns = 0)
: m_batting_average(batting_average), m_homeruns(homeruns) {
}
};
In the example above we’re declaring BaseballPlayer
as a public inheritance of Person
. We’ll talk more about this in a future lesson.
Order of construction of derived classes
C++ constructs derived classes in phases, starting with the most-base class and finishing with the most-child class. As each class is constructed, the appropriate constructor from that class is called to initialize that part of the class.
Constructors and initialization of derived classes
Here is what happens when a base class is instantiated:
- Memory for base is set aside
- The appropriate base constructor is called
- The mmeber initializer list initializes variables
- The body of the constructor executes
- Control is returned to the caller
And here is what happens when a derived class is instantiated:
- Memory for derived is set aside (for both base and derived portions)
- The appropriate derived constructor is called
- The base object is constructed first using the appropriate base constructor
- The member initializer list initializes variables
- The body of the constructor executes
- Control is returned to the caller
However, it’s important to realize that we cannot initialize the members defined in the base class. Consider what would happen if the member variable of the base class were const. Since const variables must be initialized with a value at the time of creation, the base class constructor must set its value when the variable is created and the derived class constructor can no longer modify it. Even if the member variable was not const, we would have to double-initialize the same variable and that’s not allowed (or at least, recommended). That means the following won’t work
class Derived : public Base { // assume m_id was declared in Base class
public:
double m_cost{};
Derived(double cost=0., int id=0)
: m_cost{cost}, m_id{id} {
}
};
The following would work but only when m_id
was not declared const, and again it’s not optimal with double assignment:
class Derived : public Base {
public:
double m_cost{};
Derived(double cost=0., int id=0)
: m_cost{cost} {
m_id = id;
}
};
The correct way is to delegate the construction to the base class constructor directly:
class Derived : public Base {
public:
double m_cost{};
Derived(double cost=0., int id=0)
: Base{id}, m_cost{cost} {
}
};
Here is what happened above:
- Memory for derived is allocated
- The derived constructor is called
- The compiler looks to see if we’ve asked for a particular base class constructor – we have
- The base class constructor is explicitly called
- The base class constructor body executes
- The base class constructor returns
- The derived class constructor is called
- The derived class constructor body executes
- The derived class constructor returns
Inheritance and access specifiers
We’ve seen public
and private
access specifiers. There is one more access specifier called protected
which brings a lit bit of complexity to the discussion. The protected
access specifier allows to the class the member belongs to, friends and drvied class to access the member. However, protected members are not accessbible from outside the class. For example:
class Base {
private:
int m_private{}; // can only be accessed by Base members and friends
protected:
int m_protected{}; // can be accessed by Base members, friends and derived classes
public:
int m_public{}; // can be accessed anybody
};
correspondingly, there are three types of inheritance in C++:
class PrivateDerived : private Base {
// m_private becomes inaccessible *
// m_protected becomes private *
// m_public becomes private *
};
class ProtectedDerived : protected Base {
// m_private becomes inaccessible *
// m_protected becomes protected *
// m_public becomes protected *
};
class PublicDerived : public Base {
// m_private becomes inaccessible *
// m_protected remains protected
// m_public remains public
};
Adding new functionality to a derived class
One of the biggest benefits of inheritance is the ability to reuse already written code. We can inherit the base class functionality and then add new functionality, modify existing functionality or hide functionality you don’t want. Assume we have the following setup:
class Base {
protected:
int m_value{};
public:
Base(int value) : m_value{value} {}
void identity() const { std::cout << "I am a Base\n"; }
};
class Derived : public Base {
public:
Derived(int value) : Base{value} {}
};
Now, say we want to “modify” the base class so that m_value
is accessible from the public. We can add this new functionality though the derived class:
class Derived : public Base {
public:
// same as above
int get_value() const { return m_value; }
};
Calling inherited functions and overriding behavior
In the example from the previous sector, we can observe this strange behavior:
int main() {
Base base {5};
base.identify();
Derived deri{8};
deri.identify();
return 0;
}
The output is
I am a Base
I am a Base
How can we correct this? We can modify the function inherited from the base class:
class Derived : public Base {
public:
// same as default
void identify() const { std::cout << "I am a Derived\n"; }
};
We can also add to existing functionality, instead of overriding the member function altogether:
class Derived : public Base {
public:
// same as default
void identify() const {
Base::identify(); // calling the inherited function
std::cout << "...and I am a Derived\n";
}
};
Notice the scope Base::
is necessary to avoid infinite recursion.
It becomes trickier when we want to call the friend function of the base class, as the scope won’t apply in this case. To do that, we can use static_cast
to make our derived class temporarily “look like” a base:
class Base {
public:
friend std::ostream& operator<<(std::ostream& os, const Base& b) {}
};
class Drvied : public Base {
public:
friend std::ostream& operator<<(std::ostream& os, const Derived& d) {
os << "Inside Derived we print:\n";
os << static_cast<const Base&>(d);
return os;
}
};
Hiding inherited functionality
We can expose an inherited function to public:
class Base {
private:
int m_value{};
protected:
void print() const { std::cout << "yo!\n"; }
public:
Base(int value) : m_value{value} {}
};
class Derived : public Base {
public:
Derived(int value) : Base{value} {}
using Base::print; // no parentheses here!
};
We can also hide an inherited functionality:
class Derived : public Base {
private:
using Base::print;
};
However, the print
function is still public to Base
and we can still access it from Derived
by static_cast
the derived class instance to a Base
. Also, given a set of overloaded functions in the base class, there is no way to change the access specifier for a single overload. You can only change them all if wanted.
We can also outright delete a function inherited from base class:
class Derived : oublic Base {
public:
void print() const = delete;
};
Multiple inheritance
Multiple inheritance enables a derived class to inherit members from multiple parents. To use multiple inheritance, simply specify each base class separated by a comma:
#include <string>
#include <string_view>
class Person {
private:
std::string m_name{};
int m_age{};
public:
Person(std::string_view name, int age)
: m_name{ name }, m_age{ age } {
}
const std::string& getName() const { return m_name; }
int getAge() const { return m_age; }
};
class Employee {
private:
std::string m_employer{};
double m_wage{};
public:
Employee(std::string_view employer, double wage)
: m_employer{ employer }, m_wage{ wage } {
}
const std::string& getEmployer() const { return m_employer; }
double getWage() const { return m_wage; }
};
class Teacher : public Person, public Employee {
private:
int m_teachesGrade{};
public:
Teacher(std::string_view name, int age, std::string_view employer, double wage, int teachesGrade)
: Person{ name, age }, Employee{ employer, wage }, m_teachesGrade{ teachesGrade } {
}
};
int main() {
Teacher t{ "Mary", 45, "Boo", 14.3, 8 };
return 0;
}
A mixin (also spelled as “mix-in”) is a small class that can be inherited from in order to add properties to a clas. The name mixin indicates that the class is intended to be mixed into other classes, not instantiated on its own. In the following example, the BoxMixin
and LabelMixin
classes are mixins that we can inherit from in order to create a Button
class:
struct Point2D {
int x{};
int y{};
};
class BoxMixin {
private:
Point2D m_top_left{};
Point2D m_bottom_right{};
public:
void set_top_left(Point2D point) { m_top_left = point; }
void set_bottom_right(Point2D point) { m_bottom_right = point; }
};
class LabelMixin {
private:
std::string m_text{};
public:
void set_text(std::string_view text) { m_text = text; }
};
class Button : public BoxMixin, public LabelMixin {};
It’s common to see mixins defined using templates:
template <class T>
class Mixin {};
class Derived : public Mixin<Derived> {};
Such inheritance is called Curiously Recurring Template Pattern (CRTP).
There are several problems with multiple inheritance:
- Ambiguity from same identifiers from multiple parents (can be resolved with explicit scope)
- The diamond problem, namely inheriting from multiple parents who are thereafter children to the same grandparent class, which may cause a ton of problems e.g. how many copies of the grandparent class should the derived class hold
Virtual Functions
Pointers and references to the base class of derived objects
Check the following example:
class Base {
protected:
int m_value {};
public:
Base(int value) : m_value{value} {}
std::string_view get_name() const { return "Base"; }
int get_value() const { return m_value; }
};
class Derived : public Base {
public:
Derived(int value) : Base { value } {}
std::string_view get_name() const { return "Derived"; }
int get_value_doubled() const { return m_value * 2; }
};
int main() {
Derived derived { 5 };
std::cout << "derived is a " << derived.get_name() << " and has value " << derived.get_value() << '\n';
Derived& ref_derived { derived };
std::cout << "ref_derived is a " << ref_derived.get_name() << " and has value " << ref_derived.get_value() << '\n';
Derived* ptr_derived { &derived };
std::cout << "ptr_derived is a " << ptr_derived->get_name() << " and has value " << ptr_derived->get_value() << '\n';
Base& ref_base { derived }; // Base type ref to derived obj
std::cout << "ref_base is a " << ref_base.get_name() << " and has value " << ref_base.get_value() << '\n';
Base* ptr_base { &derived }; // Base type ptr to derived obj
std::cout << "derived is a " << ptr_base->get_name() << " and has value " << ptr_base->get_value() << '\n';
return 0;
}
Surprisingly (maybe) we will get the following output:
derived is a Derived and has value 5
ref_derived is a Derived and has value 5
ptr_derived is a Derived and has value 5
ref_base is a Base and has value 5
derived is a Base and has value 5
Notice how only the base part of the derived
instance got referenced/pointed. It also means we cannot access anything under the derived part e.g. get_value_doubled
. This is relevant (and not silly at all) when we want to write one function that takes the base class instead of multiple functions corresponding to the overload of each derived class.
Virtual functions and polymorphism
A virtual function is a special type of member function that, when called, resolves to the most-derived version of the function for the actual type of the object being referenced or pointed to. A derived function is considered a match if it has the same signature (name, parameter types and whether it’s const) and return type as the base version of the function. Such functions are called overrides. For example:
class Base {
public:
virtual std::string_view get_name() const { return "Base"; }
};
class Derived : public Base {
public:
virtual std::string_view get_name() const { return "Derived"; }
};
int main() {
Derived derived {};
Base& ref_base { derived }; // Base type ref to derived obj
std::cout << "ref_base is a " << ref_base.get_name() << '\n';
return 0;
}
Since ref_base
only references to the base part of the object, by default the get_name
resolves to the version under the base class. However, with the virtual
specifier we let the compiler know that the function should instead resolve to the most-derived version, which in this case suits our need.
Notice that the virtual function resolution only works when it’s called through a pointer or refernce to the class object. It’s natural that the compiler will call the exact member function when we call the virtual function from the object directly.
In programming, polymorphism refers to the ability of an entity to have multiple forms. For example:
int add(int, int)
double add(double, double)
There are two types of polymorphism:
- Compile-time polymorphism refers to forms of polymorphism that are resolved by the compiler, which includes function overloads and template resolution
- Runtime polymorphism refers to forms of polymorphism that are resolved at runtime, which includes virtual functions etc
Last notes:
- If a base function is virtual, all matching overrides are implicitly virtual even without the
virtual
specifier. - Virtual functions take longer time to resolve
- Don’t call virtual functions inside constructors or destructors, as only the base version would exist at that time
The overrides and final specifiers and covariant return types
To address some common challenges with inheritance, C++ has two inheritance-related identifiers: override
and final
. Note that they identifiers are not keywords, they are normal words that have special meaning only when used in certain contexts. The C++ standard calls them “identifiers with special meaning”, but they’re often just referred to as “specifiers”.
A derived class virtual function is only considered an override if its signature and return types match exactly. To help address the issue of functions that are meant to be overrides but aren’t, the override specifier can be applied to any virtual function by placing the override
specifier after the function signature (the same place a function-level const specifier goes). If the function is not an override to a base class function (or is applied to a non-virtual function), the compiler will flag the function as an error:
#include <string_view>
class A {
public:
virtual std::string_view getName1(int x) { return "A"; }
virtual std::string_view getName2(int x) { return "A"; }
virtual std::string_view getName3(int x) { return "A"; }
};
class B : public A {
public:
std::string_view getName1(short int x) override { return "B"; } // compile error, function is not an override
std::string_view getName2(int x) const override { return "B"; } // compile error, function is not an override
std::string_view getName3(int x) override { return "B"; } // okay, function is an override of A::getName3(int)
};
int main() {
return 0;
}
There are cases where you don’t want someone to be able to override a virtual function, or inherit from a class. The final specifier can be used in that case to enforce.
#include <string_view>
class A {
public:
virtual std::string_view getName() const { return "A"; }
};
class B : public A {
public:
// note use of final specifier on following line -- that makes this function not able to be overridden in derived classes
std::string_view getName() const override final { return "B"; } // okay, overrides A::getName()
};
class C : public B {
public:
std::string_view getName() const override { return "C"; } // compile error: overrides B::getName(), which is final
};
In the case where we want to prevent inheriting from a class, the final specifier is applied after the class name:
#include <string_view>
class A {
public:
virtual std::string_view getName() const { return "A"; }
};
class B final : public A { // note use of final specifier here
public:
std::string_view getName() const override { return "B"; }
};
class C : public B { // compile error: cannot inherit from final class
public:
std::string_view getName() const override { return "C"; }
};
There is one special case in which a derived class virtual function override can have a different return type than the base class and still be considered a matching override. If the return type of a virtual function is a pointer or a reference to some class, override function can return a pointer or reference to a derived class. These are called covariant return types:
#include <iostream>
#include <string_view>
class Base {
public:
// This version of getThis() returns a pointer to a Base class
virtual Base* getThis() { std::cout << "called Base::getThis()\n"; return this; }
void printType() { std::cout << "returned a Base\n"; }
};
class Derived : public Base {
public:
// Normally override functions have to return objects of the same type as the base function
// However, because Derived is derived from Base, it's okay to return Derived* instead of Base*
Derived* getThis() override { std::cout << "called Derived::getThis()\n"; return this; }
void printType() { std::cout << "returned a Derived\n"; }
};
int main() {
Derived d{};
Base* b{ &d };
d.getThis()->printType(); // calls Derived::getThis(), returns a Derived*, calls Derived::printType
b->getThis()->printType(); // calls Derived::getThis(), returns a Base*, calls Base::printType
return 0;
}
Virtual destructors, virtual assignment and overriding virtualization
Whenever we need to define our own destructor for a derived class, we need to make it virtual:
class Derived : public Base {
private:
int* m_array {};
public:
Derived(int length) : m_array{new int[length]} {}
virtual ~Derived() {
delete[] m_array;
}
};
It is not necessary to create an empty derived class destructor just to mark it as virtual, as derived overrides are all assumed virtual if not otherwise specified.
As for virtual assignments, it’s generally not needed nor suggested to consider virtualizing assignment operators for the interest of simplicity.
More often, we can ignore virtualization by uisng scope resolution operator:
int main() {
Derived derived {};
const Base& base { derived };
std::cout << base.Base::get_name() << '\n';
return 0;
}
Finally we have these rules (just as recommendations / rules of thumb):
- If you intend your class to be inherited from, make sure your destructor is virtual
- If you do not intend your class to be inherited from, mark your class as final. This will prevent other classes from inheriting from it in the first place, without imposing any other use restrictions on the class itself
Early binding and late binding
When a program is compiled, the compiler converts each statement in your C++ program into one or more lines of machine language. Each line of machine language is given its own unique sequential address. This is no different for functions – when a function is encountered, it is converted into machine language and given the next available address. Thus, each function ends up with a unique address.
Binding refers to the process that is used to convert identifiers (such as variable and function names) into addresses. Although binding is used for both variables and functions, in this lesson we’re going to focus on function binding.
Early binding (as known as static binding) means the compiler (or linker) is able to directly associate the identifier name (s.g. function or variable name) with a machine address. This includes cases where we define functions ahead of time and call them directly.
Late binding (also known as dynamic binding, in the case of virtual function resolution) means the function being called is looked by name at runtime only. In C++, one way to get late binding is to use function pointers. Calling a function via a pointer is also known as an indirect function call, in which case the compiler cannot tell which function is pointed to at compile time.
The virtual table
C++ implementations typically implement virtual functions using a form of late biding known as the virtual table, which is a lookup table of functions used to resolve function calls in a dynamic/late binding manner. The virtual table sometimes goes by “vtable” or “dispatch table”.
Every class that uses virtual functions (or is derived from a class that uses virtual functions) has a corresponding virtual table. The table is simply a static array that the compiler sets up at compile time. A virtual table contains one entry for each virtual function that can be called by objects of the class. Each entry in this table is simply a function pointer that points to the most-derived function accessible by the class. The compiler then also adds a hidden pointer that is a member of the base class, which we call *__vptr
. Unlike the this
pointer, which is actually a function parameter used by the compiler to resolve self-reference, *__vptr
is a real pointer member. It makes each class object allocated bigger by the size of one pointer. It also means *__vptr
is inherited by the derived classes, which is important.
Pure virtual functions, abstract base classes and interface classes
C++ allows us to create a special kind o virtual functions called pure virtual functions (or abstract functions) that have no body at all. A pure virtual function simply acts as a placeholder that is meant to be redefined by derived classes.
#include <string>
#include <string_view>
class Animal { // this is an ABC
protected:
std::string m_name {};
public:
Animal(std::string_view name) : m_name{ name } {}
const std::string& getName() const { return m_name; }
virtual std::string_view speak() const = 0; // speak is a pure virtual function
virtual ~Animal() = default;
};
Any class with one or more pure virtual functions becomes an abstract base class, which means that it cannot be instantiated.
An interface class is a class that has no member variables, and where all of the functions are pure virtual. Interfaces are useful when you want to define the functionality that derived classes must implement but leave the details of how the derived class implement that functionality entirely up to the derived class. Interface classes are often named beginning with “I”.
#include <string_view>
class IErrorLog {
public:
virtual bool openLog(std::string_view filename) = 0;
virtual bool closeLog() = 0;
virtual bool writeError(std::string_view errorMessage) = 0;
virtual ~IErrorLog() {} // make a virtual destructor in case we delete an IErrorLog pointer, so the proper *derived* destructor is called
};
with this, we can write functions that takes a parameter in any class that conforms to the IErrorLog
interface:
#include <cmath> // for sqrt()
// instead of this, assuming FileErrorLog derives from IErrorLog
double mySqrt(double value, FileErrorLog& log) {
if (value < 0.0) {
log.writeError("Tried to take square root of value less than 0");
return 0.0;
}
return std::sqrt(value);
}
// now we have this
double mySqrt(double value, IErrorLog& log) {
if (value < 0.0) {
log.writeError("Tried to take square root of value less than 0");
return 0.0;
}
return std::sqrt(value);
}
Virtual base classes
We can solve the dimond problem (discussed earlier) by inheriting virtually:
class PoweredDevice {
};
class Scanner: virtual public PoweredDevice {
};
class Printer: virtual public PoweredDevice {
};
class Copier: public Scanner, public Printer {
}; // neither Scanner or Printer will constructor PoweredDevice -- Copier itself will do the job
Object slicing
Object slicing happens when we assign a derived object to a base type, it also happens when we push elements into a vector when the vector is of base type while the new element is of derived. We can avoid this by using references and pointers.
int main() {
Derived d1 {5};
Derived d2 {6};
Baser& b {d2};
b = d1;
return 0;
}
In the above example, b
is initialized a derived from d2
but then assigned d1
later. However, only the base portion of the d1
will be copied over and the result is b
becomes base of d1
together with derived of d2
.
Dynamic casting
We can use dynamic_cast
just like static_cast
to convert a base pointer (pointing at a derived object) to a derived pointer. If the pointer being converted is not pointing to a derived object, the resulting pointer will be a null pointer. It is also konwn as downcasting (comparing with upcasting from a derived up to base). There are also several other cases where downcasting won’t work:
- With protected or private inheritance
- For classes that do not declare or inherit any virtual functions (and thus have no virtual table, which downcasting needs)
- In certain cases involving virtual base classes, see here
Also, it turns out that we can use static_cast
for downcasting, which will be faster but more dangerous because there will be no type checking at runtime. Instead of a null pointer, we will observe undefined behavior if we downcast a pointer not pointing at a derived object.
In addition to pointers, dynamic casting also works for references.
In general, using virtual function should be preferred over downcasting. However, there are several cases where downcasting is the better choice:
- When you can not modify the base class to add a virtual function (e.g. because the base class is part of the standard library)
- When you need access to something that is derived-class specific (e.g. an access function that only exists in the derived class)
- When adding a virtual function to your base class doesn’t make sense (e.g. there is no appropriate value for the base class to return). Using a pure virtual function may be an option here if you don’t need to instantiate the base class.
Printing inherited classes using operator<<
Because we typically implement operator<<
as a friend instead of member function, and friends cannot be virtualized, we can’t simply pretend that C++ will use the most derived overload version of the function when we want to print an inherited class.
Instead of making operator<<
virtual, a simple solution is to let it call a function that is virtualized:
class Base {
public:
friend std::ostream& operator<<(std::ostream& os, const Base& b) {
os << b.identify();
return os;
}
virtual std::string identify() const {
return "Base";
}
};
class Derived : public Base {
public:
std::string identify() const override {
return "Derived";
}
};
int main() {
Base b{};
std::cout << b << '\n';
Derived d{};
std::cout << d << '\n'; // this works even without explicit << in Derived
Base& bref {d};
std::cout << bref << '\n';
return 0;
}
The above program will print
Base
Derived
Derived
Templates and Classes
Template classes
Let’s say we want to create a template class of array that works with any type of elements. In array.h
we have
#ifndef ARRAY_H
#define ARRAY_H
#include <cassert>
template <typename T>
class Array {
private:
int m_len{};
T* m_data{};
public:
Array(int len) {
assert(len > 0);
m_data = new T[len]{};
m_len = len;
}
Array(const Array&) = delete;
Array& operator=(const Array&) = delete;
~Array() {
delete[] m_data;
}
void erase() {
delete[] m_data;
m_data = nullptr;
m_len = 0;
}
T& operator[](int i);
int get_len() const {
return m_len;
}
};
template <typename T>
T& Array<T>::operator[](int i) {
assert(i >= 0 && i < m_len);
return m_data[i];
}
#endif
We can then use the template class as follows:
int main() {
const int len{12};
Array<int> int_arr{len};
Array<double> double_arr{len};
for (int i{0}; i < len; ++i) {
int_arr[i] = i;
double_arr[i] = i + 0.5;
}
for (int i{len - 1}; i >= 0; --i) {
std::cout << int_arr[i] << '\t' << double_arr[i] << '\n';
}
return 0;
}
which prints:
11 11.5
10 10.5
9 9.5
8 8.5
7 7.5
6 6.5
5 5.5
4 4.5
3 3.5
2 2.5
1 1.5
0 0.5
Notice that we cannot also move the definition part of Array::operator[]
into array.cpp
. This is because a template is not a class or a function, but a stencil used to create classes or functions. We can leave everything inside the header file, or put the definitions we originally want to put in array.cpp
now into array.inl
(as in “inline”) and then include array.inl
at the bottom of the array.h
file. A third solution is to write declarations in array.h
, definitions in array.cpp
, and then include both into another file e.g. templates.cpp
:
#include "array.h"
#include "array.cpp"
template class Array<int>; // explicitly instantiate template class here!
template class Array<double>;
Template non-type parameters
A template non-type parameter is a template parameter where the type of the parameter is predefined and is substituted for a constexpr value passed in as an argument. A non-type parameter can be any of the following types:
- An integral type
- An enumeration type
- A pointer or reference to a class object
- A pointer or reference to a function
- A pointer or reference to a class member function
std::nullptr_t
- A floating point type (since C++20)
For example:
template <typename T, int size> // size here is an integral non-type parameter
class StaticArray {
private:
T m_array[size] {};
public:
T* get_array();
T& operator[](int i) {
return m_array[i];
}
};
template<typename T, int size>
T* StaticArray<T, size>::get_array() {
return m_array;
}
Function template specialization
Sometimes we want to vary a bit about the behavior of a templated function. Say we have a templated function print
that can print integers and doubles. However, we now would like double outputs to be in scientific notation:
template <typename T>
void print(const T& t) {
std::cout << t << '\n';
}
template<>
void print<double>(const double& d) {
std::cout << std::scientific << d << '\n';
}
int main() {
print(5);
print(6.7);
return 0;
}
which prints
5
6.700000e+000
This is called function template specialization.
Class template specialization
When we want to “specialize” a class member function, we’re no longer talking about function template specialization. Instead, since the member function resides under the template class, we’re actually trying to implement class template specialization.
While we can always specialize the whole class, C++ also allows us to only specialize a certain member function and automatically implements the rest as the same as the non-specialized class:
template <typename T>
class Storage {
private:
T m_value{};
public:
Storage(T value) : m_value{value} {};
void print() {
std::cout << m_value << '\n';
}
};
template<>
void Storage<double>::print() {
std::cout << std::scientific << m_value << '\n';
} // only specializing this member function here
Partial template specialization
We can partialy specialize a class but not a function.
Exceptions
Basic exception handling
There are three keywords that work in conjunction with each other in exception handling:
throw
, try
and catch
.
In C++, a throw statement is used to signal that an exception or error case has occurred. It’s also commonly called raising an exception.
throw -1; // throw a literal integer
throw ENUM_INVALID_INDEX; // throw an enum value
throw "can you be quiet"; // throw a literal C-style string
throw dX; // throw a double varaible that was previously defined
throw MyException("deadly dead"); // throw an object of class MyException
We can use the try
keyword to define a block of statements (called a try block) that acts as an observer looking for any exceptions that are thrown within:
try {
throw -1;
}
Notice that the above block doesn’t define how we’re going to handle the exception. The actual handling is done by the catch blocks.
catch (int x) {
std::cerr << "We caught an int exception with value " << x << '\n';
}
Putting them altogether, we have the following example:
#include <iostream>
#include <string>
int main() {
try {
throw -1;
}
catch (int x) {
std::cerr << "We caught an int exception with value " << x << '\n';
}
std::cout << "Continuing the program now...\n";
return 0;
}
and the program will print
We caught an int exception with value -1
Continuing the program now...
There are four common things that a catch block may do when they catch an exception:
- Print what the error is
- Return something
- Throw another error
- If in
main()
, catch the fatal errors and termnate the program in a clean way
Exceptions, functions and stack unwinding
You can throw an exception from within a function to be caught outside, as long as the function caller is within a try block. This mechanism is called stack unwinding.
Uncaught exceptions and catch-all handlers
When a function throws an exception that it does not handle itself, it is making the assumption that a function somewhere down the call stack will handle the exception. When no exception handler for a function can be found, std::terminate()
is called and the application is terminated. In such cases, the call stack may or may not be unwound. If the stack is not unwound, local variables will not be destroyed and any cleanup expected upon destruction of said variables will not happen.
Fortunately, C++ provides us with a mechanism to catch all types of exceptions. This is known as a catch-all handler.
int main() {
try {
throw 5;
}
catch (double x) {
std::cout << "We caught an exception of type double: " << x << '\n';
}
catch (...) { // catch-all handler
std::cout << "We caught an exception of undetermined type\n";
}
}
Exceptions, classes and inheritance
We can throw an exception instead of using assert
to pass specific error code back to the caller. However, what if we still need more information? We resort to exception classes in this case.
class ArrayException {
private:
std::string m_error;
public:
ArrayException(std::string_view error) : m_error{error} {}
const std::string& get_error() const { return m_error; }
};
Because an exception class is a class, and a class can be inherited, it’s natural that an exception class can be derived. However, C++ checks the exception class type sequentially and when a derived exception is caught, the base type will be detected before the derived:
class Base {
public:
Base() {}
};
class Derived : public Base {
public:
Derived() {}
};
int main() {
try {
throw Derived();
}
catch (const Base& base) {
std::cerr << "Caught Base\n";
}
catch (const Derived& derived) {
std::cerr << "Caught Derived\n";
}
return 0;
}
The above program will print
Caught Base
and to make it work properly we have to swap the order of the two catch blocks.
In C++ STL we have std::exception
(provided in <exception>
) that basically is the base class of all exception classes, so we can easily catch all exceptions (by the STL) using this class:
#include <exception>
int main() {
try {
...
}
catch (const std::exception& exception) {
std::cerr << "Standard exception: " << exception.what() << '\n';
}
return 0;
}
and when we want to handle specific STL exceptions differently, we just add those particular catch blocks before this general block.
We can throw STL exceptions or derive our own exception class from std::exception
however we like, as long as we keep the custom exception class copyable, as the compiler makes a copy of the exception object to some piece of unspecified memory (outside of the call stack) reserved for handling exceptions:
class Base {
public:
Base() {}
};
class Derived : public Base {
public:
Derived() {}
Derived(const Derived&) = delete; // explicitly making it not copyable
};
int main() {
Derived d{};
try {
throw d; // compile error!!
}
...
}
Rethrowing exceptions
When we want to rethrow an exception, we cannot just throw x
assuming x
is the captured exception, as class type slicing will bring unexpected behavior:
int main() {
try {
try {
throw Derived{};
}
catch (Base& b) {
std::cout << "Caught base b which is actually a " << b << '\n';
throw b; // the Derived obj gets SLICED here!!!
}
}
catch (Base& b) {
std::cout << "Caught base b which is actually a " << b << '\n';
}
return 0;
}
The program above prints
Caught a base which is actually a Derived
Caught a base which is actually a Base
To solve this problem, simply use throw
when we want to rethrow an exception from within a catch
block:
int main() {
try {
try {
throw Derived{};
}
catch (Base& b) {
std::cout << "Caught base b which is actually a " << b << '\n';
throw; // the Derived obj is thrown as it is
}
}
catch (Base& b) {
std::cout << "Caught base b which is actually a " << b << '\n';
}
return 0;
}
which prints
Caught a base which is actually a Derived
Caught a base which is actually a Derived
Function try blocks
Consider the following example:
class Base {
private:
int m_x{};
public:
Base(int x) m_x{x} {
if (x <= 0)
throw 1; // thrown inside Base()
}
};
class Derived : public Base {
public:
Derived(int x) : Base{x} {
// what if we want to catch the error here?
}
}
int main() {
try {
Derived Derived{0};
}
catch (int) {
// caught here
}
return 0;
}
In order to catch the exception inside the constructor of Derived()
, we have to use a slightly modified try block called a function try block.
// same goes for Base
class Derived : public Base {
public:
Derived(int x) try : Base{x} { // notice the keyword try in the end!!
// whatever we want to do inside the constructor
}
catch (...) {
std::cerr << "Exception caught inside the constructor\n";
throw;
}
};
int main() {
try {
Derived b{0};
}
catch (int) {
std::cout << "Oops\n";
}
return 0;
}
It is obvious that function try blocks cannot resolve exceptions or return anything, and it’s important to know that reaching the end of the catch block will implicitly rethrow (although we explicitly rethrew here). In summary we have
function type | can resolve exceptions via return statement | behavior at the end of catch block |
---|---|---|
constructor | no (must throw or rethrow ) | implicit rethrow |
destructor | yes | implicit rethrow |
non-value returning function | yes | resolve exception |
value-returning function | yes | undefined behavior |
Exception dangers and downsides
Here are some problems that might occur when using exceptions:
- You might forget to clean up the resources within the block – use
std::unique_ptr
which automatically deallocates when going out of scope - Exceptions thrown from within a destructor can cause undefined behavior – don’t use exceptions in destructors altogether
- Exceptions come with performance cost
So when should we use exceptions at all?
- When the error being handled is likely to occur only infrequently
- When the error is serious and execution could not continue otherwise
- When the error cannot be handled at the place where it occurs
- When there isn’t a good alternative way to return an error code back to the caller
Exception specifications and noexcept
In C++, all functions are classified as either non-throwing or potentially throwing. To define a function as non-throwing, we simply use the noexcept
specifier at the en of the signature:
void some_function() noexcept;
The noexcept
here is nothing but a contractual promise – when an unhandled exception happens from within the function, undetermined behavior would occur and std::terminate
will be called.
Functions that are implicitly non-throwing:
- Destructors
Functions that are non-throwing by default for implicitly-declared or defaulted functions:
- Constructors: default, copy, move
- Assignments: copy, move
- Comparison operators (as of C++20)
Functions that are potentially throwing (if not implicitly-declared or defaulted):
- Normal functions
- User-defined constructors
- User-defined operators
In addition to being a specifier, noexcept
also serves as an operator that returns true
or false
indicating whether the compiler thinks the content within will throw an exception or not:
void foo() {throw -1;}
void boo() {};
void goo() noexcept {};
struct S{};
constexpr bool b1{ noexcept(5 + 3) }; // true; ints are non-throwing
constexpr bool b2{ noexcept(foo()) }; // false; foo() throws an exception
constexpr bool b3{ noexcept(boo()) }; // false; boo() is implicitly noexcept(false)
constexpr bool b4{ noexcept(goo()) }; // true; goo() is explicitly noexcept(true)
constexpr bool b5{ noexcept(S{}) }; // true; a struct's default constructor is noexcept by default
The noexcept
operator is checked statically at compile-time and doesn’t actually evaluate the input expression.
There are four levels of exception safety guarantees (which is yet another contractual guideline):
- No guarantee: there might be memory leak
- Basic guarantee: no memory will be leaked if an exception is thrown
- Strong guarantee: no memory leak, program state won’t be changed
- No throw / no fail guarantee: the function will always succeed (no fail) or fail without exception (no throw)
Examples of code that should be no throw:
- Destructors and memory deallocation/cleanup functions
- Functions that higher-level caller is no throw
Examples of code that should be no fail:
- Move constructors and move assignments
- Swap functions
- Clear/erase/reset functions
- Operations on
std::unique_ptr
- Functions that higher-level caller is no fail
std::move_if_noexcept
std::move_if_noexcept
will return a movable r-value if the object ha a noexcept
move constructor, otherwise it will return a copyable l-value. We can use noexcept
specifier in conjunction with std::move_if_noexcept
to use move semantics only when a strong exception guarantee exists (and use copy semantics otherwise).
Input and Output (I/O)
Input and output (I/O) streams
Input and output functinality is not defined as part of the core C++ language but included in the STL.
We can include the <iostream>
header to gain access to the hierarchy of classes including streams, which is just a sequence of bytes that can be accessed sequentially by definition. Specifically, input streams are used to hold input from a data producer, such as a keyboard, a file, or a network. Output streams are used to hold output for a particular data consumer, such as a monitor, a file, or a printer.
ios
is a typedef for std::basic_ios<char>
that defines a bunch of stuff that is common to both input and output streams. The istream
class is the primary class used when dealing with input streams, in which case the extraction operator >>
is used to remove values from the stream. The ostream
class is the primary class used when dealing with output streams, in which case the insertion operator <<
is used to put values in the stream.
The standard stream is a pre-connected stream provided to a computer program by its environment. C++ comes with four predefined standard stream objects that have already been st up for your use:
cin
: anistream
object tied to the standard input (typically keyboard)cout
: anostream
object tied to the standard output (typically the monitor)cerr
: anostream
object tied to the standard error (typically the monitor), providing unbuffered outputclog
: anostream
object tied to the standard error (typically the monitor), providing buffered output
Input with istream
When using the extraction operator <<
, we can use std::setw()
manipulator to limit the number of characters read in from a stream:
#include <iomanip>
char buf[10]{};
std::cin >> std::setw(10) >> buf;
Notice >>
omits spaces and newlines:
char ch{}
while (std::cin >> ch) {
std::cout << ch;
}
the above prints everything without spaces. In order to keep the spaces we can use cin.get()
which doesn’t skip any character:
char ch{}
while (std::cin.get(ch)) {
std::cout << ch;
}
We can also specify the maximum number of characters to read:
char strBuf[11]{};
std::cin.get(strBuf, 11);
std::cout << strBuf << '\n';
Notice that we’re only reading 10 characters here since we have to leave one character for the terminator. The remaining characters were left in the istream.
There is another function cin.getline()
which works just like cin.get()
but discards the delimiter:
char strBuf[11]{};
std::cin.getline(strBuf, 11);
std::cout << strBuf << '\n';
If we need to know how many character were extracted by the last call of getline()
, we can use gcount()
:
std::cout << std::cin.gcount() << " characters were read\n";
There is also a special version of getline()
which, instead of residing under cin
, is included in the string header:
std::string strBuf{};
std::getline(std::cin, strBuf);
std::cout << strBuf << '\n';
Here are a few more useful istream
functions:
ignore()
discards the first character in the streamignore(int nCount)
discards the firstnCount
characterspeek()
allows you to read a character from the stream without removing it from the streamunget()
returns the last character read back into the stream so it can be read again by the next callputback(char ch)
allows you to put back a character of your choice back into the stream to be read by the next call
Output with ostream
and ios
The insertion operator <<
is used to put information into an output stream. C++ has predefined insertion operations for all the built-in data types. When using this operator, there are two ways to change the formatting options: flags and manipulators. You can think of flags as boolean variables that can be turned on and off, and manipulators as objects placed in a stream that affect the way things are input and output.
To switch a flag on, we can use cout.setf()
function with the appropriate flag as a parameter, for example:
std::cout.setf(std::ios::showpos);
std::cout << 27 << '\n';
which prints
+27
You can also turn on multiple ios flags at once using the bitwise OR |
operator:
std::cout.setf(std::ios::showpos | std::ios::uppercase);
std::cout << 1234567.89f << '\n';
which prints
+1.23457E+06
To turn a flag off, we can use std::cout.unsetf()
. There is one intricacy here that when a flag is turned on, it cannot automatically turn off other mutually exclusive flags. For example:
std::cout.setf(std::ios::hex);
std::cout << 27 << '\n';
which prints
27
Why? This is because the default std::ios::dec
hasn’t been turned off yet. We need to manually turn off the decimal formatter which is mutually exclusive with std::ios::hex
:
std::cout.unsetf(std::ios::dec);
std::cout.setf(std::ios::hex);
std::cout << 27 << '\n';
and how it prints
1b
Alternatively, we can use manipulators to do the same thing without worrying about manually turning on and off these flags:
std::cout << std::hex << 27 << '\n'; // print 27 in hex
std::cout << 28 << '\n'; // we're still in hex
std::cout << std::dec << 29 << '\n'; // back to decimal
Here is a list of useful flags, manipulators and member functions:
- Print boolean as true/false or 1/0:
- flag:
std::ios::boolalpha
- manipulator:
std::boolalpha
andstd::noboolalpha
- flag:
- Prefix the positive numbers with a plus (+) sign:
- flag:
std::ios::showpos
- manipulator:
std::showpos
andstd::noshowpos
- flag:
- Use upper case letters:
- flag:
std::ios::uppercase
- manipulator:
std::uppercase
andstd::nouppercase
- flag:
- Print numbers in decimal/hexadeciaml/octal
- flag:
std::ios::dec
,std::ios::hex
andstd::ios::oct
- manipulator:
std::dec
,std::hex
andstd::oct
- flag:
- Print floating point numbers with different precision and format
- flag:
std::ios::fixed
,std::ios::scientific
andstd::ios::showpoint
- manipulator:
std::fixed
,std::scientific
,std::showpoint
,std::noshowpoint
andstd::setprecision(int)
- function:
std::ios_base::precision()
(returns the current precision) andstd::ios_base::precision(int)
- flag:
- Width, fill characters and justification
- flag:
std::ios::internal
(left sign, right number),std::ios::left
andstd::ios::right
- manipulator:
std::internal
,std::left
,std::right
,std::setfill(char)
,std::setw(int)
- function:
std::basic_ostream::fill()
(returns the current fill character),std::basic_ostream::fill(char)
,std::ios_base::width()
(returns the current width) andstd::ios_base::width(int)
- flag:
Specifically, here is a table showing what different precision formatting works:
option | precision | 12345.0 | 0.12345 |
---|---|---|---|
normal | 3 | 1.23e+004 | 0.123 |
normal | 4 | 1.235e+004 | 0.1235 |
normal | 5 | 12345 | 0.12345 |
normal | 6 | 12345 | 0.12345 |
showpoint | 3 | 1.23e+004 | 0.123 |
showpoint | 4 | 1.235e+004 | 0.1235 |
showpoint | 5 | 12345. | 0.12345 |
showpoint | 6 | 12345.0 | 0.123450 |
fixed | 3 | 12345.000 | 0.123 |
fixed | 4 | 12345.0000 | 0.1234 |
fixed | 5 | 12345.00000 | 0.12345 |
fixed | 6 | 12345.000000 | 0.123450 |
scientific | 3 | 1.235e+004 | 1.234e-001 |
scientific | 4 | 1.2345e+004 | 1.2345e-001 |
scientific | 5 | 1.23450e+004 | 1.23450e-001 |
scientific | 6 | 1.234500e+004 | 1.234500e-001 |
Stream classes for strings
In addition to I/O streams, there is also another set of classes called the stream classes for strings that allow you to use the familiar insertions <<
and extractions >>
operators to work with strings. There are specifically six stream classes for strings: istringstream
(derived from istream
), ostringstream
(derived from ostream
) and stringstream
(derived from iostream
) are used for reading and writing normal characters width strings, wistringstream
, wostringstream
and wstringstream
are used for reading and writing wide character strings. To use stringstreams
, you need to include <sstream>
header.
#include <sstream>
std::stringstream os{};
os << "well well well\n"; // works just like std::cout
constexpr int nValue{123};
constexpr double dValue{4.5};
os << nValue << ' ' << dValue; // works too
std::stringstream os2{123 4.5};
os >> nValue >> dValue; // works just like std::cin
os.str(""); // erase the buffer
os.clear(); // reset error flags
os << "what?\n";
std::cout << os.str(); // retrieving the string and print
Stream states and input validation
There are four stream states in C++:
goodbit
: everything is okbadbit
: fatal errorfailbit
: non-fatal erroreofbit
: end of file
and ios
also provides these functions to access these states:
good()
: booleanbad()
: booleaneod()
: booleanfail()
: booleanclear()
: clear all flags and restore togoodbit
stateclear(state)
: clear all flags and sets the state flagrdstate()
: returns the current statesetstate(state)
: set the state
In terms of input validation, there are a list of useful functions provided by the <cctype>
header:
std::isalnum(int)
: returns non-zero if the parameter is a letter or a digitstd::isalpha(int)
: returns non-zero if the parameter is a letterstd::iscntrl(int)
: returns non-zero if the parameter is a control characterstd::isdigit(int)
: returns non-zero if the parameter is a digitstd::isgraph(int)
: returns non-zero if the parameter is printable character that is not whitespacestd::isprint(int)
: returns non-zero if the parameter is printable character including whitespacestd::ispunct(int)
: returns non-zero if the parameter is neither alphanumeric nor whitespacestd::isspace(int)
: returns non-zero if the parameter is whitespacestd::isxdigit(int)
: returns non-zero if the parameter is a hexadecimal digit (0-9, a-f, A-F)
For example:
#include <algorithm> // std::all_of
#include <cctype> // std::isalpha, std::isspace
#include <iostream>
#include <ranges>
#include <string>
#include <string_view>
bool isValidName(std::string_view name) {
return std::ranges::all_of(name, [](char ch) {
return std::isalpha(ch) || std::isspace(ch);
});
}
int main() {
std::string name{};
do {
std::cout << "Enter your name: ";
std::getline(std::cin, name); // get the entire line, including spaces
} while (!isValidName(name));
std::cout << "Hello " << name << "!\n";
}
Basic file I/O
We need <fstream>
for file I/O in C++.
To write stuff into a file:
#include <fstream>
#include <iostream>
int main() {
std::ofstream outf{ "Sample.txt" };
if (!outf) {
std::cerr << "Uh oh, Sample.txt could not be opened for writing!\n";
return 1;
}
outf << "This is line 1\n";
outf << "This is line 2\n";
return 0;
}
Unlike Python, here the destructor of ofstream
will automatically close the file when going out of scope.
To read stuff from a file:
#include <fstream>
#include <iostream>
#include <string>
int main() {
std::ifstream inf{ "Sample.txt" };
if (!inf) {
std::cerr << "Uh oh, Sample.txt could not be opened for reading!\n";
return 1;
}
std::string strInput{};
// two ways to read:
while (inf >> strInput) { // this reads word by word and breaks on whitespace
std::cout << strInput << '\n';
}
while (std::getline(inf, strInput)) { // alternatively, read by lines
std::cout << strInput << '\n';
}
return 0;
}
The file stream constructors take an optional second parameter that allows you to specify information about how the file should be opened:
app
: opens the file in append modeate
: seeks to the end of the file before read/writebinary
: opens the file in binary mode (instead of text mode)in
: opens the file in read mode (default forifstream
)out
: opens the file in write mode (default forofstream
)trunc
: erases the file if it already exists
For example:
#include <iostream>
#include <fstream>
int main() {
std::ofstream outf{ "Sample.txt", std::ios::app }; // append instead of overwrite
if (!outf) {
std::cerr << "Uh oh, Sample.txt could not be opened for writing!\n";
return 1;
}
outf << "This is line 3\n";
outf << "This is line 4\n";
return 0;
}
Although the file stream automatically closes itself upon destruction of the class, we can still manually open and close a file:
std::ofstream fout{"Sample.txt"};
fout << "This is line 1\n";
fout << "This is line 2\n";
fout.open("Sample.txt", std::ios::app);
fout << "This is line 3\n";
fout.close();
Random file I/O
Instead of reading from the beginning (in
and out
mode) or from the end (ate
and app
mode), it is also possible to do a random file access, that is, skip around to various points in the file to read its contents. This can be useful when your file is full of records, adn you wish to retrieve a specific record. Rather than reading all the records until getting the desired one, we can skip directly to the record we wish to retrieve.
Random file access is done by manipulating the file pointer using either seekg()
function (“get” for input) and seekp()
function (“put” for output). Both functions take two parameters, one for how many bytes to move the file pointer, and one for the ios
flag that specifies where the offset is from. The available flags are:
beg
: offset is relative to the beginning of the file (default)cur
: offset is relative to the current location of the file pointerend
: offset is relative to the end of the file
For example:
fin.seekg(14, std::ios::cur); // move forward 14 bytes
fin.seekg(-18, std::ios::cur); // move backwards 18 bytes
fin.seekg(22, std::ios::beg); // move to 22nd byte in file
fin.seekg(-28, std::ios::end); // move to 28th byte before EOF
There are two more functions tellg()
and tellp()
which returns the absolute position of the file pointer.
It’s worth notice that it’s possible to switch the read and write mode of a fstream
(if you use fstream
instead of ifstream
or ofstream
). However, you can only switch with an explicit seekg()
or seekp()
:
std::fstream file{"Sample.txt", std::ios::in | std::ios::out};
char ch{};
while (file.get(ch)) {
switch (ch) {
case 'a':
case 'e':
case 'i':
case 'o':
case 'u':
case 'A':
case 'E':
case 'I':
case 'O':
case 'U': // if we file a vowel
file.seekg(-1, std::ios::cur); // go back 1 byte
file << '#'; // overwrite with # (now in write mode!)
file.seekg(file.tellg(), std::ios::beg); // switch back to read mode
break;
}
}
Lastly we have remove()
function to delete a file, and is_open()
function to tell if a file stream is currently open or not.
Miscellaneous Subjects
Static and dynamic libraries
A library is a package of code that is meant to be reused by many programs. Typically, a C++ library comes in two pieces:
- A header file that defines the functionality the library is exposing (offering) to the programs using it
- A precompiled binary that contains the implementation of that functionality pre-compiled into machine language
Some libraries may be split into multiple files and/or multiple header files.
There are two types of libraries, static libraries and dynamic libraries.
A static library (also known as an archive) consists of routines that are compiled and linked directly into your program. When you compile a program that uses a static library, all the functionality of the static library that your program uses becomes part of your executable. On Windows, static libraries typically have a .lib
extension, whereas on Linux, static libraries typically have a .a
(archive) extension. The benefits of using static libraries are:
- You only have to distribute the executable in order for users to run your program.
- This ensures the right version of the library is always used with your program
- Since the static libraries becomes part of your program, you can use them just like functionality you’ve written for your own program
On the downside:
- Because a copy of the library becomes part of every executable that uses it, this can cause a lot of wasted space
- They tend to be hard to upgrade, as updating the library means to replace the whole executable
A dynamic library (also known as a shared library) consists of routines that are loaded into your application at runtime. When you compile a program that uses a dynamic library, the library does not become part of the executable. Instead, it remains as a separate unit. On Windows, dynamic libraries typically have a .dll
(as in dynamic link library) extension, whereas on Linux, they typically have a .so
(as in shared object) extension. Most linkers can build an import library for a dynamic library when the dynamic library is created.
C++ FAQ
- Why shouldn’t we use
using namespace std
: namespace pollution, different versions of STL may cause name collision, and the lack ofstd::
prefixes makes it harder for readers to understand what is from STL and what is user-defined - Why can’t I use type T without including the header that defines T: to not reply on implicit inclusion of packages, which may or may not break on a different compiler
- My code that produces undefined behavior appears to be working fine, is it ok: no
- Why does my code that produces undefined behavior generate a certain result: it’s not important
- Why am I getting a compile error: feel free to Google it, or check CPPReference website
- Why should I include “foo.h” inside “foo.cpp”: maybe there’re definitions inside “foo.h” that’s needed for “foo.cpp” to compile; also it’s good practice to include “foo.h” to just let the compiler check if there’s any inconsistency between the two files
- Why should we always include “foo.cpp” inside “main.cpp” just to make the project compile and work: it can result in naming collision; it can be hard to avoid ODR violations; any change to any
.cpp
file will result in the whole project to be recompiled, which can take a long time - What’s causing the error “argument list for class template XXX is missing”: most likely it’s due to the use of a feature called Class Template Argument Deduction (CTAD), which is a C++17 feature while most compilers default to C++14 (and doesn’t work in the following example)
#include <utility> // for std::pair
int main() {
std::pair point{1, 2}; // CTAD to deduce std::pair<int, int>
return 0;
}
C++ Updates
Introduction to C++11
On August 12, 2011, the International Organization for Standardization (ISO) approved a new version of C++, namely C++11. Bjarne Stroustrup characterized the goals of C++11 as:
- Build on C++’s strengths – rather than trying to extend C++ to new areas where it may be weaker (eg. Windows applications with heavy GUI), focus on making it do what it does well even better.
- Make C++ easier to learn, use, and teach – provide functionality that makes the language more consistent and easier to use.
C++11 isn’t a large departure from C++03 thematically, but it did add a huge amount of new functionality:
auto
char16_t
andchar32_t
constexpr
decltype
default
specifier- delegating constructors
delete
specifierenum
classesextern
templateslambda
expressionslong long int
move
noexcept
specifiernullptr
override
andfinal
specifiers- range-based for statements
- r-value references
static_assert
std::initializer_list
- trailing return type syntax
- type alias
typedef
for template classes- uniform initialization
- user-defined literals
- variadic templates
- two
>>
symbols without a space between them will now properly be interpreted as closing a template object
There are also many new classes in the STL:
- Better support for multi-threading and thread-local storage
- Hash tables
- Random number generation improvements
- Reference wrappers
- Regular expressions
std::auto_ptr
has now been depreciatedstd::tuple
std::unique_ptr
Introduction to C++14
On August 18, 2014, the ISO approved a new version of C++, namely C++14. Compared with C++11, which added a huge amount of functionality, C++14 is a relatively minor update:
- aggregate member initialization
- binary literals
[[deprecated]]
attribute- digit separators
- function return type deduction
- generic lambdas
- relaxed
constexpr
functions - variable templates
std::make_unique
Introduction to C++17
In September 2017, the ISO approved C++17, which contains a fair amount of new content:
__has_include
preprocessor identifier to check if optional header files are available- if statements that resolve at compile time
- initializers in if statements and switch statements
- inline variables
- fold expressions
- mandatory copy elision for some cases
- nested namespace can now be defined as namespace
X::Y
- removal of
std::auto_ptr
and some other deprecated types static_assert
no longer requires a diagnostic text message parameterstd::any
std::byte
std::filesystem
std::optional
std::shared_ptr
can now manage C-syyle arrays (butstd::make_shared
cannot create them yet)std::size
std::string_view
- structured binding declarations
- template deduction for constructors
- trigraphs have been removed
- typename can now be used (instead of class) in a template template parameter
- UTF-8 (u8) character literals
Introduction to C++20
In February 2020, the ISO approved C++20, which contains the most changes since C++11:
- abbreviated function templates via
auto
parameters - chrono extensions for calendar and time zone support
- concepts, which allow you to put constraints on template parameters
constexpr
virtual functions,unions
,try
,catch
,dynamic_cast
andtypeid
constinit
keyword, to assert that a variable has static initialization- coroutines
- designated initializers
- immediate functions using the
consteval
keyword - modules, a replacement for
#include
- ranges
std::erase
std::make_shared
for arraysstd::map::contains()
std::span
- string formatting library
- string literals as template parameters
- three-way comparison using the spaceship operator
<=>
- using scoped enums
- views
Introduction to C++23
In February 2023, the ISO approved C++23, which includes:
constexpr
version<cmath>
(e.g.std::abs()
) and<cstdlib>
constexpr
versionstd::bitset
andstd::unique_ptr
- explicit
this
parameter - formatted printing functions
std::print
andstd::println
- literal suffixes for
std::size_t
and the corresponding signed type - multidimensional subscript
operator[]
- multidimensional span
std::mdspan
- preprocessor directives
#elifdef
and#elifndef
- preprocessor directive
#warning
- stacktrace library
- standard library modules
std
andstd.compat
- static
operator()
andoperator[]
std::expected
std::ranges
algorithmsstarts_with
,ends_with
andcontains
std::string::contains
andstd::string_view::contains
std::to_underlying
to get the underlying type of enumstd::unreachable()
- using known pointers and references in constant expressions
The End
Here are some directions you may want to explore next: