allenfrostline

(Significantly More, but Honestly Never Enough) Comprehensive Notes on C++


2024-05-14

The following notes are based on tutorials on LearnCpp.com. I take zero credit on any of the following contents except summarizing stuff here and there to fit them into one single post (barely). If you find it useful, please feel free to express your gratitude toward Alex and other authors of the original tutorial website.

Introduction / Getting Started

OK. This is a tediously long post but let’s start from the very basics.

Introduction to programming languages

A computer program is a set of instructions that the computer can perform. There are different levels of programming languages that a computer can run:

Introduction to C/C++

The C language was developed in 1972 by Dennis Ritchie at Bell Lab. C ended up being so efficient and flexible that in 1973, Ritchie and Ken Thompson rewrite most of the Unix operating systems using C. In 1978, Brian Kernighan and Dennis Ritchie published the book The C Programming Language (later known as K&R) which provided an informal standard. In 1983, the American National Standards Institute (ANSI) formed a committee to establish a formal standard for C. They finished in 1989 with the C89 standard. In 1990, the International Organization for Standardization (ISO) adopted C89 and published C90, and later in 1999 the C99 standard.

C++ was developed by Bjarne Stroustrup at Bell Lab as an extension to C since 1979. C++ is mostly considered as a superset of C, but this is not strictly true as C99 introduced a few features that do not exist in C++. C++ was standardized in 1998 by the ISO and got a minor update in 2003, thereafter called C++03. Five major updates to the C++ language (C++11, C++14, C++17, C++20 and C++23) have been made since then, each adding new functionalities. C++11 in particular added a huge number of new capabilities and is widely considered to be the new baseline version of the language.

The underlying design philosophy of C/C++ can be summed up as “trust the programmer”, which is both wonderful and dangerous.

Compiler, linker and libraries

flowchart LR A[Write C++ Program] --> B B[Compile the Program] --> C C[Link object files] --> D D[Test program] -->|debug|B

The compiler translates *.cpp to corresponding *.o object files. The linker combines all *.o object files, together with C++ Standard Library and other libraries, and then generates an executable file e.g. something.exe on Windows. This whole process is usually referred to as building. The specific executable produced as the result of building is sometimes called a build.

By combining softwares (editor, compiler, linker, debugger) involved in above steps into one, we have an integrated development environment (IDE).

C++ Basics

Preprocessor directive

The preprocessor directives are the include lines on top of source files e.g.

#include <iostream>

which here indicates that we would like to use the contents of the iostream library (which is the part of the C++ standard library that allows us to read and write text from/to the console).

Statements

A computer program is a sequence of instructions that tell the computer what to do. A statement is a type of instruction that causes the program to perform some action. Most (but not all) statements in C++ end in a semicolon. There are many different kinds of statements in C++:

Functions and the main function

A function is a collection of statements that get executed sequentially (in order, from top to bottom). The name of a function is called its identifier.

Every C++ program must have a main function.

Comments

std::cout << "Hello world!">;  // this is a single-line comment
/* this is a multi-line
   comment in C++ */

Data, values, objects and variables

Data on a computer is typically stored in a format that is efficient for storage or processing (and is thus not human readable). A single piece of data is called a value. An object in C++ is a region of storage on RAM that can store a value, and has other associated properties. Although objects in C++ can be unnamed (anonymous), more often we name our objects using an identifier. An object with a name is called a variable.

Variable instantiation

We use a special kind of declaration statement called a definition to create a variable, e.g.

int x;

At compile-time, when the compiler sees this statement, it makes a note to itself that we are defining a variable named x of type int. Moving forward, whenever the compiler sees x, it knows that we’re referencing this variable. In C++, the type of a variable must be known at compile-time and cannot be changed without recompiling the whole program.

At runtime, the variable will be instantiated, which means the object will be created and assigned a memory address.

You can define multiple variables in either ways below:

int a;
int b;
// or
int a, b;

Variable Initialization

There are 6 basic ways to initialize a variable in C++:

int a;       // default initialization
int b = 5;   // copy initialization
int c(6);    // direct initialization
int d {7};   // direct list initialization
int e = {8}; // copy list initialization
int f {};    // empty list / value initialization

The modern way to initialize objects in C++ is to use a form of initialization that makes use of curly braces. Informally, this is called list initialization (or uniform initialization or brace initialization). List initialization has an added benefit besides providing a uniform interface (for both atomic and a list of values): it disallows narrowing conversions. This means that is you try to brace initialize a variable using a value that the variable can not safely hold, the compiler will produce an error instead of a warning or even worse, silently taking the valid part of the input:

int width = 4.5;   // warning: implicit conversion from 'double' to 'int' changes value from 4.5 to 4
int width { 4.5 }; // error: a number with a fractional value can't fit into an int

You can initialize multiple variables in the following ways (except the last one):

int a = 5, b = 6;
int a(5), b(6);
int a{5}, b{6};
int a = {5}, b = {6};
int a {}, b {};
int a, b(5);
int a, b{5};
int a, b = 5;  // error: a is not initialized!

Unused initialized variables warning

C++ will generate a warning is a variable is initialized but not used. For example:

int main() {
    int x { 5 };
    return 0;
}

When compiling the program above, the following warning is generated

main.cpp:2:9: warning: unused variable 'x' [-Wunused-variable]
    int x { 5 };
        ^
1 warning generated.

There are a few ways to fix this. First, if the variable is really unused, we can just remove its definition from the program. Second, we can use it somewhere e.g. cout << x. When neither are desirable solutions, in C++17 we have introduced a new [[maybe_unused]] attribute, which allows us to tell the compiler that we’re okay with these variables being unused.

int main() {
    [[maybe_unused]] int x { 5 };
    return 0;
}

Introduction to iostream: cout, cin and endl

We need to include iostream library so that we have access to std::cout etc. We can use std::cout with an insertion operator << to send the string to the console to be printed. Similarly, we can print a newline whenever a line of output is complete by using std::endl.

It’s good to know that std::cout is buffered, meaning that the slow transferring of a batch of data to an output device is optimized with a potential risk of not printing everything if the program crashes, aborts or is paused (e.g. for debugging) halfway before the buffer is flushed. Also, for the same underlying reason, using \n instead of std::endl is actually preferred for better performance, as the cursor is moved to the next line of console without flushing the buffer.

std::cout << "x is good" << std::endl; // this is slower than below
std::cout << "y is better\n";          // no flushing, just newline

Whereas std::cout uses an insertion operator <<, std::cin reads the input using an extraction operator >>. The input must be stored in a variable to be used later.

int x{};
std::cin >> x;

It’s (also) good to know that the C++ I/O library does not provide a way to capture keyboard input without the user pressing ENTER. In order to do that, one might need to resort to third-party libraries e.g. pdcurses, FXTUI, cpp-terminal or notcurses.

Let’s take a look at the following program

#include <iostream>

int main() {
    std::cout << "Enter a number: ";
    int x{};
    std::cin >> x;
    std::cout << "You entered " << x << '\n';
    return 0;
}

What happens when we enter

Uninitialized variables

In C++, leaving the variables uninitialized can be dangerous, so we need to know what will happen when e.g. an integer is defined yet uninitialized:

int globalVar;  // [GOOD] global variable is zero-initialized

void function() {
    static int staticVar;  // [GOOD] static variable is zero-initialized
}

int main() {
    int localVar1;  // [BAD] local variable is indeterminate when uninitialized
    std::cout << localVar1;  // [BAD] could be any value or lead to crash
    int localVar2{};  // [GOOD] local variable is zero-initialized w/ curly braces
    int array[5] = {42};  // [GOOD] unspecified array values are zero-initialized
}

Keywords (aka reserved words)

There are 92 keywords as of C++23, which all have special meanings in the C++ language:

alignas const_cast int staic_assert
alignof continue long static_cast
and co_await (C++20) mutable struct
and_eq co_return (C++20) namespace switch
asm co_yield (C++20) new template
auto decltype noexcept this
bitand default not thread_local
bitor delete not_eq throw
bool do nullptr true
break double operator try
case dynamic_cast or typedef
catch else or_eq typeid
char enum private typename
char8_t (C++20) explicit protected union
char16_t export public unsigned
char32_t extern register using
class false reinterpret_cast virtual
compl float requries (C++20) void
concept (C++20) for return volatile
connst friend short wchar_t
consteval (C++20) goto signed while
constexpr if sizeof xor
constinit (C++20) inline static xor_eq

C++ also defines special identifiers: override, final, import and module. These have special meanings when used in certain contexts but are not reserved otherwise.

Identifier naming best practices

Whitespace rules

Basic coding style

Literals

Literals are values that are inserted directly into the source code. These values usually appear directly in the executable code (unless they are optimized out). In contrast, objects and variables represent memory locations that hold values and these values can be fetched on demand.

Operators

In mathematics, an operation is a process involving zero or more input values (called operands) that produce a new value (called an output value). The specific operation to be performed is denoted by a symbol usually, called an operator. This is the same as in C++. There are operators that are represented by symbols e.g. +, -, *, /, = and insertion <<, extraction >> and equality ==. There are also operators that are reserved words e.g. new, delete and throw.

The number of operands an operator takes as input is called the operator’s arity. Operators in C++ come in four different arities:

Note that some operators have different meanings depending on the use case e.g. operator - can mean both negative and subtract.

There is a famous abbreviation for the order in which the arithmetic operators are executed: PEMDAS (parenthesis > exponents > multiplication/division > addition/subtraction).

Functions and Files

Introduction to functions

The syntax/form of a general value-returning function is

return_type function_name() {
    // function body
    return return_value;
}

and for a non-value-returning functions

void function_name() {
    // function body
}

The C++ standard only defines the meaning of 3 status codes for programs: 0, EXIT_SUCCESS and EXIT_FAILURE. Both 0 and EXIT_SUCCESS mean the program executed successfully. EXIT_FAILURE means the program did not execute successfully.

#include <cstdlib>  // for EXIT_SUCCESS and EXIT_FAILURE

int main() {
    return EXIT_SUCCESS;
}

For maximum portability, we only use 0 or EXIT_SUCCESS to indicate a successful termination, and EXIT_FAILURE to indicate an unsuccessful termination. Function main will implicitly return 0 if no return statement is provided. That being said, the best practice is still to return a value from main explicitly.

C++ disallows calling the main function directly.

Function parameters and arguments

When a function is called, all its parameters of the functions, as defined in the function header, are created as variables, and the value of each of the arguments is copied into the matching parameter (using copy initialization). The process is called pass by value. Function parameters that utilized pass by value are called value parameters.

Local variables

Variables defined inside the body of a function are called local variables (as opposed to global variables), and they’re destroyed in the opposite order of creation at the end of the set of curly braces in which they are defined. The local scope is thereby defined as where a local variable’s lifetime lasts.

Forward declarations

The following program will throw a compile error

#include <iostream>

int main() {
    std::cout << "The sum of 3 and 4 is " << add(3, 4) << '\n';
    return 0;
}

int add(int x, int y) {
    return x + y;
}

because the C++ compiler compiles codes sequentially. By the time main uses add, the identifier add was even defined yet and thus causes a compile error complaining identifier not found. To address this problem, there are two options:

Declaration and definition

In C++, all definitions are declarations, but not always the other way around.

Term Definition Examples
Definition Implements a function or instantiates a variable. Definitions are always declarations. void foo() {} // function definition
int x; // variable definition
Declaration Tells compiler about an identifier and its associated type information void foo(); // function declaration
int x; // variable declaration
Pure declaration A declaration that is not a definition void foo();
Initializer Provides tan initial value for a defined object int x {2}; // 2 here is the initializer

The one definition rule (ODR)

The one definition rule (aka ODR) is a well-known rule in C++, which consists of three parts:

Violating the first part will issue a redefinition compile error. Violating part two will cause the linker to issue a redefinition error. Violating part three will cause undefined behavior.

Functions that share the same identifier but different sets of parameters are considered to be different functions, so won’t violate part one/two of ODR above.

Programs with multiple files

In order to write the following program (which will throw a compile error, see above) into a multi-file program

#include <iostream>

int main()
{
    std::cout << "The sum of 3 and 4 is: " << add(3, 4) << '\n';
    return 0;
}

int add(int x, int y)
{
    return x + y;
}

we might just write an app.cpp as

int add(int x, int y) {
    return x + y;
}

and main.cpp as

#include <iostream>

int main() {
    std::cout << "The sum of 3 and 4 is: " << add(3, 4) << '\n';
    return 0;
}

Naming collisions and namespaces

Most naming collisions occur in two ways:

C++ provides plenty of mechanisms for avoiding naming collisions. Besides local scope, we have namespaces:

There are two ways to use a namespace e.g. std:

Preprocessors

Preprocessors make various changes to the text of the code file, only after which the compilation happens. In modern compilers, the preprocessors are usually built right inside the compiler itself. Most of what the preprocessors do are fairly uninteresting, e.g. strips out comments, ensures each code file ends in a newline, and (most importantly) processing the #include directives.

Preprocessor directives

When the preprocessor runs, it scans through the code file (from top to bottom), looking for preprocessor directives. Preprocessor directives (often called directives) are instructions that start with a # symbol and end with a newline (NOT a semicolon). These directives tell the preprocessor to perform certain text manipulation tasks. Note that the preprocessor does not understand the C++ syntax – instead, the directives have their own syntax (which in some case resembles C++ syntax, and in other cases, not so much).

Below are some of the most common preprocessor directives:

The scope of #define

The preprocessor doesn’t understand C++, so all preprocessor directives will be resolved before compilation from top to bottom on a file-by-file basis. The following macro MY_NAME, as a result, will be resolved without actually calling foo at all.

#include <iostream>

void foo() {
#define MY_NAME "Alex"
}

int main() {
 std::cout << "My name is: " << MY_NAME << '\n';
 return 0;
}

and the following code will print Not printing! because the directives are only valid from the point of definition to the end of the exact file.

function.cpp:

#include <iostream>

void doSomething() {
#ifdef PRINT
    std::cout << "Printing!\n";
#endif
#ifndef PRINT
    std::cout << "Not printing!\n";
#endif
}

main.cpp:

void doSomething(); // forward declaration for function doSomething()

#define PRINT

int main() {
    doSomething();
    return 0;
}

Header files

The main purpose of a header file (.h or .hpp) is to propagate declarations to code (.cpp) files. Source files (.cpp) should always include their own paired header files (.h or .hpp) if exist. No source files (.cpp) should be included, although C++ preprocessors are able to do that.

We use angled brackets <> for header files that are not written by ourselves. The preprocessor will search for the header only in the directories specified by the include directories. The include directories are configured as part of the project/IDE/compiler settings, and typically default to the directories containing the header files that come with your compiler and/or OS.

When we use double-quotes "" for the header file, we’re telling the preprocessor that the header file is written by us and should be first searched in the current directory. If it can’t find the file, it will then search the include directories.

When trying to include header files from another directory, you can, but are not recommended to do the following

#include "headers/myHeader.h"
#include "../moreHeaders/myOtherHeader.h"

Instead, it’s advised to change the IDE/compiler setting to include the extra include directories e.g.

g++ -o main -I/folder/other/headers main.cpp"

This is the same in VS Code, as you can add -I/folder/other/headers in the Args section in tasks.json of your project.

The #include order of headers

The best practice is to include headers in the following order:

The headers for each group should be sorted alphabetically unless documentation for a 3rd party library instructs you to do otherwise.

Header file best practices (summary)

Header guards

The following project has header guards to prevent including duplicate definitions

square.h:

#ifndef SQUARE_H
#define SQUARE_H

int getSquareSides() {
    return 4;
}

#endif

wave.h:

#ifndef WAVE_H
#define WAVE_H

#include "square.h"

#endif

main.cpp:

#include "square.h"
#include "wave.h"

int main() {
    return 0;
}

However, if we add a source file square.cpp as follows, then the fact that square.h is included twice causes the problem of getSquareSides being defined once in square.cpp and main.cpp, causing a linker error:

square.cpp:

#include "square.h"

int getSquarePerimeter(int sideLength) {
    return sideLength * getSquareSides();
}

In order to solve this issue, we can put all definitions into source.cpp and leave only function declarations in square.h:

square.h:

#ifndef SQUARE_H
#define SQUARE_H

int getSquareSides();
int getSquarePerimeter();

#endif

square.cpp:

#include "square.h"

int getSquareSides() {
    return 4;
}

int getSquarePerimeter(int sideLength) {
    return sideLength * getSquareSides();
}

How about #pragma once

In modern compilers there is a simpler (but not as safe) way to do header guard:

#pragma once
// define whatever

Note: if a header file is copied so that it exists in multiple places on the file system, if somehow both copies of the header get included, header guards will successfully de-dupe the identical headers, but #pragma once won’t because the compiler won’t realize they are actually identical content.

Debugging C++ Programs

Debugging tactics

Several tips on debugging a C++ program:

Fundamental Data Types

Bits, bytes and memory addressing

The smallest unit of memory is a binary bit (aka a bit) which can hold 0 or 1. A modern de-facto standard byte is 8 bits.

Fundamental data types

Here is a list of fundamental data types in C++

Category Data Type (Minimum Size in Bytes) Meaning Example
Floating Point float (4)
double (8)
long double (8)
a number with a fractional part 3.14159
Integral (Boolean) bool (1) true or false true
Integral (Character) char (1)
wchar_t (1)
char8_t (1, C++20)
char16_t (2, C++11)
char32_t (4, C++11)
a single character of text 'c'
Integral (Integer) short int (2)
int (2)
long int (4)
long long int (8, C++11)
whole numbers including zero 64
Null Pointer std::nullptr_t (4, C++11) a null pointer nullptr
Void void no type n/a

The void type

There are three use cases of the void type:

The sizeof function

See above table for the sizes in bytes of different fundamental data types. Notice that using of sizeof on incomplete type e.g. void will result in a compilation error. Also, sizeof don’t take dynamically allocated memory used by an object into consideration, about which we need to have further discussion.

On a side note, how fast a type is in C++ doesn’t really depend on how large the memory it uses. Instead of “smaller is faster”, CPUs are actually optimized w.r.t. the corresponding specs and thus seeing 32-bit int being faster than 16-bit short or an 8-bit char is totally possible on a 32-bit CPU.

Signed integer ranges

Again, in C++ only the minimum sizes of fundamental data types are specified, int and short starts with 2 bytes, long 4 bytes and long long 8 bytes. This means the actual size of the integers can vary based on implementation. The corresponding ranges of these types are listed below:

Size / Type Range
8 bit / ? -128 to 127
16 bit / short, int -32,768 to 32,769
32 bit / long -2,147,483,648 to 2,147,483,647
64 bit / long long -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Assigning a value beyond the defined range will result in undefined behavior, which is called integer overflow.

Unsigned integer ranges

The corresponding range table for unsigned integer types are:

Size / Type Range
8 bit / ? 0 to 255
16 bit / unsigned short, unsigned int 0 to 65,535
32 bit / unsigned long 0 to 4,294,967,295
64 bit / unsigned long long 0 to 18,446,744,073,709,551,615

Unsigned integer overflow: if an unsigned value is out of range, is it divided by one greater than the largest number of the type, and only the remainder is kept. For example, the number 280 is too big for 1-byte integer, and thus 280 % 256 = 24 is set to the variable in the end.

Fixed-width integers, fast and least integers

Different from the native C/C++ integer types which have variable sizes (only minimum sizes were specified in above table), starting from C99 we have fixed-width integers in the stdint.h (later in C++ becoming cstdint.h) header:

#include <cstdint>
#include <iostream>

int main() {
  std::int16_t i{5};
  std::cout << i << '\n';
  return 0;
}

However, the fixed-width integers are not guaranteed to be defined on all architectures, nor is it faster than the native C/C++ integer types. This introduces the fast and least integer types:

#include <cstdint>
#include <iostream>

int main() {
  std::cout << "least 8: " << sizeof(std::int_least8_t) * 8 << "bits\n";
  std::cout << "least 16: " << sizeof(std::int_least16_t) * 8 << "bits\n";
  std::cout << "least 32: " << sizeof(std::int_least32_t) * 8 << "bits\n";
  std::cout << "fast 8: " << sizeof(std::int_fast8_t) * 8 << "bits\n";
  std::cout << "fast 16: " << sizeof(std::int_fast16_t) * 8 << "bits\n";
  std::cout << "fast 32: " << sizeof(std::int_fast32_t) * 8 << "bits\n";
  return 0;
}

where the least integer types guarantee that the integer holds at least given number of bits of width, and the fast integer types guarantee that the given integer type is the fastest with a width of at least given number of bits.

The downsides of these fast and least integers are that nobody really uses them and they might be too dynamic to make programs stable across architectures.

Best practice for integral types

Best practice:

Avoid the following if possible:

What is std::size_t

The size of std::size_t is the upper limit of any object’s size in the system.

Floating point numbers

Category Data Type (Minimum Size in Bytes) Typical Size in Bytes
floating point float (4) 4
double (8) 8
long double (8) 8, 12 or 16

Remember to match the type of literal in initialization to the type of the variable, e.g.

int x{5};
double y{5.0};
float z{5.0F}; // notice the suffix `f`

When we print the floating point numbers, there are a few interesting behaviors that worth attention:

#include <iostream>

int main()
{
 std::cout << 5.0 << '\n';
 std::cout << 6.7F << '\n';
 std::cout << 9876543.21 << '\n';

 return 0;
}

The output of above program is

5
6.7
9.87654e+06

because

We can override the default precision that std::cout shows by default using an output manipulator function named std::setprecision():

#include <iomanip> // for output manipulator std::setprecision()
#include <iostream>

int main()
{
    std::cout << std::setprecision(17); // show 17 digits of precision
    std::cout << 3.33333333333333333333333333333333333333f <<'\n'; // f suffix means float
    std::cout << 3.33333333333333333333333333333333333333 << '\n'; // no suffix means double

    return 0;
}

which will now print

3.3333332538604736
3.3333333333333335

However, there’s a concept called rounding error that makes precision handling in floats a headache. See this example

#include <iomanip> // for std::setprecision()
#include <iostream>

int main()
{
    float f { 123456789.0f }; // f has 10 significant digits
    std::cout << std::setprecision(9); // to show 9 digits in f
    std::cout << f << '\n';

    return 0;
}

The output of the above program is 123456792 which is a totally different number, and that’s because the original number 123456789.0 has 10 significant digits. In order to show the original number with 9 significant digits only, the number cannot be stored exactly/precisely. A corollary of this is to be wary of using floating point numbers for financial or currency data.

The official ranges of floating point numbers are defined as below

Data Type Size in Bytes Range Precision
4 ±1.18 x 10-38 to ±3.4 x 1038 and 0.0 6-9 significant digits, typically 7
8 ±2.23 x 10-308 to ±1.80 x 10308 and 0.0 15-18 significant digits, typically 16
80-bits (typically uses 12 or 16 bytes) ±3.36 x 10-4932 to ±1.18 x 104932 and 0.0 18-21 significant digits
16 ±3.36 x 10-4932 to ±1.18 x 104932 and 0.0 33-36 significant digits

NaN and infinity

There are two special categories of floating point numbers, NaN (not a number) and infinity.

#include <iostream>

int main()
{
    double zero {0.0};
    double posinf { 5.0 / zero }; // positive infinity
    std::cout << posinf << '\n';

    double neginf { -5.0 / zero }; // negative infinity
    std::cout << neginf << '\n';

    double nan { zero / zero }; // not a number (mathematically invalid)
    std::cout << nan << '\n';

    return 0;
}

The output of above program is

inf
-inf
nan

Best practice is to avoid using NaN or infinity at all.

Boolean values

bool b;
bool b1{true};
bool b2{false};
b1 = false;
bool b3{};  // default is false <- 0
bool b4{!true};  // initialized to false
bool b5{!false};  // initialized to true

Notice that printing boolean values is the same as printing the integer representation of the variables, i.e. only 0 and 1 will be printed. If you want std::cout to print actual true or false, you need to use another manipulator:

#include <iostream>

int main() {
    std::cout << true << '\n';
    std::cout << false << '\n';

    std::cout << std::boolalpha;  // from now on, true/false will be printed

    std::cout << true << '\n';
    std::cout << false << '\n';

    std::cout << std::noboolalpha;  // from now on, 1/0 will be printed

    std::cout << true << '\n';
    std::cout << false << '\n';
}

The conversion between a boolean and an integer variable is possible:

#include <iostream>

int main() {
    bool bFalse{0};  // ok
    bool bTrue{1};  // ok
    bool bNo{2};  // error (narrowing conversion is disallowed)
    bool b1 = 2;  // ok (copy initialization allows implicit conversion)
}

Inputting boolean values can be a bit unintuitive. See example below:

#include <iostream>

int main() {
    bool b{};
    std::cout << "Enter a boolean value: ";
    std::cin >> b;
    std::cout << "You entered " << b << '\n';
    return 0;
}

The output (if you typed true) is

Enter a Boolean value: true
You entered: 0

because std::cin only accepts either 0 or 1 for boolean variables, and thus the input true caused a silent fail in std::cin and thus the variable b never got assigned otherwise than its default value false. To allow std::cin to accept true and false, we need to again turn on the alphabetical mode of boolean variables by

#include <iostream>

int main() {
    bool b{};
    std::cout << "Enter a boolean value: ";
    std::cin >> std::boolalpha;
    std::cin >> b;
    std::cout << "You entered " << b << '\n';
    return 0;
}

Notice when std::boolalpha is enabled for std::cin, 0 and 1 will no longer work.

Chars

Some information worth noting:

Implicit type conversion vs explicit static_cast

Implicit type conversion happens in the below program:

#include <iostream>

void print(int x) {
 std::cout << x << '\n';
}

int main() {
 print(5.5); // warning: we're passing in a double value

 return 0;
}

The output is 5 in this case, as an integer cannot hold fractional part. Brace initialization, on the other hand, doesn’t allow implicit conversion:

int main() {
    double d { 5 }; // okay: int to double is safe
    int x { 5.5 }; // error: double to int not safe
    return 0;
}

Explicit type conversion, in the same time, is supported via the static_cast operator:

#include <iostream>

void print(int x) {
 std::cout << x << '\n';
}

int main() {
 print( static_cast<int>(5.5) ); // explicitly convert double value 5.5 to an int
 return 0;
}

and the output is still 5 here as expected. It’s worth noting that by converting char into int we can effectively get the the ASCII code of a character:

#include <iostream>

int main() {
    char ch{97}; // 97 is ASCII code for 'a'
    std::cout << ch << " has value " << static_cast<int>(ch) << '\n';  // print value of variable ch as an int
    return 0;
}

where the output is

a has value 97

Meanwhile, by using static_cast to convert unsigned integer to signed integer variable, the compiler will produce undefined behavior and is definitely not recommended.

Constants and Strings

Two types of constants

C++ supports two different types of constants:

Three types of named constants

There are three ways to define a named constant in C++:

In the first way, namely constant variables, please do remember that constant variables must be initialized. Parameters of a function can also be declared constant, but only necessary when passed by reference or pointer/address. When passed by value, a parameter doesn’t need to be declared constant to avoid the risk of being changed. Function return value can also be declared constant, but only when it’s not a fundamental type. For fundamental type return values, the const qualifier is ignored.

In the second way, we can define a constant via macro e.g.

#include <iostream>
#define MY_NAME "Allen"

int main() {
    std::cout << "My names is " << MY_NAME << '\n';
    return 0;
}

There are, however, at least three major reasons why people prefer the first method over using preprocessor macros:

Type qualifiers

We have known that const is a type qualifier. As a matter of fact, in C++ there are only two type qualifiers as of C++23, namely const and volatile. Latter is rarely used and is there to tell the compiler that an object may change its value at any time. This disables certain types of optimizations, in exchange.

The as-if rule

In C++, compilers are given a lot of leeway to optimize programs. The as-if rule says that the compiler can modify a program however it likes in order to optimize the performance. The only exception is that unnecessary calls to a copy constructor can be elided even if those copy constructors do have observable behavior.

For example, the following program can be optimized step by step:

Version 0 (original program):

#include <iostream>

int main() {
 int x { 3 + 4 };
 std::cout << x << '\n';
 return 0;
}

Version 1:

#include <iostream>

int main() {
 int x { 7 };
 std::cout << x << '\n';
 return 0;
}

Version 2:

#include <iostream>

int main() {
 std::cout << 7 << '\n';
 return 0;
}

Compile-time and runtime constants

Constants that are fixed at compile-time are called compile-time constants (like above). Constant variables that are only initialized at runtime are called runtime constants (see below).

#include <iostream>

int get_number() {
    std::cout << "Enter a number: ";
    int y{};
    std::cin >> y;
    return y;
}

int main() {
    const int x{3};  // x is compile-time constant
    const int y{get_number()};  // y is runtime constant
    const int z{x + y};  // z is runtime constant
    return 0;
}

Compile-time constants help optimization, but runtime constants are there just to keep the objects’ values are not changed.

The constexpr keyword

It’s sometimes hard to tell whether a variable will end up being compile-time constant or runtime constant, for example:

int x { 5 };       // not const at all
const int y { x }; // obviously a runtime const (since initializer is non-const)
const int z { 5 }; // obviously a compile-time const (since initializer is a constant expression)
const int w { getValue() }; // not obvious whether this is a runtime or compile-time const

Notice that, depending on how the function getValue is defined, w actually can be compile-time constant here - we have no idea, Fortunately, we can enlist the compiler’s help to ensure we get a compile-time constant when we desire so, simply by using the constexpr instead of const. When a variable is not compile-time constant yet we insist to use constexpr, there will be a compile error. As a result, the best practice is to use constexpr for any variable that should not be modifiable after initialization and whose initializer is known at compile-time.

There are, however, some types that are currently not compatible with constexpr keyword, including std::string, std::vector and other types that use dynamic memory allocation. For those objects we need to use const instead. Also, function parameters cannot be declared as constexpr as constexpr objects must be initialized with a compile-time constant.

Constant folding

Example 1:

#include <iostream>

int main() {
 constexpr int x { 3 + 4 }; // 3 + 4 is a constant expression
 std::cout << x << '\n';    // this is a runtime expression
 return 0;
}

Example 2:

#include <iostream>

int main() {
 std::cout << 3 + 4 << '\n'; // this is a runtime expression
 return 0;
}

In example 1 above, the variable x is a constant variable with its value initialized at compile-time, and as a result, the program will be optimized and consider x a compile-time constant. In example 2, 3 + 4 is not a constant expression, but because of an optimization process in C++ called constant folding, it’s still optimized at compile-time.

Literal suffixes

Data type Suffix Meaning
integral U unsigned int
L long
UL unsigned long
LL long long
ULL unsigned long long
Z the signed version of std::size_t (C++23)
UZ std::size_t (C++23)
floating point F float
L long double
string S std::string
SV std::string_view

You may find lower-case suffixes accepted as well but they’re not recommended as lower-case l looks similar 1. Another thing to notice is that string literals in C++ are essentially C-style arrays of char with an extra null terminator. These C-style string literals are objects that are created at the start of the program and guaranteed to exist for the entirety of the program. In contrast, std::string and std::string_view create temporary objects that have to be used immediately, as they’re destroyed at the end of the full expression in which they’re created.

Magic numbers

Magic numbers are not recommended usually.

Numeral systems: decimal, binary, hexadecimal and octal

By default C++ integers are decimal numbers. Including decimals, there are 4 main numeral systems available in C++, which are decimal (base 10), binary (base 2), hexadecimal (base 16) and octal (base 8).

Decimal Binary Hexadecimal Octal
0 0 0 0
1 1 1 1
2 10 2 2
3 11 3 3
4 100 4 4
5 101 5 5
6 110 6 6
7 111 7 7
8 1000 8 10
9 1001 9 11
10 1010 A 12
11 1011 B 13
12 1100 C 14
13 1101 D 15
14 1110 E 16
15 1111 F 17
16 10000 10 20

To use an octal literal, prefix the literal number with a 0 (zero):

#include <iostream>

int main() {
    int x{012};  // octal 12
    std::cout << x << '\n';
    return 0;
}

and the output would be 10, as numbers are output in decimal by default.

To use a hexadecimal literal, prefix your literal number with a 0x (zero x):

#include <iostream>

int main() {
    int x{0xF};  // hex F
    std::cout << x << '\n';
    return 0;
}

and the output would be 15. One digit of hexadecimal numbers represent 16=2^4 possible values, and thus two makes a full byte. A 32-bit integer, as a result, can be represented by eight hexadecimal digits concisely.

To use a binary literal, prefix your literal number with a 0b (zero b):

#include <iostream>

int main() {
    int x1{0b1};  // binary 1
    int x2{0b11110001};  // binary 11110001
    int x3{0b1111'0001};  // binary 11110001 with a quotation mark as visual separator
    std::cout << x1 << ' ' << x2 << ' ' << x3 << '\n';
    return 0;
}

and the output is 1 241 241. Notice how we used a single quotation mark as separator - this is also supported for other numeral systems e.g. a long decimal integer.

In order to print the values in other numeral systems than the current one, we can use std::dec, std::oct, std::hex manipulators:

#include <iostream>

int main() {
    int x{12};
    std::cout << x << '\n';  // default is dec
    std::cout << std::hex << x << '\n';  // now hex
    std::cout << x << '\n';  // still hex
    std::cout << std::oct << x << '\n';  // now oct
    std::cout << x << '\n';  // still oct
    std::cout << std::dec << x << '\n';  // now back to dec
    std::cout << x << '\n';  // still dec
    return 0;
}

As you may have guessed, outputting values in binary is a little harder than the rest three. We need to use a type called std::bitset in C++ STL (specifically, the <bitset> header) that gets the job done:

#include <bitset>
#include <iostream>

int main() {
    std::bitset<8> x1{ob1100'0101};  // an 8-digit binary number
    std::bitset<8> x2{oxC5};  // also 8-bit
    std::cout << x1 << ' ' << x2 << '\n';  // printing the variables
    std::cout << std::bitset<4>{0b1010} << '\n';  // printing a literal
    return 0;
}

There is an even better solution in C++20 and C++23, where we can use the <format> header and <print> header correspondingly:

#include <format>  // c++20
#include <iostream>
#include <print>  // c++23

int main() {
    std::cout << std::format("{:b}\n", 0b1010);  // c++20
    std::cout << std::format("{:#b}\n", 0b1010);  // c++20
    std::print("{:b} {:#b}", 0b1010, 0b1010);  // c++23
    return 0;
}

Conditional statement

The conditional statement is defined in the form condition ? statement1 : statement2. Remember to parenthesize the entire conditional statement to avoid unexpected behavior. Also, remember that statement1 and statement2 must match in types, namely the following won’t compile:

#include <iostream>

int main() {
    constexpr int x{5};
    std::cout << (x != 5 ? x : "x is 5") << '\n';
    return 0;
}

Inline functions and variables

Every time a function is called, there is a certain amount of performance overhead that occurs, specifically:

This overhead is significant for small function with frequent calls. Fortunately, in C++ the compiler has a trick that can avoid such overhead: inline expansion. This is a process where a function call is replaced by the code from the called function’s definition. Inline expansion, however, doesn’t guarantee performance improvements as there’s a tradeoff between removal of the functional overhead vs the cost of a larger executable.

There used to be an inline keyword that suggests a function is inline for compilers. However, the keyword is no longer used and in fact:

Inline variables, on the other hand, is a modern design of C++ that allows multiple definitions of the same variable identifier across multiple header files. The following, particularly, are implicitly inline:

constexpr and consteval

Remember we can use constexpr variable to optimize programs for compile-time evaluation:

#include <iostream>

int main() {
    constexpr int x{5};
    constexpr int y{6};
    std::cout << (x > y ? x : y) << " is greater!\n";
    return 0;
}

However, if we replace the conditional statement by a function, the same program won’t enjoy the compile-time optimization and that means tradeoff of performance for modularity benefits. This isn’t ideal. We can instead declare functions as constexpr so that they can be evaluated at compile-time:

#include <iostream>

constexpr int greater(int x, int y) {
    return (x > y ? x : y);
}

int main() {
    constexpr int x{5};
    constexpr int y{6};
    constexpr int g{greater(x, y)};  // evaluated at compile-time
    std::cout << g << " is greater!\n";
    return 0;
}

It is worth noting that, despite the example above, constexpr doesn’t guarantee the compile-time evaluation, but instead just make the function eligible for compile-time evaluation. In the following example, because x and y are not constexpr, the function is still evaluated at runtime:

#include <iostream>

constexpr int greater(int x, int y) {
    return (x > y ? x : y);
}

int main() {
    int x{5};
    int y{6};
    std::cout << greater(x, y) << " is greater!\n";
    return 0;
}

Prior to C++20, there were no standard language tools available to test whether a function call is evaluating at compile-time or runtime. In C++20, std::is_constant_evaluated function defined in <type_traits> header gives a boolean indication whether the function call is executing in a constant context:

#include <type_traits>

constexpr int greater(int x, int y) {
    if (std::is_constant_evaluated()) {
        std::cout << "greater(" << x << ", " << y << ") is being"
                  << " evaluated at compile-time!\n";
    } else {
        std::cout << "greater(" << x << ", " << y << ") is being"
                  << " evaluated at runtime!\n";
    }
    return (x > y ? x : y);
}

In addition to testing, we can also force a function call to be evaluated at compile-time by ensuring the return value is used where a constant expression is required. This, however, need to be done on a per-call basis and thus can be seen tedious. In C++20, there is a better workaround to enforce this - consteval, which is used to indicated that a function must be evaluated at compile-time, otherwise a compile error will result. Such functions are called immediate functions. For example:

#include <iostream>

consteval int greater(int x, int y) {  // consteval! not constexpr
    return (x > y ? x : y);
}

int main() {
    constexpr int g {greater(5, 6)}; // evaluated at compile-time because return goes into constexpr
    std::cout << g << '\n';
    std::cout << greater(5, 6) << '\n';  // still at compile-time, guaranteed!
    int x{5};
    std::cout << greater(x, 6) << '\n';  // compile error!
    return 0;
}

The downside of consteval keyword is that it forces the function to be evaluated at compile-time, which in other words means the function can no longer be evaluated at runtime even if we want to. To solve this problem, we can define a helper function using abbreviated function template in C++20 and auto type (no need to fully understand at this stage)

#include <iostream>

consteval auto eval_at_compile_time(auto value) {  // consteval
    return value;
}

constexpr int greater(int x, int y) {  // constexpr
    return (x > y ? x : y);
}

int main() {
    std::cout << greater(5, 6) << '\n';  // runtime / compile-time -> we don't have full control
    std::cout << eval_at_compile_time(greater(5, 6)) << '\n'; // guaranteed compile-time
    int x{5};
    std::cout << greater(x, 6) << '\n';  // runtime, no error!
    return 0;
}

Input with std::string

#include <iostream>
#include <string>

int main() {
    std::cout << "Enter your full name: ";
    std::string name{};
    std::cin >> name; // this won't work as expected since std::cin breaks on whitespace
    std::cout << "Enter your favorite color: ";
    std::string color{};
    std::cin >> color;
    std::cout << "Your name is " << name << " and your favorite color is " << color << '\n';
    return 0;
}

The result from a sample run of the above program gives

Enter your full name: John Doe
Enter your favorite color: Your name is John and your favorite color is Doe

This is because std::cin breaks on whitespaces and thus John and Doe are separately passed to name and color strings. To remedy this issue, we need to use std::getline function:

#include <iostream>
#include <string>

int main() {
    std::cout << "Enter your full name: ";
    std::string name{};
    std::getline(std::cin >> std::ws, name); // read a full line
    std::cout << "Enter your favorite color: ";
    std::string color{};
    std::getline(std::cin >> std::ws, color);  // read a full line
    std::cout << "Your name is " << name << " and your favorite color is " << color << '\n';
    return 0;
}

and this time the output is

Enter your full name: John Doe
Enter your favorite color: blue
Your name is John Doe and your favorite color is blue

What is std::ws? It’s an input manipulator that tells std::cin to ignore any leading whitespace before extraction. Noticing that whitespace characters include spaces, tabs and newlines, we know the following program won’t work as expected:

#include <iostream>
#include <string>

int main() {
    std::cout << "Pick 1 or 2: ";
    int choice{};
    std::cin >> choice;

    std::cout << "Now enter your name: ";
    std::string name{};
    std::getline(std::cin, name); // note: no std::ws here

    std::cout << "Hello, " << name << ", you picked " << choice << '\n';

    return 0;
}
Pick 1 or 2: 2
Now enter your name: Hello, , you picked 2

as we entered only 2\n, where 2 was passed to choice and \n passed to the std::getline function. The function notices that there’s nothing before end of the line and thus passed empty string to name. The best practice is, therefore, to keep using std::ws for every std::getline.

The length of strings

To get the length of a std::string variable, we can

#include <iostream>
#include <string>

int main() {
    std::string name{ "Alex" };
    std::cout << name << " has " << name.length() << " characters\n";

    return 0;
}

Notice the return type is not a regular integer but unsigned integral of type size_t. If you want to use this integer for following computation, it’s best to convert the type right away:

int length { static_cast<int>(name.length()) };

In C++20, there’s an easier way to get size of strings in signed integral type:

#include <iostream>
#include <string>

int main() {
    std::string name{ "Alex" };
    std::cout << name << " has " << std::ssize(name) << " characters\n";
    return 0;
}

Initializing a std::string is expensive

…and thus don’t randomly initialize duplicate string variables, or pass them as values into functions. However, it’s okay to return a string by value when the expression of the return statement resolves to any of the following

Because of scope and memory concerns. std::string may also be returned by (const) reference, which will be covered later.

To solve this problem, we have std::string_view (C++17). Instead of quickly copying e.g. a C-string to a new std::string and destroy it right away, we can create a readonly access to the original C-string and do whatever we want (as long as it’s readonly, like printing)

#include <iostream>
#include <string_view>

void print_string(std::string_view str) {
    std::cout << str << '\n';
}

int main() {
    std::string_view str{"Hello world"};
    print_string(str);
    return 0;
}

Note that implicit conversion from std::string_view to a std::string is not allowed and will give a compile error. However, explicitly initializing a std::string with a std::string_view is possible.

#include <iostream>
#include <string>
#include <string_view>

void print_string(std::string str) {
    std::cout << str << '\n';
}

int main() {
    std::string_view str{"Hello world"};
    print_string(str);  // will throw a compile error for implicit conversion
    std::string str2{str};  // okay: explicit initialization
    print_string(str2);  // okay
    print_string(static_cast<std::string>(str));  // okay: explicit conversion
    return 0;
}

Assignment, regardless whatever variable a std::string_view was originally viewing, makes it view another variable and that’s all. It doesn’t change anything and the content being viewed remains readonly.

Also, unlike std::string, std::string_view supports constexpr:

#include <iostream>
#include <string_view>

int main() {
    constexpr std::string_view s{ "Hello, world!" }; // s is a string symbolic constant
    std::cout << s << '\n'; // s will be replaced with "Hello, world!" at compile-time
    return 0;
}

That being said, it’s best to just use std::string_view as a readonly function parameter instead of general variable in most cases. Specifically, when the underlying object is destroyed, the viewer gives undefined behavior (so don’t return a std::string_view from function in most cases); when the underlying object is modified, the viewer is invalidated and again, may give undefined behavior (every time the underlying object is modified, you need to revalidate the viewer with an assignment).

Additionally, std::string_view can return substrings by .remove_prefix(#) and .remove_suffix(#) with # being the number of characters to remove from the view.

Operators

There are two important properties of operators:

The following is an exhaustive list of operators with their precedence and associativity:

Prec/Asso Operator Description Pattern
1 / LR :: Global scope (unary) ::name
:: Namespace scope (binary) class_name::member_name
2 / LR () Parentheses (expression)
() Function call function_name(parameters)
() Initialization type_name(expression)
{} List initialization (C++11) type_name{expression}
type() Functional cast new_type(expression)
type{} Functional cast (C++11) new_type{expression}
[] Array subscript pointer[expression]
. Member access from object object.member_name
-> Member access from object pointer object_pointer->member_name
++ Post-increment lvalue++
-- Post-decrement lvalue--
typeid Run-time type information typeid(type) or typeid(expression)
const_cast Cast away const const_cast<type>(expression)
dynamic_cast Run-time type-checked Cast dynamic_cast<type>(expression)
reinterpret_cast Cast on type to another reinterpret_cast<type>(expression)
static_cast Compile-time type-checked cast static_cast<type>(expression)
sizeof... Get parameter pack size sizeof...(expression)
noexcept Compile-time exception check noexcept(expression)
alignof Get type alignment alignof(type)
3 / RL + Unary plus +expression
- Unary minus -expression
++ Pre-increment ++lvalue
-- Pre-decrement --lvalue
! Logical NOT !expression
not Logical NOT not expression
~ Bitwise NOT ~expression
(type) C-style case (new_type)expression
sizeof Size in bytes sizeof(type) or sizeof(expression)
co_await Await asynchronous call (C++20) co_await expression
& Address of &lvalue
* Dereference *expression
new Dynamic memory allocation new type_name
new[] Dynamic array allocation new type_name[expression]
delete Dynamic memory deletion delete pointer
delete[] Dynamic array deletion delete[] pointer
4 / LR ->* Member pointer selector object_pointer->*pointer_to_member
.* Member object selector object.*pointer_to_member
5 / LR * Multiplication expression * expression
/ Division expression / expression
% Remainder expression % expression
6 / LR + Addition expression + expression
- Subtraction expression - expression
7 / LR << Bitwise shift left / insertion expression << expression
>> Bitwise shift right / extraction expression >> expression
8 / LR <=> Three-way comparison (C++20) expression <=> expression
9 / LR < Comparison less than expression < expression
<= Comparison less than or equals expression <= expression
> Comparison greater than expression > expression
>= Comparison greater than or equals expression >= expression
10 / LR == Equality expression == expression
!= Inequality expression != expression
11 / LR & Bitwise AND expression & expression
12 / LR ^ Bitwise XOR expression ^ expression
13 / LR | Bitwise OR expression | expression
14 / LR && Logical AND expression && expression
and Logical AND expression and expression
15 / LR || Logical OR expression || expression
or Logical OR expression or expression
16 / RL throw Throw expression throw expression
co_yield Yield expression (C++20) co_yield expression
?: Conditional expression ? expression : expression
= Assignment lvalue = expression
*= Multiplication assignment lvalue *= expression
/= Division assignment lvalue /= expression
%= Remainder assignment lvalue %= expression
+= Addition assignment lvalue += expression
-= Subtraction assignment lvalue -= expression
<<= Bitwise shift left assignment lvalue <<= expression
>>= Bitwise shift right assignment lvalue >>= expression
&= Bitwise AND assignment lvalue &= expression
|= Bitwise OR assignment lvalue |= expression
^= Bitwise XOR assignment lvalue ^= expression
17 / LR , Comma operator expression, expression

Remainder and exponent operators

The remainder operator in C++ is operator% which takes the sign of the first operand. The exponent operator is provided by the <cmath> header.

#include <iostream>
#include <cmath>

int main() {
    std::cout << 21 % 4 << '\n';
    std::cout << -21 % 4 << '\n';  // the remainder's sign follows that of the first operand
    std::cout << std::pow(3, 4) << '\n';
    return 0;
}

In order to avoid overflow, we sometimes can manually check the limit of the integral types:

#include <cassert>  // for assert
#include <cstdint>  // for std::int64_t
#include <limits>   // for std::numeric_limits
#include <iostream>

int main() {
    std::int64_t max_int64 = std::numeric_limits<std::int64_t>::max();
    std::int64_t min_int64 = std::numeric_limits<std::int64_t>::min();
    assert((10 > 9) && "assert message");  // just showing how assert() works
    std::cout << max_int64 << ' ' << min_int64 << '\n';
    return 0;
}

Increment/decrement operators

There are 4 increment/decrement operators

Operator Form operation
Prefix increment ++x Increment x, then return x
Prefix decrement --x Decrement x, then return x
Postfix increment x++ Copy x, then increment x, then return the copy
Postfix decrement x-- Copy x, then decrement x, then return the copy

Best practice is to use the prefix versions most of the time as they are more performant (no copy) and less surprising (thus to cause bugs). For certain cases, avoiding increment/decrement operators is recommended even:

int x{1};
x + ++x

The above code evaluates as 2 + 2 in Visual Studio and GCC, but 1 + 2 in Clang. To avoid confusion and potential bugs, we should altogether just avoid such coding.

The comma operator

The comma operator works very differently as in Python, thus we’re paying extra attention to it. The formal definition of operation x,y is “evaluate x and then y, then return the value of y”. For example, the following code will print 3:

#include <iostream>

int main() {
    int x{1};
    int y{2};

    std::cout << (++x, ++y) << '\n';
    return 0;
}

Even worse, check these two lines:

z = (a, b);
z = a, b;

The first line is “evaluate a first, then evaluate b, and assign b to z”. The second line is “assign a to z, then evaluate b and discard it right away”.

As a result, the best practice is DON’T USE COMMA OPERATORS AT ALL.

Relational operators and floating point comparison

The first and third lines below are redundant and should be replaced by the second and last lines instead:

if (b1 == true) {};
if (b1) {};
if (b1 == false) {};
if (!b1) {};

The following will print d1 > d2 despite mathematically d1 and d2 are equal, and this is because of precision handling in floating points:

#include <iostream>

int main() {
    double d1{ 100.0 - 99.99 }; // should equal 0.01 mathematically
    double d2{ 10.0 - 9.99 }; // should equal 0.01 mathematically

    if (d1 == d2)
        std::cout << "d1 == d2" << '\n';
    else if (d1 > d2)
        std::cout << "d1 > d2" << '\n';
    else if (d1 < d2)
        std::cout << "d1 < d2" << '\n';

    return 0;
}

As a result, floating point comparison using relational operators can be dangerous sometimes. Instead of using these native operators, we can define our own “approximately equal” function:

// C++23 version -> so that std::abs is constexpr
#include <algorithm> // for std::max
#include <cmath>     // for std::abs (constexpr in C++23)

// Return true if the difference between a and b is within epsilon percent of the larger of a and b
constexpr bool approximatelyEqualRel(double a, double b, double relEpsilon) {
 return (std::abs(a - b) <= (std::max(std::abs(a), std::abs(b)) * relEpsilon));
}

// Return true if the difference between a and b is less than or equal to absEpsilon, or within relEpsilon percent of the larger of a and b
constexpr bool approximatelyEqualAbsRel(double a, double b, double absEpsilon, double relEpsilon) {
    // Check if the numbers are really close -- needed when comparing numbers near zero.
    if (std::abs(a - b) <= absEpsilon)
        return true;

    // Otherwise fall back to Knuth's algorithm
    return approximatelyEqualRel(a, b, relEpsilon);
}

With that, we can compare like this:

int main() {
    constexpr double a{ 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 };  // almost 1
    constexpr double relEps { 1e-8 };
    constexpr double absEps { 1e-12 };
    constexpr bool same { approximatelyEqualAbsRel(a, 1.0, absEps, relEps) };
    std::cout << same << '\n';
    return 0;
}

We’re using C++23 so that std::abs is constexpr, otherwise the compiler will throw an error when instantiating same. In that case, we can fix the above program by defining our own abs function.

Logical operators

There are three kinds of logical operators: logical NOT (!), logical OR || and logical AND &&. The latter two share a property called short circuit evaluation, which means the left operand, if determining the final result already, will terminate the evaluation early without the right operand being evaluated at all. If we overload these operators with our own function definitions, this short circuit behavior is gone.

Another thing worth noting is that && actually has high precedence than ||. As a result, when mixing && and ||, it’s best to use parentheses whenever possible.

Bit Manipulations

Bit flags and bit manipulation via std::bitset

When defining bit flags, we use std::bitset from <bitset> header:

#include <bitset>

std::bitset<8> mybitset {};  // 8 bits = 8 flags

The class std::bitset provides some key functions that are useful for bit manipulation:

test()  // query whether a bit is a 0 or 1
set()   // turn a bit on (this will do nothing if the bit is already on)
reset() // turn a bit off (same as above)
flip()  // flip a bit from 0 to 1 (or vice versa)
all()   // check if all are 1
any()   // check if any is 1
none()  // check if none is 1

Bitwise operators

C++ provides 6 bit manipulation operators, often called bitwise operators:

Operator Symbol Form Operation
left shift << x << y all bits in x shift left by y bits
right shift >> x >> y all bits in x shift right by y bits
bitwise NOT ~ ~x all bits in x get flipped
bitwise AND & x & y each bit in x AND each bit in y
bitwise OR | x | y each bit in x OR each bit in y
bitwise XOR ^ x ^ y each bit in x XOR each bit in y

Of all operators above, we have a corresponding assignment version.

Scope, Duration and Linkage

User-defined namespaces

Imagine we have the project structure as below:

foo.cpp:

int do_something(int x, int y) {
    return x + y;
}

goo.cpp:

int do_something(int x, int y) {
    return x - y;
}

with these two files only, the project will compile just fine, as we’re not combining them together. Now, let’s say we have another file main.cpp:

#include <iostream>

int do_something(int, int);

int main() {
    std::cout << do_something(1, 2) << '\n';
    return 0;
}

We would get a linker error right away for defining the same function twice. Notice that this error doesn’t require the main to actually call the do_something function - in fact, simply by declaring the function in main.cpp cause the error. To resolve this issue (without renaming one of the two function definitions), we can define our own namespace:

foo.cpp:

namespace Foo {
    int do_something(int x, int y) {
        return x + y;
    }
}

goo.cpp:

namespace Goo {
    int do_something(int x, int y) {
        return x - y;
    }
}

We can, if needed, define nested namespaces in either of the following ways:

namespace Foo {
    namespace Goo {
        ...
    }
}

// or

namespace Foo::Goo {
    ...
}

When using the nested namespaces a lot, we can define namespace aliases like below:

namespace Temp = Foo::Goo;

Local variables and global variables

On the contrary:

Internal linkage and internal variables

An identifier with internal linkage can be seen and used within a single translation unit, but it is not accessible from other translation units (that is, it is not exposed to other linkers). This means if two source files share the same linker, the identifiers with internal linkage will be treated as independent and thus not violating ODR for duplicate definitions.

To make a global variable internal (and thus making it an internal variable), we can use the keyword static. Notice that static is ignored for const global variables as they’re internal by default already.

There are typically two reasons to give an identifier internal linkage:

External linkage and external variables

An identifier with external linkage can be seen and used by other other files. In that sense, external variables are more “global” than global variables. Functions and non-constant global variables are external by default. Constant global variables can be defined as external using extern keyword.

Sharing global constants across files

Prior to C++17, the following is the best practice:

Because constant global variables are by default internal, each .cpp file are actually going to have an independent copy of all these constants and header guards won’t stop this from happening. Changing any of these constants, therefore, will result in lengthy recompiling/rebuilding time for large projects. In addition, it can involve a large lump of memory every time a copy is needed.

Instead of above, we can

We can, after learning inline function and variables, define the constants as inline constexpr:

Static local variables

We have covered what static global variable means, now let’s talk about static local variables. Using the static keyword on a local variable changes the duration from automatic duration (from definition to end of block) to static duration (from definition to explicit destroy) - just like a global variable. One of the most common use case of static local variables is unique ID generation:

int generate_unique_id() {
    static int s_item_id {0};
    return s_item_id++;  // make copy; inc original; return copy
}

Another use case is when you have a function that requires a constant caching that takes a lot of time to generate on the first time.

Scope, duration and linkage summary

One last time let’s go over these concepts:

In table format:

Type Example Scope Duration Linkage Notes
Local variables int x; Block Automatic None
Static local variables static int s_x; Block Static None
Dynamic local variables int* x {new int{}}; Block Dynamic None
Function parameters void foo(int x) Block Automatic None
External non-constant global variables int g_x; Global Static External Initialized or uninitialized
Internal non-constant global variables static int g_x; Global Static Internal Initialized or uninitialized
Internal constant global variables constexpr int g_x{1}; Global Static Internal Must be initialized
External constant global variables extern const int g_x{1}; Global Static External Must be initialized
Inline constant global variables (C++17) inline constexpr int g_x{1}; Global Static External Must be initialized

Unnamed and inline namespaces

We have a few different namespaces available:

Using inline namespaces allow us to “switch” easily between multiple versions of functions etc without changing the old code.

Control Flow and Error Handling

Categories of flow control statements

Category Meaning Implemented in C++ by
Conditional statements Execute only if some condition is met if, switch
Jumps Start executing the statement at some other location goto, break, continue
Function calls Function calls a jumps to some other location and back Function call, return
Loops Repeatedly execute some sequence of code until end condition is met while, do-while, for, ranged-for
Halts Quit running the whole program std::exit, std::abort
Exceptions Error handling try, throw, catch

If conditions

Only thing that needs attention is the following example:

#include <iostream>

int main() {
    if (true)
        int x{5};
    else
        int x{6};
    std::cout << x << '\n';
    return 0;
}

The program above won’t compile because both definitions of x are within the conditional blocks and thus destroyed by std::cout needs it.

Null statements

A null statement is an expression with just a semicolon:

if (x > 10)
    ;

which is equivalent to

if (x > 10);

and thus can be dangerous, for example:

if (nuclear_code_activated());
    blow_up_the_world();

The world will be blown up no matter whether the button is pressed or not, because what’s actually going on in the example above is

if (nuclear_code_activated()) {
    ;
}
blow_up_the_world();

constexpr if statements

Check this example:

#include <iostream>

int main(){
    constexpr double g {9.8};
    if (g == 9.8) {
        std::cout << "gravity is noraml\n";
    } else {
        std::cout << "gravity is " << g << '\n';
    }
    return 0;
}

The above example will compile but the if statement is only evaluated at compile time, which is wasteful as g is constexpr. In C++17 we can optimize the flow using constexpr if statements:

#include <iostream>

int main() {
    constexpr double g {9.8};
    if constexpr (g == 9.8) {
        std::cout << "gravity is normal\n";
    } else {
        std::cout << "gravity is " << g << '\n';
    }
    return 0;
}

Modern compilers may or may not use a warning to advise such conversion of if statement for optimization purposes, and the may or may not automatically do the constexpr treatment for you. So when such optimization is required, it’s best to be explicit than letting the compiler to decide.

switch statements

A switch statement is nothing but a chain of if-else-if statements:

switch (statement) {
    case 1:
        ...
    case 2:
        ...

    ...
    default:
        ...
}

Notice that a switch statement is evaluated case by case sequentially and it won’t automatically break out of the switch block without explicitly using break or return. As a result, the best practice is to keep a break or return at all cases, and keep a default in the end. For example:

#include <iostream>

int main() {
    switch (2) {
    case 1: // Does not match
        std::cout << 1 << '\n'; // Skipped
    case 2: // Match!
        std::cout << 2 << '\n'; // Execution begins here
    case 3:
        std::cout << 3 << '\n'; // This is also executed
    case 4:
        std::cout << 4 << '\n'; // This is also executed
    default:
        std::cout << 5 << '\n'; // This is also executed
    }
    return 0;
}

This is called switch overflow (or fall-through). When fall-through happens, compilers usually give warnings as it’s usually not intentional. That being said, if it is indeed desired, we can use the [[fallthrough]] attribute to indicate the compiler that a warning won’t be necessary:

#include <iostream>

int main() {
    switch (2) {
        case 1:
            std::cout << 1 << '\n';
            break;
        case 2:
            std::cout << 2 << '\n';
            [[fallthrough]];  // fall through starting from here
        case 3:
            std::cout << 3 << '\n';
            break;
    }
    return 0;
}

Sequential case labels

Because case labels are not statements (they’re labels), we can stack them if desired:

bool is_vowel(char c) {
    return (c == 'a' || c == 'e' || c == 'i' || c =='o' || c == 'u' ||
        c == 'A' || c == 'E' || c == 'I' || c == 'O' || c == 'U');
}

The above can be rewritten using switch statement as

bool is_vowel(char c) {
    switch (c) {
        case 'a':
        case 'e':
        case 'i':
        case 'o':
        case 'u':
        case 'A':
        case 'E':
        case 'I':
        case 'O':
        case 'U':
            return true;
        default:
            return false;
    }
}

This is not fall-through because only one case is being evaluated and executed.

switch case scoping

It’s worth noting that different from if statements where there’s an implicit block, under switch cases there’re not individual blocks. Instead, all cases share the same switch block:

switch (1) {
    int a;      // okay: definition is allowed before the case labels
    int b{ 5 }; // illegal: initialization is not allowed before the case labels

    case 1:
        int y; // okay but bad practice: definition is allowed within a case
        y = 4; // okay: assignment is allowed
        break;

    case 2:
        int z{ 4 }; // illegal: initialization is not allowed if subsequent cases exist
        y = 5;      // okay: y was declared above, so we can use it here too
        break;

    case 3: { // note addition of explicit block here
            int x{ 4 }; // okay, variables can be initialized inside a block inside a case
            std::cout << x;
            break;
        }
}

while and do-while statements

Skipped.

for loops

The general template is

for (init-statement; condition; end-expression) {
    statement;
}

which can be very confusing if we allow null statements:

for (;;) {
    statement;
}

The above is equivalent to

while (true) {
    statement;
}

A for loop can also have multiple variables in the init-statement:

for (int x{0}, y{10}; x < 10; ++x, --y) {
    std::cout << x << ' ' << y << '\n';
}

break, continue and early return

Skipped.

Halts

In <cstdlib> we have std::exit and std::atexit defined. std::exit terminates the program with the given exit status code. It also performs a number of cleanup functions:

However, std::exit doesn’t clean up any local variables, and because of that, it’s advised to avoid std::exit generally. If we really need to use std::exit and have concerns like such, we can use std::atexit to register our own custom cleanup functions before calling std::exit.

For multi-threaded programs, calling std::exit in a subprocess can crash the main program because of destroyed static objects. To remedy this problem, C++ has provided another pair of halting functions that doesn’t clean up the static objects: std::quick_exit and std::at_quick_exit.

In addition, C++ has provided std::abort for abnormal termination, and std::terminate for exception handling. Notice that std::terminate actually calls std::abort implicitly.

std::cout vs std::cerr

We know that std::cout is buffered. This means the output may or may not be printed to console by the time the program crashes (if ever). In contrast, std::cerr is unbuffered and thus can always print the needed error message to console.

Clearing buffer in std::cin

We can simply

std::cin.ignore(100, '\n');

which ignores the following 100 characters in the buffer until and including the next ‘\n’. Even better, we can just ignore the maximum allowed length of stream:

#include <limits>

void ignore_line() {
    std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}

The failure mode of std::cin

When an input is invalid and extraction fails, the wrong input is kept in the buffer and std::cin enters the failure mode. In order to continue input, we need to

if (std::cin.fail()) {
    std::cin.clear();
    ignore_line();
}

Because std::cin has an automatic boolean conversion indicating whether the last input succeeded, we can also

if (!std::cin) {
    std::cin.clear();
    ignore_line();
}

One last corner case is when end-of-file (EOF) character (via ctrl-D) is passed, the whole input stream closes. This is something that can’t be fixed by std::cin.clear() and thus we need to modify the above resetting logic to

if (!std::cin) {
    if (std::cin.eof()) {
        exit(0);  // we can just shut down the program
    }
    std::cin.clear();  // back to normal input stream
    ignore_line();     // ignore bad characters remaining in the buffer
}

It’s worth noting that when the user input overflows the range of target variable, e.g. passing 40,000 to int16 (whose range is -32,768 to 32,767), the assignment still happens:

assert and static_assert

In C++, runtime assertions are implemented via the assert preprocessor macro, which lives in the <cassert> header:

#include <cassert>  // for assert
#include <cmath>    // for std::sqrt
#include <iostream>

double f(double x, double y) {
    assert(y > 0);
    if (x <= 0)
        return 0;
    return std::sqrt(2 * x / y);
}

int main() {
    std::cout << "The answer is " << f(100, -9) << '\n';
    return 0;
}

We can make our assertion statement more descriptive by adding a followup string to it

assert(condition && "my assert message");

This is because the logical AND (&&) operator has short circuit property and thus any false condition would yield the whole compound statement inside the parenthesis and thus including the added string literal in the error message.

It’s worth noting that assert macro comes with a small performance cost at each time the condition is checked, and thus in production code it’s usually advised to not use assert at all, as your code should be fully tested already. We can use the NDEBUG macro to turn off assert statements in some IDEs.

C++ also has another type of assertion called static_assert. This is basically the compile-time version of the assert which is runtime. Unlike assert which is a macro defined in <cassert>, static_assert is actually a keyword and thus no header is needed. A static_assert takes the following form:

static_assert(condition, message);

where the condition must be constant expression, and if any error, it would be a compiler error. Prior to C++17, the message used to be required, but this is no longer the case.

Generating random numbers

Using Mersenne Twister:

#include <iostream>
#include <random>   // for std::mt19937

int main() {
    std::mt19937 mt{}; // instantiate a 32-bit Mersenne Twister
    for (int count{1}; count <= 40; ++count) {
        std::cout << mt() << '\t';
        if (count % 5 == 0)
            std::cout << '\n';
    }
    return 0;
}

which will print 40 32-bit PRNG as a 8x5 table.

We can also generate uniform distribution:

#include <iostream>
#include <random>

int main() {
    std::mt19937 mt{};
    std::uniform_int_distribution die6{1, 6};
    for (int count{1}; count <= 40; ++count) {
        std::cout << die6(mt) << '\t';
        if (count % 10 == 0) std::cout << '\n';
    }
    return 0;
}

We can seed the random number generator as mt{42} instead of the default seed mt{1}, but more preferably we can use the std::random_device{} to achieve better pseudo-randomness:

#include <iostream>
#include <random>

int main() {
    std::mt19937 mt{ std::random_device{}() };
    // other following code
    return 0;
}

Notice that random_device only gives 32-bit (4 bytes) integer to seed the Mersenne number, but the random number generator is 624 bytes in size (156 folds of the size given). This means we’re effectively underseeding the generator, potentially making the result less random. To cope with that we can use std::seed_seq which basically combines a sequence of seed numbers, each of 4 bytes, into a large seeding object to be passed to the generator:

#include <iostream>
#include <random>

int main() {
    std::random_device rd{}; // just device, not called yet
    std::seed_seq ss { rd(), rd(), rd(), rd(), rd(), rd(), rd() };  // you can go all the way to 156 of rd() if you want
    std::mt19937 mt{ss};
    std::uniform_int_distribution die6{1, 6};
    for (int count{1}; count <= 40; ++count) {
        std::cout << die6(mt) << '\t';
        if (count % 10 == 0) std::cout << '\n';
    }
    return 0;
}

Type Conversion and Function Overloading

Implicit type conversion

Implicit type conversion happens in all of the following case:

When initializing a variable with a value of a different type:

double d {3}; // int converted to double
d = 6;  // int converted to double

When the type of a return value is different from the function’s declared return type:

float something() {
    return 3.0; // double 3.0 converted to float
}

When using certain binary operators with operands of different types:

double division {4.0 / 3};  // int 3 converted to double

When an argument passed to a function is a different type than the function parameter:

void something(long l) {
}

something (3);  // int 3 converted to long

Numeric promotion

Because C++ is designed to be portable and performant across a wide range of architectures, the language designers did not want to assume a given CPU would be able to efficiently manipulate values that were narrower than the natural data size for that CPU.

To help address this challenge, C++ defines a category of type conversions informally called the numeric promotions. A numeric promotion is the type conversion of certain narrower numeric types (such as a char) to certain wider numeric types (typically int or double) that can be processed efficiently and is less likely to have a result that overflows.

All numeric promotions are value-preserving, which means that the converted value will always be equal to the source value (it will just have a different type). Since all values of the source type can be precisely represented in the destination type, value-preserving conversions are said to be “safe conversions”.

Because promotions are safe, the compiler will freely use numeric promotion as needed, and will not issue a warning when doing so.

#include <iostream>

void printInt(int x) {
    std::cout << x << '\n';
}
#include <iostream>

void printDouble(double d) {
    std::cout << d << '\n';
}

int main() {
    printDouble(5.0); // no conversion necessary
    printDouble(4.0f); // numeric promotion of float to double

    return 0;
}

Narrowing conversion

In contrast to numeric conversion that are widening (note: numeric promotion, which widens a type to the native type, is just a subset of general numeric conversion), narrowing conversions are in most cases not recommended. If we really need to do so, we’d better do it explicitly:

void some_function(int i) {
}

int main() {
    double d{5.0};
    some_function(d);  // bad: implicit convertion
    some_function(static_cast<int>(d));  // good: explicit conversion
    return 0;
}

It’s worth noting (and praising) that brace initialization doesn’t allow narrow conversion implicitly:

int main() {
    int i {3.65};  // won't compile at all!
    return 0;
}

…except the value being converted is constexpr:

int main() {
    constexpr int n1 {5};
    unsigned int n2 {n1};  // not considered narrowing due to constexpr excludion clause
    int n3 {n1};  // error: brace initializtion disallows narrowing
    return 0;
}

Arithmetic conversion

What happens when we evaluate e.g. 2 + 5.5? There is actually a list of prioritized types for usual arithmetic conversion:

If the operands are not of the same type, the type with higher priority will be returned.

Explicit type conversion (casting)

C++ supports 5 different types of casts: C-style casts, static casts, const casts, dynamic casts and reinterpret casts. The latter four are often referred to as named casts.

C-style casts:

#include <iostream>

int main(){
    int x {10};
    int y {4};
    double d {(double)x / y};  // convert x to double
    std::cout << d << '\n';    // prints: 3.5
    return 0;
}

We can also use a more function-call like syntax:

double d {double(x) / y};  // same as above

Static cast:

#include <iostream>

int main() {
    char c {'a'};
    std::cout << c << ' ' << static_cast<int>(c) << '\n';  // prints: a 97
    return 0;
}

The static_cast is intentionally less powerful than the C-style cast, for example, you cannot remove the const from a variable after the cast:

int main() {
    const int x {5};
    int& ref{static_cast<int&>(x)};  // error: dropping the const qualifier here
    ref = 6;
    return 0;
}

typedef and type aliases

There are two ways the create type aliases, an old, backward-compatible way using typedef, and a more modern way with using.

typedef long Miles;  // typedef { some existing type } { (as) some name }
using Miles = long;  // using { some name (for) } = { some existing type }

The old typedef can be hard to read sometimes:

typedef int (*FuncType)(double, char);
using FuncType = int(*)(double, char);

Type aliases are very useful when we want to hide platform specific details on typing. For example, on some platforms, an int is of 2 bytes while on others 4. We can define the explicit types in a header file like this:

#ifdef INT_2_BYTES

using int8_t = char;
using int16_t = int;
using int32_t = long;

#else

using int8_t = char;
using int16_t = short;
using int32_t = int;

#endif

Using type aliases can make complex types easier to read and use:

using VectPairSI = std::vector<std::pair<std::string, int>>;

Type deduction using auto

Type deduction (also sometimes called type inference) is a feature that allows the compiler to deduce the type of an object from the object’s initializer. To use type deduction, the auto keyword is used in place of the variable’s type:

int add(int x, int y) {
    return x + y;
}

int main() {
    auto d {5.0}; // double
    auto i {1 + 2}; // int
    auto x {i}; // int as well
    auto sum {add(5, 6)};  // compiler konws add() returns and int, so auto works for function as well
    auto x; // invalid: no idea of type
    auto y {};  // invalid: same as above
    return 0;
}

It’s worth noting that type deduction drops const and constexpr qualifiers implicitly:

int main() {
    const int x {5};  // x has type const int (compile-time)
    auto y {x};  // y is just an int
    constexpr z {x};  // z is constexpr int
}

For strings, it’s a bit tricky, as auto by default infers the type to be char*. In order to make auto to deduct a string literal into string type, we need to use literal suffixes:

#include <string>
#include <string_view>

int main() {
    using namespace std::literals;
    auto s1 {"goo"s};   // string
    auto s2 {"foo"sv};  // string_view
    return 0;
}

For function return types (note: different from the returned values of a function with given type, see above) it’s even trickier. When the compiler knows the returned type of a function is unique and deterministic, we can use the auto keyword in a similar way as for variables:

auto add(int x, int y) {
    return x + y;
}  // this works

auto something(bool b) {
    if (b) {
        return 5;
    } else {
        return 6.5;
    }
} // won't compile

auto foo();  // this compiles

int main() {
    foo();  // but throws a compiler error here, because the function definition isn't specified yet
}

auto foo() {
    return 5;
} // only defned here after main

To solve the forward declaration problem in the above example, we can use the trailing return type syntax:

auto add(int, int) -> int;

int main() {
    add(5, 2);  // won't throw compiler error anymore
}

auto add(int x, int y) -> int {
    return x + y;
}  // even though the definition is here

Note that type deduction doesn’t apply to function parameters. If you use auto like the example below, it would only compile for C++20 and beyond and because of a different feature called function templates, rather than type deduction:

#includer <iostream>

void add_and_print(auto x, auto y) {
    std::cout << x + y << '\n';
}

int main() {
    add_and_print(2, 3);  // int
    add_and_print(2.5, 3.5);  // double
    return 0;
}

Function overloading

int add(int x, int y) { // integer version
    return x + y;
}

double add(double x, double y) { // floating point version
    return x + y;
}

int main() {
    return 0;
}

The compiler knows how to differentiate the overloaded functions. Specifically:

Property Used for differentiation? Notes
Number of parameters Yes
Type of parameters Yes Excludes typedef, type alias and const qualifier. Includes ellipses.
Return type No

Note that for member functions, const/volatile and ref-qualifiers are indeed used for overloading.

Given above we define a function’s type signature as the combination of the following

which doesn’t include the return type.

Deleting functions

When we want to avoid undesired usage of functions e.g.

#include <iostream>

void print_int(int x) {
    std::cout << x << '\n';
}

int main() {
    print_int(5);    // prints 5; ok
    print_int('a');  // prints 97; ???
    print_int(true); // prints 1; ???
    return 0;
}

We can define the function as deleted by using the = delete specifier:

void print_int(char) = delete;
void print_int(book) = delete;

This way, when we evaluate e.g. print_int('a'), there will be a compile error. This is explicit but sometimes too verbose. If we want to delete all other overloads of the function, we can use function templates:

#include <iostream>

void print_int(int x) {
    std::cout << x << '\n';
}

template <typename T>
void print_int(T x) = delete;  // deleting all other types

int main() {
    print_int(5);  // ok
    print_int('a');  // compile error
    print_int(true);  // compile error
    return 0;
}

Default arguments

Functions can have default arguments and that can be really convenient:

#include <iostream>

void print(int x, int y=10, int z=20) {
    std::cout << "x=" << x << ", y=" << y << ", z=" << z << '\n';
}

int main() {
    print(1, 2, 3);  // prints: x=1, y=2, z=3
    print(1, 2);  // prints: x=1, y=2, z=20
    return 0;
}

However, default arguments cannot be re-declared:

void print(int x, int y=10); // forward declaration
void print(int x, int y=20);  // error

Therefore, the best practice is to define default arguments in header files as forward declarations:

in foo.h:

#ifndef FOO_H
#define FOO_H
void print(int x, int y=10);
#endif

in main.cpp:

#include "foo.h"
#include <iostream>

void print(int x, int y) {  // no default arguments defined
    std::cout << "x=" << x << ", y=" << y << '\n';
}

int main() {
    print(5);  // still prints x=5, y=10 as expected
    return 0;
}

Templated functions

In C++, the template system was designed to simplify the process of creating functions (or classes) that are able to work with different data types. For example, below is a templated max function

template <typename T>
T max(T x, T y) {
    return (x < y) ? y : x;
}

which is actually very similar to the templated/overloaded std::max function defined the in the STL:

template<class T, class Compare>
const T& max(const T& a, const T& b, Compare comp);

In order to use a templated function, we need to instantiate it like below

#include "max.h"  // assuming our templated max function is defined here
#include <iostream>

int main() {
    std::cout << max<int>(1, 2) << '\n';
    return 0;
}

Thanks to template argument deduction, we can also do one of below instead:

max<>(1, 2);
max(1, 2);

This deduction doesn’t always work, e.g.

max("what?");  // error

but we can effectively avoid this kind of unknown behavior by deleting certain instantiations of templated functions:

template <>
const char* max(const char*, const char*) = delete;

Another headache happens in the following example:

template <typename T, typename U>
T max(T x, U y) {
    return (x < y) ? y : x;
}

int main() {
    std::cout << max(2, 2.5) << '\n';
    return 0;
}

What’s gonna happen? Because in this case, by type deduction, type T is int and U is double, and because double takes precedence over int in arithmetic rules, we’re effectively comparing 2.0 versus 2.5. However, as the return type is T namely int, how can we return 2.5 back? We cannot simply say using U as return type, as the user can always just switch the two input arguments. The better way is to utilize the auto type as return type:

template <typename T, typename U>
auto max(T x, U y) {
    return (x < y) ? y : x;
}

Starting from C++20, we can even use auto as argument types to create templated functions, which is called abbreviated function templates:

auto max(auto x, auto y) {
    return (x < y) ? y : x;
}

which is translated by compiler automatically into the correctly templated max function above.

Non-type template parameters

In addition to type parameters, in C++ we can also pass non-type parameters to templates. A non-type template parameter is a template parameter with a fixed type that servers as a place holder for a constexpr value passed in as a template argument. A non-type template parameter can be any of the following types:

We have seen non-type template parameters before, when we introduced std::bitset:

#include <bitset>

int main() {
    std::bitset<8> bits{ 0b0000'0101 }; // The <8> is a non-type template parameter
    return 0;
}

Here is a simple example involving a non-type template parameter:

template <int N>
void print() {
    std::cout << N << '\n';
}

It might seem trivial in this case, but when we want to static_assert an argument (remember that arguments cannot be constexpr and thus cannot be checked at compile-time), we can use this trick to achieve our goal.

Just like auto in type template parameters, we can also use auto for automatic type deduction in non-type parameters:

#include <iostream>

template <auto N>
void print() {
    std::cout << N << '\n';
}

int main() {
    print<5>();
    print<'c'>();
}

Compound Types: References and Pointers

What are compound data types in C++?

Compound data types (also sometimes called composite data types) are data types that can be constructed from fundamental data types (or other compound data types). Each compound data type has its own unique properties as well. C++ supports the following compound types:

Value categories (lvalue and rvalue)

Prior to C++11, there were only two value categories: lvalue and rvalue. In C++11, three additional value categories were added to support a new feature called move semantics: glvalue, prvalue and xvalue.

An lvalue is an expression that evaluates to an identifiable object or function (or bit-field). Entities with identities can be accessed via an identifier, reference or pointer. Therefore, we have modifiable and non-modifiable lvalues.

An rvalue is an expression that is not an lvalue. Commonly seen rvalues include literals (except C-style string literals, which are lvalues) and the return value of functions and operators that return by value. Rvalues are not identifiable and only exist within the scope of the expression in which they are used.

Lvalue references

In C++, a reference is an alias for an existing object. Once a reference has been defined, any operation on the reference is applied to the object being referenced. An lvalue reference (commonly just called a reference since prior to C++11 there was only one type of reference) acts as an alias for an existing lvalue (such as a variable).

int     // int type
int&    // an lvalue reference on an int object
double& // an lvalue reference on a double object

For who knows about pointers already - the ampersand symbol here does not mean “the address of”, but “lvalue reference to” an object.

A reference, once initialized, cannot be reseated, meaning that we cannot reassign it to reference another object:

#include <iostream>

int main() {
    int x{5};
    int y{6};
    int& ref{x};  // ref now is an alias of x
    ref = y;      // assign 6 to x! not reseating ref
    std::cout << x << '\n';
    return 0;
}

It’s worth noting that, although reference variables follow the same scoping and duration rules as normal variables, references and referents can have independent lifetimes. A reference can be destroyed before the object it references to, and the object can be destroyed without the reference as well.

#include <iostream>

int main() {
    int x{5};
    {
        int& ref{x}; // ref is an alias to x
        std::cout << ref << '\n';  // prints 5
    }  // ref destroyed here
    std::cout << x << '\n';  // prints 5
    return 0;
} // x destroyed here

Dangling references, namely references with their referents destroyed already, are dangerous in coding and should be avoided. Luckily, they mostly only occur when we try to return functions by references/addresses.

Last bit but quite shockingly perhaps, references are not objects. They may or may not require storage and are not considered objects in C++. As a result, we cannot define a reference to another reference, as an lvalue reference must reference an identifiable object.

int var{};
int& ref1{var};
int& ref2{ref1}; // referencing var, not ref1

Lvalue references to const

An lvalue reference can not directly reference a const object:

int main() {
    const int x { 5 }; // x is a non-modifiable (const) lvalue
    int& ref { x }; // error: ref can not bind to non-modifiable lvalue
    return 0;
}

We can use the const keyword on the reference to do that:

int main() {
    const int x{5};
    const int& ref{x};
    return 0;
}

in which case we call ref as a const reference.

We can also bind a const reference to a modifiable lvalue, which just means we’re not going to modify the original object.

Furthermore, we can also bind a const lvalue reference to an rvalue, in which case we effectively extend the lifetime of the temporary rvalue object and can safely use the value until the end of the current scope. This does not, however, work with return by (even const) references.

Last and least used, we can define constexpr references in certain cases:

int g_x {5};

int main() {
    constexpr int& ref1{g_x}; // ok: constexpr ref to global
    static int s_x {6};
    constexpr int& ref2{s_x}; // ok: constexpr ref to static local
    static const int s_xc{6};
    constexpr const int& ref2c{s_xc};  // ok: constexpr const ref to static const local (...)
    int x{7};
    constexpr int& ref3{x};   // compile error: constexpr ref to non-static local
}

Pass by lvalue references

Some objects are expensive to copy so we prefer to pass them by lvalue references to functions. Also, pass by (non-const) reference allows us to modify the original object within the function.

#include <iostream>

void printValue(int& y) {
    std::cout << y << '\n';
}

int main() {
    int x { 5 };
    printValue(x); // ok: x is a modifiable lvalue

    const int z { 5 };
    printValue(z); // error: z is a non-modifiable lvalue, cannot be passed as regular lvalue ref

    printValue(5); // error: 5 is an rvalue, cannot be passed as lvalue reference

    return 0;
}

We can fix it by passing by const lvalue reference:

#include <iostream>

void printValue(const int& y) {
    std::cout << y << '\n';
}

int main() {
    const int z{5};
    printValue(z);  // ok
    printValue(5);  // ok
}

We can now answer the question of why we don’t pass everything by reference:

How do we tell is a type is “cheap to copy” or not? Below is a function-like macro that tries to answer this question:

#include <iostream>

// Function-like macro that evaluates to true if the type (or object) is
// equal to or smaller than the size of two memory addresses
#define isSmall(T) (sizeof(T) <= 2 * sizeof(void*))

struct S {
    double a;
    double b;
    double c;
};

int main()
{
    std::cout << std::boolalpha;          // print true or false rather than 1 or 0
    std::cout << isSmall(int) << '\n';    // true
    std::cout << isSmall(double) << '\n'; // true
    std::cout << isSmall(S) << '\n';      // false
    return 0;
}

Last not is that it’s always better (except if you’re using C++14 or older, where std::string_view is not available) to use std::string_view than const std::string& because std::string_view has less expensive copy/initialization than the potential type conversion that happens when we use const std::string&.

Introduction to pointers and the address-of operator &

The & symbol tends to cause confusion because it has different meanings depending on context:

Paired with the address-of operator &, we have the dereference operator *:

#include <iostream>

int main() {
    int x{5};
    std::cout << x << '\n';     // prints 5
    std::cout << &x << '\n';    // prints address of x
    std::cout << *(&x) << '\n'; // prints the value at the memory address of x, which is 5
}

In short, & gives the address of RHS object, and * gives the object of the RHS address.

With this concept in mind, we define a pointer as an object that holds a memory address as its value:

int x;     // a normal int
int& y;    // an lvalue reference to an int
int* z;    // a pointer to an int

Notice we generally recommend putting the dereference operator * next to the type, just like the reference operator &. But when it comes to multiple definitions in the same line, it becomes somewhat awkward:

int* x, * z;  // we CANNOT define them as int* x, y

but we still would recommend placing * right next to the type and keeping a space between it and the variable - although a better suggestion is to not define multiple variables in the same line.

The size of a pointer is always the same, as it’s just the address in the memory represented by a bunch of bits / hexadecimal digits. So on a 32-bit machine, the size of a pointer is always 32 bits (4 bytes), and on a 64-bit machine, it’s always 64 bits (8 bytes).

Pointer initialization and assignment

Initialization:

int main() {
    int x{5};
    float d{2.2};
    int* p1;      // uninitialized pointer (holds garbage address)
    int* p2{};    // a null pointer
    int* p3{&x};  // a pointer at address of x
    int* p4{&d};  // error: initializing an int pointer to address of a double
    int* p5{5};   // error: cannot initialize with an int literal
    int* p6{ 0x0012FF7C};  // error: 0x0012FF7C is here just an integer literal
    return 0;
}

Assignment:

int main() {
    int x{5};
    int* ptr {&x};  // initializing ptr to address of x
    int y{6};
    ptr = &y;  // re-assign ptr to address of y
    *ptr = 7;  // assigning 7 to variable y
}

Differences between pointers and references

Null pointers

A null pointer (often shortened to just null) is a pointer that points at nothing, and the easiest way to create it is by initializing with empty braces:

int* ptr {};
int* ptr2 { nullptr };  // more explicit, same result

Notice that dereferencing a null pointer can lead to undefined behavior. Luckily, we can always check before doing anything:

#include <iostream>

int main() {
    int x { 5 };
    int* ptr { &x };
    if (ptr == nullptr)
        std::cout << "ptr is null\n";
    else
        std::cout << "ptr is " << *ptr << '\n';
    return 0;
}

Even simpler, we can rely on the implicit conversion of a pointer to boolean, realizing that a nullptr converts to false, and anything else true.

Notice that we cannot use this trick to avoid dangling pointers automatically, as a dangling pointer doesn’t necessarily points at a null value, but most of the time rubbish/random stuff.

Pointers and const

int main() {
    const int x{5};
    int* ptr {&x};       // compile error: const int* -> int*
    const int* ptr{&x};  // okay
    *ptr = 6;            // compile error
    const int y{6};
    ptr = &y;            // okay: just reseating
    int z{7};
    const int* ptr {&z}; // okay: const int pointer
    *ptr = 8;            // error: cannot change const int value
    z = 8;               // okay
    int* const ptr {&z}; // int pointer that is constant
    ptr = &y;            // error: cannot reseat a const pointer
    *ptr = 1;            // okay: ptr is an int pointer, can change value
    const int* const ptr{&z};   // const pointer to const int
}

Pass by value, reference and address

#include <iostream>
#include <string>

void print_by_value(std::string s) {
    std::cout << s << '\n';
}

void print_by_reference(std::string& s) {
    std::cout << s << '\n';
}

void print_by_address(std::string* s_ptr) {
    std::cout << *s_ptr << '\n';
}

int main() {
    std::string s{"hello"};
    print_by_value(s);        // making a copy
    print_by_reference(s);    // not making a copy
    print_by_address(&s);     // not making a copy
    std::string* s_ptr {&str};
    print_by_address(s_ptr);  // not making a copy
}

As user might just pass a null pointer to a function, it’s still considered dangerous. So we should prefer passing by (const) references.

A little more tricky, what if we want to change a pointer to a null pointer by using a function? One might do this:

void nullify([[maybe_unused]] int* ptr) {
    ptr = nullptr;
}

int main() {
    int x {5};
    int* ptr{&x};
    nullify(ptr);
    return 0;
}

However, ptr will not be nullptr after calling the function, this is because the changing the address held by the pointer parameter has no impact on the address held by the original argument/object. When the function is called, ptr receives a copy of the address (a copy, of the address, has indeed happened) and thus whatever happens to within the function, stays with the copied pointer, not the original.

We might pass by the reference of the pointer object, as pointers are objects:

void nullify(int*& refptr) {
    refptr = nullptr;
}

and it would work perfectly. Worth noting that it may sound funky int*& but the order actually makes a lot sense, since we’re now passing by the (reference of) a (pointer to int), and even better, compiler will actually throw an error if we accidentally do the order wrong, as int&* doesn’t make sense for a pointer must hold address for an object, and a reference is not an object, sadly.

Return by reference and address

Returning by reference avoids making a copy:

#include <iostream>
#include <string>

const std::string& get_program_name() {
    static const std::string s_program_name {"computer"};
    return s_program_name;
}

int main() {
    std::cout << "the name of the program is" << get_program_name();
    return 0;
}

It’s generally recommended not to return a non-const static local variable by reference. It’s non-trivial that assigning/initializing a normal variable with a returned reference makes a copy:

#include <iostream>
#include <string>

const int& get_next_id() {
    static int s_x {0};  // non-const
    ++s_x;
    return s_x;  // returning a copy, not reference
}

int main() {
    const int id1 { get_next_id() };
    const int id2 { get_next_id() };  // id1 and id2 are NOT references on the same variable
    std::cout << id1 << id2 << '\n';
    return 0;
}

The output would be 12 and that’s desired. It’s important to realize that id1 and id2 above are not two references of the same variable. They’re returned by value effectively.

One other case where we return by reference is to return the original parameters of the function by reference (when we’ve passed them by reference already).

Return by address is almost identical in terms of use case as returning by reference, except that we can potentially return nullptr suggesting “nothing” is returned. A big disadvantage of returning by address, therefore, is that the user has the obligation to check whether the returned pointer is a null pointer before dereferencing it.

In and out parameters

Parameters that are used only for receiving input from the caller are sometimes called in parameters:

void print(int x) {  // x is an in parameter
    std::cout << x << '\n';
}

In parameters are usually passed by value or const reference.

A function parameter that is used only for the purpose of returning information back to the caller is called an out parameter:

#include <cmath>  // for sin and cos

void get_sin_cos(double degrees, double& sin_out, double& cos_out) {
    constexpr double pi { 3.14159265358979323846 };
    double radians = degrees * pi / 180.0;
    sin_out = std::sin(radians);
    cos_out = std::cos(radians);
}  // not returning anything

We should generally avoid using out parameters except in the rare case where no better options exist.

Type deduction with pointers, references and const

Type deduction, in addition to dropping const qualifiers, also drops the references:

std::string& get_string_ref();

int main() {
    auto ref {get_string_ref()};     // ref's type deduced as string instead of string&
    auto& ref2 {get_string_ref()};   // ref2's type is string&
    return 0;
}

In terms of const dropping, type deduction drops the top-level const qualifiers (and leave the low-level unchanged):

const int x;          // top-level
int* const ptr;       // top-level
const int& ref;       // low-level
const int* ptr;       // low-level
const int& ref;       // low-level, but if reference is dropped... then becomes top-level
const int* const ptr; // left const is low-level, right const is top-level

// top-level const will be dropped, and low-level unched:
auto z {x};
auto z {ref};
auto z {ptr};

Specifically, for const references we have example below:

#include <string>

const std::string& getConstRef(); // some function that returns a const reference

int main() {
    auto ref1{ getConstRef() };        // std::string (reference and top-level const dropped)
    const auto ref2{ getConstRef() };  // const std::string (reference dropped, const reapplied)

    auto& ref3{ getConstRef() };       // const std::string& (reference reapplied, low-level const not dropped)
    const auto& ref4{ getConstRef() }; // const std::string& (reference and const reapplied)

    return 0;
}

The best practice is to reapply both reference and const like in ref4 to make it more explicit, although the reapplication of const isn’t necessary.

All above works identically with constexpr.

When it comes to type deduction with pointers, things are different this time - type deduction don’t drop pointers at all:

std::string* get_string_ptr();

int main() {
    auto ptr { get_string_ptr() };     // ptr is string*
    auto* ptr2 { get_string_ptr() };   // also string* just better explicity
    return 0;
}

In the above example, ptr and ptr2 are effectively the same. However, as a pointer type must resolve to a pointer initializer, they can differ in certain situation, for example:

int main() {
    auto ptr {*get_string_ptr()};     // type is string
    auto* ptr2 {*get_string_ptr()};   // compile error, as initializer is not a pointer
}

Another difference occurs when we introduce const pointers:

int main() {
    auto const ptr2 {get_string_ptr()};      // string* const (const pointer)
    const auto ptr1 {get_string_ptr()};      // string* const (const pointer)
    auto* const ptr4 {get_string_ptr()};     // string* const (const pointer)
    const auto* ptr3 {get_string_ptr()};     // const string* (pointer to const)
}

A maybe better example:

#include <iostream>

int main() {
    std::string s{};
    const std::string* const ptr {&s};  // const string* const

    auto ptr1{ptr};              // const string* (top-level const droppped)
    auto* ptr2{ptr};             // const string*
    auto const ptr3{ptr};        // const string* const (top-level const reapplied)
    const auto ptr4(ptr);        // const string* const
    auto* const ptr5{ptr};       // const string* const
    const auto* ptr6{ptr};       // const string* (only const auto* is different... const for low-level only)
    const auto const ptr7{ptr};  // error: (const auto) = (const string* const) already. applying duplicate const
    const auto* const ptr8{ptr}; // const string* const (the right const is for top-level reapplication)
}

Compound Types: Enums and Structs

Unscoped enumerations

Let’s say we want to define a bunch of colors, the most basic way is

int main() {
    int red {0};
    int green {1};
    int blue {2};

    int apple_color {red};
    int shirt_color {green};
    return 0;
}

which is not intuitive and involves magic numbers. We can define the colors as constexpr instead, to make it better for reading:

constexpr int red {0};
constexpr int green {1};
constexpr int blue {2};

int main() {
    int apple_color {red};
    int shirt_color {green};
    return 0;
}

Or even, to avoid using type int, we can use type alias:

using Color = int;
constexpr Color red {1};
constexpr Color green {2};
constexpr Color blue {3};

int main() {
    Color apple_color {red};
    Color shirt_color {green};
}

However, this doesn’t prevent somebody to initialize a Color variable like

Color something {9};

as Color is nothing but an alias of int.

We can use an enumeration (also known as enum) in this case. C++ supports two kinds of enumerations, unscoped and scoped. Here we talk about unscoped enums:

enum Color {
    red,
    green,
    blue,  // trailing comma optional but recommended
};  // the enum definition ends with a semicolon, it's just a definition statement

enum Pet {
    cat,
    dog,
    pig,
};

int main() {
    Color apple_color {red};
    Color shirt_color {green};
    Color cup_color {blue};
    Color socks_color {white};  // error: white not in Color
    Color hat_color {9};        // error: 2 is not in Color
    Color some_color {pig};     // error: pig is not in Color
    return 0;
}

It’s worth noting that by default the enumerators are automatically assigned corresponding integer values from 0. If we explicitly assign the enumerator to a value, automatic incrementation would apply:

enum Color {
    red = -3,
    blue,       // assigned -2
    green,      // assigned -1
    yellow = 3, // assigned 3
    orange = 3, // also assigned 3
    black,      // assigned 4
};

and the corresponding integral values will be automatically used when e.g. passed to std::cout. If we want to print the name instead of integer value, we need to explicit tell the compiler to do so:

constexpr std::string_view get_color(Color color) {
    switch (color) {
        case black:
            return "black";
        case red:
            return "red";
        // etc
        default:
            return "???";
    }
}

Even better, we may just directly overload the << operator for std::cout:

#include <iostream>

std::ostream& operator<<(std::ostream& out, Color color) {
    switch (color) {
        case black:
            return out << "black";
        case red:
            return out << "red";
        // etc
        default:
            return out << "???";
    }
}

We can specify the used integral type by an enum if we want to:

#include <cstdint>

enum Color : std::int8_t {
    black,
    red,
    white,
};

Another potentially relevant topic is how we convert integer to enumerator (the other way round is easy):

enum Color {
    red,
    black,
    yellow,
    orange,
    blue,
    green,
};

int main() {
    Color color {5};         // compile error: cannot implicit convert
    color = 3;               // compile error: cannot implicit convert
    Color color {static_cast<Color>(3)};  // ok
    color = static_cast<Color>(2);        // ok
    return 0;
}

If we set an explicit base type for the enum, then implicit conversion would work (for brace initialization):

enum Color : int {
    red,
    black,
    yellow,
    orange,
    blue,
    green,
};

int main() {
    Color color {5};   // ok for brace initialization
    Color color2(2);   // compile error
    Color color3 = 3;  // compile error
    color = 2;         // compile error
    return 0;
}

For the same reason we cannot simply >> input an enumerator. We can use static_cast to explicitly convert an inputted integer to an enumerator type, or we can overload the >> operator:

std::istream& operator>>(std::istream& in, Color& color) {
    int input{};
    in >> input;
    color = static_cast<Color>(input);
    return in;
}

Enumerated typed can be useful in production for error codes:

enum FileReadResult {
    readResultSuccess,
    readResultErrorFileOpen,
    readResultErrorFileRead,
    readResultErrorFileParse,
};

FileReadResult readFileContents() {
    if (!openFile())
        return readResultErrorFileOpen;
    if (!readFile())
        return readResultErrorFileRead;
    if (!parseFile())
        return readResultErrorFileParse;
    return readResultSuccess;
}

int main() {
    if (readFileContents() == readResultSuccess)
        // do something
    else
        // print error message
    return 0;
}

and for limited parameter values:

enum SortOrder {
    alphabetical,
    alphabeticalReverse,
    numerical,
};

void sort_data(SortOrder order) {
    switch (order) {
        case alphabetical:
            // something
            break;
        case alphabeticalReverse:
            // something
            break;
        case numeral:
            // something
            break;
        default:
            // raise error?
    }
}

Note that when the scope is valid, we can use both red and Color::red per our preference. This means our namespace is, unfortunately, always polluted by such enums. We can avoid that by wrapping enums inside corresponding namespaces:

namespace Color {
    enum Color {
        red,
        green,
        blue,
    };
}

Scoped enumerations (enum class)

Check the following example:

#include <iostream>

int main() {
    enum Color {
        red,
        blue,
    };

    enum Fruit {
        banana,
        apple,
    };

    Color color { red };
    Fruit fruit { banana };
    if (color == fruit) // The compiler will compare color and fruit as integers
        std::cout << "color and fruit are equal\n"; // and find they are equal!
    else
        std::cout << "color and fruit are not equal\n";
    return 0;
}

To solve this problem, we can either rely on users to use explicit Color::red everywhere and that they won’t compare Color::red versus Fruit::banana, or, we can instead use the scoped enumerations (also known as enum classes):

#include <iostream>

int main() {
    enum class Color { // "enum class" defines this as a scoped enumeration rather than an unscoped enumeration
        red, // red is considered part of Color's scope region
        blue,
    };

    enum class Fruit {
        banana, // banana is considered part of Fruit's scope region
        apple,
    };

    Color color { Color::red }; // note: red is not directly accessible, we have to use Color::red
    Fruit fruit { Fruit::banana }; // note: banana is not directly accessible, we have to use Fruit::banana
    if (color == fruit) // compile error: the compiler doesn't know how to compare different types Color and Fruit
        std::cout << "color and fruit are equal\n";
    else
        std::cout << "color and fruit are not equal\n";
    return 0;
}

Notice that here class is just an “overly” overloaded keyword and doesn’t suggest Color and Fruit are of “class type”. They are just scoped enumerator types, and the real class types are struct, class and union.

Another thing that a scoped enum is different from unscoped enum, is that they don’t implicitly convert to integers.

Last but not least, if we find ourselves in a situation where we really have to “import” all the enumerators into a scope, we can always un-scope an enum by using enum statements:

constexpr std::string_view get_color(Color color) {
    using enum Color;  // just in this scope, get rid of all Color:: namespaces
    switch (color) {
        case black: //  instead of Color::black
            return "black";
        case red:
            return "red";
        // etc
        default:
            return "???";
    }
}

Introduction to structs, members and member selection

Below is an example of a struct definition:

struct Employee {
    int id {};
    int age {};
    double wage {};
};

int main() {
    Employee aaron;   // not initialized; no error, but we don't know what aaron.wage would be
    Employee joe {};  // deafult initialization
    joe.id = 0;       // updating member variables
    joe.age = 32;
    joe.wage = 50000.0;
    Employee frank {1, 22, 60000.0};   // aggregation initialization
    Employee bob {2, 50};              // aggregation initialization w/ partial missing to default; ok
    const Employee lucy{};             // const struct; must be initialized
    Employee zoe (10, 38, 30000.0);    // direct initialization (C++20, not recommended)
    return 0;
}

The last type of initialization is only valid after C++20 and is not recommended, as it doesn’t currently work with aggregates that utilize brace elision (notably std::array).

In the cases where we have a number of members to initialize in order, it might be messy when we all of a sudden introduce a new member:

Old:

struct Foo {
    int a {};
    int c {};
}

int main() {
    Foo foo {1, 2}; // foo.a=1, foo.c=2
    return 0;
}

New:

struct Foo {
    int a {};
    int b {};  // just added
    int c {};
}

int main() {
    Foo foo {1, 2};  // note now foo.a=1 foo.b=2, foo.c=0!
    return 0;
}

To avoid this kind of bug-prone design, we can use designated initializers (available since C++20):

int main() {
    Foo foo {.a{1}, .c{3}};       // ok
    Foo foo {.a=1, .c=3};         // ok
    Foo foo {.b{2}, .a{1}};       // ok
    foo = {1, 32, 20};            // ok
    foo = {.a=1, .b=32, .c=100};  // ok
    return 0;
}

Besides designated initialization, we can also initialize a struct using another:

Foo x = foo;  // copy initialization
Foo x(foo);   // direct initialization
Foo x {foo};  // list initialization

Default member initialization

When we define a struct (or class) type, we can provide a default initialization value for each member as part of the type definition. This process is called non-static member initialization, and the initialization value is called a default member initializer.

struct Something {
    int x;
    int y {};
    int z {5};
};

int main() {
    Something s1; // s1.x is uninitialized (value initialized to 0); s1.y=0; s1.z=2
    Something s2 { 5, 6, 7}; // s2.x=5; s2.y=6; s2.z=7
    return 0;
}

Passing and returning structs

We can pass structs by const references to avoid making copies:

#include <iostream>

struct Employee {
    int id {}
    int age {};
    double wage {};
};

void print_employee(const Employee& employee) {
    std::cout << "id=" << employee.id
              << ", age=" << employee.age
              << ", wage=" << employee.wage
              << '\n';
}

we can also return structs, but this time usually by values, as to not return a dangling reference. Note that thanks to return type deducing, we don’t need to initialize a struct first, return by value and let compiler copy initialize it again - we can just pass the list that compiler may use to initialize the struct and let type deduction work:

Employee get_employee() {
    return Employee(1, 23, 10'000);
} // is the same as

Employee get_employee() {
    return {1, 23, 10'000};
}

Nested structs

Once we have a struct defined, we can define another that houses it:

// assuming we have `Employee` defined already

struct Company {
    int number_of_employees {};
    Employee CEO {};
};

int main() {
    Company my_company {7, {1, 23, 10'000}};  // nested list initilization
}

We can even wrap the type definition inside the second one, if we believe it’s not necessary to be exposed to global level:

struct Company {
    struct Employee {
        int id {}
        int age {};
        double wage {};
    };

    int number_of_employees {};
    Employee CEO {};
};

With this structure, we can now only access the Employee struct via Company::Employee.

The size of structs

The size of a struct is NOT equal to the size of all its members, but AT LEAST AT LARGE AS of the sum of all sizes:

#include <iostream>

struct Foo1 {
    int a {};
    short b {};
    short c {};
};

struct Foo2 {
    short a {};
    int b {};
    short c {};
};

int main() {
    std::cout << "the size of int is " << sizeof(int) << " bytes\n";
    std::cout << "the size of short is " << sizeof(short) << " bytes\n";
    std::cout << "the size of Foo1 is " << sizeof(Foo1) << " bytes\n";
    std::cout << "the size of Foo2 is " << sizeof(Foo2) << " bytes\n";
    return 0;
}

The output would be

the size of int is 4 bytes
the size of short is 2 bytes
the size of Foo1 is 8 bytes
the size of Foo2 is 12 bytes

The size of Foo1 and Foo2 are different merely because of the order of member declarations. That is related to the data structure alignment which is a more advanced topic.

Member selection with pointer and references

We can access the members of a struct via itself (object) using ., via its reference using ., and via its pointer using ->:

// assuming we have `Employee` defined

int main() {
    Employee john {1, 23, 10'000};
    std::cout << john.id << '\n';     // object.member
    const Employee& john2 {john};
    std::cout << john2.id << '\n';    // reference.member
    const Employee* ptr {&john};
    std::cout << ptr->id << '\n';     // pointer->member
    return 0;
}

Class templates

template <typename T>
struct Pair {
    T x {};
    T y {};
};

template <> // tells the compiler we're not using template parameter
struct Pair<int> {
    int x {};
    int y {};
    int n {};
};

We can define function under template along with the struct:

template <typename T>
struct Pair {
    T x {};
    T y {};
};

template <typename T>
constexpr T max(Pair<T> p) {
    return (p.x < p.y ? p.y : p.x);
}

We can use class templates across multiple files, as long as we put the definitions in (and include properly, into source files) the header files:

pair.h:

#ifndef PAIR_H
#define PAIR_H

template <typename T>
struct Pair {
    T first {};
    T second {};
};

template <typename T>
constexpr T max(Pair<T> p) {
    return (p.first < p.second ? p.second : p.first);
}

foo.cpp:

#include "pair.h"
#include <iostream>

void foo() {
    Pair<int> p1{1,2};
    std::cout << max(p1) << ' is larger\n';
}

main.cpp:

#include "pair.h"
#include <iostream>

void foo();  // forward declaration

int main() {
    foo();
    Pair<double> p2{1.2, 2.3};
    std::cout << max(p2) << "is larger\n";
    return 0;
}

Starting from C++17, when instantiating an object from a class template, the compiler can deduce the template types from the types of the object’s initializer, which is called class template argument deduction or CTAD:

#include <utility>  // for std::pair

int main() {
    std::pair<int, int> p1 {1, 2};    // p1 has explicit type std::pair<int, int>
    std::pair p2 {1,2};               // p2 is deduced to be of type std::pair<int, int>
    std::pair<> p3 {1,2};             // error: too few template arguments, both arguments not deduced
    std::pair<int> p4 {1,2};          // error: too few template arguments, second argument not deduced
}

We can omit the argument type specification for good to utilize this feature, but notice in above example that we cannot omit only a few of the types, nor can we leave empty brackets <>.

Another thing to keep in mind is that, although C++17 introduced CTAD, it doesn’t work all the time in C++17. In the following example, the program doesn’t compile and throws some error like “class template argument deduction failed” in C++17 (but compiles fine in C++20, because C++20 can automatically generate deduction guides). This is when we need to hint the compiler with some deduction guide:

template <typename T, typename U>
struct Pair {
    T first {};
    U second {};
};

// without the following deduction guide, we will have a compile error:
template <typename T, typename U>
Pair(T, U) -> Pair<T, U>;

int main() {
    Pair p {1,2};
    return 0;
}

In addition to deduction guide, we can also provide the default type for type template parameters (so C++17 can fall back to the default types when it cannot decide):

template <typename T=int, typename U=int>
struct Pair {
    T first {};
    U second {};
};

This is also what happens when we define a variable as just Pair p without initializer.

Last thing to remember, is that CTAD doesn’t work with non-static member initialization. In other words, it cannot work inside another struct:

struct Foo {
    std::pair p {1,2};  // error: CTAD doesn't work here
};

Alias templates

We can create a type alias for a class template where all template arguments are specified:

// assuing we have defined `Pair` struct

int main() {
    using Point = Pair<int>;
    Point p {1,2};
    return 0;
}

Instead of using explicit templates, we can also created alias template which is an alias regardless of template argument:

// assuming we have defined `Pair` struct

template <typename T>
using Coord = Pair<T>;

Introduction to Classes

Classes vs structs

A class is define as follows

class Employee {
    int m_id {};
    int m_age {};
    double m_wage {};
};

Notice the ending semicolon as class declaration is also considered a statement. It’s apparent that classes are very similar to structs:

struct Employee {
    int id {};
    int age {};
    double wage {};
};

Both class and struct can have member functions, except that a struct’s constructor cannot be user-defined.

The difference lies in the accesibility of their members, which we will cover in the following sections.

Const class objects and const member functions

There are some rules concerning the const-ness of the class object and its members.

Public and private members and access specifiers

By default, the members of a struct are public and the members of a class are private. That is why in convention it is recommended to name member variables of classes with a prefix m_ suggesting that they’re private members (same logic goes with s_ for static and g_ for global variables).

We can set the access levels via access specifiers:

class Date {
public:
    void print() const {
        std::cout << m_year << '/' << m_month << '/' << m_day;
    }

private:
    int m_year {2024};
    int m_month {2};
    int m_date {28};
};

Together with protected members, we have the full summary of access level within classes: p| Access | Member Access | Derived Class Access | Public Access |

Access Member Access Derived Class Access Public Access
public yes yes yes
protected yes yes no
private yes no no

where “derived class access” means whether the derived class object can access its parent’s member.

In practice, we’re best to avoid using access specifiers for structs altogether, while keeping all data members of a class private (or protected) if possible. Classes normally have public member functions if they’re intended user interface.

The benefits of encapsulation

There are quite a few benefits in encapsulating data and functions inside a class:

That being said, when a function can totally be implemented as a non-member function, it is always recommended to do so instead of blindly adding to the encapsulation. This makes the class more light-weighted and straightforward.

Introduction to constructors

A constructor is a special member function that is automatically called after a non-aggregate class type object is created.

Constructors must have the same name as the class (with the same capitalization). For template classes, this name excludes the template parameters. Constructors have no return type (not even void). Below is a simple example of a class with constructor defined:

#include <iostream>

class Foo {
private:
    int m_x {};
    int m_y {};

public:
    Foo(int x, int y) { // here's our constructor function that takes two initializers
        std::cout << "Foo(" << x << ", " << y << ") constructed\n";
    }

    void print() const {
        std::cout << "Foo(" << m_x << ", " << m_y << ")\n";
    }
};

int main() {
    Foo foo{ 6, 7 }; // calls Foo(int, int) constructor
    foo.print();
    return 0;
}

We can list initialize the members of the class inside the constructor:

...
    Foo(int x, int y):
        m_x {x}, m_y {y} {
        std::cout << "Foo(" << x << ", " << y << ") constructed\n";
    }

Notice that the member initialization happens in the order of the member variables’ definitions, instead of the list initialization.

Default constructors

A class comes without explicit definition a default constructor:

class Foo {
public:
    Foo() {}
};

We can customize the default constructor by modifying the constructor body. However, a class can only have one default constructor, so we cannot define a default constructor without arguments, and define another one with all-default arguments.

In practice, it is recommended to write explicit default constructor instead of leaving the empty body:

class Foo {
public:
    Foo() = default;  // instead of empty body
};

It’s more than just an asthetic difference: user defined default constructor with empty body won’t zero-initialize the member variables if we forgot to:

class UserDefault {
private:
    int m_x;  // not initialized
    int m_y {};
public:
    UserDefault() {}
    void print() {
        std::cout << m_x << ' ' << m_y << '\n';
    }
};

class ExplicitDefault {
private:
    int m_x;  // not initialized 
    int m_y {};
public:
    ExplicitDefault() = default;
    void print() {
        std::cout << m_x << ' ' << m_y << '\n';
    }
};

int main() {
    UserDefault ud{};
    ud.print();
    ExplicitDefault ed{};
    ed.print();
    return 0;
}

The fisrt will print 85 0 instead of 0 0 in this case.

Delegating constructors

Constructors are allowed to delegate (transfer responsibility for) initilization to another constructor from the same class type. This process is sometimes called constructor chaining and such constructors are called delegating constructors.

class Employee {
private:
    std::string m_name {};
    int m_id {};
public:
    Employee(std::string_view name)
    : Employee(name, 0) {
    }
};

We can, however, in this case reduce the constructor by using default arguments:

class Employee {
private:
    std::string m_name;
    int m_id {};
public:
    Employee(std::string_view name, int id = 0)
    : m_name {name}
    , m_id {id} {
        std::cout << "Employee(" << name << ", " << id << ")\n";
    }
};

Copy constructors

If you don’t provide one explicitly, C++ will create a public implicit copy constructor for you. By default, a copy constructor does memberwise initializations and nothing else.

If you decide to provide a customized copy constructor (not recommended), you should avoid doing anything other than copying, and remember to always pass a const reference in the parameter.

You can explicitly use the default specifier just like the default constructor, if you prefer to have an explicit copy constructor.

You can also use delete to prevent copies and throw compile errors if a copy on the underlying class isn’t desired:

class Fraction {
public:
    Fraction(const Fraction& fraction) = delete;  // throw compile error if copy
}

There is a concept called “copy ellision” which basically describes an optimization by compiler that turns

Something s {Something (5)};

into

Something s {5};

so that redundant copy is skipped (or rather, elided). Copy ellision was optional before C++17 and became mandatory afterwards.

Explicit constructors

While C++ supports implicit conversion, only one user-defined conversion may be applied each time. This means implicit conversion sometimes has to be converted to (partially, at least) explicit definitions. Meanwhile, sometimes we need to explicity from the user-passed argument and such implicit conversion may cause confusion. To solve such problems, we can specify a constructor to be explicit:

class Dollars {
private:
    int m_dollars {};
public:
    explicit Dollars(int d)
    : m_dollars{d} {
    }
};

void print(Dollars d) {
    std::cout << '$' << d.getDollars() << '\n';
}

void print(int x) {
    std::cout << x << '\n';
}

int main() {
    print(5);            // prints 5, no confusion/conversion
    print(Dollars{5});   // prints $5
}

More on Classes

The hidden this pointer and member function chaining

Inside every member function, the keyword this is a const pointer that holds the address of the current implicit object.

class Simple {
private:
    int m_id {};
public:
    Simple(int id)
    : m_id {id}{
    }
    
    void setID(int id) {
        m_id = id;
    }

    void print() {
        std::cout << m_id << ' '  // this is the same as
                  << this->m_id   // this with explicit this->
                  << '\n';
    }
};

As a refresher of pointers, this->m_id is equivalently (*this).m_id.

How the compiler rewrites the setID function is that it turns the function to a static function with this being its first argument:

static void setID(Simple* const this, int id) {
    this->m_id = id;
}

For my fellow Python users, here static is the same as @staticmethod in concept. That means the function is nothing but similar to a normal function.

In general, despite being more explicit, it is not recommended to use this-> everywhere. Instead, using the m_ prefix is a much more concise way while doing the same effect.

In addition to using this to access and set member variables, we can also use this to chain up consecutive function calls. For example:

class Calc {
private:
    int m_value {};
public:
    void add(int value) {
        m_value += value;
    }
    void sub(int value) {
        m_value -= value;
    }
    void mul(int value) {
    ```m_value *= value;
    }
    int getVaue() {
        return m_value;
    }
}

The above class requires us to do consecutive calculation in the following manner:

int main() {
    Calc calc{};
    calc.add(5);
    calc.sub(3);
    calc.mul(2);
    std::cout << calc.getValue() << '\n';
    return 0;
}

Instead of above we can do a much concise calculation by using this

class Calc {
private:
    int m_value {};
public:
    Calc& add(int value) {
        m_value += value;
        return *this;
    }
    Calc& sub(int value) {
        m_value -= value;
        return *this;
    }
    Calc& mul(int value) {
        m_value *= value;
        return *this;
    }
    int getValue() {
        return m_value;
    }
};


int main() {
    Calc calc{};
    calc
        .add(5)
        .sub(3)
        .mul(2)
    std::cout << calc.getValue() << '\n';
    return 0;
}

We can also reset a class object back to its default state by

void reset() {
    *this = {};
}

It’s worth noting that this was added to C++ before reference was a thing, and should it be done today, it would definitely be implemented as a self reference, just like many other moderner languages.

Classes and header files

To address the problem of larger and larger classes, C++ allows us to separate the declaration from class member function definitions. For example,

class Date {
private:
    int m_year;
    int m_month;
    int m_day;
public:  // a bunch of declarations only below
    Date(int year, int month, int day);
    void print() const;
    int getYear() const {return m_year};
    int getMonth() const {return m_month};
    int getDay() const {return m_day};
};

Date::Date(int year, int month, int day)// actual definition
    : m_year{year},
      m_month{month},
      m_day{day} {
}

Even further, we can put declarations and definitions in different files. For example, inside date.h we can have

#pragma once

class Date {
private:
    int m_year {};
    int m_month {};
    int m_day {};
public:
    Date(int year, int month, int day);
    void print() const;
    int getYear() const {return m_year};
    int getMonth() const {return m_month};
    int getDay() const {return m_day};
};

and in date.cpp we have

#include "date.h"

Date::Date(int year, int month, int day)
    : m_year {year},
      m_month {moth},
      m_day {day} {
}

void Date::print() const {
    std::cout << "Date(" << m_year <<
                 ", " << m_month <<
                 ", " << m_day << '\n';
}

If you’re concered about the one-definition rule (ODR), rest easy as types are exempt from it and there is no issue to include class definitions into multiple files. Meanwhile, the rule that forbids multiple definitions inside the same file still has its effect and thus we cannot include the same class into the same file multiple times, thus the header guards or #pragma once.

Nested types

#include <iostream>

enum class FruitType {
    apple,
    banana,
    cherry
};

class Fruit {
private:
    using PercentageType = int;
    FruitType m_type { };
    PercentageType m_percentageEaten { 0 };

public:
    Fruit(FruitType type) : m_type { type } {}
    FruitType getType() { return m_type; }
    PercentageType getPercentageEaten() { return m_percentageEaten; }
    bool isCherry() { return m_type == FruitType::cherry; }
};

int main() {
    Fruit apple { FruitType::apple };
    if (apple.getType() == FruitType::apple)
        std::cout << "I am an apple";
    else
        std::cout << "I am not an apple";
    return 0;
}

Note how we define the type alias inside the private chunk to be used in public etc.

Another interesting feature of nested classes is that they can access the private members of their parent class, as long as they’re defined inside the parent class:

class Employee {
private:
    std::string m_name;
    int m_id;
public:
    class Printer {
    public:
        void print(const Employee& e) {
            std::cout << e.m_name << '\n';
        }
    };
};

Notice how the Printer class can access the private members of Employee (although this is not implicitly accessible).

Introduction to destructors

There are several rules about destructors:

The following is an example of a destructor:

class Simple {
private:
    int m_id {};

public:
    Simple(int d)
        : m_id {id} {
    }

    ~Simple() {
        std::cout << "Destructing Simple " << m_id << '\n';
    }
};

If a non-aggregate class type object has no user-declared desturctor this compiler will generate a destructor with an empty body. The destructor is called an implicit destructorm and it is effectively just a placeholder.

It’s important to note that when std::exit() is called, the destructors won’t be triggered and thus no cleanup work shall happen (which is bad).

Class template with member functions

Let’s combine type template and class template:

template <typename T>
class Pair {
private:
    T m_first {};
    T m_second {};

public:
    Pair(const T& first, const T& second)
        : m_first {first},
          m_second {second} {
    }

    bool isEqual(const Pair<T>& pair);
};

template <typename T>
bool Pair<T>::isEqual(const Pair<T>& pair) {
    return m_first == pair.m_first && m_second == pair.m_second;
}

int main() {
    Pair p1 { 5, 6 };
    std::cout << std::boolalpha << "isEqual(5,6): "
              << p1.isEqual(Pair{5,6}) << '\n';
    std::cuut << std::boolalpha << "isEqual(5,7): "
              << p1.isEqual(Pair(5,6)) << '\n';
    return 0;
}

Static member variables

Static member variables are shared by all objects of the class.

struct Something {
    static int s_value;
};

int main() {
    Something first {};
    Something second {};
    first.s_value = 2;
    std::cout << first.s_value << ' ' << second.s_value << '\n';
    return 0;
}

It’s worth noting that static member variables are not associated with the objects at all, which actually makes sense, since they exist even before a class gets instantiated. In order to access the member, we can just use Something::s_value:

class Something {
public:
    static int s_value;
};

int Something::s_value {1};

int main() {
    Something::s_value = 2;
    std::cout << Something::s_value << '\n';
    return 0;
}

See, no object involved in this example.

Static member variables cannot be initialized at the same time as definition, hence the extra line outside the class definition. There are two exceptions, though:

Static member functions

Note that static member variables are not accessible via class if they’re private. To solve this problem, we need static member functions:

class Something {
private:
    static inline int s_value {1};

public:
    static int getValue() {
        return s_value;
    }
};

int main() {
    std::cout << Something::s_value << '\n';  // compile error
    std::cout << Something::getValue() << '\n';  // ok
    return 0;
}

Is also worth noting (and quite natural) that a static function has no access to the this pointer, since no object must be involved. Also, C++ doesn’t support static constructors (which quite makes sense too).

Friend non-member functions

When we want a non-member function to ba able to access the private variables, we need friendship. Inside the body of a class, a friend declaration using the friend keyword can be used to tell the compiler that some other class or function is now a friend, which in C++ means they have been granted access to the private and protected members of the said class.

class Accumulator {
private:
    int m_value {};

public:
    void add(int value) {
        m_value += value;
    }

    friend void print(const Accumulator& accumulator);
};

void print(const Accumulator& accumulator) {
    std::cout << accumulator.m_value;
}

int main() {
    Accumulator acc {};
    acc.add(5);
    print(acc);
    return 0;
}

Instead of declaring a friend non-member function and define it outside the class, as can also just define the friend function inside the class:

class Accumulator {
private:
    int m_value {};

public:
    void add(int value) {
        m_value += value;
    }

    friend void print(const Accumulator& accumulator) {
        std::cout << accumulator.m_value;
    }
};

A function can be friend of multiple classes.

Friend classes and friend member functions

Just like friend non-member functions, we can define friend classes and member functions.

class Storage {
private:
    int m_value {};

public:
    Storage(int value) : m_value{value} {}
    friend class Display;
};

class Display {
private:
    bool m_display {};

public:
    Display(bool display) : m_display {display} {}
    void displayStorage(const Storage& storage) {
        if (m_display) {
            std::cout << storage.m_value << '\n';
        }
    }
};

which declares the class Display to be a friend of Storage and thus make the prior to be able to access the private members of the latter. Instead of doing that for the whole class, we can also declare only the function as the friend of Storage. It’s worth noting that in order to avoid a compile error, we need to have Storage declared before Display. If that’s not possible for the full class, we can at least do forward declaration and just write this on the top:

class Storage;

Ref qualifier

Let’s say we have the following member function tht returns a const reference:

auto& getName() const { return m_name; }

and want to avoid the potential problem of dangling references. We could ref-qualify this function by adding a & qualifier to the overload that will match only lvalue implicit objects, and a && qualifier to the overload that will match only rvalue implicit objects:

const auto& getName() const & {return m_name;}  // & for lvalue implicit (return by reference)
auto getName() const && {return m_name;}        // && for rvalue implicit (return by value)

With the above overloading, we can safely run the below program without worrying about dangling references:

int main() {
 Employee joe{};
 joe.setName("Joe");
 std::cout << joe.getName() << '\n'; // Joe is an lvalue, so now returned by reference
 std::cout << createEmployee("Frank").getName() << '\n'; // Frank is an rvalue, so now returned by value
 return 0;
}

Two things to note about ref-qualifying:

Dynamic Arrays: std::vector

Introduction to containers and arrays

The following types are containers under the general programming definition, but are not considered to be containers by the C++ standard:

To be a container in C++, the container must implement all of the requirements here. That being said, since std::string and std::vector<bool> implement most of the requirements, they behave like containers in most circumstances and are sometimes called “pseudo-containers”.

Of the provided container classes, std::vector and std::array see the most use and will be where we focus the bulk of our attention.

An array is a container data type that stores a sequence of values contiguously (meaning each element is placed in an adjacent memory location with no gaps). Arrays allow fast, direct access to any element and are conceptually simple and easy to use.

There are three types of primary array types in C++: C-style arrays, std::vector container class and std::array container class. C-style arrays were inherited from the C language. For backwards compatibility, these arrays are defined as part of the core C++ language (much like the fundamental data types). They are also sometimes called “naked arrays”, “fixed-sized arrays” or “built-in arrays”. To help make arrays safer and easier to use in C++, the std::vector container class was introduced in C++03 and is the most flexible of the three array types, and has a bunch of useful capabilities that the other array types don’t. Finally, the std::array container class was introduced in C++11 as a direct replacement for C-style arrays. It is more limited than std::vector but can also be more efficient, especially for smaller arrays.

Introduction to std::vector and list constructors

The std::vector container is defined in the <vector> header as a class template with a template type parameter that defines the type of the elements. For example:

#include <vector>

int main() {
    std::vector<int> empty{};
    return 0;
}

You can initialie the std::vector with a list of values also:

std::vector<int> primes {2, 3, 5, 7, 11};
std::vector vowels {'a', 'e', 'i', 'o', 'u'};  // type char deduced by CTAD

The above is called a list constructor and is provided often by container classes. The list constructor does three things:

We can access the elements like the following:

int pp = primes[0] + primes[1];

Notice operator[] does not do any bounds checking and thus passing an invalid index (negative / greater than or equal to the length of the array) will result in undefined behavior.

One of the defining characteristics of arrays is that the elements are always allocated contiguously in memory, meaning the elements are all adjacent in memory (with no gaps between them):

#include <iostream>
#include <vector>

int main() {
    std::vector primes {2, 3, 5, 7, 11};
    std::cout << "An int is " << sizeof(int) << "bytes\n";
    std::cout << &(primes[0]) << '\n';
    std::cout << &(primes[1]) << '\n';
    std::cout << &(primes[2]) << '\n';
    return 0;
}

This produces the following output:

An int is 4 bytes
00DBF720
00DBF724
00DBF728

We can construct a vector of specific length to avoid the following kind of work:

std::vector<int> data = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; // ten zeros

by using the explicit constructor explicit std::vector<T>(int):

std::vector<int> data(10); // vector containing ten int elements, value-initialized to zero

Notice the above initialization uses direct initialization with parentheses instead of braces – there is apparent ambiguity from using braces in this case since both list constructor and the explicit constructor can take this form. The reality is, C++ has a special rule to select a matching list constructor over other constructors (but we should always just avoid such confusion).

Lastly, the following is a list of examples of different ways to initialize a vector:

std::vector<int> v1 = 10;   // compile error
std::vector<int> v2(10);    // explicit constructor; ten zeros
std::vector<int> v3{10};    // list constructor; just single zero
std::vector<int> v4 = {10}; // copy constructor; single zero
std::vector<int> v5({10});  // copy constructor; single zero

The unsigned length and subscript problem

Before everything, let’s talk about history.

As Bjarne Stroustrup recalls, when the container classes in the C++ standard library was being designed (circa 1997), the designers had to choose whether to make the length (and array subscripts). They chose to make them unsigned as lengths cannot be nagative naturally, and unsigned type allows greater maximum (important back in the 16-bit days).

In retrospect, this is generally regarded as having been the wrong choice, because negative signed integer will be implicitly converted to a large unsigned integer producing a garbage result, and the commonly used operator[] doesn’t do range-checking at all.

Above being said, we can ask a container for its length using size() member function (which returns the length as unsigned size_type):

#include <iostream>
#include <vector>

int main() {
    std::vector prime{2, 3, 5, 7, 11};
    std::cout << "length: " << prime.size() << '\n';
    return 0;
}

Unlike std::string and std::string_view which have both length() and size() member functions (that do the same thing), std::vector (together with most other container types) only has size(). Starting from C++17, we can also use std::size non-member function to do the same thing:

std::size(prime)

Then, starting from C++20, we have std::ssize() to get the length as a large signed integral type (usually std::ptrdiff_t) which is the type normally used as the signed counterpart to std::size_t. This is the only function of the three that returns the length as a signd type. When saving the returned value to an integral type, you can either use int (but with static_cast to avoid implicit narrowing conversion warning/error). Or, you can use auto to let the compiler to deduce the type used for the variable.

On the other hand, while operator[] does no bounds checking, we have at() member function that does runtime bounds checking:

prime.at(5);  // throws exception of type std::out_of_range

Alternatively, we can also define the index as a constexpr int or std::size_t to avoid implicit narrowing conversion (but alas, no bounds checking).

Passing std::vector

An object of type std::vector can be passed to a function just like any other object. That means if we pass it by value, an expensive copy will be made, hence we should typically pass std::vector by (const) reference to avoid such copies. While doing so, we may use CTAD to omit the template parameter for better compatibility in function definition.

Returning std::vector, and an introduction to move semantics

Check the following example:

#include <iostream>
#include <vector>

int main() {
    std::vector arr1 {1, 2, 3, 4, 5};
    std::vector arr2 { arr1 };  // copy of arr1
    arr1[0] = 6;
    arr2[0] = 7;
    std::cout << arr1[0] << arr2[0] << '\n';
    return 0;
}

The above prints 6 and 7 because we know arr2 is nothing but a copy of arr1. The term copy semantics refers t othe rules that determine how copies of objects are made. When we say a type supports copy semantics, we mean that objects of that type are copyable, because the rules for making such copies have been defined. When we say copy semantics are being invoked, that means we’ve done something that will make a copy of an object.

For class types, copy semantics are tytically implemented via the copy constructor (and copy assigment operator).

Let’s now check another example:

#include <iostream>
#include <vector>

std::vector<int> generate() {
    std::vector arr1 {1, 2, 3, 4, 5};
    return arr1;
}

int main() {
    std::vector arr2 { generate() };  // the return value of generate() dies after this line
    arr2[0] = 7;  // nothing to do with arr1
    std::cout << arr2[0] << '\n';
    return 0;
}

When arr2 is initialized this time, it is being initialized using a temporary object returned from function generate() and unlike the prior case, here the rvalue is destroyed right after and copying is very pointlessly costly. That is why we need to introduce the move semantics, which refers to the rules that determine how the data from one object is moved to another object. When the move semantics is invoked, any data member that can be moved is moved, and any data member that can’t be moved is copied. Normally, when an object is being initialized with or assigned an object of the same type, copy semantics will be used (assuming copy isn’t elided by user). However when all of the following conditions are true, the move semantics will be invoked instead:

The sad news is that not many types support move semantics – however, std::vector and std::string both do. That means we can return move-capable types like std__vector by value just fine. That being said, we should still pass these move-capable objects by const references because move semantics won’t be invoked when we pass these by values, as they’ll be lvalues at the time. So in short: pass by const references, return by values.

Arrays and loops

Arrays provide a way to store multiple objects without having to name each element. Loops provide a way to traverse an array without having to explicitly list each element. Templates provide a way to parameterize the element type. Together, they allow us to write code that can operate on a container of elements, regardless of the element type or number of elements in the container.

#include <iostream>
#include <vector>

template <typename T>
T calculateAverages(const std::vector<T>& arr) {
    std::size_t length {arr.size()};
    T average{0};
    for (std::size_t i{0}; i < length; ++i) {
        average += arr[i];
    }
    average /= static_cast<int>(length);
    return average;
}

It’s good to remember (yes) how the simple for loop is written: initialized zero, and terminate when i < length.

Arrays, loops and sign challenge solutions

We can use int as the type of loop index (and it’s preferred, in fact). If you’re lazy, you can use auto which will deduce the type for you. In C++23, you can even att the Z suffix to define a literal to be signed:

for (auto i{0Z}; i < static_cast<std::ptrdiff_t>(arr.size()); ++i)

or utilizing std::ssize() introduced in C++20:

for (auto i{0Z}; i < std::ssize(arr); ++i)

Notice that when we use these index variables, since they’re now signed, they will throw warnings when being narrowing converted to unsigned inside operator[]. We can use a number of ways to avoid such warnings (and no, we’re not gonna introduce all of them here cuz they’re annoyingly ugly imo) but the preferred solution is actually surprisingly simple: we don’t index arrays altogether. In fact, C++ provides several other methods for traversing through arrays that do not use indices at all, and if we don’t have indices, we don’t worry about these signed/unsigned issues. Two most common methods are range-based for loops, and interators.

Range-based for loops

The range-based for statement has the following syntax:

for (element_declaration : array_object)
    statement;

For example:

#include <iostream>
#include <vector>

int main() {
    std::vector fibonacci { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 };
    for (int num : fibonacci)
        std::cout << num << ' ';
    std::cout << '\n';
    return 0;
}

We can even use auto for type deduction:

for (auto num: fibonacci)
    std::cout << num << ' ';

For classes with expensive copying e.g. strings, we may also want to avoid copying altogether in the range-based loops by using const references:

#include <iostream>
#include <string>
#include <vector>

int main() {
    using namespace std::literals;  // for s suffix as string literals
    std::vector words {"peter"s, "likes"s, "frozen"s, "yogurt"s};
    for (const auto& word : words)
        std::cout << word << ' ';
    std::cout << '\n';
    return 0;
}

It’s important to notice that range-based loop doesn’t give you the index automatically, so you might want to keep your own index counter variable if needed (or just use the classic for loop).

Last but not least, if you want to do range-based loop in reverse:

#include <iostream>
#include <ranges>  // C++20
#include <string_view>
#include <vector>

int main() {
    using namespace std::literals;  // for sv suffix as string_view
    std::vector words {"alex"sv, "bobby"sv, "chad"sv, "dave"sv};
    for (const auto& word : std::views::reverse(words)) // from <ranges>
        std::cout << word << ' ';
    std::cout << '\n';
    return 0;
}

Array indexing and length using enumerators

One of the bigger documentation problems with arrays is that their indices don’t normally provide any real information. For example:

#include <vector>

int main() {
    std::vector testScores {78, 79, 71, 56, 55};
    testScores[2] = 77;  // whose score is it?
    return 0;
}

We can use unscoped enumerators instead:

namespace Students {
    eunm Names {
        Sam,
        Allen,
        Billy,
        Wonka,
        Zoe
    };
}

int main() {
    std::vector testScores {78, 79, 71, 56, 55};
    testScores[Students::Allen] = 78;
    return 0;
}

Because enumerators are implicitly converted to std::size_t, we don’t need to worry about them being used indices. Meanwhile, as they’re constexpr, we don’t need to worry about signed/unsigned indexing problem either. Cool.

When we define a non-constexpr variable of the enumeration type, that would break the last statement above, thus causing a sign conversion warning. We can be more explicit if we really need that variable:

namespace Students {
    enum Names : unsigned int {
        Sam,
        Allen,
        Billy,
        Wonka,
        Zoe
   };
}

To avoid the size of the array being shorter than the enumeration, we can add an extra enumerator in the end for array initialization:

namespace Students {
    enum Names {
        Sam,
        Allen,
        Billy,
        Wonka,
        Zoe,
        num_students
    };
}

int main() {
    std::vector<int> testScores(Students::num_students);  // properly initialized
    ...
}

or assert using this extra enumerator:

#include <cassert>

int main() {
    std::vector testScores { 78, 79, 71, 56, 55 };
    assert(std::size(testScores) == Students::num_students);
    return 0;
}

std::vector resize and capacity

You can resize a std::vector at runtime:

#include <iostream>
#include <vector>

int main() {
    std::vector v{0, 1, 2};
    std::cout << "the length of v is " << v.size() << '\n';

    v.resize(5)
    std::cout << "the length of v is " << v.size() << '\n';
    for (auto i : v)
        std::cout << i << ' ';
    std::cout << '\n';
    return 0;
}

This prints:

The length is: 3
The length is: 5
0 1 2 0 0

Another important concept is the capacity of std::vector. This is the actual units of memory allocated to the array. When we resize an array down to some smaller value, the memory does not get reallocated right away and thus making capacity lagging behind the size. The indexing is based on the size instead of capacity. In order to force the reallocation, we have shrink_to_fit() member function.

std::vector and stack behavior

In programming, a stack is a container data type where the insertion and removal of elements occurs in a LIFO (last-in-first-out) manner. This is commonly implemented via two operations named push and pop.

In C++, stack-like operations were added (as member functions) to the existing standard library container classes that support efficient insertion and removal of elements at one end (std::vector, std::deque and std::list). This allows any of these containers to be used as stacks in addition to their native capabilities.

For std::vector, we have the following member functions to support stack-like behavior:

Here emplace_back is mostly the same as push_back except when we’re creating a temporary variable to be pushed into a stack, emplace_back skips the potentially expensive copying and just passes the bare arguments to the constructor:

class Foo {
private:
    std::string m_a{};
    int m_b{};
public:
    Foo(std::string_view a, int b)
        : m_a{a}, m_b{b} {
    }

    explicit Foo(int b)
        : m_a {}, m_b {b} {
    }
};

int main() {
    std::vector<Foo> stack{};

    Foo f{"a", 2};
    stack.push_back(f);          // preferred 
    stack.emplace_back(f);       // works too

    stack.push_back({"a", 2});   // create and then copy
    stack.emplace_back("a", 2);  // create, no copy

    stack.push_back({ 2 });      // compile error: Foo(int) is explicit
    stack.emplace_back(2);       // ok
}

Unlike the subscript operator (operator[] and at()), here push_back and pop_back (and emplace_back) will actively change the size of the array (but pop_back is still lazy on capacity).

In order to limit the activeness in reallocating capacities, we can use the reserve() member function, which reallocates a std::vector without changing its current size.

int main() {
    std::vector<int> stack{};
    stack.reserve(6);      // size: 0, capacity: 6
    stack.push_back(2);    // size: 1, capccity: 6
    stack.push_back(1);    // size: 2, capccity: 6
    stack.push_back(4);    // size: 3, capccity: 6
    stack.push_back(7);    // size: 4, capccity: 6
    stack.push_back(8);    // size: 5, capccity: 6
    stack.push_back(9);    // size: 6, capccity: 6
    stack.pop_back();      // size: 5, capccity: 6
    
}

std::vector<bool> vs std::bitset

std::vector<bool> works similarly to std::bitset with maybe better space efficiency. However, the sizeof(std::vector<bool>) has a 40-byte overhead so the saved memory won’t mean much unles we’re allocating more than 40 boolean values. Meanwhile, the space optimization depends on the implementation a lot and thus is not always trustworthy. Lastly, we should note that std::vector<bool> is actually not a vector (not contiguous in memory) nor does it hold real boolean values, nor does it meet the C++ standard of containers. In short, we should maybe just avoid using std::vector<bool> altogether.

Instead, three alternative containers are recommended:

Fixed-size Arrays: std::array and C-style Arrays

Introduction to std::array

There are two reasons why we still need std::array when we’ve already got std::vector:

and the second reason makes the majority of the it. In short, whenever we need constexpr arrays, we should use std::array over std::vector.

In order to declare a std::array:

#include <array>

int main() {
    std::array<int, 5> a{};
}

The size of a std::array must be a constexpr upon declaration, e.g. a integer literal, a constexpr variable or an unscoped enumerator.

The initialization of a std::array does not involve a constructor, because it’s not a class but an aggregate. It means if a std::array is defined without an initializer, the elements will be default initialized.

std::array<int, 5> a{1,2,3,4,5};       // list initialization
std::array<int, 5> b = {1,2,3,4,5};    // copy initialization
std::array<int, 5> c;                  // uninitialized!
std::array<int, 5> d{};                // zero initialized; ok
std::vector<int> v(5);                 // vector can be value initialized

We can define a std::array as const or constexpr (latter preferred all the time, otherwise ask yourself if you really need a std::array rather than a std::vector):

const std::array<int, 5> a{1,2,3,4,5};
constexpr std::array<int, 5> b{1,2,3,4,5};

Starting from C++17, we can use CTAD for type deduction (and are recommended to use it as much as possible):

constexpr std::array a {1,2,3,4,5};     // <int, 5> deducted
constexpr std::array<int> b{1,2,3,4,5}; // error: cannot partially omit template parameter!

The most common way to access the elements of a std::array is through subscripts/indices (via operator[]). There is also an at() member function that unfortunately doesn’t do bound checking at compile time (it does at runtime) either, and thus is not recommended.

std::array length and indexing

There are three common ways to ge the length of a std::array:

All above methods returns the length in constexpr even when the std::array itself is not constexpr.

In addition to operator[] (does not bound checking) and at() (runtime bound checking), std::array also supports std::get() non-member function that does compile time bound checking:

#include <array>
#include <iostream>

int main() {
    constexpr std::array prime{2, 3, 5, 7, 11};
    std::cout << std::get<3>(prime);  // prints 7
    std::cout << std::get<9>(prime);  // compile error
    return 0;
}

Passing and returning std::array

CTAD doesn’t work for function parameters, so when passing std::array to a function, we need to explicitly specify the element type and array length. If we really want the function to be able to accept general array type, we can create a function template that parameterizes both the element type and array length on the function level.

Before indexing the array inside the function, we need to statically assert the length versus the indices of interest, there are two ways:

In terms of returning the std::array, there are three options:

std::array of class types, and brace elision

We can define a std::array of class types and assign elements like follows:

#include <array>
#include <iostream>

struct House {
    int number {};
    int stories {};
    int roomsPerStory {}; 
};

int main() {
    std::array<House, 3> houses {};
    houses[0] = {12, 1, 7};
    houses[1] = {11, 2, 5};
    houses[2] = {10, 2, 3};
    for (const auto& house : houses) {
        std::cout << "House number " << house.number
                  << " has " << (house.stories * house.roomsPerStory)
                  << " rooms.\n";
    }
    return 0;
}

Alternatively, we can also explicitly initialize a std::array of structs:

constexpr std::array houses {
    House{12, 1, 7},
    House{11, 2, 5},
    House{10, 2, 3}  
};

However, do remember that the following way of initialization does not work:

constexpr std::array<House, 3> houses {
    {12, 1, 7},
    {11, 2, 5},
    {10, 2, 3}  
};

Instead, a compile error will be thrown because the std::array implementation is nothing but a struct with a C-style array as its first member, thus the second and third pair of braces are not recognized. To remedy this problem, we can add an additional set of braces:

constexpr std::array<House, 3> houses {
    {
        {12, 1, 7},
        {11, 2, 5},
        {10, 2, 3}        
    }
};

In fact, you can always add this extra set of braces in the initialization of std::array, and that is totally legal in C++. Truth is aggregates in C++ supports a concept called brace elision that basically says if it’s not totally necessary, you can elide the outer pair of braces just fine.

Arrays of references via std::reference_wrapper

Because references are not objects, you cannot make an array of references. The elements of an array must be assignable, and references simply can’t be reseated.

int x{1};
int y{2};
[[maybe_unused]] std::array<int&, 2> refarr{ x, y };  // compile error

int& rx{x};
int& ry{y};
[[maybe_unused]] std::array valarr{rx, ry};  // ok, but this is just <int, 2>

If we do want an array of references, there is a workaround by using std::reference_wrapper which lives in the <functional> header and takes a type template parameter T and then behaves like a modifiable lvalue reference to T. Several things to pay attention:

Here’s a simple example:

#include <array>
#include <functional>
#include <iostream>

int main() {
    int x{1};
    int y{2};
    int z{3};

    std::array<std::reference_wrapper<int>, 3> arr{x, y, z};
    arr[1].get() = 5;  // you can modify the object being referenced 
    std::cout << arr[1] << y << '\n';  // prints 55 because y has been modified 
    return 0;
}

Prior to C++17, CTAD didn’t exist, so explicit type must be specified. Therefore, in order to make things easier, introduced were std::ref() and std::cref() functions that basically served as shorthands for std::reference_wrapper:

int x{1};
std::reference_wrapper<int> rx{x};        // C++11 explicit 
auto rx2{std::reference_wrapper<int>{x}}  // C++11 explicit
auto ref{std::ref(x)};                    // C++11 shorthand for std::reference_wrapper<int>
auto cref{std::cref(x)};                  // C++11 shorthand for std::reference_wrapper<const int>
std::reference_wrapper rx3{x};            // C++17 with CTAD
auto rx4{std::reference_wrapper{x}};      // C++17 with CTAD

Introduction to C-style arrays

Being part of the core language, C-style arrays have their own special declaration syntax. In an C-style array declaration, we use square brackets [] to tell the compiler that a declared object is a C-style array. Inside the brackets, we can optionally provide the length of the array, which is an integer value of type std::size_t that tells the compiler how many elements are in the array.

int main() {
    int testScores[30] {};
    return 0;
}

The length of a C-style array must be at least 1, or the compiler will throw an error. The length of the array must be a constexpr. Also, in contrast to e.g. std::vector, the indices here doesn’t need to be unsigned.

There are several different ways to initialize a C-style array:

int fibonacci[5] = {1, 1, 2, 3, 5};  // copy-list initialization
int prime[5] {2, 3, 5, 7, 11};       // list initialized (preferred)
int prime[5] {2, 3, 5, 7, 11, 13};   // compile for too many initializers
auto prime[5] {2, 3, 5, 7, 11};      // compile error for CTAD doesn't work for C-style arrays
int prime[] {2, 3, 5, 7, 11};        // length deduced (preferred)
int prime[] {};                      // compile error for zero-lengthed array
int arr[5];                          // default initialized with elements uninitialized
int arr[5] {};                       // value initialized with elements zero initialized (preferred)

As for getting the size of a C-style array, we can use the sizeof() operator or std::size() and std::ssize() non-member functions :

const int prime[] { 2, 3, 5, 7, 11 };
sizeof(prime);     // returns 20 (assuming 4 bytes each int)
std::size(prime);  // C++17, returns unsigned int 5
std::ssize(prime); // C++20, returns signed int 5

For C++14 or older versions, we can also use the following custom function to get the length of an array:

#include <iostream>

template <typename T, std::size_t N>
constexpr std::size_t length(const T(&)[N]) noexcept {
    return N;
}

int main() {
    int prime[5] {2, 3, 5, 7, 11};
    std::cout << "length: " << length(prime) << '\n';
    return 0;
}

C-style array decay

In most cases, when a C-style array is used in an expression, the array will be implicitly converted into a pointer to the element type, initialized with the address of the first element (with index 0). Colloquially, this is called array decay.

#include <iomanip>  // for std::boolalpha 
#include <iostream>

int main() {
    int arr[5] {1, 2, 3, 4, 5};
    auto ptr {arr};  // ptr is of type int*
    std::cout << std::boolalpha << (typeid(ptr) == typeid(int*)) << '\n';  // prints true
    std::cout << std::boolalpha << (&arr[0] == ptr) << '\n';               // prints true 
    return 0;
}

There are only a few cases in C++ where a C-style array doesn’t decay:

A decayed array pointer does not know the length of itself and thus the term “decay”. This decay behavior actually solves the problem of passing a huge array as argument – when passed as decayed array, it’s actually the pointer that holds the address of the first element of the array that gets passed. Therefore, no copy is made in the process.

void printElementZero(const int* arr) {
    std::cout << arr[0];
} // same as below 

void printElementZero(const int arr[]) {
    std::cout << arr[0];
}

The problem with array decay is, as the length information is lost during the process, the following function won’t work correctly:

void printArraySize(const int arr[]) {
    std::cout << sizeof(arr) << '\n';  // prints 4 no matter what
}

Fortunately, C++17’s better replacement std::size() (and C++20’s std::ssize()) won’t compile in this case:

void printArraySize(const int arr[]) {
    std::cout << std::ssize(arr) << '\n';  // compile error: std::size() doesn't work on pointers
}

In general, we can just avoid using the good O’ C-style arrays nowadays.

Pointer arithmetic and subscripting

Pointer arithmetic is a feature that allows us to apply certain integer arithmetic operators (addition, subtraction, increment, or decrement) to a pointer to produce a new memory address.

We can subscript a pointer holding the address of an array’s first element to get other elements of the array:

#include <iostream>

int main() {
    const int arr[] {1,2,3,4,5};
    const int* ptr{arr};
    std::cout << ptr[2] << '\n';      // prints 3
    std::cout << *(ptr + 2) << '\n';  // prints 3
    return 0;
}

Following this trick we can traverse through an array using pointers:

#include <iostream>

int main() {
    constexpr int arr[] = {1,2,3,4,5};
    const int* begin{arr};
    const int* end{arr + std::size(arr)};
    for (int* i{begin}; i != end; ++i) { // terminate when i == end
        std::cout << *i << ' ';
    }
    std::cout << '\n';
    return 0;
}

Not surprisingly, the range-based for loops over C-style arrays are exactly implemented using pointer arithmetic:

for (auto i : arr) {
    std::cout << i << ' ';
}
std::cout << '\n';

C-style strings

Although C-style strings have fallen out of favor in modern C++ for being hard to use and dangerous comparing with std::string and std::string_view, we’re going through the basics of them here. To define a classic C-style string vairable, we simply declare a C-style array variable of char (or const char / constexpr char) type:

char str[8]{};
const char str[]{"this is a string"};
constexpr char str[]{"hello world"};

Remember that there is an extra character for the implicit null terminator. For this particular reason, it’s highly recommended to omit the length upon declaration and the let the compiler calculate the length for you.

To print a C-style string, we can simply std::cout it because the output streams (e.g. std::cout) make some assumptions about your intent (address for non-cahr pointer, whole string for char pointers):

#include <iostream>

void print(char ptr[]) {
    std::cout << ptr << '\n';
}

int main() {
    char str[] {"this is fucked up"};
    print(str);
    return 0;
}

Since the output streams always prints the whole string when the underlying is a char*, weird things like the following can happen:

#include <iostream>

int main() {
    char c{'Q'};                                        // just regular char 
    std::cout << &c << '\n';                            // trying to print the address of a char
    std::cout << static_cast<const void*>(&c) << '\n';  // this would work as expected 
    return 0;
}

The output is

Q
0x16d64b08b

but the first line can be literally anything, like Q╠╠╠╠╜╡4;¿■A – that’s what undefined behavior means.

To read a C-style string:

#include <iostream>
#include <iterator>

int main() {
    char str[255]{};  // an array large enough to hold 254 characters + null terminator 
    std::cout << "Enter your string: ";
    std::cin.getline(str, std::size(str));
    std::cout << "You entered: " << str << '\n';
    return 0;
}

To modify a C-style string, you can only assign values to each elements one by one:

char str[]{"what?"};
str[2] = "o";

To get the length of a C-style string, the previously mentioned functions don’t work:

char str[255]{"string"};  // 254 available characters + null terminator; using 6+1 only
std::size(str);  // returns 255 

char* ptr{str};  // decayed C-style string 
std::size(ptr);  // compile error 

Luckily, we can use the strlen() function in the <cstring> header:

#include <cstring>
#include <iostream>

int main() {
    char str[255] {"string"};
    std::cout << std::strlen(str) << '\n';  // prints 6
    char* ptr {str};
    std::cout << std::strlen(ptr) << '\n';  // prints 6
    return 0;
}

Some other C-style string manipulating functions:

C-style string symbolic constants

Although seemingly producing the same strings, C++ deals with the memory allocation differently in the following two ways to define a string symbolic constant:

const char name[] {"Allen"};  // one copy in global memory, one for `name`
const char* name{"Allen"};    // only one copy and a pointer; more efficient 

Type deduction for a C-style string is fairly straightforward:

auto s1{"Allen"};   // const char* 
auto* s2{"Allen"};  // const char*
auto& s3{"Allen"};  // const char(&)[5]

Multidimensional C-style arrays

We can define a two-dimensional C-style array as:

int a[2][3];

C++ uses row-major order (there are languages e.g. Fortran that use column-major order), meaning that elements in an array are sequentially placed in memer row-by-row, ordered from left to right, top to bottom. For example, the elements inside the above array are placed sequentially in the following order:

[0][0] [0][1] [0][2] [1][0] [1][1] [1][2]

Initializing a two-dimentional array is as easy as

int a[2][3] {
    {1,2,3},
    {4,5,6}
};

The missing initializers will be value-initialized:

int a[2][3] {
    {1,2},
    {4},
};  // result in {{1,2,0},{4,0,0}}

You can also omit (only the leftmost) length in the declaration:

int a[][3] {
    {1,2,3},
    {4}
};

Lastly, just like regular one-dimensional arrays, you can value-initialize the whole array to zeros:

int a[2,3]{};

Multidimensional std::array

Is there a standard library class for multidimensional arrays? Sadly, the answer is no. However, we can still define one like follows:

std::array<std::array<int, 3>, 2> a {{
    {1,2,3},
    {4,5,6}
}};  // notice the double braces! 

This syntax is verbose and hard to read and requires double braces (for reasons mentioned in previous sections) and swaps the row- and column-number awkwardly. To make it easier to use, we can create an alias template like the following:

template <typename T, std::size_t Row, std::size_t Col>
using Array2d = std::array<std::array<T, Col>, Row>;

Array2d<int, 2, 3> a {{
    {1,2,3},
    {4,5,6}
}};

In C++23 we have a new std::mdspan which provides a simple way to reshape a one-dimensional array to a multidimensional one:

std::array<int, 6> a{1,2,3,4,5,6};
std::mdspan mdView { a.data(), 2, 3 };  // first arg is a pointer to the array data
std::size_t row {mdView.extents().extent(0)};
std::size_t col {mdView.extents().extent(1)};

// print in 1d
for (std::size_t i=0; i < mdView.size(); ++i)
    std::cout << mdView.data_handle()[i] << ' ';
std::cout << '\n';

// print in 2d
for (std::size_t r=0; r < row; ++r) {
    for (std::size_t c=0; c < col; ++c)
        std::cout << mdView[r, c] << ' ';  // operator[] accepts multiple indices since C++23
    std::cout << '\n';
}
std::cout << '\n';

In C++26, we’ll have std::mdarray which officially combines std::array with std::mdview. Hooray!

Iterators and Algorithms

Sorting an array using selection sort

Selection sort performs the following steps to sort an array from smallest to largest:

Here is how this algorithm is implemented in C++:

#include <iostream>
#include <iterator>
#include <utility>

int main() {
    int array[] {3,5,2,1,4};
    constexpr int length{static_cast<int>(std::size(array))};

    for (int i{}; i < length - 1; ++i) {
        int i_min {i};
        for (int j{i + 1}; j < length; ++j) {
            if (array[j] < array[i_min]) i_min = j;
        }
    std::swap(array[i], array[i_min]);
    }
    
    for (int i{}; i < length; ++i) std::cout << array[i] << ' ';
    std::cout << '\n';
    return 0;
}

Provided in the <algorithm> header we have std::sort that does the tedius work for us:

#include <algorithm>
#include <iostream>
#include <iterator>

int main() {
    int array[] {3,5,2,1,4};
    std::sort(std::begin(array), std::end(array));
    for (auto i : array) std::cout << i << ' ';
    std::cout << '\n';
    return 0;
}

Introduction to iterators

An iterator is an object designed to traverse through a container (e.g. the values in an array, or the characters in a string), providing access to each element along the way. A container may provide different kinds of iterators. For example, an array container might offer a forwards iterator that walks through the array in forward order, and a reverse iterator that walks through the array in reverse order.

We can use a pointer as an iterator:

#include <array>
#include <iostream>

int main() {
    std::array data{0,1,2,3,4,5,6};
    auto begin{&data[0]};
    auto end{begin + std::size(data)};  // this is one after the last element 
    for (auto ptr{begin}; ptr != end; ++ptr)
        std::cout << *ptr << ' ';
    std::cout << '\n';
    return 0;
}

We can also use standard library iterators:

#include <array>
#include <iostream>

int main() {
    std::array data{0,1,2,3,4,5,6};
    auto begin{data.begin()};
    auto end{data.end()};  // also one after the last element 
    for (auto ptr{begin}; ptr != end; ++ptr)
        std::cout << *ptr << ' ';
    std::cout << '\n';
    return 0;
}

The <iterator> header also provides std::begin and std::end similarly:

#include <array>  // this includes <iterator>
#include <iostream>

int main() {
    std::array data{0,1,2,3,4,5,6};
    auto begin{std::begin(data)};
    auto end{std::end(data)};
    for (auto ptr{begin}; ptr != end; ++ptr)
        std::cout << *ptr << ' ';
    std::cout << '\n';
    return 0;
}

Notice we’re not using operator< in the for loop above, because some iterator types are not relationally comparable and operator!= works with those types.

All types that have both begin() and end() member functions, or that can be used with std::begin() and std::end(), are usable in range-based for loops:

#include <array>
#include <iostream>

int main() {
    std::array data{1,2,3};
    for (auto i : data)
        std::cout << i << ' ';
    std::cout << '\n';
    return 0;
}

There is a concept called iterator invalidation which basically refers to an iterator becoming a dangling pointer. This happens when you modify the container while using the iterator. For example:

#include <vector>

int main() {
    std::vector v {0,1,2,3};
    for (auto i : v) {
        if (i % 2 == 0)
            v.push_back(i + 1);  // modifying the container here 
    }
    return 0;
}

The above program will generate undefined behavior for the iterator has been invalidated when we push the new elements into the container. Another example is

#include <iostream>
#include <vector>

int main() {
    std::vector v{0,1,2,3,4,5};
    auto i {v.begin};
    ++i;
    std::cout << *i << '\n';  // ok: prints 1
    v.erase(i);  // modifying the container, iterator invalidated 
    ++i;
    std::cout << *i << '\n';  // undefined behavior!
    return 0;
}

To fix the above program, we can use the fact vector’s erase() member function returns the iterator to the next element (or end() when the last element is erased):

#include <iostream>
#include <vector>

int main() {
    std::vector v{0,1,2,3,4,5};
    auto i{v.begin()};
    ++i;
    std::cout << *i << '\n';  // ok: prints 1 
    i = v.erase(i);   // i is overridden to the next position
    std::cout << *i << '\n';  // ok: prints 2 
    return 0;
}

Introduction to standard library algorithms

The functionality provided in the algorithms library generally fall into one of three categories:

Using std::find to find an element by value:

#include <algorithm>
#include <array>
#include <iostream>

int main() {
    std::array data{1,2,3,4,5};
    std::cout << "Enter a value to search for and replace with: ";
    int search{};
    int replace{};
    std::cin >> search >> replace;
    auto found{std::find(data.begin(), data.end(), search)};
    if (found == data.end())
        std::cout << "Could not find " << search << '\n';
    else
        *found = replace;  // yes it's that easy 
    for (int i : data)
        std::cout << i << ' ';
    std::cout << '\n';
    return 0;
}

Using std::find_if to find an element that matches some condition:

#include <algorithm>
#include <array>
#include <iostream>
#include <string_view>

bool contains_nut(std::string_view str) {
    // std::string_view::find returns std::string_view::npos if not found 
    return (str.find("nut") != std::string_view::npos);
}

int main() {
    std::array<std::string_view, 4> arr{"apple", "banana", "walnut", "lemon"};
    auto found{std::find_if(arr.begin(), arr.end(), contains_nut)};
    if (found == arr.end())
        std::cout << "no nuts\n";
    else
        std::cout << "found " << *found << '\n';
    return 0;
}

Using std::count and std::count_if to count how many occurrences there are in an array:

#include <algorithm>
#include <array>
#include <iostream>
#include <string_view>

bool contains_nut(std::string_view str) {
    // std::string_view::find returns std::string_view::npos if not found 
    return (str.find("nut") != std::string_view::npos);
}

int main() {
    std::array<std::string_view, 4> arr{"apple", "banana", "walnut", "lemon"};
    auto nuts {std::count_if(arr.begin(), arr.end(), contains_nut)};
    std::cout << "there are " << nuts << " nut(s)\n";
}

Using std::sort to do (custom) sort:

#include <algorithm>
#include <array>
#include <iostream>

bool greater(int a, int b) {
    return (a > b);  // true means first!
}

int main() {
    std::array arr{1,2,5,3,2};
    std::sot(arr.begin(), arr.end(), greater);  // this is gives DESCENDING order in result!
    for (auto i : arr)
        std::cout << i << ' ';
    std::cout << '\n';
    return 0;
}

Notice we can replace our greater() by the one provided by the C++:

std::sort(arr.begin(), arr.end(), std::greater{});  // greater is a type and thus need {} instantiation

Using std::for_each to do something to all elements of a container:

#include <algorithm>
#include <array>
#include <iostream>

void double_number(int& i) {
    i *= 2;
}

int main() {
    std::array arr{1,2,4,2,1};
    std::for_each(arr.begin(), arr.end(), double_number);
    for (auto i : arr)
        std::cout << i << ' ';
    std::cout << '\n';
    return 0;
}

Having to explicitly pass arr.begin() and arr.end() in the above algorithms is a bit annoying, but luckily in C++20 we have added ranges, which allows us to simply pass arr. This will make our code even shorter and more readable.

Timing your code

C++11 comes with some functionality in the <chrono> library to do some simple timing. We can encapsulate the timing inside a class to be used easily:

#include <chrono>  // for std::chrono 

class Timer {
private:
    using Clock = std::chrono::steady_clock;
    using Second = std::chrono::duration<double, std::ratio<1>>;
    std::chrono::time_point<Clock> m_beg {Clock::now()};

public:
    void reset() {
        m_beg = Clock::now();
    }

    double elapsed() const {
        return std::chrono::duration_cast<Second>(Clock::now() - m_beg).count();
    }
};

Then we can easily use it as follows:

#include <iostream>

int main() {
    Timer t;
    std::cout << "Time has elapsed " << t.elapsed() << " seconds\n";
    return 0;
}

There are three things that may impact timing:

Dynamic Allocation

Dynamic memory allocation with new and delete

C++ supports three basic types of memory allocation, of which you’ve already seen two:

Both static and automatic allocation have two things in common:

While we can use static/automatic allocation with a considerably large size at compile time that gauges on the maximum size of the variable, there are severa drawbacks:

Fortunately, we can address these problems by using dynamic memory allocation, which requests memory when needed from a much larger pool of memory called heap.

We can allocate a single variable dynamically using the new operator:

new int; // returns a pointer 
int* ptr { new int };  // dynamically allocate an integer and address it to a pointer
int* ptr { new int(5) };  // direct initialization 
int* ptr { new int{5} };  // uniform initialization

Note that accessing heap-allocated objects is generally slower than accessing stack-allocated objects because the compiler knows the address of stack-allocated objects and can go directly to the address. For heap-allocated ones, there are two steps: to get the address of the object (from the pointer) and to get the value at that address.

To delete a single variable from memory:

delete ptr;     // free the memory used by the object that ptr refers to 
ptr = nullptr;  // set ptr back to a null pointer

When operator new fails, a bad_alloc exception is thrown, which may cause a program termination if it’s not handled properly. We can use the following method to solve this:

int* ptr { new(std::nothrow) int };  // don't throw error if new fails; instead, assign ptr to nullptr 

which is kinda shady and thus not recommended. Alternatively we can explicitly “handle” the issue (we’ll cover actual exception handling later):

int* ptr { new(std::nothrow) int{} };
if (!value) {  // if value is nullptr
    std::cerr << "Could not allocate memory\n";
}

To free the memory referred by a pointer (even when it’s null already), we can simply

delete ptr;

Dynamically allocated memory stays allocated until it is explicitly deallocated or until the program ends, assuming your operating system does regular cleanup. This means we can sometimes accidentally cause memory leak by writing functions like this:

void do_something_stupid() {
    int* ptr { new int };
}

By constantly calling this function, we’re accumulating the memory allocated while deleting ptr as it’s out of scope once out of the function, hence the unreferencable memory will increase until the program ends.

Another situation when we cause memory leak is to re-assign a pointer, and to fix these problems we can always just delete the pointer before leaving.

Dynamically allocating arrays

We can dynamically allocate a C-style array too (for a dynamic std::array, you might as well just consider std::vector which is non-dynamically allocated). To allocate an array, we can

#include <cstddef>
#include <iostream>

int main() {
    std::cout << "Enter a positive integer: ";
    std::size_t len{};
    std::cin >> len;

    int* arr{ new int[len]{} };  // notice that length here doesn't need to be const!
    std::cout << "I just allocated an array of integers of length " << len << '\n';
    arr[0] = 5;
    delete[] arr;  // delete the whole array
    return 0;
}

In the above example, since we’re dynamically allocating a C-style array, there are several differences than our previously covered arrays:

Notice we’re writing int twice in the above example. In practice, we can also avoid that by

auto* arr { new int[len]{} };

When it comes to resizing the dynamic array – it’s typically recommended to just go for std::vector as C++ doesn’t have a built-in way to resize an array that’s been allocated.

Destructors

A destructor is another special kind of class member function that is executed when an object of that class is destroyed. Whereas constructors are designed to initialize a class, destructors are designed to help clean up. When your class is holding any resources (e.g. dynamic memory or a file or database handle), or if you need to do any kind of maintenance before the object is destroyed, the destructor is the perfect place to do so.

Like constructors, destructors have specific naming rules:

Below is a simple example:

#include <iostream>
#include <cassert>
#include <cstddef>

class IntArray {
private:
    int* m_array{};
    int m_len{};

public:
    IntArray(int len) {
        assert(len > 0);
        m_array = new int[static_cast<std::size_t>(len)]{};
        m_len = len;
    }

    ~IntArray() {
        delete[] m_array;
    }

    // other public member functions 
};

The above examle adopts a concept called RAII (Resource Acquisition Is Initialization) which states the resource (e.g. memory, file and database handles) acquisition should happen in constructors and resource releasing in destructors. This helps prevent resource leaks.

Lastly, take special note about std::exit() because no destructors will be called when you use std::exit() to terminate the program.

Pointers to pointers and dynamic multidimensional arrays

Since pointers are also objects, it’s natural to think that we can define a pointer that points to another pointer (though a bit tongue-twistery):

int val { 5 };           // just an int
int* ptr { &value };     // ptr is pointing at the address of val
int** ptrptr { &ptr};    // ptrptr is pointing at the address of ptr
std::cout << **ptrptr;   // prints 5

With pointers to pointers, we can define dynamic multidimensional arrays:

int rows { 5 };
int cols { 10 };
int** arr { new int*[rows] };
for (int i{}; i < cols; ++i)
    arr[i] = new int[cols];

Note that we can also make the array non-rectangular e.g. triangular, since we’re just iteratively allocating row arrays.

Void pointers

The void pointer, also known as the generic pointer, is a special type of pointer that can be pointed at objects of any data type. A void pointer is declared like any normal pointer with its type being void*:

int x{};
float y{};

struct S {
    int n;
    float f;
};

S s{};

void* ptr{};
ptr = &x;  // ok
ptr = &y;  // ok 
ptr = &s;  // ok!

However, since a void pointer doesn’t know the actual underlying type it’s pointing to, dereferencing a void pointer is illegal. Instead, it must be first cast to a specific pointer type before being dereferenced:

int x{5};
void* ptr{};
ptr = &x;
int* iptr{static_cast<int*>(ptr)};
std::cout << *iptr;

For the same reason, deleting a void poitner will result in undefined behavior and should be avoided unless you cast it to a specific typed pointer first. Also, pointer arithmetic is not allowed on void pointers as it doesn’t know what size the underlying object has.

There is no such thing as void reference, as a reference always has an underlying instance/object and thus knows the type.

Functions

Function pointers

Idenifier is a function’s name, but what type is the function? Functions have their own lvalue function types: the return type together with arguments make the type. Much like variables, functions live at an assigned address in memory. When we print the function itself via operator<<, the function pointer is implicitly converted to a bool and always prints 1. In some other cases, te compiler has an extension to prints the actual address of the function pointer. If it’s not done automatically, you can cast the function pointer to a void pointer to let the compiler know it’s the address you want to print:

#include <iostream>

int foo() {
    return 5;
}

int main() {
    std::cout << reinterpret_cast<void*>(foo) << '\n';
    return 0;
}

To define a function pointer:

int (*fcnPtr) ();  // notice the difference between this and below 
int* fcn ();    // this is just a function taking no args and returning int*

Only the first line defines fcnPtr as a function pointer. The second line is just a function.

To define a const function pointer:

int (*const fcnPtr) ();  // correct 
const int (*fcnPtr) ();  // wrong 

The first line defines a const function pointer. The second is a function pointer on a function returning const int.

To assign a function pointer to a function, it’s pretty much what we’ve been doing with pointers all the time:

fcnPtr = &foo;  // no parentheses!!

Same with calling a function via it’s pointer:

(*fcnPtr)(5);  // assuming function foo takes a single arg in int
fcnPtr(5);     // implicit dereference! works like magic

We can also pass a function pointer to other functions as arguments:

void selection_sort(int* array, int size, bool (*compare)(int, int));
void selection_sort(int* array, int size, bool compare(int, int));  // implicit conversion again!

Notice how on the second line above the compare function pointer gets implicitly converted for more succinct code. This only works when it’s a function argument inside another function. To make everything more neat, we can define type aliases for function pointers:

using CompareFunction = bool(*)(int, int);
void selection_sort(int* array, int size, CompareFunction compare);

Alternatively, we can also use the std::function class provided by the standard library in <functional>:

#include <functional>
void selection_sort(int* array, int size, std::function<bool(int, int)> compare);

The stack and the heap

The memory that a program uses is typically divided into a few different areas called segments:

Advantages and disadvantages of the heap:

Advantages and disadvantages of the stack:

Recursion

A recursive function is a function that calls itself. Check the following (poorly written) function:

void count_down(int count) {
    std::cout << "push " << count << '\n';
    count_down(count - 1);
}

The function calls itself indefinitely and thus likely will cause a stack overflow. When tail call (recursive calling itself at the end only) optimization happens in compiler, however, the proram will run forever instead of throwing a stack overflow error.

To remedy this problem, we can write the termination condition for the recursion:

void count_down(int count) {
    if (!count) return;
    std::cout << "push " << count << '\n';
    count_down(count - 1);
}

Command line arguments

When we want to utilize the command line arguments of a program, we can explicitly write main() in the following form:

int main(int argc, char* argv[]);  // or 
int main(int argc, char** argv);

Where argc is the count of command line arguments, and argv is the array of actual arguments passed.

When we want to convert the element in argv to numeric (since they’re always read as strings), we can

#include <sstream>  // for std::stringstream 
int num {};
std::stringstream convert { argv[1] };
convert >> num;

Ellipsis (and why to avoid them)

There are certain cases where it can be useful to be able to pass a variable number of parameters to a function, and C++ provides a special specifier known as ellipsis that allows us to do precisely that. Functions that use ellipsis take the form as below:

return_type function_name(argument_list, ...)

To use ellipsis, we need <cstdarg> header and use std::va_list to hold the values, std::va_start() to initialize it, and std::va_arg() to extract values from the std::va_list. For example, the following function calculates the average of variable count of numbers:

#include <iostream>
#include <cstdarg>  // needed for ellipsis 

double calc_average(int count, ...) {
    int sum{0};
    std::va_list list;           // to get values in ellipsis 
    std::va_start(list, count);  // the first is the target list to initialize, the second is the last non-ellipsis arg
    for (int arg{}; arg < count; ++arg)
        sum += std::va_arg(list, int);  // the first is the target list, the second is element type 
    std::va_end(list);           // clean up the va_list
    return static_cast<double>(sum) / count;
}

int main() {
    std::cout << calc_average(5, 1,2,3,4,5) << '\n';
    std::cout << calc_average(3, 5,3,1) << '\n';
    return 0;
}

The ellipsis is dangerous as it doesn’t do type checking at all when you extract the values using va_arg. For example, using the calc_average function above:

std::cout << calc_average(6, 1.0, 2, 3, 4, 5, 6) << '\n';

The output would be surprising:

1.78782e+008

This result epitomizes the phrase “garbage in, garbage out” which means a computer, unlike humans, will unquestioningly process whatever you want it to and potentially produce nonsentical output.

Another reason why ellipsis is dangerous, is that it doesn’t know the length of the input. To properly use it we much pass a count argument (like above) or use a sentinel value to terminate.

As a conclusion, it’s generally suggested to avoid using ellipsis altogether.

Introduction to lambdas (anonymous functions)

A lambda expression (also known as a lambda or closure) allows us to define an anonymous function inside another function. This nesting is important as it allows us to avoid namespace pollution and to define the function as close to where it is used as possible.

The syntax for lambdas in C++ is

[captureClause] (parameters) -> returnType { statements; }

The capture clause can be left empty if not needed. The paremeters can be left empty or even ommited entirely with brackets. The return type is optional and assumed auto if omitted. Therefore, the simplest lambda is just []{}.

Below is another example that passes a lambda to std::find_if:

#include <algorithm>
#include <array>
#include <iostream>
#include <string_view>

int main() {
    constexpr std::array<std::string_view, 4> arr{"apple", "banana", "walnut"};
    auto found{ std::find_if(
        arr.begin(),
        arr.end(),
        [](std::string_view str){ return str.find("nut") != std::string_view::npos; }
    ) };
    if (found == arr.end())
        std::cout << "No nuts\n";
    else
        std::cout << "Found " << *found << '\n';
    return 0;
}

For advanced readers: a lambda is not a function, but instead a special kind of object in compiler called a functor that contain an overloaded operator() that make them callable like a function.

If for some reason we want to “name” a lambda and save it to a variable, there are three ways:

#include <functional>

int main() {
    double (*add_numbers)(double, double) {
        [](double a, double b) {
            return a + b;
        }
    }; // to a function pointer 

    std::function add_number {  // note we can omit <double(double, double)> here since C++17
        [](double a, double b) {
            return a + b;
        }
    };  // to a std::function 

    auto add_number {
        [](double a, double b) {
            return a + b;
        }
    };
}

We can use auto for the parameters in a lambda to make our code simpler:

#include <algorithm>
#include <array>
#include <iostream>
#include <string_view>

int main() {
    constexpr std::array months {
        "January",
        "February",
        "March",
        "April",
        "May",
        "June",
        "July",
        "August",
        "September",
        "October",
        "November",
        "December"
    };

    const auto start_with_same_letter { std::adjacent_find(
        months.begin(),
        months.end(),
        [](const auto& a, const auto& b) { return a[0] == b[0]; }
    ) };

    if (start_with_same_letter != months.end()) {
        std::cout << *start_with_same_letter << " and "
                  << *std::next(start_with_same_letter)
                  << " start with the same letter\n";
    }
    return 0;
}

However, using auto can be dangerous as well in some cases, say

const auto five_letter_months { std::count_if(
    months.begin(),
    months.end(),
    [](std::string_view str) { return str.length() == 5; }
) };

If we use auto instead of str:string_view in the lambda, the inferred type would be const char* and thus a lot of functionalities would be lost.

In terms of constexpr lambdas: all lambdas are implicitly constexpr as of C++17 if the lambda has no captures (or all captures are constexpr) and calls no other functions (or all functions it calls are constexpr). Considering a lot of functions in standard library are made constexpr only since C++20, these conditions are likely not true until C++20.

One thing to be aware is that each generic lambda corresponds to a different type, meaning that we can define one lambda for different independent tasks:

int main() {
    auto count_print {
        [](auto value) {
            static int count{};
            std::cout << count << ": " << value << '\n';
        }
    };

    print("hello");
    print("world");
    print(1);
    print(2);
    print("bye");
    return 0;
}

and the output for above program is

0: hello
1: world 
0: 1
1: 2
2: bye

In terms of return type deduction: we need to make sure all possible return values are of the same type when we don’t specify the return type for a lambda, otherwise there will be a compile error:

auto divide {
    [](int x, int y, bool integer_division) {
        if (integer_division)
            return x / y;  // return type is int 
        else 
            return static_cast<double>(x) / y;  // error: return type doesn't match 
    }
};

To fix this problem, we need to either explicitly convert and make sure types match, or explicitly specify the return type and let the compiler implicitly convert types for us:

auto divide {
    [](int x, int y, bool integer_division) -> double {
        if (integer_division)
            return x / y;  // compiler would implicitly convert int to double for us 
        else
            return static_cast<double>(x) / y;
    }
};

Lastly, we don’t need to define many simple functions as lambdas because we have a bunch of them defined in <functional> provided by standard library. What differentiates them from lambdas is that they need to instantiated e.g. std::greater{} before being used like a function.

Lambda captures

Lambdas can only access global identifiers, entities that are known at compile time, and entities with static storage duration. That means local variables are not accessible by lambdas and thus we cannot “partially” pass values to a lambda. This brings up the concept of captures. The captuer clause is used to indirectly give a lambda access to variables available in the surrounding scope that it normally would not have access to. All we need to do is list the entities we want to access from within the lambda as part of the capture clause. For example:

...
std::string search{};
std::cin >> search;  // search is not available at compile time
auto found { std::find_if(
    arr.begin();
    arr.end();
    [search](std::string_view str) { return str.find(search) != std::string_view::npos; }  // check capture clause!
) };

Captures are essentially const copies of the original variables (instead of actual references, for example). While we can mark the lambda with specifier mutable so that the captures becomes non-const, they’re still just copies and thus won’t affect the original variables:

#include <iostream>

int main() {
    int ammo{10};
    auto shoot {
        [ammo]() mutable {
            --ammo;
            std::cout << "Pew! " << ammo << " shot(s) left\n";
        }
    };

    shoot();
    shoot();
    std::cout << ammo << " shot(s) left\n";
    return 0;
}

What the above prints is

Pew! 9 shot(s) left 
Pew! 8 shot(s) left 
10 shot(s) left

To actually capture a variable by reference, we can prepend the variable name with an ampersand &:

#include <iostream>

int main() {
    int ammo{10};
    auto shoot {
        [&ammo]() { // no need for mutable any more 
            --ammo;
            std::cout << "Pew! " << ammo << " shot(s) left\n";
        }
    };

    shoot();
    shoot();
    std::cout << ammo << " shot(s) left\n";
    return 0;
}

and the above prints

Pew! 9 shot(s) left
Pew! 8 shot(s) left 
8 shot(s) left

Multiple variables can be captured by suing commas to separate them. This can include a mix of captures by value and by reference. In the extreme case, a default capture (also known as a capture-default) captures all variables that are mentioned in the lambda. To capture all used variables by value, use =. To capture all used variables by reference, use &:

int main() {
    ...
    int width{};
    int height{};
    std::cin >> width >> height;  // width and height are not known at compile time 
    auto found { std::find_if(
        arr.begin(),
        arr.end(),
        [=](int area) { return width * height == area; }
    ) };  // width and height are captured by value to lambda
}

We can even define new variables inside capture:

auto found { std::find_if(
    arr.begin(),
    arr.end(),
    [specified_area{width * height}](int area) { return specified_area == area; };
) };

Operator Overloading

Introduction to operator overloading

In C++, operators are implemented as functions. By using function overloading on the operator functions, we can define our own version of the operators that work with different data types.

There are some limitations on operator overloading:

Overloading the arithmetic operators using friend functions

There are three different ways to overload operators:

In this lesson we focus on the friend function way because it’s the most intuitive for binary operators.

Say we have a custom class storing how many cents of money we have and we’d like to overload operator+ for multiple cents instances:

#include <iostream>

class Cents {
private:
    int m_cents{};

public:
    Cents(int cents) : m_cents { cents } {}
    friend Cents operator+(const Cents& c1, const Cents& c2);
    int getCents() const { return m_cents; }
};

Cents operator+(const Cents& c1, const Cents& c2) {
    return c1.m_cents + c2.m_cents;
} // we can also define the friend function inside the class

int main() {
    Cents c1{6};
    Cents c2{4};
    Cents csum{c1 + c2};
    std::cout << "I have " << csum.getCents() " cents in total\n";
    return 0;
}

We can also overload other arithmetic operators and even for mixed types. Also, we can call overloaded operators when defining another friend function, e.g. overload operator- using operator+.

Overloading operators using normal functions

When we don’t need the access to private members, we can just define the overloaded operator as a normal member function:

Cents operator+(const Cents& c1, const Cents& c2) {
    return Cents{ c1.getCents() + c2.getCents() };
}

The one difference between a friend function versus a normal function in this case, besides the accessibility, is that when we declare the class, we declare the friend functions in the header file as well, and that serves as a prototype inherently. For a normal function, we need to provide this prototype/declaration inside the header ourselves to make it explicit that this overload exists for the users of the header file.

Overloading the I/O operators

Overloading operator<< is similar to overloading operator+ since they’re both binary operators, except that the parameter types are a bit different:

// std::ostream is the type of std::cout 
friend std::ostream& operator<<(std::ostream& out, const Type& obj);

Overloading operator>> is done in a manner analogous to overloading operator>>:

// std::istream is the type of std::cin
friend std::istream& operator>>(std::istream& in, Type& obj);

Overloading operators using member functions

Overloading operators using a member function is very similar to overloading operators using a friend function. When overloading an operator using a member function:

Re-using the previous example where we implemented using friend functions:

class Cents {
private:
    int m_cents{};

public:
    Cents(int cents) : m_cents {cents } {};
    Cents operator+(int value) const;
    int getCents() const {return m_cents;}
};

Cents Cents::operator+(int value) const {
    return Cents{m_cents + value};
}

Since they’re just so similar, how on earth should we decide whether we’re doing the friend function or member function way? Well, there are a few more things to note:

Typically, the following are some rules of thumb to decide which overloading we should define:

Overloading unary operators +, - and !

A simple example of how we might define operator- on the Cents class:

Cents Cents::operator-() const {
    return Cents{-m_cents};
}

Overloading the comparison operators

Because comparison operators are all binary and not modifying the left operands, we can define them as friend functions (or normal if possible):

class Car {
private:
    std::string m_make;
    std::string m_model;

public:
    Car(std::string_view make, std::string_view model)
        : m_make{make}, m_model{model} {
    }

    friend bool operator==(const Car& c1, const Car& c2);
    friend bool operator!=(const Car& c1, const Car& c2);
};

bool operator==(const Car& c1, const Car& c2) {
    return (c1.m_make == c2.m_make) &&   // & is bitwise/logical; && is logical only
           (c1.m_model == c2.m_model);
}

bool operator!=(const Car& c1, const Car& c2) {
    return !(c1 == c2);
}

As suggested above, we can minimize the comparative redundancy by

C++20 introduces the spaceship operator operator<=> which allows us to reduce the number of comparison functions to implement to just 2 or even 1:

Overloading the increment and decrement operators

Since increment and decrement are urary and modifying the operand, they’re best implemented as member functions:

class Digit {
private:
    int m_digit{};
public:
    Digit(int digit) : m_digit {digit} {}
    Digit& operator++();     // prefix
    Digit& operator--();     // prefix
    Digit& operator++(int);  // postfix
    Digit& operator--(int);  // postfix
};

Digit& Digit::operator++() {
    if (m_digit == 9)
        m_digit = 0;
    else
        ++m_digit;
    return *this;
}

Digit& Digit::operator--() {
    if (m_digit == 0)
        m_digit = 9;
    else 
        --m_digit;
    return *this;
}

Digit& Digit::operator++(int) {
    Digit temp{*this};
    ++(*this);
    return temp;
}

Digit& Digit::operator--(int) {
    Digit temp{*this};
    --(*this);
    return temp;
}

Notice how we differentiate the prefix and postfix versions with the parameter int: when there’s an int parameter, C++ will deem the function to be a postfix instead of prefix.

Overloading the subscript operator

We can overload the subscript operator [] in C++ to allow intuitive access to elements in the private member list:

#include <iostream>

class IntList {
private:
    int m_list[10]{};
public:
    int& operator[] (int index) {
        return m_list[index];
    }
};

int main() {
    IntList list{};
    list[2] = 3;
    std::cout << list[2] << '\n';  // prints 3
    return 0;
}

Notice operator[] need to return by (const, if needed) reference cuz otherwise when we run list[2] = 3, we are essentially evaluating 6 = 3 and will cause a compile error. When we need functionality for const list subscription, we can define a const version of the overload.

When we need to implement the logics for both const and non-const versions of the overload function, what we can do to save the amount of duplicate coding is to put majority of the function body into another function that gets called by both versions.

In C++23, we have an even simpler solution utilizing several new features:

#include <iostream>

class IntList {
private:
    int m_list{1,2,3,4,5};
public:
    auto&& operator[](this auto&& self, int index) {
        return self.m_list[index];
    }  // && and self to differentiate const vs non-const
};

int main() {
    IntList list{};
    list[2] = 3;    // ok 
    const IntList clist{};
    clist[2] = 3;   // compile error
    std::cout << clist[2] << '\n';
    return 0;
}

Note that we don’t mix pointers and overloaded subscriptions:

IntList* list { new IntList{} };
list[2] = 3;   // compile error

Overloading the parenthesis operator

The parenthesis operator must be implemented as a member function. Taking the following as an example:

class Matrix {
private:
    double data[4][4]{};

public:
    double& operator()(int row, int col);
    double operator()(int row, int col) const;
};

double& Matrix::operator()(int row, int col) {
    assert(row >= 0 && row < 4);
    assert(col >= 0 && col < 4);
    return m_data[row][col];
}

double Matrix::operator()(int row, int col) const {
    assert(row >= 0 && row < 4);
    assert(col >= 0 && col < 4);
    return m_data[row][col];
}

int main() {
    Marix matrix;
    matrix(1, 2) = 4.5;
    return 0;
}

The reason why we didn’t implement something like matrix[1][2] is because the second subscript operator is harder to implement than just using the parenthesis.

The operator() is often overloaded to implement functors (also known as function objects), which are classes that behave like functions except that they store data in member variables:

#include <iostream>

class Accumulator {
private:
    int m_counter {};
public:
    int operator()(int i) { return (m_counter += i); }
    void reset() { m_counter = 0; }
};

int main() {
    Accumulator acc{};
    std::cout << acc(1) << '\n';    // prints 1
    std::cout << acc(3) << '\n';    // prints 4
    Accumulator acc2{};
    std::cout << acc2(10) << '\n';  // prints 10
    std::cout << acc2(20) << '\n';  // prints 30
    return 0;
}

Overloading typecasts

C++ inherently converts an int variable into a Cents object as shown above, we can also “explicitly” define the behavior of typecasting from a Cents into an int:

class Cents {
private:
    int m_cents {};
public:
    Cents (int cents = 0) : m_cents {cents} {};
    operator int() const {return m_cents;}
    // ...
};

Note we used the word “explicit” in quotes above, because the behavior we defined would guide C++ how to implicitly convert a Cents object into an int. We can also force users to actually “explicitly” perform this typecast:

explicit operator int() const {return m_cents; }

and by specifying explicit like this, users can only convert using static_cast<int>(cents) if needed.

Overloading the assignment operator

The purpose of the copy constructor and the copy assignment operator are almost equivalent – both copy one object to another. However, the copy constructor initialiezs new objects, whereas the assignment operator replaces the contents of existing objects. In sum:

We can overload the assignment operator as follows:

Cents& operator=(const Cents& cents);

Cents& Cents::operator=(const Cents& cents) {
    m_cents = cents.m_cents;
    return *this;
}

For simple cases, this implementation works. For cases where dynamic memory allocation is expected, we might want to handle self-assignment explicitly. Say we have a custom MyString class that does dynamic memory allocation:

MyString& MyString::operator=(const MyString& str) {
    if (this == &str)
        return *this;  // important: avoid self-assignment
    if (m_data)
        delete[] m_data;  // important: delete existing variable and free its memory 
    m_length = str.m_length;
    m_data = nullptr;
    if (m_length) 
        m_data = new char[static_cast<std::size_t>(str.m_length)];
    std::copy_n(str.m_data, m_length, m_data);  // copy m_length of str.m_data into m_data
    return *this;
}

The compiler provides an implicit copy assignment operator by default and you can optionally avoid this behavior by making it private or using the delete keyword:

Cents& operator=(const Cents& cents) = delete;

Shallow vs deep copying

Because C++ doesn’t know much about your class, the default copy constructor and default assignment operator use a copying method known as memberwise copy (also known as shallow copy). This means C++ copies each member of the class individually (using the assignment operator for overloaded operator=; using direct initialization for the copy constructor).

This works until it doesn’t – when designing classes that handle dynamically allocated memory, memberwise/shallow copying can get us in a lot of troubles. This is because shallow copies of a pointer is just a copy of the pointer itself without actually allocating any memory or copying the content.

We can solve this by doing a deep copy, which allocates memory for the copy and then copies the actual value, so that the copy lives in distinct memory from the source.

void MyString::deep_copy(const MyString& str) {
    delete[] m_data;  // delete existing variable and free its memory 
    m_length = str.m_length;
    if (str.m_data) {
        m_data = new char[m_length];
        for (int i{}; i < m_length; ++i)
            m_data[i] = str.m_data[i];
    } else
        m_data = nullptr;
}

MyString::MyString(const MyString& str) {
    deep_copy(str);
}  // copy constructor 

MyString& MyString::operator=(const MyString& str) {
    if (this != &source)
        deep_copy(str);
    return *this;
}  // copy assignment operator

This is exactly the reason why we have the rule of three, namely whenever you need to define a destructor, a copy constructor or a copy assignment operator, you will eventually realize that you need to define all three of them, since very likely you’re handling dynamic memory allocation.

Move Semantics and Smart Pointers

Introduction to smart pointers and move semantics

There are a myriad of ways where a pointer gets allocated but not properly deleted/deallocated, say we return early or throw an exception before deleting a pointer. To avoid this kind of situation for good, there we have smart pointers:

#include <iostream>

template <typename T>
class AutoPtr {
private:
    T* m_ptr;

public:
    AutoPtr(T* ptr=nullptr) : m_ptr(ptr) {};
    ~AutoPtr() { delete m_ptr; }
    T& operator*() const { return *m_ptr; }
    T* operator->() const { return m_ptr; }
};

class Resource {
public:
    Resource() { std::cout << "Resource constructed\n"; }
    ~Resource() { std::cout << "Resource destructed\n"; }
};

int main() {
    AutoPtr<Resource> res(new Resource());
    return 0;
}

The above program prints

Resource constructed 
Resource destructed 

As long as the AutoPtr is defined locally, it will be properly deleting the internal pointer and memory. Relatively, the default built-in pointers are sometimes called “dumb pointers” because they can’t clean up after themselves.

One problem of the above implementation is that it has no properly defined copy constructor and copy assignment operator. That means when we pass an AutoPtr to another AutoPtr in copy construction, the original pointer would be deleted without notice:

AutoPtr<Resource> ptr1(new Resource());
AutoPtr<Resource> ptr2(ptr1);

We will see one constructed notice and two destructed notices – on the same resource object. To fix this problem, we can define explicity the copy constructor and copy assignment operator to both use references and thereby avoid the copies altogether, but how about returning an AutoPtr from a function?

??? get_auto_ptr() {
    ...
    return AutoPtr(res);
}

We cannot return by copy as it will trigger the same problem as in the copy constructor (multiple pointers to the same resource and destructing the same memory multiple times), neither can we return by reference because the AutoPtr object will be destructred once leaving the local function scope and the user will be left with a dangling reference, same reason for not returning by a pointer to AutoPtr.

This is why we need the move semantics, which is basically transfering the ownership between objects rather than making copies:

AutoPtr(AutoPtr& ap) {
    m_ptr = ap.m_ptr;
    ap.m_ptr = nullptr;
}

AutoPtr& operator=(AutoPtr& ap) {
    if (&ap != this) { // avoid self assignment 
        delete m_ptr;
        m_ptr = ap.m_ptr;
        ap.m_ptr = nullptr;
    }
    return *this;
}

In C++98 there introduced std::auto_ptr which was later removed in C++17 for a few reasons:

We should use std::unique_ptr and std::shared_ptr moving forward, which are introduced in C++11.

R-value references

Let’s go over some recap on the value categories. Prior to C++11, only one type of reference existed in C++, and so it was just called a reference. Starting from C++11, it is instead called l-value reference which can only be initialized with modifiable l-values:

l-value reference can be initialized with can modify
modifiable l-values yes yes
non-modifiable l-values no no
r-values no no

L-value references to const objects can be initialized with modifiable and non-modifiable l-values and r-values alike. However, those values can’t be modified:

l-value reference to const can be initialized with can modify
modifiable l-values yes no
non-modifiable l-values yes no
r-values yes no

C++11 adds a new type of reference called an r-value reference. An r-value reference is a reference that is designed to be initialized with an r-value (only). While an l-value reference is created using a single ampersand, an r-value reference is created using a double ampersand:

int x{5};       // l-value 
int& lref{x};   // l-value ref on a l-value 
int&& rref{5};  // r-value ref on an r-value 

R-value references cannot be initialized with l-values.

r-value reference can be initialized with can modify
modifable l-values no no
non-modifiable l-values no no
r-values yes yes

And for r-value reference to const:

r-value reference to const can be initialized with can modify
modifiable l-values no no
non-modifiable l-values no no
r-values yes no

These r-value references have two properties that are quite useful:

When we want to differentiate the behavior of an l-value vs r-value parameter of a function:

void fun(const int& lref) {
    // do something
}

void fun(const int&& rref) {
    // do something 
}

int main() {
    int x{5};
    fun(x);      // x passed as lref  
    fun(5);      // 5 passed as rref
    return 0;
}

Move constructors and move assignments

We can define move constructor and move assignment like below:

// not const cuz we need to move the ownership from the original ptr to current
AutoPtr(AutoPtr&& a) noexcept : m_ptr(a.m_ptr) { a.m_ptr = nullptr; }
AutoPtr& operator=(AutoPtr&& a) noexcept {
    if (&a != this) {
        delete m_ptr;
        m_ptr = a.m_ptr;
        a.m_ptr = nullptr;
    }
    return *this;
}

The compiler will create an implicit move constructor and move assignment operator if all of the following are true:

For the above reasons we’re better off to delete the definitions of default copy constructor and assignment operator using =delete. Also it’s worth noting that the compiler will not generate an implicit move constructor when we delete the copy constructor, and thus it’s better to be very explicit what behavior we need from the compiler about the move constructor and move assignment operators, using either =delete or =default (or our own definition).

std::move

There are cases where we want to invoke the move semantics but the compiler does copying for us because the object is an l-value instead of r-value, and thus the double ampersand r-value reference isn’t matched. For example:

#include <iostream>
#include <string>

template<class T>
void my_swap_function(T& a, T& b) {
    T tmp{a};
    a = b;
    b = tmp;
}

int main() {
    std::string x{"abc"};
    std::string y{"de"};
    std::cout << "x: " << x << '\n';
    std::cout << "y: " << y << '\n';
    my_swap_function(x, y);
    std::cout << "x: " << x << '\n';
    std::cout << "y: " << y << '\n';
    return 0;
}

which prints

x: abc 
y: de 
x: de 
y: abc

So this program works as expected except that a bunch of copies are made. Apparently it isn’t necessary now that we know about move semantics. The problem is that both a and b are l-values and thus we cannot explicitly invoke move instead of copy. To solve this problem, we have std::move from <utility> library:

#include <iostream>
#include <string>
#include <utility>

template <class T>
void my_swap_function(T&a, T&b) {
    T tmp{ std::move(a) };
    a = std::move(b);
    b = std::move(tmp);
}

// same as previous program 

Instead of making a copy of a, b and tmp, we use std::move to convert l-value variables into r-values, and since the parameter is an r-value, the move semantics are invoked to avoid expensive copying.

Another example when filling elements of a container:

#include <iostream>
#include <string>
#include <utility> // for std::move
#include <vector>

int main() {
 std::vector<std::string> v;
 // We use std::string because it is movable (std::string_view is not)
 std::string str { "Knock" };
 std::cout << "Copying str\n";
 v.push_back(str); // calls l-value version of push_back, which copies str into the array element
 std::cout << "str: " << str << '\n';
 std::cout << "vector: " << v[0] << '\n';
 std::cout << "\nMoving str\n";
 v.push_back(std::move(str)); // calls r-value version of push_back, which moves str into the array element
 std::cout << "str: " << str << '\n'; // The result of this is indeterminate
 std::cout << "vector:" << v[0] << ' ' << v[1] << '\n';
 return 0;
}

The above program prints:

Copying str
str: Knock
vector: Knock

Moving str
str:
vector: Knock Knock

std::move is also useful when sorting an array of elements and moving contents managed by one smart pointer to another.

std::unique_ptr

C++11 standard library ships with 4 smart pointer classes: std::auto_ptr (removed in C++17 for reason mentioned before), std::unique_ptr, std::shared_ptr and std::weak_ptr. Among these, std::unique_ptr is by far the most commonly used smart pointer class and we’ll cover that one in this section.

std::unique_ptr (provided in <memory>) can be seen as a replacement for std::auto_ptr. It should be used to manage any dynamically allocated object that is not shared by multiple objects. That is, it can and should completely own the object it manages and cannot share the ownership with other classes.

#include <iostream>
#include <memory>

class Resource {
public:
    Resource() { std::cout << "Acquired\n"; }
    ~Resource() { std::cout << "Destroyed\n"; }
};

int main() {
    std::unique_ptr<Resource> res{ new Resource() };
    return 0;
}

The above program prints

Acquired 
Destroyed 

Since std::unique_ptr, just like other smart pointers, is allocated on the stack here, whenever it exits its scope, it properly destroys the resource it’s managing. Better than std::auto_ptr, it also implements move semantics by default:

// same definition of Resource 

int main() {
    std::unique_ptr<Resource> res1 { new Resource() };
    std::unique_ptr<Resource> res2 {};  // initialized as nullptr 
    std::cout << "res1 is " << (res1 ? "not null\n" : "null\n");
    std::cout << "res2 is " << (res2 ? "not null\n" : "null\n");
    // res2 = res1;         // Won't compile: copy assignment is disabled
    res2 = std::move(res1); // res2 assumes ownership, res1 is set to null
    std::cout << "Ownership transferred\n";
    std::cout << "res1 is " << (res1 ? "not null\n" : "null\n");
    std::cout << "res2 is " << (res2 ? "not null\n" : "null\n");
    return 0;
}

The above prints

Acquired
res1 is not null
res2 is null
Ownership transferred
res1 is null
res2 is not null
Destroyed

std::unique_ptr has * and -> overloaded, and has a cast to bool that returns true if the pointer is managing a resource:

if (res) {
    std::cout << *res << '\n';
}

Another improvement from std:auto_ptr is that std::unique_ptr knows when to use scalar vs array delete, and hence it’s ok to use std::unique_ptr with array objects. However, it’s almost always better to just use std::array or std::vector than using std::unique_ptr with a fixed, dynamic or C-style array.

Additionally, in C++14 we have a new function named std::make_unique which constructs an object of the template type and initialize it with the arguments passed into the function:

std::unique_ptr<Resource> create_resource() {
    return std::make_unique<Resource>();
}

std::make_unique distinguishes T from T[] and T[N], and avoids the using of new altogether, so it’s pretty much always preferred than std::unique_ptr explicitly. Also, it’s just more succinct:

f(std::unique_ptr<MyClass>(new MyClass(param)), g());
f(std::make_unique<MyClass>(param), g());

Notice we don’t return std::unique_ptr by pointer or reference in most cases. Also, don’t let multiple std::unique_ptr manager the same resource (apparently). Don’t delete the resourse for your std::unique_ptr.

std::shared_ptr

Unlike std::unique_ptr, which is designed to singly own and manage a resource, std::shared_ptr is meant to solve the case where you need multiple smart pointers co-owning a resource. std::shared_ptr keeps track of how many of them are sharing ownership on the same resource and won’t deallocate the memory until the last std::shared_ptr goes out of scope.

// assuming same definition of Resource 

int main() {
    Resource* res{ new Resource };
    std::shared_ptr<Resource> ptr1{ res };
    {
        std::shared_ptr<Resource> ptr2{ ptr1 };  // share ownership from ptr1 to ptr2 
        std::cout << "Killing one shared pointer\n";
    }
    std::cout << "Killing another shared pointer\n";
    return 0;
}

This prints:

Acquired 
Killing one shared pointer  
Killing another shared pointer 
Destroyed 

Notice we’re sharing the ownership using a copy constructor instead of below

int main() {
    Resource* res{ new Resource };
    std::shared_ptr<Resource> ptr1{ res };
    {
        std::shared_ptr<Resource> ptr2{ res };  // set ownership independently 
        std::cout << "Killing one shared pointer\n";
    }
    std::cout << "Killing another shared pointer\n";
    return 0;
}

which prints

Acquired 
Killing one shared pointer  
Destroyed 
Killing another shared pointer 
Destroyed 

and the program will crash after the second Destroyed, as the same resource has now been deallocated twice.

Similar to std::make_unique, we have std::make_shared for more succinct coding:

int main() {
    auto ptr1 { std::make_shared<Resource>() };
    {
        auto ptr2 { ptr1 };
        std::cout << "Killing one pointer\n";
    }
    std::cout << "Killing another pointer\n";
    return 0;
}

Simpler, and more importantly safer, as there’s now no way to independently create ptr2 based on res.

It’s worth knowing that a std::unique_ptr can be converted to a std::shared_ptr using a special constructor but not vice versa.

std::weak_ptr

Check the following example:

#include <iostream>
#include <memory>
#include <string>

class Person {
private:
    std::string m_name{};
    std::shared_ptr<Person> m_partner{};  // default to nullptr 
public:
    Person(const std::string& name) : m_name(name) { std::cout << name << " is born\n"; }
    ~Person() { std::cout << m_name << " is dead\n"; }
    friend bool marry(std::shared_ptr<Person>& p1, std::shared_ptr<Person>& p2) {
        if (!p1 || !p2) {
            return false;
        }

        p1->m_partner = p2;
        p2->m_partner = p1;
        std::cout << p1->m_name << " marries " << p2->m_name << '\n';
        return true;
    }
};

int main() {
    auto allen { std::make_shared<Person>("Allen") };
    auto christine { std::make_shared<Person>("Christine") };
    marry(allen, christine);
    return 0;
}

The above program will print

Allen is born
Christine is born 
Allen marries Christine 

But that’s it! No follow-up deallocations will happen because when the destructor tries to destroy allen->m_partner, it realizes that chris is still owning the underlying resource and thus skips. Same happens when chris->m_partner skips to deallocate the memory for the resource co-owned by allen. As a result, this circular reference causes buggy deallocation despite the using of std::shared_ptr.

std::weak_ptr was designed to solve this problem and the only thing we need to change was to declare the m_partner member variable as a std::weak_ptr:

    std::weak_ptr<Person> m_partner{};  // default to nullptr 

and the same program will now print

Allen is born
Christine is born
Allen marries Christine
Christine is dead
Allen is dead

One drawback of std::weak_ptr is that it’s just an observer and has no -> overload. In order to access its resource, we need to convert a std::weak_ptr into a std::shared_ptr using lock member function. For example:

const std::shared_ptr<Person> get_partner() const {
    return m_partner.lock();  // otherwise it's not a shared_ptr
}

Unlike std::shared_ptr which will keep the underlying resource alive, a std::weak_ptr won’t and might become a dangling pointer. Luckily, it still got something that’s better than a dumb pointer, namely the boolean expire member function which tells whether the reference count to an object has become zero:

std::weak_ptr<Resource> getWeakPtr() {
    auto ptr{ std::make_shared<Resource>() };
    return std::weak_ptr<Resource>{ ptr };
} // ptr goes out of scope, Resource destroyed

Resource* getDumbPtr() {
    auto ptr{ std::make_unique<Resource>() };
    return ptr.get();
} // ptr goes out of scope, Resource destroyed

int main() {
    auto dumb{ getDumbPtr() };
    std::cout << "Our dumb ptr is: " << ((dumb == nullptr) ? "nullptr\n" : "non-null\n");
    auto weak{ getWeakPtr() };
    std::cout << "Our weak ptr is: " << ((weak.expired()) ? "expired\n" : "valid\n");
    return 0;
}

and it prints

Resource acquired
Resource destroyed
Our dumb ptr is: non-null
Resource acquired
Resource destroyed
Our weak ptr is: expired

Object Relationships

Composition and Aggregation

The process of building complex objects from simpler ones is called composition. There are two basic subtypes of object composition: composition and aggregation.

To qualify a composition, an object and a part must have the following relationship:

A real-world example of composition would be the relationship between a person’s body and a heart.

An aggregation must satisfy the following relationships:

A good example would be a person versus their address. Since multiple people can share the same address without the address being managed by the homeowners/leasers, it’s an aggregation.

A further summary:

Compositions:

Aggregations:

Notice that we cannot initialize a list/array of references (because list elements must be assignable while references can’t)

std::vector<const Teacher&> m_teachers{};  // illegal

What we can do is to use std::reference_wrapper provided in the <functional> header which has a get member function to unwrap and get the underlying reference:

#include <functional>
#include <iostream>
#include <vector>
#include <string>

int main() {
    std::string tom{"Tom"};
    std::string berta{"Berta"};
    std::vector<std::reference_wrapper<std::string>> names {tom, berta};  // list of refs 
    std::string jim{"Jim"};
    names.emplace_back(jim);  // convert to ref wrapper and push to the end 
    for (auto name: names) {
        name.get() += " Beam";
    }
    std::cout << jim << '\n';  // prints Jim Beam 
    return 0;
}

Association

To qualify an association, an object and another object must have the following relationship:

The relationship between doctors and patients is a great example of an association. The doctor clearly has a relationship with his patients, but conceptually it’s not a part/whole relationship and thus is not considered object composition. A doctor can see multiple patients and vice versa. There is naturally a circular dependency here and we need to take care how we implement this kind of relationships:

#include <functional>
#include <iostream>
#include <string>
#include <string_view>
#include <vector>

class Patient;  // forward declaration

class Doctor {
private:
    std::string m_name{};
    std::vector<std::reference_wrapper<const Patient>> m_patients{};

public:
    explicit Doctor(std::string_view name) : m_name(name) {
    }

    void add_patient(Patient& patient);
    friend std::ostream& operator<<(std::ostream& os, const Doctor& doctor);

    [[nodiscard]] const std::string& get_name() const {
        return m_name;
    }
};

class Patient {
private:
    std::string m_name{};
    std::vector<std::reference_wrapper<const Doctor>> m_doctors{};

    void add_doctor(const Doctor& doctor) {
        m_doctors.emplace_back(doctor);
    }  // this needs to be private because we always prefer Doctor.add_patient

public:
    explicit Patient(std::string_view name) : m_name(name) {
    }

    friend std::ostream& operator<<(std::ostream& os, const Patient& patient);

    [[nodiscard]] const std::string& get_name() const {
        return m_name;
    }

    friend void Doctor::add_patient(Patient& patient);  // so that it can access Patient.add_doctor
};

void Doctor::add_patient(Patient& patient) {
    m_patients.emplace_back(patient);
    patient.add_doctor(*this);
}

std::ostream& operator<<(std::ostream& os, const Doctor& doctor) {
    if (doctor.m_patients.empty()) {
        os << doctor.m_name << " has no patients right now\n";
        return os;
    }

    os << doctor.m_name << " is seeing these patients:\n";
    for (const auto patient : doctor.m_patients) {
        os << "- " << patient.get().get_name() << '\n';
    }
    return os;
}

std::ostream& operator<<(std::ostream& os, const Patient& patient) {
    if (patient.m_doctors.empty()) {
        os << patient.m_name << " has no doctors right now\n";
        return os;
    }

    os << patient.m_name << " is seeing these doctors:\n";
    for (const auto doctor : patient.m_doctors) {
        os << "- " << doctor.get().get_name() << '\n';
    }
    return os;
}

int main() {
    Patient dave{"Dave"};
    Patient frank{"Frank"};
    Patient betsy{"Besty"};
    Doctor james{"James"};
    Doctor scott{"Scott"};

    james.add_patient(dave);
    james.add_patient(frank);
    scott.add_patient(betsy);

    std::cout << james << '\n';
    std::cout << scott << '\n';
    return 0;
}

Below is a summary table of the three kinds of relationships we’ve talked about by far:

Property Composition Aggregation Association
relationship type whole/part whole/part otherwise unrelated
members can belong to multiple classes no yes yes
members’ existence managed by class yes no no
directionality undirectional undirectional undirectional or bidirectional
relationship verb part of has-a uses-a

Dependencies

A dependency occurs when one object invokes another object’s functionality in order to accomplish some specific task. This is a weaker relationship than an association, but still, any change to object being depended upon may break functionality in the (dependent) caller. A dependency is always an undirectional relationship.

In C++, associations are a relationship where one class always directly or indirectly “links” to the associated class as a member, namely one class knows its members. Dependencies, on the other hand, are not memberships and they are typically instantiated as needed only.

Container classes

A container class is a class designed to hold and organize multiple instances of another type (either another class, or a fundamental type). There are many different kinds of container classes, each of which has various advantages, disadvantages and restrictions in their use. By far the most commonly used container in programming is the array, which you have already seen in many examples. Although C++ has built-in array functionality, programmers will often use an array container class (std::array or std::vector) instead because of the additional benefits they provide. Unlike built-in arrays, array container classes generally provide dynamic resizing (when elements are added or removed), remember their size when they are passed to functions, and do bounds-checking. This not only mankes array container classes more convenient than normal arrays, but safer too.

Most well-defined containers provide these functionalities:

Sometimes certain classes omit some of these functionalities: for example, arrays often omit insert and remove functionalities because they are slow and not encouraged. Below is an example of an IntArray container class:

// IntArray.h
#ifndef INTARRAY_H
#define INTARRAY_H

#include <algorithm>  // for std::copy_n
#include <cassert>    // for assert()

class IntArray {
private:
    int m_length{};
    int* m_data{};

public:
    IntArray() = default;

    IntArray(IntArray&&) = delete;
    IntArray& operator=(IntArray&&) = delete;

    explicit IntArray(int length) : m_length{length} {
        assert(length >= 0);

        if (length > 0) {
            m_data = new int[length]{};
        }
    }

    ~IntArray() {
        delete[] m_data;
        // we don't need to set m_data to null or m_length to 0 here, since the object will be destroyed immediately
        // after this function anyway
    }

    IntArray(const IntArray& a) {
        // Set the size of the new array appropriately
        reallocate(a.get_length());
        std::copy_n(a.m_data, m_length, m_data);  // copy the elements
    }

    IntArray& operator=(const IntArray& a) {
        // Self-assignment check
        if (&a == this) {
            return *this;
        }

        // Set the size of the new array appropriately
        reallocate(a.get_length());
        std::copy_n(a.m_data, m_length, m_data);  // copy the elements

        return *this;
    }

    void erase() {
        delete[] m_data;
        // We need to make sure we set m_data to nullptr here, otherwise it will
        // be left pointing at deallocated memory!
        m_data = nullptr;
        m_length = 0;
    }

    int& operator[](int index) {
        assert(index >= 0 && index < m_length);
        return m_data[index];
    }

    // reallocate resizes the array.  Any existing elements will be destroyed.  This function operates quickly.
    void reallocate(int newLength) {
        // First we delete any existing elements
        erase();

        // If our array is going to be empty now, return here
        if (newLength <= 0) {
            return;
        }

        // Then we have to allocate new elements
        m_data = new int[newLength];
        m_length = newLength;
    }

    // resize resizes the array.  Any existing elements will be kept.  This function operates slowly.
    void resize(int newLength) {
        // if the array is already the right length, we're done
        if (newLength == m_length) {
            return;
        }

        // If we are resizing to an empty array, do that and return
        if (newLength <= 0) {
            erase();
            return;
        }

        // Now we can assume newLength is at least 1 element.  This algorithm
        // works as follows: First we are going to allocate a new array.  Then we
        // are going to copy elements from the existing array to the new array.
        // Once that is done, we can destroy the old array, and make m_data
        // point to the new array.

        // First we have to allocate a new array
        int* data{new int[newLength]};

        // Then we have to figure out how many elements to copy from the existing
        // array to the new array.  We want to copy as many elements as there are
        // in the smaller of the two arrays.
        if (m_length > 0) {
            int elements_to_copy{(newLength > m_length) ? m_length : newLength};
            std::copy_n(m_data, elements_to_copy, data);  // copy the elements
        }

        // Now we can delete the old array because we don't need it any more
        delete[] m_data;

        // And use the new array instead!  Note that this simply makes m_data point
        // to the same address as the new array we dynamically allocated.  Because
        // data was dynamically allocated, it won't be destroyed when it goes out of scope.
        m_data = data;
        m_length = newLength;
    }

    void insert_before(int value, int index) {
        // Sanity check our index value
        assert(index >= 0 && index <= m_length);

        // First create a new array one element larger than the old array
        int* data{new int[m_length + 1]};

        // Copy all of the elements up to the index
        std::copy_n(m_data, index, data);

        // Insert our new element into the new array
        data[index] = value;

        // Copy all of the values after the inserted element
        std::copy_n(m_data + index, m_length - index, data + index + 1);

        // Finally, delete the old array, and use the new array instead
        delete[] m_data;
        m_data = data;
        ++m_length;
    }

    void remove(int index) {
        // Sanity check our index value
        assert(index >= 0 && index < m_length);

        // If this is the last remaining element in the array, set the array to empty and bail out
        if (m_length == 1) {
            erase();
            return;
        }

        // First create a new array one element smaller than the old array
        int* data{new int[m_length - 1]};

        // Copy all of the elements up to the index
        std::copy_n(m_data, index, data);

        // Copy all of the values after the removed element
        std::copy_n(m_data + index + 1, m_length - index - 1, data + index);

        // Finally, delete the old array, and use the new array instead
        delete[] m_data;
        m_data = data;
        --m_length;
    }

    // A couple of additional functions just for convenience
    void insert_at_beginning(int value) {
        insert_before(value, 0);
    }

    void insert_at_end(int value) {
        insert_before(value, m_length);
    }

    [[nodiscard]] int get_length() const {
        return m_length;
    }
};

#endif

std::initializer_list

We have seen the following examples using initializer lists:

int array[] { 1,2,3,4,5 };
auto* array{new int[5]{ 1,2,3,4,5 }};

However, we cannot use initializer list to instantiate our IntArray out of box:

IntArray array{1,2,3,4,5};  // won't compile

and this is because we need to explicitly define a constructor that takes initializer lists:

#include <cassert> // for assert 
#include <initializer_list>  // for std::initializer_list 
// etc 


class IntArray {
private:
    // etc 

public:
    IntArray(std::initializer_list<int> list)  // constructor for initializer list 
        : IntArray(static_cast<int>(list.size())) {  // delegation constructor 
        int count {0};
        for (auto element : list) {
            m_data[count] = element;
            ++count;
        }
    }
}

Note that list initialization prefers list constructors over non-list constructors:

IntArray a1(5);  // using IntArray(int) and has length 5
IntArray a2{5};  // using IntArray(std::initializer_list<int>) and has length 1

For the very reason it’s wise to remember that adding a list constructor to an existing class that did not have one may break existing programs, and hence should overall be avoided unless you know what you’re doing. In the case we really need to implement/add a constructor that takes a std::initializer_list, we need to ensure that we do at least one of the following:

Inheritance

Basic inheritance in C++

Inheritance in C++ takes place between classes. In an inheritance relationship, the class being inherited from is called the parent class, base class or superclass. The class doing the inheriting is called the child class, derived class or subclass. For example:

#include <string>
#include <string_view>

class Person {
public:  // making our members public just in this example 
    std::string m_name{};
    int m_age{}

    Person(std::string_view name = "", int age = 0)
        : m_name(name), m_age(age) {
    }

    const std::string& get_name() const { return m_name; }
    const std::string& get_age() const { return m_age; }
};

class BaseballPlayer : public Person {
public:
    double m_batting_average{};
    int m_homeruns{};

    BaseballPlayer(double batting_average = 0, int homeruns = 0)
        : m_batting_average(batting_average), m_homeruns(homeruns) {
    }
};

In the example above we’re declaring BaseballPlayer as a public inheritance of Person. We’ll talk more about this in a future lesson.

Order of construction of derived classes

C++ constructs derived classes in phases, starting with the most-base class and finishing with the most-child class. As each class is constructed, the appropriate constructor from that class is called to initialize that part of the class.

Constructors and initialization of derived classes

Here is what happens when a base class is instantiated:

And here is what happens when a derived class is instantiated:

However, it’s important to realize that we cannot initialize the members defined in the base class. Consider what would happen if the member variable of the base class were const. Since const variables must be initialized with a value at the time of creation, the base class constructor must set its value when the variable is created and the derived class constructor can no longer modify it. Even if the member variable was not const, we would have to double-initialize the same variable and that’s not allowed (or at least, recommended). That means the following won’t work

class Derived : public Base {  // assume m_id was declared in Base class
public:
    double m_cost{};

    Derived(double cost=0., int id=0)
        : m_cost{cost}, m_id{id} {
    }
};

The following would work but only when m_id was not declared const, and again it’s not optimal with double assignment:

class Derived : public Base {
public:
    double m_cost{};

    Derived(double cost=0., int id=0)
        : m_cost{cost} {
        m_id = id;
    }
};

The correct way is to delegate the construction to the base class constructor directly:

class Derived : public Base {
public:
    double m_cost{};

    Derived(double cost=0., int id=0)
        : Base{id}, m_cost{cost} {
    }
};

Here is what happened above:

Inheritance and access specifiers

We’ve seen public and private access specifiers. There is one more access specifier called protected which brings a lit bit of complexity to the discussion. The protected access specifier allows to the class the member belongs to, friends and drvied class to access the member. However, protected members are not accessbible from outside the class. For example:

class Base {
private:
    int m_private{};       // can only be accessed by Base members and friends 
protected:
    int m_protected{};     // can be accessed by Base members, friends and derived classes 
public:
    int m_public{};        // can be accessed anybody 
};

correspondingly, there are three types of inheritance in C++:

class PrivateDerived : private Base {
    // m_private becomes inaccessible *
    // m_protected becomes private *
    // m_public becomes private *
};
class ProtectedDerived : protected Base {
    // m_private becomes inaccessible * 
    // m_protected becomes protected * 
    // m_public becomes protected * 
};
class PublicDerived : public Base {
    // m_private becomes inaccessible *
    // m_protected remains protected 
    // m_public remains public 
};

Adding new functionality to a derived class

One of the biggest benefits of inheritance is the ability to reuse already written code. We can inherit the base class functionality and then add new functionality, modify existing functionality or hide functionality you don’t want. Assume we have the following setup:

class Base {
protected:
    int m_value{};
public:
    Base(int value) : m_value{value} {}
    void identity() const { std::cout << "I am a Base\n"; }
};

class Derived : public Base {
public:
    Derived(int value) : Base{value} {}
};

Now, say we want to “modify” the base class so that m_value is accessible from the public. We can add this new functionality though the derived class:

class Derived : public Base {
public:
    // same as above
    int get_value() const { return m_value; } 
};

Calling inherited functions and overriding behavior

In the example from the previous sector, we can observe this strange behavior:

int main() {
    Base base {5};
    base.identify();
    Derived deri{8};
    deri.identify();
    return 0;
}

The output is

I am a Base 
I am a Base 

How can we correct this? We can modify the function inherited from the base class:

class Derived : public Base {
public:
    // same as default 
    void identify() const { std::cout << "I am a Derived\n"; }
};

We can also add to existing functionality, instead of overriding the member function altogether:

class Derived : public Base {
public:
    // same as default 
    void identify() const {
        Base::identify();  // calling the inherited function
        std::cout << "...and I am a Derived\n";
    }
};

Notice the scope Base:: is necessary to avoid infinite recursion.

It becomes trickier when we want to call the friend function of the base class, as the scope won’t apply in this case. To do that, we can use static_cast to make our derived class temporarily “look like” a base:

class Base {
public:
    friend std::ostream& operator<<(std::ostream& os, const Base& b) {}
};

class Drvied : public Base {
public:
    friend std::ostream& operator<<(std::ostream& os, const Derived& d) {
        os << "Inside Derived we print:\n";
        os << static_cast<const Base&>(d);
        return os;
    }
};

Hiding inherited functionality

We can expose an inherited function to public:

class Base {
private:
    int m_value{};
protected:
    void print() const { std::cout << "yo!\n"; }
public:
    Base(int value) : m_value{value} {}
};

class Derived : public Base {
public:
    Derived(int value) : Base{value} {}
    using Base::print;  // no parentheses here!
};

We can also hide an inherited functionality:

class Derived : public Base {
private:
    using Base::print;
};

However, the print function is still public to Base and we can still access it from Derived by static_cast the derived class instance to a Base. Also, given a set of overloaded functions in the base class, there is no way to change the access specifier for a single overload. You can only change them all if wanted.

We can also outright delete a function inherited from base class:

class Derived : oublic Base {
public:
    void print() const = delete;
};

Multiple inheritance

Multiple inheritance enables a derived class to inherit members from multiple parents. To use multiple inheritance, simply specify each base class separated by a comma:

#include <string>
#include <string_view>

class Person {
private:
    std::string m_name{};
    int m_age{};

public:
    Person(std::string_view name, int age)
        : m_name{ name }, m_age{ age } {
    }

    const std::string& getName() const { return m_name; }
    int getAge() const { return m_age; }
};

class Employee {
private:
    std::string m_employer{};
    double m_wage{};

public:
    Employee(std::string_view employer, double wage)
        : m_employer{ employer }, m_wage{ wage } {
    }

    const std::string& getEmployer() const { return m_employer; }
    double getWage() const { return m_wage; }
};

class Teacher : public Person, public Employee {
private:
    int m_teachesGrade{};

public:
    Teacher(std::string_view name, int age, std::string_view employer, double wage, int teachesGrade)
        : Person{ name, age }, Employee{ employer, wage }, m_teachesGrade{ teachesGrade } {
    }
};

int main() {
    Teacher t{ "Mary", 45, "Boo", 14.3, 8 };
    return 0;
}

A mixin (also spelled as “mix-in”) is a small class that can be inherited from in order to add properties to a clas. The name mixin indicates that the class is intended to be mixed into other classes, not instantiated on its own. In the following example, the BoxMixin and LabelMixin classes are mixins that we can inherit from in order to create a Button class:

struct Point2D {
    int x{};
    int y{};
};

class BoxMixin {
private:
    Point2D m_top_left{};
    Point2D m_bottom_right{};
public:
    void set_top_left(Point2D point) { m_top_left = point; }
    void set_bottom_right(Point2D point) { m_bottom_right = point; }
};

class LabelMixin {
private:
    std::string m_text{};
public:
    void set_text(std::string_view text) { m_text = text; }
};

class Button : public BoxMixin, public LabelMixin {};

It’s common to see mixins defined using templates:

template <class T>
class Mixin {};

class Derived : public Mixin<Derived> {};

Such inheritance is called Curiously Recurring Template Pattern (CRTP).

There are several problems with multiple inheritance:

Virtual Functions

Pointers and references to the base class of derived objects

Check the following example:

class Base {
protected:
    int m_value {};
public:
    Base(int value) : m_value{value} {}
    std::string_view get_name() const { return "Base"; }
    int get_value() const { return m_value; }
};

class Derived : public Base {
public:
    Derived(int value) : Base { value } {}
    std::string_view get_name() const { return "Derived"; }
    int get_value_doubled() const { return m_value * 2; }
};

int main() {
    Derived derived { 5 };
    std::cout << "derived is a " << derived.get_name() << " and has value " << derived.get_value() << '\n';
    Derived& ref_derived { derived };
    std::cout << "ref_derived is a " << ref_derived.get_name() << " and has value " << ref_derived.get_value() << '\n';
    Derived* ptr_derived { &derived };
    std::cout << "ptr_derived is a " << ptr_derived->get_name() << " and has value " << ptr_derived->get_value() << '\n';
    Base& ref_base { derived };   // Base type ref to derived obj
    std::cout << "ref_base is a " << ref_base.get_name() << " and has value " << ref_base.get_value() << '\n';
    Base* ptr_base { &derived };  // Base type ptr to derived obj
    std::cout << "derived is a " << ptr_base->get_name() << " and has value " << ptr_base->get_value() << '\n';
    return 0;
}

Surprisingly (maybe) we will get the following output:

derived is a Derived and has value 5
ref_derived is a Derived and has value 5
ptr_derived is a Derived and has value 5
ref_base is a Base and has value 5
derived is a Base and has value 5

Notice how only the base part of the derived instance got referenced/pointed. It also means we cannot access anything under the derived part e.g. get_value_doubled. This is relevant (and not silly at all) when we want to write one function that takes the base class instead of multiple functions corresponding to the overload of each derived class.

Virtual functions and polymorphism

A virtual function is a special type of member function that, when called, resolves to the most-derived version of the function for the actual type of the object being referenced or pointed to. A derived function is considered a match if it has the same signature (name, parameter types and whether it’s const) and return type as the base version of the function. Such functions are called overrides. For example:

class Base {
public:
    virtual std::string_view get_name() const { return "Base"; }
};

class Derived : public Base {
public:
    virtual std::string_view get_name() const { return "Derived"; }
};

int main() {
    Derived derived {};
    Base& ref_base { derived };   // Base type ref to derived obj 
    std::cout << "ref_base is a " << ref_base.get_name() << '\n';
    return 0;
}

Since ref_base only references to the base part of the object, by default the get_name resolves to the version under the base class. However, with the virtual specifier we let the compiler know that the function should instead resolve to the most-derived version, which in this case suits our need.

Notice that the virtual function resolution only works when it’s called through a pointer or refernce to the class object. It’s natural that the compiler will call the exact member function when we call the virtual function from the object directly.

In programming, polymorphism refers to the ability of an entity to have multiple forms. For example:

int add(int, int)
double add(double, double)

There are two types of polymorphism:

Last notes:

The overrides and final specifiers and covariant return types

To address some common challenges with inheritance, C++ has two inheritance-related identifiers: override and final. Note that they identifiers are not keywords, they are normal words that have special meaning only when used in certain contexts. The C++ standard calls them “identifiers with special meaning”, but they’re often just referred to as “specifiers”.

A derived class virtual function is only considered an override if its signature and return types match exactly. To help address the issue of functions that are meant to be overrides but aren’t, the override specifier can be applied to any virtual function by placing the override specifier after the function signature (the same place a function-level const specifier goes). If the function is not an override to a base class function (or is applied to a non-virtual function), the compiler will flag the function as an error:

#include <string_view>

class A {
public:
    virtual std::string_view getName1(int x) { return "A"; }
    virtual std::string_view getName2(int x) { return "A"; }
    virtual std::string_view getName3(int x) { return "A"; }
};

class B : public A {
public:
    std::string_view getName1(short int x) override { return "B"; } // compile error, function is not an override
    std::string_view getName2(int x) const override { return "B"; } // compile error, function is not an override
    std::string_view getName3(int x) override { return "B"; } // okay, function is an override of A::getName3(int)

};

int main() {
    return 0;
}

There are cases where you don’t want someone to be able to override a virtual function, or inherit from a class. The final specifier can be used in that case to enforce.

#include <string_view>

class A {
public:
    virtual std::string_view getName() const { return "A"; }
};

class B : public A {
public:
    // note use of final specifier on following line -- that makes this function not able to be overridden in derived classes
    std::string_view getName() const override final { return "B"; } // okay, overrides A::getName()
};

class C : public B {
public:
    std::string_view getName() const override { return "C"; } // compile error: overrides B::getName(), which is final
};

In the case where we want to prevent inheriting from a class, the final specifier is applied after the class name:

#include <string_view>

class A {
public:
    virtual std::string_view getName() const { return "A"; }
};

class B final : public A { // note use of final specifier here
public:
    std::string_view getName() const override { return "B"; }
};

class C : public B { // compile error: cannot inherit from final class
public:
    std::string_view getName() const override { return "C"; }
};

There is one special case in which a derived class virtual function override can have a different return type than the base class and still be considered a matching override. If the return type of a virtual function is a pointer or a reference to some class, override function can return a pointer or reference to a derived class. These are called covariant return types:

#include <iostream>
#include <string_view>

class Base {
public:
    // This version of getThis() returns a pointer to a Base class
    virtual Base* getThis() { std::cout << "called Base::getThis()\n"; return this; }
    void printType() { std::cout << "returned a Base\n"; }
};

class Derived : public Base {
public:
    // Normally override functions have to return objects of the same type as the base function
    // However, because Derived is derived from Base, it's okay to return Derived* instead of Base*
    Derived* getThis() override { std::cout << "called Derived::getThis()\n";  return this; }
    void printType() { std::cout << "returned a Derived\n"; }
};

int main() {
    Derived d{};
    Base* b{ &d };
    d.getThis()->printType(); // calls Derived::getThis(), returns a Derived*, calls Derived::printType
    b->getThis()->printType(); // calls Derived::getThis(), returns a Base*, calls Base::printType

    return 0;
}

Virtual destructors, virtual assignment and overriding virtualization

Whenever we need to define our own destructor for a derived class, we need to make it virtual:

class Derived : public Base {
private:
    int* m_array {};

public:
    Derived(int length) : m_array{new int[length]} {}
    virtual ~Derived() {
        delete[] m_array;
    }
};

It is not necessary to create an empty derived class destructor just to mark it as virtual, as derived overrides are all assumed virtual if not otherwise specified.

As for virtual assignments, it’s generally not needed nor suggested to consider virtualizing assignment operators for the interest of simplicity.

More often, we can ignore virtualization by uisng scope resolution operator:

int main() {
    Derived derived {};
    const Base& base { derived };
    std::cout << base.Base::get_name() << '\n';
    return 0;
}

Finally we have these rules (just as recommendations / rules of thumb):

Early binding and late binding

When a program is compiled, the compiler converts each statement in your C++ program into one or more lines of machine language. Each line of machine language is given its own unique sequential address. This is no different for functions – when a function is encountered, it is converted into machine language and given the next available address. Thus, each function ends up with a unique address.

Binding refers to the process that is used to convert identifiers (such as variable and function names) into addresses. Although binding is used for both variables and functions, in this lesson we’re going to focus on function binding.

Early binding (as known as static binding) means the compiler (or linker) is able to directly associate the identifier name (s.g. function or variable name) with a machine address. This includes cases where we define functions ahead of time and call them directly.

Late binding (also known as dynamic binding, in the case of virtual function resolution) means the function being called is looked by name at runtime only. In C++, one way to get late binding is to use function pointers. Calling a function via a pointer is also known as an indirect function call, in which case the compiler cannot tell which function is pointed to at compile time.

The virtual table

C++ implementations typically implement virtual functions using a form of late biding known as the virtual table, which is a lookup table of functions used to resolve function calls in a dynamic/late binding manner. The virtual table sometimes goes by “vtable” or “dispatch table”.

Every class that uses virtual functions (or is derived from a class that uses virtual functions) has a corresponding virtual table. The table is simply a static array that the compiler sets up at compile time. A virtual table contains one entry for each virtual function that can be called by objects of the class. Each entry in this table is simply a function pointer that points to the most-derived function accessible by the class. The compiler then also adds a hidden pointer that is a member of the base class, which we call *__vptr. Unlike the this pointer, which is actually a function parameter used by the compiler to resolve self-reference, *__vptr is a real pointer member. It makes each class object allocated bigger by the size of one pointer. It also means *__vptr is inherited by the derived classes, which is important.

Pure virtual functions, abstract base classes and interface classes

C++ allows us to create a special kind o virtual functions called pure virtual functions (or abstract functions) that have no body at all. A pure virtual function simply acts as a placeholder that is meant to be redefined by derived classes.

#include <string>
#include <string_view>

class Animal {  // this is an ABC
protected:
    std::string m_name {};

public:
    Animal(std::string_view name) : m_name{ name } {}
    const std::string& getName() const { return m_name; }
    virtual std::string_view speak() const = 0; // speak is a pure virtual function
    virtual ~Animal() = default;
};

Any class with one or more pure virtual functions becomes an abstract base class, which means that it cannot be instantiated.

An interface class is a class that has no member variables, and where all of the functions are pure virtual. Interfaces are useful when you want to define the functionality that derived classes must implement but leave the details of how the derived class implement that functionality entirely up to the derived class. Interface classes are often named beginning with “I”.

#include <string_view>

class IErrorLog {
public:
    virtual bool openLog(std::string_view filename) = 0;
    virtual bool closeLog() = 0;
    virtual bool writeError(std::string_view errorMessage) = 0;
    virtual ~IErrorLog() {} // make a virtual destructor in case we delete an IErrorLog pointer, so the proper *derived* destructor is called
};

with this, we can write functions that takes a parameter in any class that conforms to the IErrorLog interface:

#include <cmath> // for sqrt()

// instead of this, assuming FileErrorLog derives from IErrorLog
double mySqrt(double value, FileErrorLog& log) {
    if (value < 0.0) {
        log.writeError("Tried to take square root of value less than 0");
        return 0.0;
    }
    return std::sqrt(value);
}

// now we have this
double mySqrt(double value, IErrorLog& log) {
    if (value < 0.0) {
        log.writeError("Tried to take square root of value less than 0");
        return 0.0;
    }
    return std::sqrt(value);
}

Virtual base classes

We can solve the dimond problem (discussed earlier) by inheriting virtually:

class PoweredDevice {
};

class Scanner: virtual public PoweredDevice {
};

class Printer: virtual public PoweredDevice {
};

class Copier: public Scanner, public Printer {
};  // neither Scanner or Printer will constructor PoweredDevice -- Copier itself will do the job

Object slicing

Object slicing happens when we assign a derived object to a base type, it also happens when we push elements into a vector when the vector is of base type while the new element is of derived. We can avoid this by using references and pointers.

int main() {
    Derived d1 {5};
    Derived d2 {6};
    Baser& b {d2};
    b = d1;
    return 0;
}

In the above example, b is initialized a derived from d2 but then assigned d1 later. However, only the base portion of the d1 will be copied over and the result is b becomes base of d1 together with derived of d2.

Dynamic casting

We can use dynamic_cast just like static_cast to convert a base pointer (pointing at a derived object) to a derived pointer. If the pointer being converted is not pointing to a derived object, the resulting pointer will be a null pointer. It is also konwn as downcasting (comparing with upcasting from a derived up to base). There are also several other cases where downcasting won’t work:

Also, it turns out that we can use static_cast for downcasting, which will be faster but more dangerous because there will be no type checking at runtime. Instead of a null pointer, we will observe undefined behavior if we downcast a pointer not pointing at a derived object.

In addition to pointers, dynamic casting also works for references.

In general, using virtual function should be preferred over downcasting. However, there are several cases where downcasting is the better choice:

Printing inherited classes using operator<<

Because we typically implement operator<< as a friend instead of member function, and friends cannot be virtualized, we can’t simply pretend that C++ will use the most derived overload version of the function when we want to print an inherited class.

Instead of making operator<< virtual, a simple solution is to let it call a function that is virtualized:

class Base {
public:
    friend std::ostream& operator<<(std::ostream& os, const Base& b) {
        os << b.identify();
        return os;
    }

    virtual std::string identify() const {
        return "Base";
    }
};

class Derived : public Base {
public:
    std::string identify() const override {
        return "Derived";
    }
};

int main() {
    Base b{};
    std::cout << b << '\n';
    
    Derived d{};
    std::cout << d << '\n';  // this works even without explicit << in Derived

    Base& bref {d};
    std::cout << bref << '\n';

    return 0;
}

The above program will print

Base
Derived 
Derived 

Templates and Classes

Template classes

Let’s say we want to create a template class of array that works with any type of elements. In array.h we have

#ifndef ARRAY_H
#define ARRAY_H

#include <cassert>

template <typename T>
class Array {
private:
    int m_len{};
    T* m_data{};

public:
    Array(int len) {
        assert(len > 0);
        m_data = new T[len]{};
        m_len = len;
    }

    Array(const Array&) = delete;
    Array& operator=(const Array&) = delete;
    ~Array() {
        delete[] m_data;
    }

    void erase() {
        delete[] m_data;
        m_data = nullptr;
        m_len = 0;
    }

    T& operator[](int i);
    int get_len() const {
        return m_len;
    }
};

template <typename T>
T& Array<T>::operator[](int i) {
    assert(i >= 0 && i < m_len);
    return m_data[i];
}

#endif

We can then use the template class as follows:

int main() {
    const int len{12};
    Array<int> int_arr{len};
    Array<double> double_arr{len};
    for (int i{0}; i < len; ++i) {
        int_arr[i] = i;
        double_arr[i] = i + 0.5;
    }
    for (int i{len - 1}; i >= 0; --i) {
        std::cout << int_arr[i] << '\t' << double_arr[i] << '\n';
    }
    return 0;
}

which prints:

11  11.5
10  10.5
9   9.5
8   8.5
7   7.5
6   6.5
5   5.5
4   4.5
3   3.5
2   2.5
1   1.5
0   0.5

Notice that we cannot also move the definition part of Array::operator[] into array.cpp. This is because a template is not a class or a function, but a stencil used to create classes or functions. We can leave everything inside the header file, or put the definitions we originally want to put in array.cpp now into array.inl (as in “inline”) and then include array.inl at the bottom of the array.h file. A third solution is to write declarations in array.h, definitions in array.cpp, and then include both into another file e.g. templates.cpp:

#include "array.h"
#include "array.cpp"

template class Array<int>;  // explicitly instantiate template class here!
template class Array<double>;

Template non-type parameters

A template non-type parameter is a template parameter where the type of the parameter is predefined and is substituted for a constexpr value passed in as an argument. A non-type parameter can be any of the following types:

For example:

template <typename T, int size>  // size here is an integral non-type parameter 
class StaticArray {
private:
    T m_array[size] {};

public:
    T* get_array();
    T& operator[](int i) {
        return m_array[i];
    }
};

template<typename T, int size>
T* StaticArray<T, size>::get_array() {
    return m_array;
}

Function template specialization

Sometimes we want to vary a bit about the behavior of a templated function. Say we have a templated function print that can print integers and doubles. However, we now would like double outputs to be in scientific notation:

template <typename T>
void print(const T& t) {
    std::cout << t << '\n';
}

template<>
void print<double>(const double& d) {
    std::cout << std::scientific << d << '\n';
}

int main() {
    print(5);
    print(6.7);
    return 0;
}

which prints

5
6.700000e+000

This is called function template specialization.

Class template specialization

When we want to “specialize” a class member function, we’re no longer talking about function template specialization. Instead, since the member function resides under the template class, we’re actually trying to implement class template specialization.

While we can always specialize the whole class, C++ also allows us to only specialize a certain member function and automatically implements the rest as the same as the non-specialized class:

template <typename T>
class Storage {
private:
    T m_value{};

public:
    Storage(T value) : m_value{value} {};
    void print() {
        std::cout << m_value << '\n';
    }
};

template<>
void Storage<double>::print() {
    std::cout << std::scientific << m_value << '\n';
}  // only specializing this member function here

Partial template specialization

We can partialy specialize a class but not a function.

Exceptions

Basic exception handling

There are three keywords that work in conjunction with each other in exception handling: throw, try and catch.

In C++, a throw statement is used to signal that an exception or error case has occurred. It’s also commonly called raising an exception.

throw -1;                            // throw a literal integer 
throw ENUM_INVALID_INDEX;            // throw an enum value 
throw "can you be quiet";            // throw a literal C-style string 
throw dX;                            // throw a double varaible that was previously defined 
throw MyException("deadly dead");    // throw an object of class MyException

We can use the try keyword to define a block of statements (called a try block) that acts as an observer looking for any exceptions that are thrown within:

try {
    throw -1;
}

Notice that the above block doesn’t define how we’re going to handle the exception. The actual handling is done by the catch blocks.

catch (int x) {
    std::cerr << "We caught an int exception with value " << x << '\n';
}

Putting them altogether, we have the following example:

#include <iostream>
#include <string>

int main() {
    try {
        throw -1;
    }

    catch (int x) {
        std::cerr << "We caught an int exception with value " << x << '\n';
    }

    std::cout << "Continuing the program now...\n";
    return 0;
}

and the program will print

We caught an int exception with value -1 
Continuing the program now...

There are four common things that a catch block may do when they catch an exception:

Exceptions, functions and stack unwinding

You can throw an exception from within a function to be caught outside, as long as the function caller is within a try block. This mechanism is called stack unwinding.

Uncaught exceptions and catch-all handlers

When a function throws an exception that it does not handle itself, it is making the assumption that a function somewhere down the call stack will handle the exception. When no exception handler for a function can be found, std::terminate() is called and the application is terminated. In such cases, the call stack may or may not be unwound. If the stack is not unwound, local variables will not be destroyed and any cleanup expected upon destruction of said variables will not happen.

Fortunately, C++ provides us with a mechanism to catch all types of exceptions. This is known as a catch-all handler.

int main() {
    try {
        throw 5;
    }

    catch (double x) {
        std::cout << "We caught an exception of type double: " << x << '\n';
    }

    catch (...) {  // catch-all handler
        std::cout << "We caught an exception of undetermined type\n";
    }
}

Exceptions, classes and inheritance

We can throw an exception instead of using assert to pass specific error code back to the caller. However, what if we still need more information? We resort to exception classes in this case.

class ArrayException {
private:
    std::string m_error;
public:
    ArrayException(std::string_view error) : m_error{error} {}
    const std::string& get_error() const { return m_error; }
};

Because an exception class is a class, and a class can be inherited, it’s natural that an exception class can be derived. However, C++ checks the exception class type sequentially and when a derived exception is caught, the base type will be detected before the derived:

class Base {
public:
    Base() {}
};

class Derived : public Base {
public:
    Derived() {}
};

int main() {
    try {
        throw Derived();
    }
    catch (const Base& base) {
        std::cerr << "Caught Base\n";
    }
    catch (const Derived& derived) {
        std::cerr << "Caught Derived\n";
    }
    return 0;
}

The above program will print

Caught Base

and to make it work properly we have to swap the order of the two catch blocks.

In C++ STL we have std::exception (provided in <exception>) that basically is the base class of all exception classes, so we can easily catch all exceptions (by the STL) using this class:

#include <exception>

int main() {
    try {
        ...
    }
    catch (const std::exception& exception) {
        std::cerr << "Standard exception: " << exception.what() << '\n';
    }
    return 0;
}

and when we want to handle specific STL exceptions differently, we just add those particular catch blocks before this general block.

We can throw STL exceptions or derive our own exception class from std::exception however we like, as long as we keep the custom exception class copyable, as the compiler makes a copy of the exception object to some piece of unspecified memory (outside of the call stack) reserved for handling exceptions:

class Base {
public:
    Base() {}
};

class Derived : public Base {
public:
    Derived() {}
    Derived(const Derived&) = delete;   // explicitly making it not copyable 
};

int main() {
    Derived d{};
    try {
        throw d;  // compile error!!
    }
    ...
}

Rethrowing exceptions

When we want to rethrow an exception, we cannot just throw x assuming x is the captured exception, as class type slicing will bring unexpected behavior:

int main() {
    try {
        try {
            throw Derived{};
        }
        catch (Base& b) {
            std::cout << "Caught base b which is actually a " << b << '\n';
            throw b;  // the Derived obj gets SLICED here!!!
        }
    }
    catch (Base& b) {
        std::cout << "Caught base b which is actually a " << b << '\n';
    }
    return 0;
}

The program above prints

Caught a base which is actually a Derived
Caught a base which is actually a Base

To solve this problem, simply use throw when we want to rethrow an exception from within a catch block:

int main() {
    try {
        try {
            throw Derived{};
        }
        catch (Base& b) {
            std::cout << "Caught base b which is actually a " << b << '\n';
            throw;  // the Derived obj is thrown as it is
        }
    }
    catch (Base& b) {
        std::cout << "Caught base b which is actually a " << b << '\n';
    }
    return 0;
}

which prints

Caught a base which is actually a Derived
Caught a base which is actually a Derived

Function try blocks

Consider the following example:

class Base {
private:
    int m_x{};
public:
    Base(int x) m_x{x} {
        if (x <= 0)
            throw 1;  // thrown inside Base()
    }
};

class Derived : public Base {
public:
    Derived(int x) : Base{x} {
        // what if we want to catch the error here?
    }
}

int main() {
    try {
        Derived Derived{0};
    }
    catch (int) {
        // caught here
    }
    return 0;
}

In order to catch the exception inside the constructor of Derived(), we have to use a slightly modified try block called a function try block.

// same goes for Base 

class Derived : public Base {
public:
    Derived(int x) try : Base{x} { // notice the keyword try in the end!!
        // whatever we want to do inside the constructor 
    }
    catch (...) {
        std::cerr << "Exception caught inside the constructor\n";
        throw;
    }
};

int main() {
    try {
        Derived b{0};
    }
    catch (int) {
        std::cout << "Oops\n";
    }
    return 0;
}

It is obvious that function try blocks cannot resolve exceptions or return anything, and it’s important to know that reaching the end of the catch block will implicitly rethrow (although we explicitly rethrew here). In summary we have

function type can resolve exceptions via return statement behavior at the end of catch block
constructor no (must throw or rethrow ) implicit rethrow
destructor yes implicit rethrow
non-value returning function yes resolve exception
value-returning function yes undefined behavior

Exception dangers and downsides

Here are some problems that might occur when using exceptions:

So when should we use exceptions at all?

Exception specifications and noexcept

In C++, all functions are classified as either non-throwing or potentially throwing. To define a function as non-throwing, we simply use the noexcept specifier at the en of the signature:

void some_function() noexcept;

The noexcept here is nothing but a contractual promise – when an unhandled exception happens from within the function, undetermined behavior would occur and std::terminate will be called.

Functions that are implicitly non-throwing:

Functions that are non-throwing by default for implicitly-declared or defaulted functions:

Functions that are potentially throwing (if not implicitly-declared or defaulted):

In addition to being a specifier, noexcept also serves as an operator that returns true or false indicating whether the compiler thinks the content within will throw an exception or not:

void foo() {throw -1;}
void boo() {};
void goo() noexcept {};
struct S{};

constexpr bool b1{ noexcept(5 + 3) }; // true; ints are non-throwing
constexpr bool b2{ noexcept(foo()) }; // false; foo() throws an exception
constexpr bool b3{ noexcept(boo()) }; // false; boo() is implicitly noexcept(false)
constexpr bool b4{ noexcept(goo()) }; // true; goo() is explicitly noexcept(true)
constexpr bool b5{ noexcept(S{}) };   // true; a struct's default constructor is noexcept by default

The noexcept operator is checked statically at compile-time and doesn’t actually evaluate the input expression.

There are four levels of exception safety guarantees (which is yet another contractual guideline):

Examples of code that should be no throw:

Examples of code that should be no fail:

std::move_if_noexcept

std::move_if_noexcept will return a movable r-value if the object ha a noexcept move constructor, otherwise it will return a copyable l-value. We can use noexcept specifier in conjunction with std::move_if_noexcept to use move semantics only when a strong exception guarantee exists (and use copy semantics otherwise).

Input and Output (I/O)

Input and output (I/O) streams

Input and output functinality is not defined as part of the core C++ language but included in the STL.

We can include the <iostream> header to gain access to the hierarchy of classes including streams, which is just a sequence of bytes that can be accessed sequentially by definition. Specifically, input streams are used to hold input from a data producer, such as a keyboard, a file, or a network. Output streams are used to hold output for a particular data consumer, such as a monitor, a file, or a printer.

ios is a typedef for std::basic_ios<char> that defines a bunch of stuff that is common to both input and output streams. The istream class is the primary class used when dealing with input streams, in which case the extraction operator >> is used to remove values from the stream. The ostream class is the primary class used when dealing with output streams, in which case the insertion operator << is used to put values in the stream.

The standard stream is a pre-connected stream provided to a computer program by its environment. C++ comes with four predefined standard stream objects that have already been st up for your use:

Input with istream

When using the extraction operator <<, we can use std::setw() manipulator to limit the number of characters read in from a stream:

#include <iomanip>
char buf[10]{};
std::cin >> std::setw(10) >> buf;

Notice >> omits spaces and newlines:

char ch{}
while (std::cin >> ch) {
    std::cout << ch;
}

the above prints everything without spaces. In order to keep the spaces we can use cin.get() which doesn’t skip any character:

char ch{}
while (std::cin.get(ch)) {
    std::cout << ch;
}

We can also specify the maximum number of characters to read:

char strBuf[11]{};
std::cin.get(strBuf, 11);
std::cout << strBuf << '\n';

Notice that we’re only reading 10 characters here since we have to leave one character for the terminator. The remaining characters were left in the istream.

There is another function cin.getline() which works just like cin.get() but discards the delimiter:

char strBuf[11]{};
std::cin.getline(strBuf, 11);
std::cout << strBuf << '\n';

If we need to know how many character were extracted by the last call of getline(), we can use gcount():

std::cout << std::cin.gcount() << " characters were read\n";

There is also a special version of getline() which, instead of residing under cin, is included in the string header:

std::string strBuf{};
std::getline(std::cin, strBuf);
std::cout << strBuf << '\n';

Here are a few more useful istream functions:

Output with ostream and ios

The insertion operator << is used to put information into an output stream. C++ has predefined insertion operations for all the built-in data types. When using this operator, there are two ways to change the formatting options: flags and manipulators. You can think of flags as boolean variables that can be turned on and off, and manipulators as objects placed in a stream that affect the way things are input and output.

To switch a flag on, we can use cout.setf() function with the appropriate flag as a parameter, for example:

std::cout.setf(std::ios::showpos);
std::cout << 27 << '\n';

which prints

+27

You can also turn on multiple ios flags at once using the bitwise OR | operator:

std::cout.setf(std::ios::showpos | std::ios::uppercase);
std::cout << 1234567.89f << '\n';

which prints

+1.23457E+06

To turn a flag off, we can use std::cout.unsetf(). There is one intricacy here that when a flag is turned on, it cannot automatically turn off other mutually exclusive flags. For example:

std::cout.setf(std::ios::hex);
std::cout << 27 << '\n';

which prints

27

Why? This is because the default std::ios::dec hasn’t been turned off yet. We need to manually turn off the decimal formatter which is mutually exclusive with std::ios::hex:

std::cout.unsetf(std::ios::dec);
std::cout.setf(std::ios::hex);
std::cout << 27 << '\n';

and how it prints

1b

Alternatively, we can use manipulators to do the same thing without worrying about manually turning on and off these flags:

std::cout << std::hex << 27 << '\n'; // print 27 in hex
std::cout << 28 << '\n';             // we're still in hex
std::cout << std::dec << 29 << '\n'; // back to decimal

Here is a list of useful flags, manipulators and member functions:

Specifically, here is a table showing what different precision formatting works:

option precision 12345.0 0.12345
normal 3 1.23e+004 0.123
normal 4 1.235e+004 0.1235
normal 5 12345 0.12345
normal 6 12345 0.12345
showpoint 3 1.23e+004 0.123
showpoint 4 1.235e+004 0.1235
showpoint 5 12345. 0.12345
showpoint 6 12345.0 0.123450
fixed 3 12345.000 0.123
fixed 4 12345.0000 0.1234
fixed 5 12345.00000 0.12345
fixed 6 12345.000000 0.123450
scientific 3 1.235e+004 1.234e-001
scientific 4 1.2345e+004 1.2345e-001
scientific 5 1.23450e+004 1.23450e-001
scientific 6 1.234500e+004 1.234500e-001

Stream classes for strings

In addition to I/O streams, there is also another set of classes called the stream classes for strings that allow you to use the familiar insertions << and extractions >> operators to work with strings. There are specifically six stream classes for strings: istringstream (derived from istream), ostringstream (derived from ostream) and stringstream (derived from iostream) are used for reading and writing normal characters width strings, wistringstream, wostringstream and wstringstream are used for reading and writing wide character strings. To use stringstreams, you need to include <sstream> header.

#include <sstream>

std::stringstream os{};
os << "well well well\n";  // works just like std::cout
constexpr int nValue{123};
constexpr double dValue{4.5};
os << nValue << ' ' << dValue;  // works too

std::stringstream os2{123 4.5};
os >> nValue >> dValue;  // works just like std::cin

os.str("");  // erase the buffer 
os.clear();  // reset error flags
os << "what?\n";
std::cout << os.str();  // retrieving the string and print

Stream states and input validation

There are four stream states in C++:

and ios also provides these functions to access these states:

In terms of input validation, there are a list of useful functions provided by the <cctype> header:

For example:

#include <algorithm> // std::all_of
#include <cctype>    // std::isalpha, std::isspace
#include <iostream>
#include <ranges>
#include <string>
#include <string_view>

bool isValidName(std::string_view name) {
  return std::ranges::all_of(name, [](char ch) {
    return std::isalpha(ch) || std::isspace(ch);
  });
}

int main() {
  std::string name{};

  do {
    std::cout << "Enter your name: ";
    std::getline(std::cin, name); // get the entire line, including spaces
  } while (!isValidName(name));

  std::cout << "Hello " << name << "!\n";
}

Basic file I/O

We need <fstream> for file I/O in C++.

To write stuff into a file:

#include <fstream>
#include <iostream>

int main() {
    std::ofstream outf{ "Sample.txt" };
    if (!outf) {
        std::cerr << "Uh oh, Sample.txt could not be opened for writing!\n";
        return 1;
    }
    outf << "This is line 1\n";
    outf << "This is line 2\n";
    return 0;
}

Unlike Python, here the destructor of ofstream will automatically close the file when going out of scope.

To read stuff from a file:

#include <fstream>
#include <iostream>
#include <string>

int main() {
    std::ifstream inf{ "Sample.txt" };
    if (!inf) {
        std::cerr << "Uh oh, Sample.txt could not be opened for reading!\n";
        return 1;
    }
    std::string strInput{};

    // two ways to read:

    while (inf >> strInput) { // this reads word by word and breaks on whitespace
        std::cout << strInput << '\n';
    }

    while (std::getline(inf, strInput)) {  // alternatively, read by lines
        std::cout << strInput << '\n';
    }

    return 0;
}

The file stream constructors take an optional second parameter that allows you to specify information about how the file should be opened:

For example:

#include <iostream>
#include <fstream>

int main() {
    std::ofstream outf{ "Sample.txt", std::ios::app };  // append instead of overwrite 
    if (!outf) {
        std::cerr << "Uh oh, Sample.txt could not be opened for writing!\n";
        return 1;
    }
    outf << "This is line 3\n";
    outf << "This is line 4\n";

    return 0;
}

Although the file stream automatically closes itself upon destruction of the class, we can still manually open and close a file:

std::ofstream fout{"Sample.txt"};
fout << "This is line 1\n";
fout << "This is line 2\n";

fout.open("Sample.txt", std::ios::app);
fout << "This is line 3\n";
fout.close();

Random file I/O

Instead of reading from the beginning (in and out mode) or from the end (ate and app mode), it is also possible to do a random file access, that is, skip around to various points in the file to read its contents. This can be useful when your file is full of records, adn you wish to retrieve a specific record. Rather than reading all the records until getting the desired one, we can skip directly to the record we wish to retrieve.

Random file access is done by manipulating the file pointer using either seekg() function (“get” for input) and seekp() function (“put” for output). Both functions take two parameters, one for how many bytes to move the file pointer, and one for the ios flag that specifies where the offset is from. The available flags are:

For example:

fin.seekg(14, std::ios::cur);    // move forward 14 bytes 
fin.seekg(-18, std::ios::cur);   // move backwards 18 bytes 
fin.seekg(22, std::ios::beg);    // move to 22nd byte in file 
fin.seekg(-28, std::ios::end);   // move to 28th byte before EOF

There are two more functions tellg() and tellp() which returns the absolute position of the file pointer.

It’s worth notice that it’s possible to switch the read and write mode of a fstream (if you use fstream instead of ifstream or ofstream). However, you can only switch with an explicit seekg() or seekp():

std::fstream file{"Sample.txt", std::ios::in | std::ios::out};

char ch{};
while (file.get(ch)) {
    switch (ch) {
        case 'a':
        case 'e':
        case 'i':
        case 'o':
        case 'u':
        case 'A':
        case 'E':
        case 'I':
        case 'O':
        case 'U':  // if we file a vowel
            file.seekg(-1, std::ios::cur);  // go back 1 byte
            file << '#';  // overwrite with # (now in write mode!)
            file.seekg(file.tellg(), std::ios::beg);  // switch back to read mode
            break;
    }
}

Lastly we have remove() function to delete a file, and is_open() function to tell if a file stream is currently open or not.

Miscellaneous Subjects

Static and dynamic libraries

A library is a package of code that is meant to be reused by many programs. Typically, a C++ library comes in two pieces:

Some libraries may be split into multiple files and/or multiple header files.

There are two types of libraries, static libraries and dynamic libraries.

A static library (also known as an archive) consists of routines that are compiled and linked directly into your program. When you compile a program that uses a static library, all the functionality of the static library that your program uses becomes part of your executable. On Windows, static libraries typically have a .lib extension, whereas on Linux, static libraries typically have a .a (archive) extension. The benefits of using static libraries are:

On the downside:

A dynamic library (also known as a shared library) consists of routines that are loaded into your application at runtime. When you compile a program that uses a dynamic library, the library does not become part of the executable. Instead, it remains as a separate unit. On Windows, dynamic libraries typically have a .dll (as in dynamic link library) extension, whereas on Linux, they typically have a .so (as in shared object) extension. Most linkers can build an import library for a dynamic library when the dynamic library is created.

C++ FAQ

#include <utility>  // for std::pair

int main() {
    std::pair point{1, 2};  // CTAD to deduce std::pair<int, int>
    return 0;
}

C++ Updates

Introduction to C++11

On August 12, 2011, the International Organization for Standardization (ISO) approved a new version of C++, namely C++11. Bjarne Stroustrup characterized the goals of C++11 as:

C++11 isn’t a large departure from C++03 thematically, but it did add a huge amount of new functionality:

There are also many new classes in the STL:

Introduction to C++14

On August 18, 2014, the ISO approved a new version of C++, namely C++14. Compared with C++11, which added a huge amount of functionality, C++14 is a relatively minor update:

Introduction to C++17

In September 2017, the ISO approved C++17, which contains a fair amount of new content:

Introduction to C++20

In February 2020, the ISO approved C++20, which contains the most changes since C++11:

Introduction to C++23

In February 2023, the ISO approved C++23, which includes:

The End

Here are some directions you may want to explore next: