Glossary

Absolute row number: the sequential index of a row in a table, regardless of what sections of the table is being displayed.

Aggregation: to combine many values into one, e.g., by summing a set of numbers or concatenating a set of strings.

Alias: to have two (or more) references to the same physical data.

Anonymous function: a function that has not been assigned a name. Anonymous functions are usually quite short, and are usually defined where they are used, e.g., as callbacks.

Attribute: a name-value pair associated with an object, used to store metadata about the object such as an array’s dimensions.

Catch (exception): to accept responsibility for handling an error or other unexpected event. R prefers “handling a condition” to “catching an exception”.

Condition: an error or other unexpected event that disrupts the normal flow of control. See also handle.

Constructor (class): a function that creates an object of a particular class. In the S3 object system, constructors are a convention rather than a requirement.

Copy-on-modify: the practice of creating a new copy of aliased data whenever there is an attempt to modify it so that each reference will believe theirs is the only one.

Double square brackets: an index enclosed in [[...]], used to return a single value of the underlying type. See also single square brackets.

Empty vector: a vector that contains no elements. Empty vectors have a type such as logical or character, and are not the same as null.

Environment: a structure that stores a set of variable names and the values they refer to.

Error: the most severe type of built-in condition in R.

Exception: an object containing information about an error, or the condition that led to the error. R prefers “handling a condition” to “catching an exception”.

Filter: to choose a set of records according to the values they contain.

Fully qualified name: an unambiguous name of the form package::thing.

Functional programming: a style of programming in which functions transform data rather than modifying it. Functional programming relies heavily on higher-order functions.

Generic function: a collection of functions with similar purpose, each operating on a different class of data.

Global environment: the environment that holds top-level definitions in R, e.g., those written directly in the interpreter.

Group: to divide data into subsets according to some criteria while leaving records in a single structure.

Handle (a condition): to accept responsibility for handling an error or other unexpected event. R prefers “handling a condition” to “catching an exception”.

Helper (class): in S3, a function that constructs and validates an instance of a class.

Heterogeneous: potentially containing data of different types. Most vectors in R are homogeneous, but lists can be heterogeneous.

Higher-order function: a function that takes one or more other functions as parameters. Higher-order functions such as map are commonly used in functional programming.

Homogeneous: containing data of only a single type. Most vectors in R are homogeneous.

Hubris: excessive pride or self-confidence.

ISO3 country code: a three-letter code defined by ISO 3166-1 that identifies a specific country, dependent territory, or other geopolitical entity.

Lazy evaluation: delaying evaluation of an expression until the value is actually needed (or at least until after the point where it is first encountered).

List comprehension: an expression that generates a new list from an existing one via an implicit loop.

Logical indexing: to index a vector or other structure with a vector of Booleans, keeping only the values that correspond to true values.

Message: the least severe type of built-in condition in R.

Method: an implementation of a generic function that handles objects of a specific class.

NA: a special value used to represent data that is Not Available.

Name collision: a situation in which the same name has been used in two different packages which are then used together, leading to ambiguity.

Negative selection: to specify the elements of a vector or other data structure that aren’t desired by negating their indices.

Null: a special value used to represent a missing object. NULL is not the same as NA, and neither is the same as an empty vector.

Package: a collection of code, data, and documentation that can be distributed and re-used.

Parent environment: the environment “above” the current environment. Parentage is defined lexically (when code is written) rather than dynamically (as code is called).

Pipe operator: the %>% used to make the output of one function the input of the next.

Raise (exception): a way of indicating that something has gone wrong in a program, or that some other unexpected event has occurred. R prefers “signalling a condition” to “raising an exception”.

Range expression: an expression of the form low:high that is transformed a sequence of consecutive integers.

Recycle: to re-use values from a shorter vector in order to generate a sequence of the same length as a longer one.

Relative row number: the index of a row in a displayed portion of a table, which may or may not be the same as the absolut row number within the table.

Scalar: a single value of a particular type, such as 1 or “a”. Scalars don’t really exist in R; values that appear to be scalars are actually vectors of unit length.

Select: to choose entire columns from a table by name or location.

Setup (testing): code that is automatically run once before each unit test.

Signal (a condition): FIXME. a way of indicating that something has gone wrong in a program, or that some other unexpected event has occurred. R prefers “signalling a condition” to “raising an exception”.

Single square brackets: FIXME. an index enclosed in [...], used to select a structure from another structure. See also double square brackets.

Storage allocation: setting aside a block of memory for future use.

Teardown (testing): code that is automatically run once after each unit test.

Test fixture: the data structures, files, or other artefacts on which a unit test operates.

Test runner: a software tool that finds and runs unit tests.

Tibble: a modern replacement for R’s data frame, which stores tabular data in columns and rows, defined and used in the tidyverse.

Tidyverse: a collection of R packages for operating on tabular data in consistent ways.

Unit test: a function that tests one aspect or property of a piece of software.

Validator (class): a function that checks the consistency of an S3 object.

Variable arguments: in a function, the ability to take any number of arguments. R uses ... to capture the “extra” arguments.

Vector: a sequence of values, usually of homogeneous type. Vectors are the fundamental data structure in R; scalars are actually vectors of unit length.

Warning: FIXME. a built-in condition in R of middling severity.