# F Key Points

## F.1 Values and Vectors

• Use `print(expression)` to print the value of a single expression.
• Variable names may include letters, digits, `.`, and `_`, but `.` should be avoided, as it sometimes has special meaning.
• R’s atomic data types include logical, integer, double (also called numeric), and character.
• R stores collections in homogeneous vectors of atomic types, or in heterogeneous lists.
• ‘Scalars’ in R are actually vectors of length 1.
• Vectors and lists are created using the function `c(...)`.
• Vector indices from 1 to length(vector) select single elements.
• Negative indices to vectors deselect elements from the result.
• The index 0 on its own selects no elements, creating a vector or list of length 0.
• The expression `low:high` creates the vector of integers from `low` to `high` inclusive.
• Subscripting a vector with a vector of numbers selects the elements at those locations (possibly with repeats).
• Subscripting a vector with a vector of logicals selects elements where the indexing vector is `TRUE`.
• Values from short vectors (such as ‘scalars’) are repeated to match the lengths of longer vectors.
• The special value `NA` represents missing values, and (almost all) operations involving `NA` produce `NA`.
• The special values `NULL` represents a nonexistent vector, which is not the same as a vector of length 0.
• A list is a heterogeneous vector capable of storing values of any type (including other lists).
• Indexing with `[` returns a structure of the same type as the structure being indexed (e.g., returns a list when applied to a list).
• Indexing with `[[` strips away one level of structure (i.e., returns the indicated element without any wrapping).
• Use `list('name' = value, ...)` to name the elements of a list.
• Use either `L['name']` or `L\$name` to access elements by name.
• Use back-quotes around the name with `\$` notation if the name is not a legal R variable name.
• Use `matrix(values, nrow = N)` to create a matrix with `N` rows containing the given values.
• Use `m[i, j]` to get the value at the i’th row and j’th column of a matrix.
• Use `m[i,]` to get a vector containing the values in the i’th row of a matrix.
• Use `m[,j]` to get a vector containing the values in the j’th column of a matrix.
• Use `for (loop_variable in collection){ ...body... }` to create a loop.
• Use `if (expression) { ...body... } else if (expression) { ...body... } else { ...body... }` to create conditionals.
• Expression conditions must have length 1; use `any(...)` and `all(...)` to collapse logical vectors to single values.
• Use `function(...arguments...) { ...body... }` to create a function.
• Use variable <- function(…arguments…) { …body… }` to create a function and give it a name.
• The body of a function can be a single expression or a block in curly braces.
• The last expression evaluated in a function is returned as its result.
• Use `return(expression)` to return a result early from a function.

## F.2 Indexing

• A list is a heterogeneous vector capable of storing values of any type (including other lists).
• Indexing with `[` returns a structure of the same type as the structure being indexed (e.g., returns a list when applied to a list).
• Indexing with `[[` strips away one level of structure (i.e., returns the indicated element without any wrapping).
• Use `list('name' = value, ...)` to name the elements of a list.
• Use either `L['name']` or `L\$name` to access elements by name.
• Use back-quotes around the name with `\$` notation if the name is not a legal R variable name.
• Use `matrix(values, nrow = N)` to create a matrix with `N` rows containing the given values.
• Use `m[i, j]` to get the value at the i’th row and j’th column of a matrix.
• Use `m[i,]` to get a vector containing the values in the i’th row of a matrix.
• Use `m[,j]` to get a vector containing the values in the j’th column of a matrix.

## F.3 Control Flow

• Use `for (loop_variable in collection){ ...body... }` to create a loop.
• Use `if (expression) { ...body... } else if (expression) { ...body... } else { ...body... }` to create conditionals.
• Expression conditions must have length 1; use `any(...)` and `all(...)` to collapse logical vectors to single values.
• Use `function(...arguments...) { ...body... }` to create a function.
• Use variable <- function(…arguments…) { …body… }` to create a function and give it a name.
• The body of a function can be a single expression or a block in curly braces.
• The last expression evaluated in a function is returned as its result.
• Use `return(expression)` to return a result early from a function.

## F.4 The Tidyverse

• `install.packages('name')` installs packages.
• `library(name)` (without quoting the name) loads a package.
• `library(tidyverse)` loads the entire collection of tidyverse libraries at once.
• `read_csv(filename)` reads CSV files that use the string ‘NA’ to represent missing values.
• `read_csv` infers each column’s data types based on the first thousand values it reads.
• A tibble is the tidyverse’s version of a data frame, which represents tabular data.
• `head(tibble)` and `tail(tibble)` inspect the first and last few rows of a tibble.
• `summary(tibble)` displays a summary of a tibble’s structure and values.
• `tibble\$column` selects a column from a tibble, returning a vector as a result.
• `tibble['column']` selects a column from a tibble, returning a tibble as a result.
• `tibble[,c]` selects column `c` from a tibble, returning a tibble as a result.
• `tibble[r,]` selects row `r` from a tibble, returning a tibble as a result.
• Use ranges and logical vectors as indices to select multiple rows/columns or specific rows/columns from a tibble.
• `tibble[[c]]` selects column `c` from a tibble, returning a vector as a result.
• `min(...)`, `mean(...)`, `max(...)`, and `std(...)` calculates the minimum, mean, maximum, and standard deviation of data.
• These aggregate functions include `NA`s in their calculations, and so will produce `NA` if the input data contains any.
• Use `func(data, na.rm = TRUE)` to remove `NA`s from data before calculations are done (but make sure this is statistically justified).
• `filter(tibble, condition)` selects rows from a tibble that pass a logical test on their values.
• `arrange(tibble, column)` or `arrange(desc(column))` arrange rows according to values in a column (the latter in descending order).
• `select(tibble, column, column, ...)` selects columns from a tibble.
• `select(tibble, -column)` selects out a column from a tibble.
• `mutate(tibble, name = expression, name = expression, ...)` adds new columns to a tibble using values from existing columns.
• `group_by(tibble, column, column, ...)` groups rows that have the same values in the specified columns.
• `summarize(tibble, name = expression, name = expression)` aggregates tibble values (by groups if the rows have been grouped).
• `tibble %>% function(arguments)` performs the same operation as `function(tibble, arguments)`.
• Use `%>%` to create pipelines in which the left side of each `%>%` becomes the first argument of the next stage.

## F.5 Cleaning Up Data

• Develop data-cleaning scripts one step at a time, checking intermediate results carefully.
• Use `read_csv` to read CSV-formatted tabular data into a tibble.
• Use the `skip` and `na` parameters of `read_csv` to skip rows and interpret certain values as `NA`.
• Use `str_replace` to replace portions of strings that match patterns with new strings.
• Use `is.numeric` to test if a value is a number and `as.numeric` to convert it to a number.
• Use `map` to apply a function to every element of a vector in turn.
• Use `map_dfc` and `map_dfr` to map functions across the columns and rows of a tibble.
• Pre-allocate storage in a list for each result from a loop and fill it in rather than repeatedly extending the list.

## F.6 Testing and Error Handling

• Operations signal conditions in R when errors occur.
• The three built-in levels of conditions are messages, warnings, and errors.
• Programs can signal these themselves using the functions `message`, `warning`, and `stop`.
• Operations can be placed in a call to the function `try` to suppress errors, but this is a bad idea.
• Operations can be placed in a call to the function `tryCatch` to handle errors.
• Use testthat to write unit tests for R.
• Put unit tests for an R package in the `tests/testthat` directory.
• Put tests in files called `test_group.R` and call them `test_something`.
• Use `test_dir` to run tests from a particular that match a pattern.
• Write tests for data transformation steps as well as library functions.

## F.7 Non-Standard Evaluation

• R uses lazy evaluation: expressions are evaluated when their values are needed, not before.
• Use `expr` to create an expression without evaluating it.
• Use `eval` to evaluate an expression in the context of some data.
• Use `enquo` to create a quosure containing an unevaluated expression and its environment.
• Use `quo_get_expr` to get the expression out of a quosure.
• Use `!!` to splice the expression in a quosure into a function call.

## F.8 Object-Oriented Programming

• S3 is the most commonly used object-oriented programming system in R.
• Every object can store metadata about itself in attributes, which are set and queried with `attr`.
• The `dim` attribute stores the dimensions of a matrix (which is physically stored as a vector).
• The `class` attribute of an object defines its class or classes (it may have several character entries).
• When `F(X, ...)` is called, and `X` has class `C`, R looks for a function called `F.C` (the `.` is just a naming convention).
• If an object has multiple classes in its `class` attribute, R looks for a corresponding method for each in turn.
• Every user defined class `C` should have functions `new_C` (to create it), `validate_C` (to validate its integrity), and `C` (to create and validate).

## F.9 Intellectual Debt

• Don’t use `setwd`.
• The formula operator `~` delays evaluation of its operand or operands.
• `~` was created to allow users to pass formulas into functions, but is used more generally to delay evaluation.
• Some tidyverse functions define `.` to be the whole data, `.x` and `.y` to be the first and second arguments, and `..N` to be the N’th argument.
• These convenience parameters are primarily used when the data being passed to a pipelined function needs to go somewhere other than in the first parameter’s slot.
• ‘Copy-on-modify’ means that data is aliased until something attempts to modify it, at which point it duplicated, so that data always appears to be unchanged.

## F.10 Projects

• An R package can contain code, data, and documentation.
• R code is distributed as compiled bytecode in packages, not as source.
• R packages are almost always distributed through CRAN, the Comprehensive R Archive Network.
• Most of a project’s metadata goes in a file called `DESCRIPTION`.
• Metadata related to imports and exports goes in a file called `NAMESPACE`.
• Add patterns to a file called `.Rbuildignore` to ignore files or directories when building a project.
• All source code for a package must go in the `R` sub-directory.
• `library` calls in a package’s source code will not be executed as the package is loaded after distribution.
• Data can be included in a package by putting it in the `data` sub-directory.
• Data must be in `.rda` format in order to be loaded as part of a package.
• Data in other formats can be put in the `inst/extdata` directory, and will be installed when the package is installed.
• Add comments starting with `#'` to an R file to document functions.
• Use roxygen2 to extract these comments to create manual pages in the `man` directory.
• Use `@export` directives in roxygen2 comment blocks to make functions visible outside a package.
• Add required libraries to the `Imports` section of the `DESCRIPTION` file to indicate that your package depends on them.
• Use `package::function` to access externally-defined functions inside a package.
• Alternatively, add `@import` directives to roxygen2 comment blocks to make external functions available inside the package.
• Import `.data` from `rlang` and use `.data\$column` to refer to columns instead of using bare column names.
• Create a file called `R/package.R` and document `NULL` to document the package as a whole.
• Create a file called `R/dataset.R` and document the string `‘dataset’` to document a dataset.

## F.11 Web Applications with Shiny

• Every Shiny application has a user interface, a server, and a call to `shinyApp` that connects them.
• Every Shiny application must be in its own directory.
• Images and other static assets must be in that directory’s `www` sub-directory.
• The `inputId` and `outputId` attributes of UI elements are used to refer to them from the server.
• Use `input\$name` and `output\$name` in the server to refer to UI elements.
• Code placed at the top of the script outside functions is run once when the app launches.
• Code placed inside `server` is run once for each user.
• Code placed inside a handler is run once on each change.
• A reactive variable is a function whose value changes automatically whenever anything it depends on changes.
• Use `reactive({...})` to create a reactive variable explicitly.
• The server can change UI elements via the `session` variable.
• Use `uiOutput` and `renderUI` to (re-)create UI elements as needed in order to break circular dependencies.

## F.12 Reticulate

• The `reticulate` library allows R programs to access data in Python programs and vice versa.
• Use `py.whatever` to access a top-level Python variable from R.
• Use `r.whatever` to access a top-level R definition from Python.
• R is always indexed from 1 (even in Python) and Python is always indexed from 0 (even in R).
• Numbers in R are floating point by default, so use a trailing ‘L’ to force a value to be an integer.
• A Python script run from an R session believes it is the main script, i.e., `__name__` is `'__main__'` inside the Python script.