# Key Points

## Values and Vectors

• Use print(expression) to print the value of a single expression.
• Variable names may include letters, digits, ., and _, but . should be avoided, as it sometimes has special meaning.
• R’s atomic data types include logical, integer, double (also called numeric), and character.
• R stores collections in homogeneous vectors of atomic types, or in heterogeneous lists.
• ‘Scalars’ in R are actually vectors of length 1.
• Vectors and lists are created using the function c(...).
• Vector indices from 1 to length(vector) select single elements.
• Negative indices to vectors deselect elements from the result.
• The index 0 on its own selects no elements, creating a vector or list of length 0.
• The expression low:high creates the vector of integers from low to high inclusive.
• Subscripting a vector with a vector of numbers selects the elements at those locations (possibly with repeats).
• Subscripting a vector with a vector of logicals selects elements where the indexing vector is TRUE.
• Values from short vectors (such as ‘scalars’) are repeated to match the lengths of longer vectors.
• The special value NA represents missing values, and (almost all) operations involving NA produce NA.
• The special values NULL represents a nonexistent vector, which is not the same as a vector of length 0.

...visit page

## Indexing

• A list is a heterogeneous vector capable of storing values of any type (including other lists).
• Indexing with [ returns a structure of the same type as the structure being indexed (e.g., returns a list when applied to a list).
• Indexing with [[ strips away one level of structure (i.e., returns the indicated element without any wrapping).
• Use list('name' = value, ...) to name the elements of a list.
• Use either L['name'] or L$name to access elements by name. • Use back-quotes around the name with $ notation if the name is not a legal R variable name.
• Use matrix(values, nrow = N) to create a matrix with N rows containing the given values.
• Use m[i, j] to get the value at the i’th row and j’th column of a matrix.
• Use m[i,] to get a vector containing the values in the i’th row of a matrix.
• Use m[,j] to get a vector containing the values in the j’th column of a matrix.

...visit page

## Control Flow

• Use for (loop_variable in collection){ ...body... } to create a loop.
• Use if (expression) { ...body... } else if (expression) { ...body... } else { ...body... } to create conditionals.
• Expression conditions must have length 1; use any(...) and all(...) to collapse logical vectors to single values.
• Use function(...arguments...) { ...body... } to create a function.
• Use variable <- function(…arguments…) { …body… } to create a function and give it a name.
• The body of a function can be a single expression or a block in curly braces.
• The last expression evaluated in a function is returned as its result.
• Use return(expression) to return a result early from a function.

...visit page

## The Tidyverse

• install.packages('name') installs packages.
• library(name) (without quoting the name) loads a package.
• library(tidyverse) loads the entire collection of tidyverse libraries at once.
• read_csv(filename) reads CSV files that use the string ‘NA’ to represent missing values.
• read_csv infers each column’s data types based on the first thousand values it reads.
• A tibble is the tidyverse’s version of a data frame, which represents tabular data.
• head(tibble) and tail(tibble) inspect the first and last few rows of a tibble.
• summary(tibble) displays a summary of a tibble’s structure and values.
• tibble$column selects a column from a tibble, returning a vector as a result. • tibble['column'] selects a column from a tibble, returning a tibble as a result. • tibble[,c] selects column c from a tibble, returning a tibble as a result. • tibble[r,] selects row r from a tibble, returning a tibble as a result. • Use ranges and logical vectors as indices to select multiple rows/columns or specific rows/columns from a tibble. • tibble[[c]] selects column c from a tibble, returning a vector as a result. • min(...), mean(...), max(...), and std(...) calculates the minimum, mean, maximum, and standard deviation of data. • These aggregate functions include NAs in their calculations, and so will produce NA if the input data contains any. • Use func(data, na.rm = TRUE) to remove NAs from data before calculations are done (but make sure this is statistically justified). • filter(tibble, condition) selects rows from a tibble that pass a logical test on their values. • arrange(tibble, column) or arrange(desc(column)) arrange rows according to values in a column (the latter in descending order). • select(tibble, column, column, ...) selects columns from a tibble. • select(tibble, -column) selects out a column from a tibble. • mutate(tibble, name = expression, name = expression, ...) adds new columns to a tibble using values from existing columns. • group_by(tibble, column, column, ...) groups rows that have the same values in the specified columns. • summarize(tibble, name = expression, name = expression) aggregates tibble values (by groups if the rows have been grouped). • tibble %>% function(arguments) performs the same operation as function(tibble, arguments). • Use %>% to create pipelines in which the left side of each %>% becomes the first argument of the next stage. ...visit page ## Cleaning Up Data • Develop data-cleaning scripts one step at a time, checking intermediate results carefully. • Use read_csv to read CSV-formatted tabular data into a tibble. • Use the skip and na parameters of read_csv to skip rows and interpret certain values as NA. • Use str_replace to replace portions of strings that match patterns with new strings. • Use is.numeric to test if a value is a number and as.numeric to convert it to a number. • Use map to apply a function to every element of a vector in turn. • Use map_dfc and map_dfr to map functions across the columns and rows of a tibble. • Pre-allocate storage in a list for each result from a loop and fill it in rather than repeatedly extending the list. ...visit page ## Non-Standard Evaluation • R uses lazy evaluation: expressions are evaluated when their values are needed, not before. • Use expr to create an expression without evaluating it. • Use eval to evaluate an expression in the context of some data. • Use enquo to create a quosure containing an unevaluated expression and its environment. • Use quo_get_expr to get the expression out of a quosure. • Use !! to splice the expression in a quosure into a function call. ...visit page ## Handling Errors • Operations signal conditions in R when errors occur. • The three built-in levels of conditions are messages, warnings, and errors. • Programs can signal these themselves using the functions message, warning, and stop. • Operations can be placed in a call to the function try to suppress errors, but this is a bad idea. • Operations can be placed in a call to the function tryCatch to handle errors. ...visit page ## Object-Oriented Programming • S3 is the most commonly used object-oriented programming system in R. • Every object can store metadata about itself in attributes, which are set and queried with attr. • The dim attribute stores the dimensions of a matrix (which is physically stored as a vector). • The class attribute of an object defines its class or classes (it may have several character entries). • When F(X, ...) is called, and X has class C, R looks for a function called F.C (the . is just a naming convention). • If an object has multiple classes in its class attribute, R looks for a corresponding method for each in turn. • Every user defined class C should have functions new_C (to create it), validate_C (to validate its integrity), and C (to create and validate). ...visit page ## Intellectual Debt • Don’t use setwd. • The formula operator ~ delays evaluation of its operand or operands. • ~ was created to allow users to pass formulas into functions, but is used more generally to delay evaluation. • Some tidyverse functions define . to be the whole data, .x and .y to be the first and second arguments, and ..N to be the N’th argument. • These convenience parameters are primarily used when the data being passed to a pipelined function needs to go somewhere other than in the first parameter’s slot. • ‘Copy-on-modify’ means that data is aliased until something attempts to modify it, at which point it duplicated, so that data always appears to be unchanged. ...visit page ## Projects • An R package can contain code, data, and documentation. • R code is distributed as compiled bytecode in packages, not as source. • R packages are almost always distributed through CRAN, the Comprehensive R Archive Network. • Most of a project’s metadata goes in a file called DESCRIPTION. • Metadata related to imports and exports goes in a file called NAMESPACE. • Add patterns to a file called .Rbuildignore to ignore files or directories when building a project. • All source code for a package must go in the R sub-directory. • library calls in a package’s source code will not be executed as the package is loaded after distribution. • Data can be included in a package by putting it in the data sub-directory. • Data must be in .rda format in order to be loaded as part of a package. • Data in other formats can be put in the inst/extdata directory, and will be installed when the package is installed. • Add comments starting with #' to an R file to document functions. • Use roxygen2 to extract these comments to create manual pages in the man directory. • Use @export directives in roxygen2 comment blocks to make functions visible outside a package. • Add required libraries to the Imports section of the DESCRIPTION file to indicate that your package depends on them. • Use package::function to access externally-defined functions inside a package. • Alternatively, add @import directives to roxygen2 comment blocks to make external functions available inside the package. • Import .data from rlang and use .data$column to refer to columns instead of using bare column names.
• Create a file called R/package.R and document NULL to document the package as a whole.
• Create a file called R/dataset.R and document the string 'dataset' to document a dataset.

...visit page

## Testing

• Use testthat to write unit tests for R.
• Put unit tests for an R package in the tests/testthat directory.
• Put tests in files called test_group.R and call them test_something.
• Use test_dir` to run tests from a particular that match a pattern.
• Write tests for data transformation steps as well as library functions.

...visit page

• FIXME

...visit page

• FIXME

...visit page