Introduction
Terms defined: NoSQL database, database, database management system (DBMS), relational database management system (RDBMS), table
Scope
- intended audience
- Rachel has a degree in cell biology and works in a research hospital doing cell assays.
- She uses Excel to analyze experimental data, but is frustrated each time she has to ask one of her colleagues to pull data from the hospital database.
- Rachel would like to understand how data is stored and managed.
- Her work schedule is unpredictable and highly variable, so she needs to be able to learn a bit at a time.
- prerequisites
- Basic tabular data analysis: filtering rows, aggregating within groups
- learning outcomes
- Explain the difference between a database and a database manager.
- Write SQL to select, filter, sort, group, and aggregate data.
- Define tables and insert, update, and delete records.
- Describe different types of join and write queries that use them to combine data.
- Use windowing functions to operate on adjacent rows.
- Explain what transactions are and write queries that roll back when constraints are violated.
- Explain what triggers are and write SQL to create them.
- Manipulate JSON data using SQL.
- Interact with a database using Python directly, from a Jupyter notebook, and via an ORM.
Setup
- See instructions on the home page.
Background Concepts
- A database is a collection of data that can be searched and retrieved
- A database management system (DBMS) is a program that manages a particular kind of database
- Each DBMS stores data in its own way
- SQLite stores each database in a single file
- PostgreSQL spreads information across many files for higher performance
- DBMS can be a library embedded in other programs (SQLite) or a server (PostgreSQL)
- A relational database management system (RDBMS) stores data in tables
and uses SQL for queries
- Unfortunately, every RDBMS has its own dialect of SQL
- There are also NoSQL databases like MongoDB that don't use tables
Connecting to a Database
-
The
penguinsdatabase contains two tablespenguinsis all the Palmer Penguins datalittle_penguinsis a subset used in our first few queries to keep output readable
-
Using DB Browser for SQLite:
- Open the application
File…Open Database- Select
penguins.dbfrom the download
Command Line
- On the command line:
sqlite3 db/penguins.db
- Not actually a query: starts an interactive session with the database in
db/penguins.db - Alternative: provide a single query on the command line
sqlite3 database "query" - Or put query in file and run
sqlite3 database < filename
To disconnect from a command-line database session, type Control-D or
.quiton a line of its own. You may need to type a semi-colon;to close any unfinished query before SQLite will recognize your attempt to escape.