Database Foundations
Introducing SQLite for persistent, queryable storage
Concepts
- A relational database organizes data into tables
of rows and columns
- SQL is the language for creating tables, inserting rows, and querying data
- Explain the difference between a relational database and a Python list of dataclasses
- A primary key uniquely identifies each row
not nullprevents empty values in required fields- Column types (
integer,real,text) constrain what values may be stored - Write a SQL CREATE TABLE statement for snail measurements with a primary key, not null constraints, and appropriate column types
- SQL injection: building a query by concatenating
user-supplied strings lets an attacker embed SQL commands in their input
- parameterized queries (using
?placeholders) prevent this by treating all supplied values as data, never as SQL - Show me an example of SQL injection and explain how parameterized queries with ? placeholders prevent it
- parameterized queries (using
- A context manager (
with conn:) wraps database operations in a transaction- If the block completes normally the transaction commits
- If it raises an exception the transaction rolls back, leaving the database unchanged
- Explain what a database transaction is and what happens when an exception is raised inside a
with conn:block
- SQLite stores the entire database in a single file on disk
- Easy to share, back up, and version-control
- Appropriate for a research tool with a single writer, not for a high-traffic production service
- What are the limitations of SQLite compared to a full database server, and when is it appropriate for a research project?
- A row factory (
conn.row_factory = sqlite3.Row) makes each row accessible by column name rather than by numeric index- What does setting conn.row_factory = sqlite3.Row do, and why is it more convenient than the default?
Why a database?
- What a relational database gives us over a list of dataclasses:
persistence, querying, and concurrent access
- Why would I use a database instead of storing data as a CSV file or a list of Python objects?
- When SQLite is enough and when it is not
- What are the signs that a project has outgrown SQLite and needs a different database?
Creating a schema
- A
create tablestatement for snail measurements- Write a SQL CREATE TABLE statement for snail measurement records with appropriate column types and constraints
- Primary keys,
not nullconstraints, and column types- Explain primary keys and not null constraints in SQL with a concrete example
- Prompting the LLM to write the schema and checking it for mistakes
- What mistakes should I look for when reviewing a SQL schema generated by an LLM?
Opening a connection with sqlite3
- The standard library
sqlite3module- Show me how to open a SQLite database connection and create a table using Python's sqlite3 module
- Context managers for transactions
- How do I use a context manager to wrap database operations in a transaction with Python's sqlite3 module?
- Row factory for dict-like access by column name
- How do I configure sqlite3 to return rows as dictionaries keyed by column name?
Basic SQL
select,insert,where, andorder by- Show me basic SELECT, INSERT, WHERE, and ORDER BY SQL statements for a table of snail measurements
- Writing queries as strings with
?placeholders to avoid SQL injection- Rewrite this SQL query to use ? placeholders instead of string concatenation
- Reading error messages when queries fail
- What do common sqlite3 error messages mean and how do I fix them?
Inserting and querying records
- A helper function that inserts a dataclass instance
- Write a Python function that inserts a snail measurement dataclass instance into a SQLite table using a parameterized query
- A helper function that queries and returns all rows
- Write a Python function that queries all rows from a SQLite table and returns them as a list of dataclass instances
- Verifying the data round-trips correctly
- How do I write a simple test to verify that data inserted into a SQLite database can be read back correctly?
Check for Understanding
What is SQL injection, and why does using ? placeholders prevent it?
SQL injection happens when user input is concatenated directly into a query string.
For example, if site comes from a form and the query is
f"select * from snails where site = '{site}'",
a user who enters '; drop table snails; -- can destroy the database.
Parameterized queries pass the value separately from the SQL text;
the database driver escapes or quotes it automatically,
so it is always treated as a literal value, never as SQL syntax.
What happens to the database if your Python script raises an exception inside a with conn: block?
The with conn: context manager calls conn.rollback() when the block exits due to an exception,
undoing any changes made during the block.
The database returns to the state it was in before the block started,
so no partial or corrupted data is written.
Why does SQLite use a single file rather than a running server process, and what does that mean for concurrent access?
SQLite is an embedded database: the library runs inside your application process and reads and writes a single file directly. There is no separate server to start, configure, or connect to over a network. The trade-off is concurrency: SQLite allows multiple simultaneous readers but only one writer at a time, and locks the file for the duration of each write transaction. For a research tool with one or a few users this is fine; for a high-traffic site it is not.
What is the difference between a primary key and a unique constraint?
Both prevent duplicate values in a column.
primary key additionally declares that the column is the canonical identifier for each row;
a table can have at most one primary key,
and it implicitly adds not null.
unique can appear on multiple columns in the same table and allows null values
(though the behavior of multiple nulls varies by database).
Exercises
Add a sites table
Prompt the LLM to add a second table for survey sites with columns for name, latitude, and longitude. Then prompt it to add a foreign key from the snail measurements table that references the sites table. Look up what a foreign key constraint does and verify that the LLM's SQL enforces it correctly.
Write a query yourself
Without prompting the LLM, write a SQL query that counts the number of measurements per site. Run it in the Python REPL against a database that has data loaded. Then prompt the LLM to write the same query and compare the two versions. Note any differences in style or correctness.
Inspect the database
Use the sqlite3 command-line tool (sqlite3 path/to/database.db) to open the database.
Run .tables to list tables and .schema snails to display the table definition.
Describe what the schema tells you about the constraints on the data.
Handle a migration
Add a new observer: text column to the existing snail measurements table using alter table.
Before you run it,
prompt the LLM what value existing rows will have for the new column.
Run the migration and verify the answer is correct.
Discuss whether not null would have been a reasonable constraint for this new column.