Logging and Auditing

Add structured logging using structlog
Old-fashioned logging generates human-readable text:

2025-03-13 14:30:21 INFO: File 'example.txt' uploaded successfully

Structured logging writes data objects:

{
  "timestamp": "2025-03-13T14:30:21.123Z",
  "level": "info",
  "event": "file_uploaded",
  "filename": "example.txt",
  "file_size_bytes": 12345,
  "content_type": "text/plain",
  "ip_address": "192.168.1.1",
  "user_agent": "Mozilla/5.0"
}

Benefits:
- Logs can be easily queried and analyzed by code (no fancy regular expressions)
- Every log event has the same fields
- More metadata can be included with each event
- Log entries can be rendered in different formats
Modify server.py
- Logger configuration: set up structlog with appropriate processors
- Context binding: add context data to loggers
- Event-based logging: use specific event names rather than free-form messages
- Exception handling: capture structured exception information
- Standard fields: include common fields like timestamps
Specific enhancements
- Log all file operations (upload, delete, listing)
- Add request metadata (IP address, user agent)
- Include ile metadata (size, content type)
- Provide detailed error information
- Create a log viewer page
Best practices
- Log specific events rather than messages
- Use field names consistently
- Add relevant metadata to each log entry (e.g., timestamps)
- Never log sensitive data like passwords, tokens, or personal information
- Match log levels to the significance of events
Something we shouldn't have done
- Store all log messages in a global variable called LOG to display in the viewer page
In a production application we would:
- Configure structlog to output to a logging service or aggregator rather than just printing to the console
- Store logs in a database or dedicated log storage system rather than in memory
- Implement log rotation and retention policies
We create a new bound logger in each request handler rather than reusing a global bound logger because:
- Each HTTP request has unique context data (IP address, user agent, etc.) that should be captured in logs. Creating a fresh bound logger ensures each request gets its own context.
- Flask handles multiple requests concurrently, and reusing a global bound logger could lead to context data from one request leaking into logs for another request. (We actually run this example in single-threaded mode so that it's safe to append things to the LOGS variable, but as noted above, that's a bad practice.)
We use Flask's before_request handler to automatically log all incoming requests
- This ensures consistent handling and formatting of log events

@app.before_request
def log_request():
    """Log all incoming requests."""
    if not request.path.startswith('/static'):
        logger.info("request", 
                  method=request.method,
                  path=request.path,
                  ip=request.remote_addr,
                  user_agent=request.headers.get('User-Agent'))