Simple Simulations
- Start with a few simple simulations to illustrate SimPy
- The framework relies on generators, which are briefly summarized in this appendix
A Regular Schedule
- Shae works on their hobby project for 50 minutes and then takes a 10-minute break
- Import
Environment from simpy
- Define a generator to simulate work
env.timeout(duration) creates an object that means "wait this long"
yield suspends the worker and gives this object to SimPy
- SimPy advances its clock (does not actually "wait")
T_WORK = 50
T_BREAK = 10
def worker(env):
while True:
print(f"start work at {env.now}")
yield env.timeout(T_WORK)
print(f"start break at {env.now}")
yield env.timeout(T_BREAK)
- To make this work:
- Create an
Environment
- Call
worker to create a generator object
- Does not execute the function (yet)
- Pass in the environment so the generator can call
env.timeout
- Tell the environment how long to run
T_MORNING = 4 * 60
if __name__ == "__main__":
env = Environment()
proc = worker(env)
env.process(proc)
env.run(until=T_MORNING)
print(f"done at {env.now}")
start work at 0
start break at 50
start work at 60
start break at 110
start work at 120
start break at 170
start work at 180
start break at 230
done at 240
- FIXME: diagram of execution
env.timeout() doesn't actually pause the process
- It creates an object telling the framework how long the process wants to pause
yield gives this object to the framework
- Co-operative concurrency
- Completely deterministic unless we deliberately introduce randomness
Introducing Randomness
- Shae starts at the same time but works for variable intervals
- So we have to decide how to model those intervals
- Uniform random distribution is unrealistic, but simple
T_MIN_WORK = 10
T_MAX_WORK = 50
def rand_work():
return random.uniform(T_MIN_WORK, T_MAX_WORK)
def worker(env):
while True:
print(f"start work at {env.now}")
yield env.timeout(rand_work()) # changed
print(f"start break at {env.now}")
yield env.timeout(T_BREAK)
start work at 0
start break at 26.67250326533211
start work at 36.67250326533211
start break at 54.79344793494629
start work at 64.7934479349463
start break at 104.87029408376178
start work at 114.87029408376178
start break at 136.94211951464072
start work at 146.94211951464072
start break at 193.03234415437063
start work at 203.03234415437063
done at 240
- Is this working correctly?
- Hard for us to make sense of the output
Monitoring
- Round the time to
PREC decimal places
- Record start and end events in a list of dictionaries
- Easy to convert to a dataframe
def worker(env, log):
while True:
log.append({"event": "start", "time": env.rnow})
yield env.timeout(rand_work())
log.append({"event": "end", "time": env.rnow})
yield env.timeout(T_BREAK)
- Move main body into a function
- Initialize random number generation for reproducibility
- Testing and debugging are very difficult otherwise
- Output a structured log as JSON
SEED = 12345
def main():
seed = int(sys.argv[1]) if len(sys.argv) > 1 else SEED
random.seed(seed)
env = Environment()
log = []
proc = worker(env, log)
env.process(proc)
env.run(until=T_MORNING)
json.dump(log, sys.stdout, indent=2)
[
{"event": "start", "time": 0},
{"event": "end", "time": 26.665},
…more events…
{"event": "end", "time": 237.149}
]
Visualization
- Could put the visualization in the simulation
- But may want to visualize the data in many different ways
import json
import plotly.express as px
import polars as pl
import sys
df = pl.from_dicts(json.load(sys.stdin)).with_columns(
pl.when(pl.col("event") == "start").then(1).otherwise(-1).alias("delta")
)
events = df.with_columns(
pl.col("delta").cum_sum().alias("state")
)
fig = px.line(events, x="time", y="state", line_shape="hv")
fig.update_layout(margin={"l": 0, "r": 0, "t": 0, "b": 0}).update_yaxes(tickvals=[0, 1])
fig.write_image(sys.argv[1])
- The visualization code is 2/3 the size of the simulation
- Understanding a simulation is as much work as building it
Managers and Programmers
- Shae's manager gives them work
- Need a process to generate tasks
- And a way for the manager to give them to Shae
- Use
simpy.Store to model a job queue
- First-in, first-out
- Infinite capacity (for now)
- FIXME: diagram
- Setup
T_JOB_ARRIVAL = (20, 30)
T_WORK = (10, 50)
PREC = 3
def rt(env):
return round(env.now, PREC)
def rand_job_arrival():
return random.uniform(*T_JOB_ARRIVAL)
def rand_work():
return random.uniform(*T_WORK)
- Manager puts things in queue
- As with
env.timeout,
queue.put creates an object that we yield to the framework
- Then wait a random interval before creating the next job
- Use
itertools.count and next() to generate a sequence of integer job IDs
def manager(env, queue, log):
job_id = count()
while True:
log.append({"time": rt(env), "id": "manager", "event": "create job"})
yield queue.put(next(job_id))
yield env.timeout(rand_job_arrival())
- Programmer takes jobs from the queue in order
- The logging code makes this harder to read
def programmer(env, queue, log):
while True:
log.append({"time": rt(env), "id": "worker", "event": "start wait"})
job = yield queue.get()
log.append({"time": rt(env), "id": "worker", "event": "start work"})
yield env.timeout(rand_work())
log.append({"time": rt(env), "id": "worker", "event": "end work"})
- Main program sets things up, runs the simulation, and displays the log
T_SIM = 100
SEED = 12345
def main():
seed = int(sys.argv[1]) if len(sys.argv) > 1 else SEED
random.seed(seed)
env = Environment()
queue = Store(env)
log = []
env.process(manager(env, queue, log))
env.process(programmer(env, queue, log))
env.run(until=T_SIM)
json.dump(log, sys.stdout, indent=2)
[
{"time": 0, "id": "manager", "event": "create job"},
{"time": 0, "id": "worker", "event": "start wait"},
{"time": 0, "id": "worker", "event": "start work"},
{"time": 10.407, "id": "worker", "event": "end work"},
{"time": 10.407, "id": "worker", "event": "start wait"},
…more events…
{"time": 92.57, "id": "worker", "event": "start wait"}
]
Interesting Questions
- Throughput: how many jobs get done per unit time?
- Delay: how long from job creation to job completion?
- Utilization: how busy is the programmer?
- We need better data collection
- So define a class
Job
- Every job has a unique ID
Job stores the log of job-related events
class Job:
_id = count()
_log = []
def __init__(self, env):
self.env = env
self.id = next(Job._id)
self.log("created")
def log(self, message):
Job._log.append({"time": rt(self.env), "id": self.id, "event": message})
def manager(env, queue):
while True:
yield queue.put(Job())
yield env.timeout(rand_job_arrival())
def programmer(env, queue):
while True:
job = yield queue.get()
job.log("start")
yield env.timeout(rand_work())
job.log("end")
- Save the parameters in the output for reproducibility
def main():
…as before…
params = {
"seed": seed,
"t_sim": t_sim,
"t_job_arrival": T_JOB_ARRIVAL,
"t_work": T_WORK,
}
result = {
"params": params,
"tasks": Job._log,
}
json.dump(result, sys.stdout, indent=2)
| id |
created |
start |
end |
| 0 |
0.0 |
0.0 |
10.407 |
| 1 |
18.332 |
18.332 |
40.278 |
| 2 |
44.837 |
44.837 |
62.583 |
| 3 |
62.205 |
62.583 |
79.05 |
| 4 |
83.525 |
83.525 |
null |
| 5 |
96.01 |
null |
null |
- Throughput: number of completed tasks over simulation time
- Delay: average of end time minus creation time for completed tasks
- Programmer utilization
- Drop tasks that haven't been started
- Fill
null end values with simulation time
- Sum the differences between end and start, and divide by simulation time
| metric |
value |
| throughput |
0.040 |
| delay |
16.736 |
| utilization |
0.830 |
- But this is only for 100 ticks
- How do these figures change as we run the simulation for longer periods?
| duration |
throughput |
delay |
utilization |
| 100 |
0.040 |
16.736 |
0.830 |
| 1000 |
0.038 |
76.904 |
0.961 |
| 10000 |
0.034 |
1537.582 |
0.996 |
| 100000 |
0.033 |
16412.712 |
1.000 |
- Throughput and utilization stabilize, delay steadily increases
- The manager is creating work faster than the programmer can do it
- So the programmer is busy all the time
- Which is the limit on throughput