Exploring Scenarios

Terms defined: transition probability

Smoothing Over Multiple Runs

We have been showing results from individual runs
What do results look like when averaged over many runs?
Plot job ages vs. time averaged over 1, 10, 100, and 1000 simulations

ages from 1 run — Figure 1: Ages From 1 Run

ages smoothed over 10 runs — Figure 2: Ages Smoothed Over 10 Runs

ages smoothed over 100 runs — Figure 3: Ages Smoothed Over 100 Runs

ages smoothed over 1000 runs — Figure 4: Ages Smoothed Over 1000 Runs

Work is piling up
- Toward an asymptote or just slowing down?
We only get to experience one curve in real life

Choosing Jobs

Four policies:
- Oldest job first (same as regular queue)
- Newest job first
- Longest job first
- Shortest job first
Implement by:
- Replacing Store with PriorityStore
- Adding __lt__ to Job for comparison
- Actual comparison depends on policy

class Params:
    # …as before…
    policy: str = "shortest"

class Job(Recorder):
    # …as before…
    def __lt__(self, other):
        match self.sim.params.policy:
            case "oldest":
                return self.t_create < other.t_create
            case "newest":
                return other.t_create < self.t_create
            case "shortest":
                return self.duration < other.duration
            case "longest":
                return other.duration < self.duration
            case _:
                assert False, f"unknown policy {self.sim.params.policy}"

Look at effect on backlog over time

job backlog over time — Figure 5: Job backlog vs. time

job age over time — Figure 6: Age of jobs in queue vs. time

Table 1: Throughput
policy	t_sim	num_jobs	throughput
longest	1000	485	0.48
newest	1000	457	0.46
oldest	1000	497	0.5
shortest	1000	509	0.51

Table 2: Utilization
policy	t_sim	total_work	utilization
oldest	1000	934.48	0.93
shortest	1000	961.69	0.96
newest	1000	888.03	0.89
longest	1000	974.04	0.97

Multiple Workers Redoing Work

Our models so far have assumed that when a job is done, it's done
In real life, testing often reveals bugs that need rework
Start by modeling with a second queue and a group of testers

class Params:
    # …as before…
    n_tester: int = 1
    p_rework: float = 0.5

class Simulation(Environment):
    def __init__(self):
        # …as before…
        self.test_queue = None

    def simulate(self):
        # …as before…
        self.test_queue = Store(self)
        for _ in range(self.params.n_tester):
            self.process(Tester(self).run())

class Tester(Recorder):
    def __init__(self, sim):
        super().__init__(sim)
        self.t_work = 0

    def run(self):
        while True:
            job = yield self.sim.test_queue.get()
            yield self.sim.timeout(job.duration)
            if self.sim.rand_rework():
                yield self.sim.code_queue.put(job)
            else:
                job.t_complete = self.sim.now

Any tester can test any job
All jobs needing rework go back in the same queue as new work
- And are handled in arrival order, i.e., not given priority
But this isn't realistic
Give each Coder its own queue

class Coder(Recorder):
    def __init__(self, sim):
        # …as before…
        self.queue = Store(self.sim)

Have testers give work back to the coder who did the work
- Need to add a coder_id field to jobs to keep track of this

class Tester(Recorder):
    def run(self):
        while True:
            job = yield self.sim.test_queue.get()
            assert job.coder_id is not None
            yield self.sim.timeout(job.duration)
            if self.sim.rand_rework():
                yield self.sim.coders[job.coder_id].queue.put(job)
            else:
                job.t_complete = self.sim.now

Now the hard part: coders selecting jobs
Put it in a method of its own
And use yield from to yield its results back to SimPy

class Coder(Recorder):
    def run(self):
        while True:
            job = yield from self.get()
            yield self.sim.timeout(job.duration)
            yield self.sim.test_queue.put(job)

So how does Coder.get(…) work?
- Create but do not immediately yield a request for an item from each queue
- yield the combination of those items using | (a short form for simpy.AnyOf)
- result is dictionary-like object whose keys are the requests and whose values are the items that we got
- If we got something from our own queue, take that
- If not, take what we got from the shared queue
Important: we have to cancel the other request or we can't make another one from that queue
- Yes, this took a while to figure out
- And yes, it's safe to cancel a request that wasn't satisfied

    def get(self):
        new_req = self.sim.code_queue.get()
        rework_req = self.queue.get()
        result = yield (new_req | rework_req)
        if (len(result.events) == 2) or (rework_req in result):
            new_req.cancel()
            job = result[rework_req]
            assert job.coder_id == self.id
        else:
            rework_req.cancel()
            job = result[new_req]
            assert job.coder_id is None
            job.coder_id = self.id
        return job

We can now build a graph showing the transition probabilities for each state that a job might be in
- Introduce a state incomplete for jobs that aren't finished by the end of the simulation

transition probability graph — Figure 7: Transition Probability Graph