Monday, July 27, 2015

Week 7: data grunginess despite general progress

DSSG 2015 Paratransit group is making progress! We plugging away, making tons of Python scrips and iPython Notebooks for testing our various functions. The workflow is as follows (click for better resolution):



The main issue for the next week or so will be to assemble all of the individual layers into a cohesive, as-bug-free-as-possible main.py script that takes a handful of command-line arguments for parameters such as broken bus run ID, break-down time, and AWS access keys for acquiring the real-time bus scheduling data.

What I've realized through all of this: I'm glad we have a contact at King County Metro to explain the column headers from the rider data to us. The data is really all over the place, from missing latitude/longitude input to an "estimated time of arrival" column listing the time of day as "115000" seconds, i.e. hour ~32 of a day. Previously, we've just been eliminating these rides from the roster, as they're signs of outdated data capturing methods that KCM is in the process of eliminating. Currently, a big issue we're having is working with the 15-minute updated text files that we're receiving from KCM: the columns are stored according to variable space counts, and several columns that were previously combined into one are now space-delimited. We'll need to first organize these columns into a more useful .csv-type format and then QC the real-time data as we have been doing.

To-do: Given the insertion of passenger p on to bus X requiring the minimum additional travel time between any two feasible points (from a time-windows perspective) on bus X's original schedule, bus X will subsequently miss n-t additional time windows. Here, n is the number of missed time windows after p has been serviced by bus X, and t is the number of time windows that the bus is already scheduled to miss according to the quasi-real-time bus schedule stream. We know how much additional time that inserting p onto X will take according to the OSRM routing API, and therefore we can find the additional cost on a per service hour basis. That's great, but we still need to balance this pure costs against n-t, and the scale by which the n-t time windows are missed. For example, suppose inserting passenger p on to X will result in missing only one time window by 1500 seconds (25 minutes), but that inserting passenger p on to Y will result in 4 missed time windows that are only missed by about 200 seconds each; which one is better? That's up to us!


Sunday, July 12, 2015

The algorithm: brute force, intern style

Coming into week 5, I'm pretty psyched.

Tomorrow we have a meeting with Dr. Randy LeVeque, UW Applied Math legend/Professor, and Wednesday he's leading the "Reproducibility Workshop" where we're spending a lot of time on file organization via Git - maybe I'll finally start climbing up this Git learning curve.

Last Friday, July 10th, we had a meeting with Matthew, our primary contact at King County Metro Access operations. Fortunately, our interpretation of the optimization problem was correct. The situation is still this: given an unscheduled request (or set of unscheduled requests), find the most appropriate run on which we can squeeze the new passenger. Compare the additional cost/damage to current schedule to the cost of sending a DEB (dispatched extra boarding?) bus out, or fulfill the ride with a taxi (given the passenger can even ride in a taxi).

We've done a decent amount of literature review (shout-out to Dr. Carl Henrik Hall for his 2013 paper "Improving Paratransit Scheduling Using Ruin and Recreate Methods" being the most relevant paper to our problem thus far). The best method for inserting unscheduled requests is simply called "best insertion," and it's mentioned in Hall's aforementioned paper:


                                             Lol. Deceptive use of the word "simply," eh?


It's just a brute-force algorithm, where you test Y/N "can this new passenger fit on bus K between nodes i and given everyone's time window constraints and the Access bus's capacity constraints (12 ambulatory spots or 3 wheelchairs)?"

On a regular weekday, the peak number of Access bus routes in operation is ~250. If a bus breaks down at a peak ridership time, are we going to test the insertion of b stranded passengers onto 250 routes? Well, hopefully not. My job for tomorrow is to write a reasonable distance cut-off function that will tell us which buses are simply too far away to be able to service the stranded passengers by the end of each passenger's acceptable pick-up or drop-off window.

Over all, we're planning on using the OSRM (Open Street Routing Machine) API to obtain the estimated travel time to leave normally-scheduled node i, arrive at unscheduled request node h, and make it back to normally-scheduled node j by the end of j's time window. If this is the case, then it's feasible to pick up unscheduled request at node h. We'll get a list of these feasible pick-up routes. The next question obviously becomes whether or not we can also feasibly drop off the unscheduled request at their drop-off location without breaking subsequent time windows. One of my top priorities is to minimize the number of API requests our solution will have to make, querying travel times between points. I think I've got a good methodology...

Iterating over all of the unscheduled requests, we'll get a list of wholly feasible bus routes that can accommodate each unscheduled request, and then we'll have to rank each feasible re-route. The ranking might get tricky, because technically all feasible routes will only extend each original bus route schedule by 2(time-window-size)-minutes, assuming that no passenger's time window constraints are broken. But, I anticipate that we might have to change our methodology to let feasible re-routes break time window constraints, in which case the ranking criterion would obviously rank feasible re-route according to who misses the fewest passenger's time windows, and by how much those windows are missed.

Then, there's the whole question of how to put this application into a feasible package... Django anyone?