## Category Archive

The following is a list of all entries from the Performance category.

## Scheduling algorithm in GAMS, AMPL, MFS and lp_solve

In researching scheduling problems, I came across Nathan Brixius’ blog post which utilised Microsoft solver foundation for a critical path project scheduling problem. This was a step in the right direction, but my problem was resource constrained without concern for the order in which tasks finished. After some further research I found Erwin Kalvelagen from the Amsterdam Optimization Modeling Group who had a relevant post on a conference scheduling problem that used GAMS to model a solution. My goal was to port this to MSF, which was made easier with Ewrin’s great tutorial on OML – the modelling language behind Microsoft’s Solver Foundation.

I started by porting the GAMS solution to AMPL – also a modelling language for linear optimization problems, and closer to MSF as far as syntax is concerned. One advantage GAMS has over AMPL (and indeed MSF) is its richer language syntax and support for sparse matrix’s – see this thread for a discussion on more differences. However AMPL is clean and so was a first step.

AMPL requres two files

1) the data file which defines ‘sets’ and parameter values associated with the domain:

set TASKS := T1 T2 T3 T4 T5; set PEOPLE := P1 P2 P3 P4; set SLOTS := S1 S2; param pref (tr): T1 T2 T3 T4 T5 := P1 1 0 2 0 0 P2 0 1 2 0 0 P3 0 0 1 2 0 P4 0 0 0 1 2 ; param: ntasks := T1 1 T2 1 T3 1 T4 1 T5 1 ; param: minp maxp := T1 1 5 T2 1 5 T3 1 5 T4 1 5 T5 1 5 ; param: nrooms := S1 5 S2 5 ;

2) The model file which declares constaints, and the objective function:

set TASKS;

set PEOPLE;

set SLOTS;

param pref {TASKS,PEOPLE} >= 0;

param nrooms {SLOTS} >= 0;

param ntasks {TASKS} >= 0;

param minp {TASKS} >= 0;

param maxp {i in TASKS} >= minp[i];

var x {t in TASKS, p in PEOPLE, s in SLOTS} binary;

var xts {t in TASKS, s in SLOTS} binary;

var xtp {t in TASKS, p in PEOPLE} binary;

subject to rule1 {t in TASKS, s in SLOTS}:

sum {p in PEOPLE} x[t,p,s] >= xts[t,s];

subject to rule2 {t in TASKS, p in PEOPLE, s in SLOTS}:

x[t,p,s] <= xts[t,s];
subject to rule3 {t in TASKS, p in PEOPLE}:
sum {s in SLOTS} x[t,p,s] == xtp[t,p];
subject to rule4 {t in TASKS}:
sum {s in SLOTS} xts[t,s] == ntasks[t];
subject to rule5 {t in TASKS}:
minp[t] <= sum {p in PEOPLE} xtp[t,p] <= maxp[t];
subject to rule6 {s in SLOTS}:
sum {t in TASKS} xts[t,s] <= nrooms[s];
subject to rule7 {p in PEOPLE, s in SLOTS}:
sum {t in TASKS} x[t,p,s] == 1;
maximize z: sum {t in TASKS, p in PEOPLE} pref[t,p] * xtp[t,p];
[/sourcecode]
This can be executed by running following commands inside ampl.exe console:
[sourcecode language='css']
ampl: model sched.mod;
ampl: data sched.dat;
ampl: solve;
MINOS 5.51: ignoring integrality of 70 variables
MINOS 5.51: optimal solution found.
42 iterations, objective 12
ampl: display x;
x [*,*,S1]
: P1 P2 P3 P4 :=
T1 1 0 0 0
T2 8.01124e-18 1 -8.01124e-18 0
T3 1.13258e-16 1.13258e-16 1.13258e-16 0
T4 0 0 1 1
T5 0 0 0 3.08286e-16
[*,*,S2]
: P1 P2 P3 P4 :=
T1 1.40275e-16 -1.40275e-16 1.40275e-16 0
T2 0 2.2428e-16 -8.40105e-26 0
T3 1 1 1 0
T4 -2.21125e-16 0 2.21125e-16 2.21125e-16
T5 2.89604e-16 0 -2.78275e-16 1
;
[/sourcecode]
Although in this case we are solving a binary linear optimization problem, AMPL treats all variables as double values - so you can interpret the small numbers in results as FALSE, and the 1's as TRUE. The 3d matrix x then represents the cross section of person(p1-p4) attending task(t1-t5) during slot (s1,s2). Which can more simply be represented as the following Schedule:

s1 | s2 | |

p1 | t1 | t3 |

p2 | t2 | t3 |

p3 | t4 | t3 |

p4 | t4 | t5 |

Now we are ready to port this same solution to MFS. Using the Excel 2007 foundation solver add-in, all data is defined as region linked to cells on a sheet, and the model is declare in OML. The syntax is similar defining the ‘sets’ and ‘decisions’ (variables) with constraints using Foreach to represent a sum.

Model[

Parameters[Sets,People,Tasks,Slots],

Parameters[Integers,NTasks[Tasks],MinPeople[Tasks],MaxPeople[Tasks],MaxRooms[Slots],Pref[People,Tasks]],

Decisions[Integers[0,1],x[Tasks, People, Slots], xts[Tasks, Slots], xtp[Tasks, People]],

Decisions[Reals,z],

Constraints[

// rule 1: x(t,p,s)=0 for all p => xts(t,s) = 0

Foreach[{t,Tasks}, {s,Slots}, Sum[{p,People},x[t,p,s]] >= xts[t,s]],

// rule 2: any x(t,p,s)=1 => xts(t,s) = 1

Foreach[{t,Tasks}, {p,People}, {s,Slots}, x[t,p,s] <= xts[t,s]],
// rule 3: p visits talk t in any s
Foreach[{t,Tasks}, {p,People}, Sum[{s,Slots},x[t,p,s]] == xtp[t,p]],
// rule 4: each talk happens exactly ntalk times
Foreach[{t,Tasks}, Sum[{s,Slots}, xts[t,s]] == NTasks[t]],
// rule 5: task must have between min&max people
Foreach[{t,Tasks}, Sum[{p,People}, xtp[t,p]] <= MaxPeople[t]],
Foreach[{t,Tasks}, Sum[{p,People}, xtp[t,p]] >= MinPeople[t]],

// rule 6: timeslot can only hold up to 5 talks

Foreach[{s,Slots}, Sum[{t,Tasks}, xts[t,s]] <= MaxRooms[s]],
// rule 7: p can only visit one talk in each time period
Foreach[{p,People}, {s,Slots}, Sum[{t,Tasks}, x[t,p,s]] == 1],
z == Sum[{t,Tasks},{p,People},Pref[p,t]*xtp[t,p]]
],
Goals[Maximize[z]]
]
[/sourcecode]
The following could also be solved directly in C# code, which will be an exercise for another post. I did however take the solution one more step further by creating a lp_solve implementation in the form of an '.lp' file which can be solved through the lp_solve IDE. This format is much more verbose as it doesn’t abstract the data, instead deals with the 70 binary variables for each cell of each matrix (x, xtp, and xts). This simple solution is solved in 20ms – faster then the other higher level libraries.

Download the source for solutions in GAMS, AMPL, MSF /w excel, and lp_solve.

## Cloudera building a business around open source Map Reduce

The heavy hitting ex-executives behind start up CloudEra are banking on a business based around Hadoop, the open source Map Reduce implementation with a distribution capable of running on Amazon’s EC2. Google is credited with popularising (inventing) Map Reduce and has been tuning its own implementation for many years. It gave insights into the origin and future research direction in a round table video last year.

Increasingly companies need to make sense of Terabytes or even Petabytes of data. This information is stored across many machines on many disks, and needs distributed algorithms for sifting through the data in any reasonable time. This is where Map-Reduce comes in.

Interestingly Microsoft has taken a step back from this direction when with deciding that its SDS offering should support standard ‘relational’ features, in effect turning the product into a hosted SQL Server cloud.

It has however been active in this research field. It released its functional programming language F# and it runs its ad serving on Dryad – a distributed execution software engine. DryadLINQ combines the power of this engine, with the simplicity of LINQ by creating a SQL-like execution plan for distributed processing, very cool!

Large scale distributed processing software typically runs on many low grade Linux servers running open source software so that licensing costs are kept low. However with the army of MS developers out there, there are companies springing up to provide software to make the most out of idle cycles on Windows boxes around the network. Manjrasoft a recent graduate from Melbourne University’s GridBus laboratory have released an Alpha of their Aneka software – a .NET Map Reduce implementation.

## i4o (Indexed LINQ) adds support for POCO

Aaron Erickson has released a new version of his i4o (Indexed LINQ). This great open source library dynamically hooks into the statically typed Expression Trees generated when querying with LINQ eg:

var nonIndexedQuery = from item in list where item.DemoStringNonIndexed == nonIndexedString select item; var resultNonIndexed = nonIndexedQuery.ToList();

The library now supports indexing Plain Old C# objects (POCOs) such as *PocoClass* in the following example. Attributes can still be applied to properties, but the new AddIndex method allows indexing objects that have no knowledge of i4o.

var sampleSize = 1000000;

var list = new IndexableCollection

for (int i = 0; i < sampleSize; i++)
list.Add(new PocoClass() {
DemoStringNonIndexed = Guid.NewGuid().ToString(),
DemoStringIndexed = Guid.NewGuid().ToString()
} );
list.AddIndex("DemoStringIndexed");
[/sourcecode]
The biggest hit when using i4o is loading the data into the collection, then applying the index. After that querying is markedly improved, in this example 85x faster then traditional select.

load 3725ms index 3754ms nonIndexedStopwatch 175ms indexedStopwatch 2ms

This makes this an ideal library to be used in caching scenarios. I wonder if it could be plugged into Microsoft’s new velocity distributed cache!