[Pipet Devel] BL->PL design

Sat Sep 9 07:16:42 EDT 2000

Hi All,

here the irc log between Jean-Marc and me. It's about the BL->PL design. There's
might be info missing because there's
a lot discussed via email. JM offered to write a document about the design us
two are happy about.

<jarl>  ok, I think even on the bandwidth cq network delay issue we agree.
<jm> My point is that the user will know where the network bottlenecks will be,
the BL won't
<jarl> somewhere is a turning point though, when 200% ram is requested for say,
the schedular tends to de-lacalisation
<jm> What do you mean?
<jarl> localisation should be the 1th thing the schedular should distribute..
<jarl> de-localisation = distribute node to remote machine
<jarl> but I would like to pute more logic into the schedular then just 'do what
the xml says'
<jm> Well, there are some issues that don't distribute well... if you consider a
network which will be network-bound and not CPU(or RAM)-bound
<jarl> jep..
<jarl> maybe we can weight total network execution time?
<jm> Finding a split that minimizes the network links is a bit like solving the
traveling salesman problem.
<jarl> traveling prob <- that's why I promote this non-localised methode
<jm> It's already an NP problem... if you try to distribute it, it just won't
work.
<jarl> but seems we're agreed on most... maybe only details now :)
<jarl> or do you see more issues not ok?
<jm> Well, I think deciding on how the splitting will be done is an important
issue.
<jarl> :)
<jarl> but I think you have want you need, or what you're missing?
<jarl> or see an obstacle?
<jm> what do you mean?
<jarl> the monitoring, the naming, the corba stuff
<jarl> hmm.. what more had been discussed?
<jarl> the scheduling... it's being optional
<jarl> you still see a need for a 'central BL' meganism?
<jm> By scheduling, you mean decide which node runs where, right (I called that
splitting)?
<jarl> jep, splitting = scheduling
<jm> Yes, I still think we really need a central scheduler.
<jarl> darn :)
<jarl> why?
<jm> It will already be hard enough to have a good scheduler... I don't see have
we can make a good, distributed scheduler
<jm> ...sorry "HOW we can make..."
<jarl> hmm..
<jm> the optimization has to be done globally...
<jm> You cannot just the BL's take what they want.
<jarl> I think it can be done
<jm> oops... "just LET the BL's take..."
<jm> not if you take into account the network. even if we assume all outputs
have the same size
<jm> This is equivalent to "solve the traveling salesman problem with
distributed machines that don't talk to each other"
<jarl> kinda hard to explain why I think very simple rules will do the trick
<jarl> I'll link nodes (their execution binary or xml text) to execution times..
when the local machine is free it will go there..
<jarl> such rules
<jarl> very simple rules that have local impact
<jarl> no 'all descibing' mathamatics
<jm> Can you explain me how?
<jm> forget my last line, it was supposed to go somewhere else!
<jarl> it wont be perfect, but it will fluctuate around the optimum wonce you
introduce enough weighted values into the algorithm
<jarl> execution time
<jarl> ram load
<jarl> cpu load
<jarl> bandwidth
<jarl> all such issue are simply calculated with constanf factors and indicate
where the node runs.
<jarl> local execution having the biggest multiplyer
<jm> I don't understand you objection to running the scheduler locally... Or
could we start with a local scheduler and when (or if) we need a distributed
one, we can write it?
<jarl> sure
<jarl> but I think this is usage of piper, should be some options in the UI for
defnining where and how you want to run
<jm> This is how I see the optimal (though not necessarly practical, but still)
scheduler:
<jm> (I'm assuming that all required resources are known)
<jm> You first need to have a function that, given the node
districution(scheduling) returns how long it would take to run.
<jarl> given on history?
<jm> Then you use an optimization technique (most likely genetic algorithms) to
find out there the global minimum of the function is. ie what's the scheling
that minimizes the total time it takes to run the "program"
<jm> This can only be done if the "evaluation function" knows where each node is
located.
<jarl> ok. so the BL that gets the xml data uploaded from the DL should do ALL
the scheduling and then distribute?
<jm> Yes. That's what I think.
<jarl> needs much cpu power..
<jarl> needs uploading of recources useage back to central host?
<jm> That's the only way to solve the problem. Not all problems distribute
well... otherwise, all the super-computers would have ween replaced by Beowolf
clusters...
<jarl> I think GP could do well, but not really real time.
<jm> not the resource usage, but the resources available on each system (or
maybe we're not talking about the same thing)
<jm> Well, the genitic algorithm part could distribute some crunching to the
other machines if it's that complicated... but the problem itself is resolved
locally.
<jm> Also, I think that if the system is too complex for a local scheduler to
run fast, it will totally confuse a distributed scheduler anyway.
<jarl> I think the question is changed a bit: what will we do 1st. I must admit
this GP distributor is quite nice, but I hope to see these rule-set based
version running too
<jm> I say we start with two modes for now: 1) the user does the scheduling and
2) local scheduling... later if we can try a distributed one if you really like.

<jm> rule based scheduling... how?
<jarl> confuse a distributed scheduler? all confusion is taken care of in
distribution of next network, so it will 'stabalize' around some situation
<jm> Confuse in the sense that the "solution" found by a distributed scheduler
will be so non-optimal...
<jm> ...what you might as well distributed at random
<jarl> hehe
<jarl> speak again once it's running :)
<jarl> ok, but at first we wont have an GP schedular, so maybe we should 1st
work on the 100% defnined version
<jm> 100% defnined version?
<jarl> and when all the part commnunicate we'll work on a more sofistaced one
<jm> I said the GP scheduler would be the optimal scheduler, but of course we
shouldn't start with that.
<jarl> 100% definied <- all node have a fixed location\machine
<jm> OK, 100%-defined first...
<jm> I think the best system would be an interactive scheduler. One that talks
to the UI and helps the user distributing.
<jarl> and also ok with this (very simple) PL main Corba thinggy?
<jarl> talks <- yer :)
<jm> The UI could say: "I think you're putting too much stuff on machine XYZ".
<jarl> would be very very sweet
<jarl> is this voice regoc ready for such?
<jm> This would give you the real optimal scheduler. (remember that the GP thing
was optimal if the system knew everything about required resuirces... the user
already has a better idea of what nodes do)
<jarl> ic
<jm> ...talk in the weak sense... the interactive scheduler communicates with
the UI
<jm> Do we agree on most of the system?
<jarl> think so
<jarl> for a while at least :-)
<jarl> it's just about what goes 1st I think
<jm> about the PL, did you agree if we called it Baby-BL and that it would link
to the PL? (in my e-mail)
<jarl> the part I called PL main Corba thinggy?
<jm> Something like that...
<jarl> ok, as long as you agree with using corba between BL and PL\BabyBL
<jm> Sure... the way the BL communicates with it's other parts is up to you. I
don't even know how to write an ftp client, so I'm not going to tell you what to
do for this part!
<jarl> L)
<jarl> but I'll need you input for this baby BL.
<jm> ...You may not like the naming... but the local scheduler is actually what
I was calling grandma-BL (after all, it's responsible for the brokering)
<jm> "I'll need you input for this baby BL" what do you mean?
<jarl> pointing me at the enrty point of the Overflow library
<jarl> how it can be wrapped best
<jm> Sure, I'll help. but it's quite easy to understand.
<jarl> ok
<jm> Do you have the Overflow (PL) code downloaded?
<jarl> it's something for later.. i'm currently buzy working on the C++ port of
BL
<jarl> downloaded <- yes
<jm> It's as simple as calling:
<jm> UIDocument *doc = new UIDocument(argv[1]);
<jm> doc->load();
<jm> doc->run(param);
<jm> This is the actual code in batchflow.cc that starts the processing...
<jarl> this run, is this only called inside the last node?
<jarl> ir is a UIDocument a complete network?
<jm> run calls getOutput() on the last node... and everything starts...
<jm> a UIDocument is the equivalent of an .m file in matlab.
<jm> However, when distributing, the big document will be split in several
smaller UIDocument. Each one will be sent to a Baby-BL
<jarl> yes
<jarl> does each UIDocument need some special nodes?
<jm> Not in overflow... but for Piper, we will need special nodes for all the
inputs of a (splitted) UIDocument. This special node will call a Baby-BL
function to get the result from another part of the program.
<jm> as in my A->B->C example, when we decide that B->C goes on the different
machine than A, we need to "rewrite" the part as S->B->C, where S is a special
node that will ask the BL to fetch the result from A.
<jarl> ic
<jarl> routing the lose ends true networks
<jm> ??
<jarl> lose ends <- end of node chain on one machine, a splitted network that's
located over multiple machines
<jarl> and the up- and downloading ofcourse
<jarl> hmm.. no, no uploading
<jarl> only result uplodaing
<jarl> darn
<jarl> sorry..
<jm> yes, the only "special node" required will be for result uploading
<jarl>  no downloading extra nodes , but nodes that transport the results back
<jarl> yes
<jm> ...well, I'm all mixed up with the up-down point of view... the special
node is at the input of the network.
<jarl> o, why there?
<jarl> nah, never mind
<jarl> but anyways, the DL will be suppying this extra node also?
<jm> the Baby-BL receives a getOutput CORBA "signal" (don't know how to call
it). it calls getOutput() on its PL network... some computation is done and the
special input node will call a (BBL) callback function, which will call
getOutput() on another BL.
<jarl> ok
<jarl> get it
<jm> Extra DL nodes... not really. There will be debugging/viewing nodes I don't
know if we should say they are DL, GUI, or Pl nodes...
<jarl> uhhh
<jarl> I meant to say was if the DL add this extra node into the xml
description?
<jm> However (I don't know whether you were refering to that), there will be a
communication system between the DL and the PL, so that nodes can send
warnings/errors to the user.
<jm> ...sorry, now I understand what you mean!...
<jarl> ok.. will this commenction go directly from DL to PL?
<jarl> conn...
<jarl> arf
<jm> I'm not sure whether it'll be the DL job, the scheduler's, or the BL...
<jarl> it's 3:40 now :(
<jm> I mean the DL/BL/... job to add the node to the document...
<jm> I think it should be the scheduler...
<jarl> ok
<jarl> BL
<jm> but I'll let you sleep now......
<jarl> hehe
<jarl> nah
<jarl> good idear
<jm> Do you mean you're still OK to discuss?
<jm> ...or the "nah" was for something else?
<jarl> sure, I just make much more spelling errors and bollocks talk at there
times :)
<jarl> "nah" was a out-lout thinking..
<jarl> but not yet
<jarl> sleeping that is
<jm> Anyway... who will add the "BL2PL" node is a detail.
<jarl> right
<jm> I was talking to Brad about the DL-PL communication system.
<jarl> I'm just asking some to get the picture of what the babyBL will be like
<jm> We though we could have "pseudo-packets" that could be transmitted by the
BL.
<jarl> sure
<jm> The Baby-BL will be quite simple.
<jarl> simple <- nice
<jm> The DL address will be simple: "DL" since there's only one.
<jm> A node address will be like "machine:BabyBL ID:node pointer"
<jarl> maybe we could use some sort of number system that's used Piper wide that
defines locations of nodes\dl's\etc?
<jm> the machine could be an IP of a name,
<jm> the BabyBL ID is used to discriminate many BBL's on the same machine.
<jm> Since the BBL lives in the same address space as the PL, node pointers are
OK.
<jarl> right
<jm> What sort of number system?
<jarl> something like this you're just describing..
<jarl> DL->BL we introduced this URI numbering I described in my email...
<jm> which e-mail?
<jarl> I dont really care how, as long as we use the same numbering
<jarl> uhh... mom..I'll copy paste
<jarl> #Define Piper URI id number (Univ. Relocation Identification, every Piper
part can be
<jarl> access by this number:
<jarl>     342.0.0    = BL or UI instance, I'm not sure about this one.
<jarl>     342.168.0 = Subnet (or user space) 168 on machine 342
<jarl>     342.168.4281 = Node 4281 inside network 168 on machine 342.
<jarl> just a scretch, neven really critised
<jm> Well, I think it's equivalent to what I was saying, right?
<jm> BTW... I've just thought about something you might not like to hear...
<jm> all this translation to C++ you've been doing is for what I call the
mother-BL (the one that does the authentication), right?
<jarl> yes
<jarl> originaly because it was to link to Overflow :(
<jm> ...if this part only communicates to the DL and the Baby-BL (thus the PL)
though CORBA, maybe the translation to C++ wasn't necessary (or am I just
completly lost)?
<jarl> hehe
<jarl> kinda sad :)
<jm> Is the translation complete or are you still having trouble with CORBA?
<jarl> but it's much cleaner alfter all
<jarl> to have one Orb library used and so
<jm> ...then you at least gain something.
<jarl> Brad helped my out with some examples
<jarl> I just finished the removal of the old corba code, and will do the new
ones starting tomorrow
<jm> ...so you're staying with C++? If so, have you started using the STL. If
it's not that hard, it would be nice to slowly replace glib by it.
<jarl> C++ COrba is much more simple as C bindings luckily
<jarl> if I had to go the other way around it woud be much worst
<jarl> nah, I havent started using any C++ feature.. I just worked hard on
letting the C++ compiler to shut up
<jarl> but I hope to learn
<jarl> while coding
<jm> I wouldn't like  to have to translate Overflow to C. I use (and need to)
just about all the C++ features that C doesn't have (templaces, virtual
functions, double inheritence, ...)!
<jarl> maybe a lot of old lines can be replaced with much better functions
<jarl> but for now I know C and related libraries... wont speed up things if I
was to do everything in advanced C++ yet
<jm> Sure... makes sense... when you know more and have some time it might be a
good idea
<jarl> jer, slowly we learn
<jm> When you're done with the C++ CORBA thing, we could start writing the
BabyBL together.
<jarl> ok, maybe you can code the logic \ skeleton, and I can code the corba
portals?
<jm> Sure, as I said, the way I see it so far, the non-CORBA part will probably
be around one hundred lines of code.

bye,
jarl