[Pipet Devel] IRC on Bl/Pl organization

Tue Sep 5 12:51:42 EDT 2000

Hey all;
	Jean-Marc, Jarl and I have been trying to work on thinking up
good plans for how to organize the BL and PL and get some good
connectivity going. Yesterday there was a big ol' IRC chat about this,
which is pasted below. It discusses the organization of the BL and how
this will interact with the PL. 

This discussion introduces the ideas of central "mother BLs" (MBLs) which
are central controlling BLs that will interact with the DL (but maybe not
directly). The mother BL can then have "Baby BLs" (BBLs) which will
directly interact with the PL. This is sort of a summary of what is going
on, but the actual structure is still under heated discussion, as you can
see from reading the log.

Part 2 of this discussion will occur today at 4:30pm EST. Anyone who is
interested is welcome to join in (or listen). The chats occur on ChatNet
(see www.chatnet.org for a list of servers) on the #piper channel
(naturally enough). See you there!

Brad 

-->	jm (~jm at cn-997.143-201-24.mtl.mc.videotron.net) has joined #piper
<bstard>	hi jean-mark
<jm>	I'm here!
<brad>	Hi Jean-Marc!
<jm>	Have you been talking for a long time?
<bstard>	got the email or the icq?
<jm>	e-mail... got no ICQ!
<bstard>	almost an hour already :)
<bstard>	icq sucks :(
<jm>	..oops I actually had the ICQ!
---	bstard gives channel operator status to jm
<brad>	talking <- mostly about compiling the BL and fun stuff like that
:-)
<jm>	And how's the C++ compiling going?
<bstard>	and I screatched a meganism for the work the BL should do
<jm>	Go ahead...
<bstard>	C++ port <- went smoothly until I met the C++ Corba
<bstard>	C++ corba is very diff. from C Corba.
<bstard>	0: DL logs into BL
<bstard>	1: DL uploads XML network to BL
<bstard>	3: BL checks XMS for resources
<bstard>	4: BL distributes nodes to BL instances agross internet
<bstard>	5: Acknowledges DL and PL that BL is ready and waiting 4
a go
<bstard>	6: {undefined yet}
<bstard>	step 4 will be the one that will undergo big
enchancements once we've the basics operational.
<bstard>	stuff like resource management etc will be runtime some
day in the future :)
<jm>	So wouldn't the BL involved in 3 and 4 be a "master" while the
distributed ones would be "slave"?
<bstard>	oops, forgot 2: BL checks XML for authorized nodes
<bstard>	master-slave: ok, way of looking at thing.
<bstard>	things
<bstard>	like you view it: master = local instance, slave =
remotely (on other machines) 
<bstard>	so a BL is master if it's distributing, and a slave once
it's receaving XML's
<jm>	Sort of, what I meant by "master" (may not be the best term) is
the part that has the "global view" while all the other BL instances
("slave") only see a small part of the problem.
<brad>	Yeah, that is the view I had of master/slave as well.
<bstard>	But once the network is running, the BL instances are
'same level' instances communicating results etc to each other. The
master-slave relation only exsists during the 'distribution phase'
<bstard>	jm: jep, that's the way I view the situation too. 
<brad>	No, but I think that things also need to be broken up so that you
can return information back to the DL, like in the "viewer" nodes.
<bstard>	slave = partially knowledge of the 'job', master has full
knowledge
<jm>	Well, I think there should be a master that is the only one
allowed to talk to the DL.
<bstard>	brad: what do you mean by 'no' ?
<brad>	no <- I mean, doesn't the master-slave situation also need to
exist on a local level?
<bstard>	jm: the DL logs into a BL, so only that DL can receive,
only that DL will be offered the results and can upload jobs
<jm>	Well, if one part of the processing is done locally, then there
should be a slave running on the local machine.
<bstard>	nah, why an extra thread\instance?
<bstard>	the way I see things, only 1 BL will run on a machine.
<bstard>	but try to convince me :)
<jm>	What I meant was that all when slaves want to communicate with
the UI, they should all send the info the to master. Then the master is
the only one allowed to talk to the DL.
<brad>	convince <- Jean-Marc is probably the best person to do this,
since it is his idea :-)
<bstard>	hehe
<jm>	I see the master as a library that's linked with the DL, so
there's one process for the DL and the master.
<bstard>	uhh, how do I put this.
<bstard>	See the DL as the master BL, that;s probably the best way
to look at the situation
<jm>	Also, it you have an SMP machine and want to run two part of the
processing on it, you should have two BL slaves running on this machine.
<bstard>	all the BL's do the work for THAT specific DL
<bstard>	they also do work for other DL's
<bstard>	jm: SMP will be covered by internal threading
<bstard>	which is already implemented for 50%..
<jm>	Why not just say one BL slave process for each part of the
processing?
<brad>	multiple DLs <- Jarl, this is confusing me. There will only be
one DL process per machine. A single process can handle multiple user
interfaces.
<bstard>	but.. I still can not see why you people advocate a
master-slave model. Try me again please..
<bstard>	jm: each part of the processing will be done by the PL?
So it seem more logical to thread multiple PL's
<bstard>	brad: did I say "Multiple DL's" ?? 
<bstard>	If I did, a mistake, sorry
<bstard>	Ofcourse we'll only need 1 DL/mache
<bstard>	machine
<bstard>	arf
<brad>	multiple dls <- Okay, sorry. I was just getting confused :-)
<jm>	I'm thinkint 1PL=1BL, so there's as much BLs as there are
processing parts.
<bstard>	brad: who aint :)
<bstard>	jm : good point.. 
<bstard>	they are gonna be statically linked
<bstard>	which kinda sucks regarding my views :-)
<jm>	What's going to be statically linked?
<bstard>	Overlow as lib to the BL
<bstard>	or will we use .so
<bstard>	?
<jm>	Well, it "has" to be dynamically linked
<bstard>	ic
<bstard>	why?
<jm>	Overflow is meant to work dynamically, since the toolboxes are
dynamic libraries that are loaded on startup.
<bstard>	ok, no problem
<bstard>	and Overflow can only execute 1 network at a time?
<jm>	Basically, when Overflow starts, it looks for all the .so files
in a certain path and loads them all. Each .so contains code for a bunch
of nodes. That way, you can add nodes without recompiling.
<bstard>	ic, that's the ways GMS works with plugins too
<jm>	Well, Overflow is just a library, so you could build 10 network
and run them at the same time if you have 10 threads.
<bstard>	And could you implement threading in the library after
the RUN object?
<bstard>	so that the BL could upload multiple XML's into the PL
and give it a go ?
<jm>	Well, it would be equivalent to the BL starting multiple threads
and starting a processing in each one.
<bstard>	:)
<bstard>	hehe
<bstard>	so I'm buggered again? :-)
<bstard>	dont get me wrong
<jm>	The reason I prefer one PL per BL is that sometimes the PL could
crash for some reason and you want to be able to track it. Also, you
could restart only this part instead of all the PL's running on that
machine.
<bstard>	I cant really oversee the situation because I dont really
know how Overflow is implemented..
<bstard>	If you think the best way is to have the BL do the
threading that's fine to me...
<bstard>	crashing <- ic
<brad>	crash <- I think this is very important. People will really want
to know why things crashed, and during development we will definately
need to know this :-)
<bstard>	dont expect the BL to be 100 stable.. it HAS been running
for days under stress testing.. but I you never know 
<bstard>	"but I you never know" <- but you never know :)
<brad>	stable <- No problems, I don't expect anything to be stable yet
:-)
<bstard>	OK, so the design will be : 1 DL -> 1 BL with internal
threading -> 1 PL / BL thread ??
<jm>	I think in this case, mulpiple processes are better than multiple
threads.
<jm>	You're losing me...
<bstard>	aailks
<bstard>	loosing where?
<bstard>	@ the threading vrs. processes?
<jm>	I don't understand what you want to thread. In my mind we
wouldn't need any thread at all (just separate processes).
<bstard>	I still am in faivor of 1 BL process / machine.. that's
why I'm thing about threading
<bstard>	multiple BL's will be very hard to manage 
<bstard>	darn... if only we were sitting next to each other I
could draw a design :)
<bstard>	that would clear up thing\
<bstard>	s
<bstard>	jm: you saw this flowchart on http://sunsite.auc.dk/gms ?
<jm>	The problem is that if the PL crashes, it will crash the BL and
all the other PL's on the machine.
<bstard>	hmm... I can have the BL run inside a debugger
<bstard>	for as long as things are unstable
<bstard>	a build-in debugger 
<bstard>	no gdb or so..
<bstard>	but the ONLY thing you have again a sigle process design
is the instability? Or do you just dont feel confortable with it?
<jm>	it's more about PL instabilities and error recovery.
<bstard>	hmm.,.. just a sec, I'll give somebody a call about we
running the BL\PL set inside a debugger.. just to ask if it's possible.
<bstard>	arf... if you need people :(
<bstard>	no answer :)
<bstard>	otherwise we can have the DL do the resurrecting?
<jm>	it's not only about debugging but also real-life. If the PL
crashes in real-life, then you only need to restart one PL.
<bstard>	the BL is capable of keeping track of a runtime history,
so it can do a full restore after a crash
<bstard>	no, not debugging the errors!
<brad>	DL resurrection <- I can detect a BL crash very easily, but I'm
not sure how I'll be able to report back to the UI on what caused the
crash...
<jm>	Not if the crashing PL crashes the BL and the other PL before
they have time to terminate.
<bstard>	a debuggers receives the crash signals and can respawn
the BL\PL process
<jm>	but you loose all the computations from the other PL's
<bstard>	 DL resurrection <- the DL can restart the BL with a
special parameter, and the BL will do the rest
<bstard>	just like if we'd BL slaves restart a PL process.
<brad>	Dl resurrection <- What do you mean be special parameter?
<bstard>	jm: lose other computations. again good point. 
<bstard>	like ./BL -restart :)
<bstard>	ok, the PL can not do history tracking.. that would
become too slow. shit!
<bstard>	sorry
<jm>	I don't see what's so complicated about running two BL's on the
same machine.
<bstard>	ok. What about: the BL will start a child\slave process
for all data streams coming out of the Pipeline?
<bstard>	out of the BL pipeline that is.
<jm>	What do you mean by BL pipeline?
<bstard>	all incoming data streams go thru the pipeline, that does
the authentication etc.
<bstard>	see : http://sunsite.auc.dk/gms/gms.gif
<bstard>	other things like marshalling data and resource
management are also located in the pipeline
<jm>	Sorry, I don't understand your flowchart...
<bstard>	if we are to use multiple BL's on 1 machine that will
become very hard 
<bstard>	look only at the middle part, with 'outside world' above
it
<bstard>	the 'sensors' are the data streams coming from external
sources..
<jm>	What's harder? For me it's just communicate with another BL,
regardless of where it is (local or remote)
<bstard>	the basic of gms is that there is only 1 pipeline on a
machine. That's why I make the fuss about multiple BL's :-)
<bstard>	so the plan of stripped (non-pipeline, no
DL-communicating, non-external input) slave BL's sounds ok?
<bstard>	or dont you get it?
<jm>	Sorry, I think I'm missing some basic information...
<bstard>	to you ( the PL) there wont be a diff. Also to the DL
there's no diff. and the BL design will remain intact
<bstard>	jm: maybe we need to discus this on voice some day? 
<bstard>	brad: you're still here? :-)
<jm>	Well, you're the BL specialist after all... it's just that from
my point of view, things would look so much cleaner if "one part of the
processing" = 1 BL "slave" = 1 PL, regardless of where everything runs.
<brad>	Yup, just listening :-). This is your guys battle.
<jm>	...Just thinking... is the multiple BL problem about the
authentication and stuff like that?
<bstard>	jep. about deadlocks and so on. and about resource
management
<jm>	In this case, maybe there could be a permanent part of the BL on
each machine, which would spawn (process) Baby-BL's linked with the PL.
<bstard>	that's why I only wnat to have 1 process doing all that
on 1 machine
<bstard>	YEP!
<bstard>	the BL '
<bstard>	master' will be permanent
<bstard>	those 'slaves' will be spawned one there's something to
do
<bstard>	1 spawned stripped-down BL / PL 
<jm>	once?
<bstard>	and die when the job is done
<jm>	That's what I'm thinking about... it's just that I had forgotten
about all that authentication/resource thing.
<jm>	Can you wait a sec...
<bstard>	ok, pfeeew
<bstard>	we've settled something it seems :)
<brad>	Good stuff! Baby-BLs are the way to go :-)
<bstard>	ofcourse I'll wait... /me having a thinking-smoke
<bstard>	brad: lol
<bstard>	hehe
<bstard>	BBL's :)
<brad>	:-)
<brad>	I'm happy you are coming together on that. The Baby-BLs actually
makes a lot of sense with what both of you were saying. 
<brad>	Maybe we can have parent/baby relationship, instead of
master/slave. Sounds a little nicer :-)
<bstard>	brad: could you build into the DL a respawn check for the
Mother BL ?
<bstard>	the MBL so to say
<brad>	What do you mean by a respawn check?
<bstard>	if the MBL dies, the DL should restart the thing
<bstard>	like the BBL's do for the PL'
<bstard>	s
<brad>	Yeah, that is no problem. We should add a little ping() function
to the IDL, then when the ping fails, the DL will restart the BL.
<jm>	Well the BBL would link to the PL, so both die at the same time
<jm>	it would be the mother BL that would respawn a BBL/PL
<bstard>	brad: ping <- yep, I'll put it into the IDL
<bstard>	jm: jep!
<brad>	So what is our plan for actually implementing all of this?
<bstard>	the MBL will do all the resource\AAA and (re)spawning
work
<jm>	As for what I was calling the "master", it could belong to the DL
or BL (I don't care), and that's the one I see responsible for splitting
the processing, sending that to the MBL's and respawning MBL's
<bstard>	Actual impl. <- 1stly the C++ Corba should be done. After
that I'll be eager to do some coding of new features :)
<bstard>	jm: I see what you mean now. I think you can see my
initial opposition to this now too can you?
<jm>	opposition to what?
<bstard>	to the master-slace design
<bstard>	slave
<brad>	impl <- Well, how about if we split like this: I'll work on
getting the CORBA C++ working and you and Jean-Marc can work on
implementing this whole mother/baby thing.... 
<bstard>	if slave start to do authentication etc the system will
become a big mess
<jm>	I'm not sure you understand what I mean. What I was calling
"master" is more like a library that the DL would use to communicate with
all the mother-BL's
<bstard>	brad: I can not devel that without the sources compiling
:(
<bstard>	that darn corba must be running soon
<jm>	For me master BL != mother BL.
<bstard>	jm: ?? master != mother? 
<jm>	OK, let me say how I now see the whole thing:
<brad>	impl <- Well, lets extract the CORBA out into its own separate
module. Can you just write a little C++ code that will run to test
things, and then we'll make things be driven by the CORBA subsequent to
that? I think the CORBA should come out of the messaging_db stuff for
cleanliness maybe anyways...
<jm>	The DL links (as library) with the "master"-BL, which takes care
of splitting the processing into small XML chunks that are sent to all
the mother-BL's on the different machines. Then each mother-BL spawns a
Baby-BL/PL for each part it receives.
<bstard>	brad: let's not create too much chaosshall we?
<bstard>	shall we 1st finish the corba, and once it's working I'll
separate the corba and the database, ok?
<bstard>	jm: three BL's?
<bstard>	hmm
<jm>	Assuming the flow A->B->C where A and C are run on machine X, B
is run on machine Y, and the DL is run on machine Z, here are the
processes I see:
<jm>	1 process on Z (the DL/master-BL)
<jm>	3 processes on X (1 Mother-BL, 2 baby-BL's)
<jm>	2 processes on Y (1 mother-BL, one baby-BL)
<jm>	Once again, the confusion probably arises from the fact that you
probably don't consider my "master-BL" to be part of the BL...
<brad>	impl <- I'll put this on hold until you and Jean-Marc are done...
<bstard>	brad: k :)
<bstard>	jm: ic what you mean
<bstard>	I see this master BL and mother BL as being the same
process
<bstard>	the motherBL gets a chunk of XML from the DL, splits it,
sends it to other motherBL's and the motherBL's start producing baby's
<jm>	I see all the mother-BL's being equal -> no reason why the
splitting would be done by one particular instance of mother-BL.
<bstard>	and maybe the 'other' motherBL's will also split the XML
once more if they see fit
<bstard>	hehe
<bstard>	jm: still see the need for a 3th BL type, the master BL?
<jm>	...or you can call it master-DL if you like, since I don't see
where it belongs.
<jm>	I think it's not the mother-BL's job to split the processing in
chunks. ...if you don't do processing on the local (where the DL runs)
machine, you don't need a mother-BL there.
<bstard>	but the BL will manage the resource management, this can
not be done without doing the xml splitting
<jm>	Sure, but since there are many BL's, the splitting decision has
to be split somewhere. There should be a part that only does that.
<bstard>	there will be only 1 MBL on a machine. So it's capable of
desiding the splitting
<bstard>	why introduce a new process for just doing the splitting?
<bstard>	the MBL will do 1: authenticate, 2:
recouremanagement\splitting 3: spawning BBL's
<jm>	Yes, but how do you decide how to split between different
machines. What I'm saying is that there should be a part that's in the
middle of the DL and all the mother-BL's
<jm>	there are many MBL's, right? one on each machine
<bstard>	I think a MBL should "take what it want" and "pass on
what it doesn't like to other MBL's"
<bstard>	no, just ONE MBL / machine
<bstard>	many BBL\PL sets
<jm>	...OK, let's go from the start.
<bstard>	ok :-)
<jm>	If we have 10 machines, we have 10 MBL's (one on each machine).
Let's assume the DL runs on a 11'th machine.
<bstard>	ok..
<jm>	The DL gets from the GUI a complete (unsplit XML network).
<jm>	(remember, there's no MBL running where the DL is)
<jm>	There's no reason the DL would ask one particular (of the 10)
BL's to do the splitting.
<bstard>	there is
<bstard>	the DL doesn't have login access to every MBL
<bstard>	but the DL COULD have it..
<bstard>	very strange situation though
<jm>	I think there should be "something" on the local machine that
(maybe after communicating with all the MBL's) decides how the program
should be split and then sends each part to the right MBL.
<jm>	Why not have a special part of the BL that would do the interface
between the DL and all the MBL's (on all the machines)
<bstard>	what about making this "something" this logic:	"take
what it want" and "pass on what it doesn't like to other MBL's"
<bstard>	that way the splitting can start on every system on the
net in theory..
<jm>	I think we disagree on how the processing will be split...
<jm>	From what I understand, you say that only the BL will decide
about the splitting. Am I right?
<bstard>	you say splitting should be done by a splitting master, I
think we can rely upon smart algorithms that will do the job without any
'mastering'
<jm>	I'm more thinking that the user will say "This runs here, that
runs there, ..."
<bstard>	the user CAN say that
<jm>	Only, sometimes, the user want something to run somewhere... and
nobody's going to decide something else.
<bstard>	that way the splitting will be very easy
<bstard>	but what if the user says 'just runs it somewhere where
it will be done fastest' ?
<jm>	In this case the local "splitter" would ask all the MBL's about
their resources, but I think most of the time the decision will not be
made by the MBL's.
<jm>	Let's say I have a sound acquisition node, I want it to record
from a certain microphone that's on a certain machine... not to record on
any microphone, where the BL like to.
<bstard>	I think that way will build a very static system. You
should have more faith in having the MBL's fighting about getting the
optimal resource load
<bstard>	microphone: ok, in that case you define to use THAT
specific mic. 
<bstard>	Splitting highly defined xml will be easy, we'll just
pass the xml on to where the user wants it to be
<bstard>	I'm more interested into xml that does NOT contain any
localisation defines
<jm>	I think the user needs to have a word to say about the splitting.
<bstard>	sure he\she does!
<bstard>	the user work with 'that file' and 'this soundcard'
<bstard>	but also with 'some computation power, where the heck it
might be'
<bstard>	or 'information about this'
<jm>	What not have all the MBL's send their resource info at the same
place and then combine that with what the user likes and take the
decision *in one place*.
<bstard>	how will this be done once Piper runs on 103430943
machine some day in the future?
<bstard>	machineS
<bstard>	arf
<jm>	There are always things the user will know that the BL won't
know... like "if I run this part that lauches this process on this
machine, than this user will be mad at me because of this side effect".
There's no way the BL can handle that!
<bstard>	ok, that's why the MBL will get strict resource limits
<bstard>	maxcpu = xx%
<bstard>	maxmem = xx mb
<bstard>	etc
<bstard>	but ok, you're a tiny bit right :)
<jm>	sometimes, it's just that a certain node needs more io or does
something else that disturbs...
<bstard>	the BL wont know everything.. so wont the 'splitter BL'
<jm>	Yes, but the user can speak to the splitter BL and help it do the
splitting.
<bstard>	that's why I want to implement a 'prediction' neural net
once we've Piper running.. to predict availeble resources
<bstard>	hmm.. you think the user should guide the splitting? I
think the user can define anything he wants, after which it's up to the
BL to fill in the gaps
<jm>	Anyway, I don't see how distributing the splitting would be
better than making a centralized decision taken from the data (resources)
of all the BL's.
<jm>	Note that I don't see how the BL will know what resources (CPUt
time) are going to be required for each node, since it will not know what
the nodes do.
<bstard>	1st because a centralized meganism wont be possible once
there are a lot of Piper's running, and 2ndly because I know prediction
algoritms are much more effective than centralized algoritms
<jm>	When I write new nodes, I don't want to have to tell the BL how
long it will take to run, depending on all the variables (size of the
problem, ...).
<bstard>	not know how much it will use in advance <- ok, not the
1st time the node runs, but here the history logs and prediction
algoritms come into play
<bstard>	but I hope you'll reuse those nodes a next time you build
a network
<bstard>	so the MBL's will know upon hostory what that nodes
consumed the previous time
<jm>	not exactly... there are some things you cannot predict... like
how long it will take that algorithm to converge according to the
caracteristics of the data.
<bstard>	ok, such nodes will exsist. Those nodes will get the
label 'unpredictable'.
<bstard>	:)
<jm>	Let's say my neural network will take twice longer to converge if
I train it with speech than if I train it with music... how will the BL
know what files I'm senting to it... also, even the idea of how long it
will take to converge is very intuitive.
<jm>	The problem is that these "unpredictable" nodes will often be the
most resource consuming... in my case at least.
<bstard>	ok, but nodes will consume resources in more-or-less the
same order of magnitude
<bstard>	maybe this is something that needs experimenting.. 
<bstard>	To me the design of having 1 central boss BL for each
network will be limiting.
<jm>	Also, even with many Piper's running, I don't see the problem of
centralizing decisions... Each MBL tells the splitter how much resources
it has left, the splitter receives all that info and takes the decision.
<bstard>	hmm... shall we continue this discussion tomorrow, it's
23:50 here. I'm a bit tired.
<bstard>	maybe I see things much clearer tomorrow :)
<bstard>	I'll give your point of view some brainpower tomorrow...
let you know more next time, ok?
<bstard>	brad: bye, you're around tomorrow too?
<brad>	Sure, yeah, I can be really quiet tommorrow as well :-)