|
#1
|
|
|
|
|
SHOULD IT BE TURTLES ALL THE WAY UP?
In the famous anecdote, the little old lady replies to the noted philosopher, "It's turtles all the way down." When it comes to writing software many writers on software design and many programming language creators seem to believe that it is turtles all the way up. What do I mean by "turtles all the way up"? By this I mean the thesis that the techniques and programming language concepts that are used in the small can be extended indefinitely to programming in the large. In other words if we use language X for 100 line programs and 1000 line programs we can also use it for 1000000 line programs. We may have to add some extensions and new features along the way, but we can increase the size of the programs we write indefinitely using the same ideas. <p> The antithesis is that it isn't turtles all the way up, or at least it shouldn't be turtles all the way up. That is, the kinds of languages and programming technologies that we use in large scale programming should be quite different from the kind of languages used for programming in the small. <p> There are many propositions as to what those large scale technologies should be, and many such technologies in use. Here I am going to look at data flow languages and data flow structuring of programs. WHAT ARE DATA FLOW LANGUAGES? There are two wikipedia articles that give useful answers: http://en.wikipedia.org/wiki/Dataflow_programming http://en.wikipedia.org/wiki/Flow-based_programming The distinction wikipedia makes between data flow programming and flow based programming is obscure. The following definition is an edited version of definitions used in the two articles. Data flow languages structure applications as networks of "black box" elements that exchange data across predefined connections by message passing. Elements execute when they get messages; they send messages asynchronously. Data flow based applications are inherently parallel. There are a wide variety of data flow languages, varying from spread sheets, to Labview, to Erlang. Many are graphical; programming is done by altering flow diagrams. One thing they all have in common is that they have a run time system. Traditional imperative programs are composed of routines that call each other; i.e., when a call is made the caller constructs a data packet (calling sequence) and transfers control and the data packer to the called routine. When the called routine is done it constructs a data packet to pass back to the caller and transfers control back to the caller. In data flow programs the "routines" do not call each other. Instead they are activated by the run time system when there is input for them; when they create outputs the run time system takes care of moving the output to the destination input areas. When the "routines" are done they transfer control back to the run time system. One difference between traditional programs and data flow programs is that traditional programs use LIFO semantics whereas data flow programs use FIFO semantics. That is, a traditional program puts data on a stack and gets data back on the same stack. In data flow programs each element gets data from a queue and puts data to other queues. Another difference is that the connectivity of traditional programs is deeply embedded in the code. To pass data from A to B, A calls B. That is, the caller has to specify where the data goes. In data flow programs the connectivity can be separate from the code. A doesn't pass data directly to B; instead it passes data to the run time system which in turn passes the data to B. The caller does not have to specify where the data goes. As a result data flow programs can use different languages for the internal implementation of the computational elements and for the connectivity. In fact, it is common for data flow languages to be graphical. ADVANTAGES AND DISADVANTAGES What are the advantages and disadvantages of data flow programming? Some significant advantages: * Concurrency and parallelism are natural. Code can be distributed between cores and even across networks. Many of the problems associated with threads disappear. * Data flow networks are natural for representing process. This is particularly true for transaction processing. * Message passing gets rid of the problems associated with shared memory and locks. * Data flow programs are more extensible than traditional programs. Elements can be readily composed into composite elements from the bottom up rather than top down. Some significant disadvantages: * The mindset of data flow programming is unfamiliar to most professional programmers. Most dataflow programming languages are niche languages used by non-professional programer users. * The intervention of the run time system can be expensive. The great advantage of LIFO semantics is that the implementing code is immediate and cheap. * Not using shared memory has costs. Either messages must be copied or they must be immutable. * Using data flow programming in the large pretty much requires that it be used from the start. That is, converting traditional programs into data flow programs is difficult because the structuring is so different. Richard Harter, cri http://home.tiac.net/~cri, http://www.varinoma.com Kafka wasn't an author; Kafka was a prophet! |
|
|
|
#2
|
|
|
|
|
On Oct 26, 3:22 pm, c...@tiac.net (Richard Harter) wrote:
> SHOULD IT BE TURTLES ALL THE WAY UP? > > In the famous anecdote, the little old lady replies to the noted > philosopher, "It's turtles all the way down." When it comes to > writing software many writers on software design and many > programming language creators seem to believe that it is turtles > all the way up. > > What do I mean by "turtles all the way up"? By this I mean the > thesis that the techniques and programming language concepts that > > are used in the small can be extended indefinitely to programming > > in the large. In other words if we use language X for 100 line > programs and 1000 line programs we can also use it for 1000000 > line programs. We may have to add some extensions and new > features along the way, but we can increase the size of the > programs we write indefinitely using the same ideas. I've never seen a convincing argument to show that this is wrong. We can use a 26 letter alphabet to make little words. We can use a 26 letter alphabet to make bigger words. We can use a 26 letter alphabet to make little paragraphs. We can use a 26 letter alphabet to make bigger paragraphs. We can use a 26 letter alphabet to make little books. We can use a 26 letter alphabet to make bigger books. We can use a 26 letter alphabet to make entire libraries. Why isn't the same thing true of programming languages? Now, I admit that if our tiny building blocks are poorly made, the bigger building blocks become more and more fragile. But that is true no matter which programming language we use to build the smallest blocks. And it is always a tragic mistake to try to make one big giant block that does it all (Forth metaphor is not an exception because the mega word comes from its babies). I also don't think it matters which direction we build the turtles, as long as they make it from top to bottom. [..] |
|
#3
|
|
|
|
|
Richard Harter wrote:
[snip] > Some significant advantages: > > * Concurrency and parallelism are natural. Code can be > distributed between cores and even across networks. Many of the > problems associated with threads disappear. [snip] To be able to naturally deal with concurrency and parallelism is excellent. But wouldn't it be, on the other hand, somewhat inconvenient to program processes that requires sequentiality? Should or could there be an adequate interface between data flow and conventional programming languages? M. K. Shen |
|
#4
|
|
|
|
|
On Mon, 26 Oct 2009 22:22:14 GMT, Richard Harter wrote:
> Some significant disadvantages: > > * The mindset of data flow programming is unfamiliar to most > professional programmers. Most dataflow programming languages > are niche languages used by non-professional programer users. > > * The intervention of the run time system can be expensive. The > great advantage of LIFO semantics is that the implementing code > is immediate and cheap. > > * Not using shared memory has costs. Either messages must be > copied or they must be immutable. > > * Using data flow programming in the large pretty much requires > that it be used from the start. That is, converting traditional > programs into data flow programs is difficult because the > structuring is so different. * Low abstraction level, basically lacking any abstraction. Data flow works with primitive built-in atomic types forced into value semantics. * Already mentioned inefficiency because of value semantics (i.e. either permanent marshaling inputs, or else blocking the producers or the consumers when data are shared) Normally it is the programmer to choose how to protect states. With the dataflow it must be the engine to decide. Basically it is lack of abstraction again. Working with *data* instead of *states*, that summaries it. * The networks of wired blocks cannot be encapsulated into reusable opaque primitives. You can group blocks, that's it. Usual methods of decomposition do not work, because there is a firewall between "block" and "data". You cannot merge them (like in OO) into a reusable entity (type = values + behavior). * Unmaintainable code. Look at large data flow programs (e.g. in DiaDem, Simulink, LabView). It is impossible to use reasonable software processes on them. How to compare two diagrams? When are they semantically equivalent? How do I validate a diagram? How to search for anything in a diagram? Another example of this problem is represented by GUI libraries, which are mostly event controlled. Practically none of them can be easily used in multitasking environment. It is a horror to debug hangups or generators in messages routed from one messages loop to another. And the proposal is to build *everything* this way. Shudder. * Semantic problems, the barriers of a block that fire the computation of the block's output is a very primitive model that does not scale in reality. It almost instantly goes into a mess of deadlocking vs. race condition choices. And see above, it practicably cannot be debugged except for trivial cases (without feedbacks, with all inputs synchronous etc) it cannot be decomposed into simpler tasks... |
|
#5
|
|
|
|
|
Richard Harter wrote:
> Some significant advantages: > > * Concurrency and parallelism are natural. I guess that such a language could profitably be used to simulate the neuronal network of the brain. Could that be right? M. K. Shen |
|
#6
|
|
|
|
|
On 10/26/2009 06:22 PM, Richard Harter wrote:
> The antithesis is that it isn't turtles all the way up, or at > least it shouldn't be turtles all the way up. That is, the kinds > of languages and programming technologies that we use in > large scale programming should be quite different from the kind > of languages used for programming in the small. I think most people would agree that methodologies for small-scale programs are different from those for large-scale programs, although there might be considerable disagreement over the dividing line. I certainly am not going to be used a full-blown OOP paradigm for writing even something like an automatic code generator; on the other hand, I shudder at the thought of writing something so complex as an operating system in a functional paradigm. An interesting project would be to take a large corpus of various mature open-source programs of varying size and see if there is a correlation between size and the use of certain language features or patterns. > One difference between traditional programs and data flow > programs is that traditional programs use LIFO semantics whereas > data flow programs use FIFO semantics. That is, a traditional > program puts data on a stack and gets data back on the same > stack. In data flow programs each element gets data from a queue > and puts data to other queues. One nit to point out: at this stage, many programs don't follow a procedural model of programming. OOP is the dominant paradigm, and I don't see the sequence of data flow as being LIFO. Yet you continually refer to procedural programming as those that make up `traditional programs.' > Another difference is that the connectivity of traditional > programs is deeply embedded in the code. To pass data from A to > B, A calls B. That is, the caller has to specify where the data > goes. In data flow programs the connectivity can be separate > from the code. A doesn't pass data directly to B; instead it > passes data to the run time system which in turn passes the data > to B. The caller does not have to specify where the data goes. Two words: function pointers. You can also go with virtual or dynamic method dispatch, but function pointers is shorter and sounds better. > * Concurrency and parallelism are natural. Code can be > distributed between cores and even across networks. Many of the > problems associated with threads disappear. They don't disappear, they're pushed into the runtime system. From practical experience, that means they probably bubble up and annoy you in the programming language. > * Data flow networks are natural for representing process. This > is particularly true for transaction processing. Maybe I'm just being a stupid idiot here, but I don't see how data flow is natural for representing some common processes. For example, the HTML5 tokenization process. I suppose you can flow current-state output back around and into itself as next-state input, but that's not exactly natural. This also brings up a question of how the language deals with loops in the dependency graph. The solutions I see either bring back the problems with threading or could create unreliability. > * Message passing gets rid of the problems associated with shared > memory and locks. Just as credit default swaps and other financial machinations got rid of risk :-). From what I can tell, similar problems arise, but they're in a different form (data flow dependency graphs, particularly the fact that they're typically not acyclic). > * Data flow programs are more extensible than traditional > programs. Elements can be readily composed into composite > elements from the bottom up rather than top down. Even if you consider good old procedural programming, I would say that the exact same thing is easily doable in traditional programs. I can take code that does asynchronous socket reads and turn it into a MIME decoder by creating a module that calls the asynchronous socket read module appropriately. It's also an example of your "bottom-up composition." > Some significant disadvantages: > > * Using data flow programming in the large pretty much requires > that it be used from the start. That is, converting traditional > programs into data flow programs is difficult because the > structuring is so different. I don't see it that way. Ultimately, most libraries are merely "pass X in as input, get Y out as output"--it shouldn't be too hard to make that an atomic data-flow program block. |
|
#7
|
|
|
|
|
On 26 Oct, 22:22, c...@tiac.net (Richard Harter) wrote:
[..] > takes care of moving the output to the destination input areas. > When the "routines" are done they transfer control back to the > run time system. > > One difference between traditional programs and data flow > programs is that traditional programs use LIFO semantics whereas > data flow programs use FIFO semantics. That is, a traditional > program puts data on a stack and gets data back on the same > stack. In data flow programs each element gets data from a queue > and puts data to other queues. Hmm. Trad languages aren't stack-based in the sense FORTH is, with explicit PUSH and POP instructions. Parameters and return values sit on the stack for implementational convenience. You could re-implement a trad language with mini-queues that are only one message long. (Thus making the FIFO/LIFO distinction irrelevant). Indeed, OO languages oftens see method calls as message passing. (http://c2.com/cgi/wiki?AlanKaysDefin...ObjectOriented) The stacking seems to come in with a rule that replies/returns are always to the sender/caller. Such a rule would, I think, lead to emergent stack-like behaviour even without a low-level stack in the implementation. It is interesting to consider how recursion could be implemented in a dataflow language. > * Using data flow programming in the large pretty much requires > that it be used from the start. That is, converting traditional > programs into data flow programs is difficult because the > structuring is so different. Counterexample: ls | sort |
|
#8
|
|
|
|
|
Richard Harter wrote:
> There are many propositions as to what those large scale > technologies should be, and many such technologies in use. Here > I am going to look at data flow languages and data flow > structuring of programs. I think dataflow programming deserves some more exploration than it's had. Right now, piping things between shell commands is all the dataflow programming most of us do, and that's pretty trivial. Also, the languages involved are pretty heavily constrained by what they do and the very limited uses to which they are put. The dataflow parts of shell programming, for example, are effectively monotyped, with the datatype they work on being the "line of text." We haven't really seen examples of "modern" dataflow languages (with, eg, user-defined types, good tools, proper debugging interfaces, an asynchronous execution model, queue-length and queued-data age monitoring, function types, encapsulation, etc). It's worth a research project or two, certainly. The semantic primitives we use to describe type systems, for example, take on different meanings, whose theorems and corollaries have not yet been fully examined, in the absence of function calls as we understand them in OO, functional, and imperative languages. But I think I need to take exception to some of your claims. > * Data flow networks are natural for representing process. Â This > is particularly true for transaction processing. And particularly untrue for naturally-recursive processes such as formal language parsing, etc. Also? The reason they're very natural for transaction processing is because they are BUILT ON TOP OF transaction processing systems. All those operations on queues are data commits that the runtime has to manage. A typical dataflow language is mostly a way of customizing the transaction processing system on which it's built to the needs of a particular application. > * Message passing gets rid of the problems associated with shared > memory and locks. Sort of. In practice the runtime has to manage the queues, and it has to mediate contention between processes that want to write to them and between processes that want to read them. This may allow you to not notice the locks etc, but it doesn't remove them. On the other hand, good dataflow design can often limit queues to a single producer and a single consumer. > * Data flow programs are more extensible than traditional > programs. Â Elements can be readily composed into composite > elements from the bottom up rather than top down. Got any experimental results? Frankly I don't see one as being superior to the other, and I don't see how "bottom-up" or "top-down" designs are more or less favored by either paradigm. If you're going to make a claim like this, you should do the study to try and prove it. "Top-down" vs. "Bottom-up" IME are usually more influenced by programming tools (like, eg, a REPL or otherwise interactive environment as opposed to a static compiler). Bear |
|
#9
|
|
|
|
|
On 27 Oct, 00:04, user923005 <dcor> wrote:
[..] > > We can use a 26 letter alphabet to make little words. > We can use a 26 letter alphabet to make bigger words. > We can use a 26 letter alphabet to make little paragraphs. > We can use a 26 letter alphabet to make bigger paragraphs. > We can use a 26 letter alphabet to make little books. > We can use a 26 letter alphabet to make bigger books. > We can use a 26 letter alphabet to make entire libraries. > > Why isn't the same thing true of programming languages? what do you make libraries into? At each stage in your list some extra organising structure is introduced. A library has shelves and indexes, it is not a single gigantic book or trillion- word sentence. So the answer to "why not" is "we have reached the highest organising- pricniple allowed by the language, and we still have more to organise". (what do you make libraries into? ) Currently we are struggling to cope with applications spread accross multiple nodes and processores, a situation which was not forseen in most traditonal HLLs. |
|
#10
|
|
|
|
|
On 27 Oct, 08:05, Mok-Kong Shen <mok-kongs> wrote:
> To be able to naturally deal with concurrency and parallelism > is excellent. But wouldn't it be, on the other hand, somewhat > inconvenient to program processes that requires sequentiality? > Should or could there be an adequate interface between data flow > and conventional programming languages? The transforms that operate on flows are black boxes and could be written in PP, OO, or functional languages. The DF system just has to know what they can handle. |
|
#11
|
|
|
|
|
On Oct 26, 5:22 pm, c...@tiac.net (Richard Harter) wrote:
> What do I mean by "turtles all the way up"? By this I mean the > thesis that the techniques and programming language concepts that > > are used in the small can be extended indefinitely to programming > > in the large. In other words if we use language X for 100 line > programs and 1000 line programs we can also use it for 1000000 > line programs. We may have to add some extensions and new > features along the way, but we can increase the size of the > programs we write indefinitely using the same ideas. > <p> > The antithesis is that it isn't turtles all the way up, or at > least it shouldn't be turtles all the way up. That is, the kinds > > of languages and programming technologies that we use in > large scale programming should be quite different from the kind > of languages used for programming in the small. While this is perhaps true, it's completely unclear what we need to successfully manage large projects. Sure a few things do help (like strong type, memory safety, good support for interfaces), but those also help on all but the smallest projects as well. At the level of several tens of thousands of lines of code, it's all irrelevant. Basically any language and methodology, including winging it, will work. At the million line level, there simply isn't any demonstrated reliable* methodology, unless you can break up the system into many small and highly independent pieces (consider the library of tens of thousands of device drivers on some modern OSs). No magic bullets, and all that… *where reliable is defined as producing something approximating a working version of the desired system in sometime approximating the planned time and budget. |
|
#12
|
|
|
|
|
"robertwessel2" <robertwessel2> writes:
> [...] At the million line level, there simply isn't any > demonstrated reliable* methodology, unless you can break up the system > into many small and highly independent pieces (consider the library of > tens of thousands of device drivers on some modern OSs). Well, there's one proven methodology: the compiler. That is, metaprogramming. I can write working programs of one million lines of assembler any day. Only it's not me who will write this million of assembler lines each day, it's the compiler. I will only write 10,000 lines of source code. Now if you need to write a million of source line, then just don't do it. Use metaprogramming to generate this million of source lines from a smaller source. And so on, you can add layers of metaprogramming all you need to compact your sources and always have something of manageable size. |
|
#13
|
|
|
|
|
On 2009-10-28, Pascal J. Bourguignon <pjb> wrote:
> Now if you need to write a million of source line, then just don't > do it. Use metaprogramming to generate this million of source lines > from a smaller source. And so on, you can add layers of > metaprogramming all you need to compact your sources and always have > something of manageable size. So, you're saying that *every* programming problem can be solved in at most a few tens of thousands of lines of code? Certainly some problems can, but most can't. Metaprogramming is just a form of compression, and there is no compression system that can reduce every source below a given size. Some problems really are irreducibly complex, and demand complex solutions. - Tim |
|
#14
|
|
|
|
|
On Oct 27, 3:47 am, Mok-Kong Shen <mok-kongs> wrote:
> Richard Harter wrote: > > Some significant advantages: > > > * Concurrency and parallelism are natural. > > I guess that such a language could profitably be used to simulate > the neuronal network of the brain. Could that be right? > > M. K. Shen In one domain - evaluating large mathematical expressions - dataflow languages express their programs well. One complete step in program design - memory map design - does not occur at all. That's where the early work on Sisal was directed if I'm not mistaken. Research on these mathematical programs - and the search for execution speed - still continues. Part of the charm ascribed to "dataflow machines" was their parallelism. Parallelism was to be as inherent in the execution as in the programming. That was to be a fundamental goal of the computer hardware for these machines. I think that an associative processor to be used for matching up tokens in the (executable) dataflow graph does not exist in large-scale hardware. Instead it is simulated by the "run time" that was mentioned earlier. That's just not the same thing at all. John |
|
#15
|
|
|
|
|
Tim Little <tim> writes:
> On 2009-10-28, Pascal J. Bourguignon <pjb> wrote: >> Now if you need to write a million of source line, then just don't >> do it. Use metaprogramming to generate this million of source lines >> from a smaller source. And so on, you can add layers of >> metaprogramming all you need to compact your sources and always have >> something of manageable size. > > So, you're saying that *every* programming problem can be solved in at > most a few tens of thousands of lines of code? Can you not specify all programming problem in less that a few thousands of lines of specification? Well, you can always write more detailed specifications, but I can assure you that sales peoples will always be able to put the whole specifications of your software on a 2-page booklet. > Certainly some problems can, but most can't. Metaprogramming is just > a form of compression, and there is no compression system that can > reduce every source below a given size. Some problems really are > irreducibly complex, and demand complex solutions. Yes indeed. However, assuming a big ontology (eg. take wikipedia, or even the whole web), wouldn't it be possible to express the needs for any software in less than ten thousands lines, and let the sufficiently smart system develop it, filling in the blanks in the specifications with all the knowledge it can extract from the web? Or take the problem actually in the other dirrection. Would you trust any implementation of a system that has orders of magnitude more than ten thousand lines of specifications? How can you ensure these specifications are consistent? How can you ensure that they're effectively implemented? Wouldn't you be more able to understand and check the specifications if they were shorter, that is indeed, given the ultimate limits to compression, if what they specified was less complex or of a more limited scope? If you accept that big systems must be decomposed into small programs, you should probably also accept that big specifications are as bad as big program(*), and that they should be short too, to be understandable and effectively implementable. Then the degree of automatization in the process of translating the specifications into executable code is only a matter of advancement of the techniques, while the size of the executable code is only (roughly) a function of the number of metaprogramming levels used. (*) Specifications = programs, as in: high level, declarative programs. |
|
|
|
|
| Similar Threads | |
| control flow vs data flow tasks I want to use SendMail task to send an email - only if data exists in table. easy enough, however to check if data exists I have to go to data flow, how do i return to... |
|
| SSIS: Is it possible to start execution of series of data flow tasks part way through series? I am new to SSIS and am creating my first package. Basically there are about 10 data flow tasks and each one takes quite a long time to run. I would like to be able to start... |
|
| regarding programming languages hi i know only programming language 'C' & with my civil engineering background i got a job in a software company which develops civil specific softwares here i require the... |
|
| Differences in data description in programming languages In an exchange between Richard, and RW, Richard points out that declaration of data depends upon the specs of the source language. I understand that. What I would like to... |
|
| languages in programming languages How do people who don't use the roman alphabet program? Do they need to learn English or are there translated versions of Fortran, C++ and Basic in other human... |
|
|
All times are GMT. The time now is 10:50 AM. | Privacy Policy
|