Monday, 13 October 2014

Philosophical aspects of the software architecture

Philosophical aspects of the software architecture

This article aims at the top-level system architects and scientifically minded researchers; however, I hope that even junior coders may find many interesting and useful things in it.

The Matrix is watching

Even the simplest digital system is the interactive simulation system; in fact, it is the extension of our world. Not like in the Matrix movie, but the principles are same - it is the interactive simulation.
Remote control in your hand to turn your air conditioner on, is the interactive simulation system.
You do not believe it? It is primitive; however, it is based on the Alan Turing machine that just crunches the numbers. By the way, what is the number?  The number is the abstraction made up by the humans to simulate the reality. There are no numbers in our universe. The world around us is 100% analogue. We use the numbers and booleans to describe the surrounding us universe. The description is always the model of something and the model is always a simulation, static or dynamic. What is the story, for instance, by Conan Doyle about Sherlock Holms? Was Sherlock real?
No, the character was purely fictional, in other words simulated.  Such concepts like "Yes"," No", "Bigger"," Smaller" are just the abstractions. They do not exist in the reality. If for any reason the humankind disappears, such things like "yes" and "no" will disappear with us, because they exist only inside our brains.  We make them up.

The world without objects

The objects do not exist in the universe either. We make the objects in order to abstract one part of the universe from another. How about the stars? Do the
y exists? Of course, they do, however, there is no physical boundary between the single star and the star system, the star system is the part of the galaxy and so on. It all depends on the way we look at it. If we deal with the star, we focus on the star object ignoring its surroundings, if it is required.
In the software development, we use the objects because it is the only way to overcome the complexity of our real world. An object is always a model of something real that we deal with.
Of course, we can make the objects that are not the models of the real objects, but they are still the models of the models, which were derived from the reality for the simple reason - the reality (the Universe) is the primary source of everything.  That is why the Object Oriented programming (OO) is so important and ubiquitous.  Encapsulation and polymorphism are just the formal methods of dealing with the objects.
Now we can see some different paradigms in the software development world, like emerging functional programming, previously there was a procedural programming, and we hear the voices that the OO programming will soon be gone.

This one, for instance

Personally, I think that it will never happen because the object is the base concept of our world.  Remove the objects from our set of concepts and everything will disappear.  The object is the building block of any virtual (simulated) reality.

The system

system is the collection of the objects that interact with each other. How? Interaction in the IT world is always a sending the message to the target and receiving a response (if any).

Sync vs async

Communication patterns

The complete communication pattern is a request-response.  Request-response is always synchronous; we have to wait for the response.
Building the system, we have to choose carefully the communication pattern or rather the patterns, because complex system usually requires more than one channel of the control.
Usually it is a carefully chosen the combination of sync and async methods.

Let us look into both patterns

Synchronous approach implies that the system stops and waits for the end of the execution.
Asynchronous, on the contrary sends the command and continues the execution without waiting for the result.
Historically the communications became mostly synchronous.  Example:  Remoting, WCF, CORBA and other. They are all sync. There were apparently  two reasons for that - the popularity of http protocol and rise of RMI (remote method invocation).
Http protocol is a stateless one. Maintaining of the connection is not required.  Open the connection, send the request, immediately receive the reply and close the connection. That was the idea at early days of our global web. Perhaps at that time it was justified, the systems were very primitive and the pattern "request - response" covered all the needs. Not any more though.
The RMI has also contributed to the sync pattern. The idea was to execute the commands remotely in same manner they are executed locally. Wow! How convenient, you do not even have to care where the target is, here or in Japan or on the Moon.

The live is more complex  

Imagine you are writing the letter to your fiancée asking to become your wife. You drop the letter to the post box and wait for the reply. Do you stop eating or going to work, brushing your teeth? Unlikely, otherwise your bride is risking becoming a widow instead of a wife.
So, it appears that sending the letter is asynchronous.  You send and forget?
Not quite. The response, if it comes, will change your life. In other words, it will change the state of the system (you).  Well, It looks again it is synchronous, but what about brushing the teeth?
So, we can clearly see that the behaviour of the system (you and your bride interaction) cannot be covered by the existing common patterns and if you are designing the real time system (I would rather call them real life systems), you have to stop relying on the sync-async patterns. Simply they are not sufficient.  They cover only a very limited number of cases but we keep pushing this pattern instead of thoroughly reviewing it.
The pattern Async_WithConfirmation_and_Timeout covers all the cases, including pure async and pure sync.  Just make the Confirmation=false and the timeout = 0 and the pattern becomes purely async, Make the timeout = infinity and the confirmation=true and we have pure sync pattern.

Was Frankenstein synchronous?

Imagine we built the Frankenstein, kind of the android and everything is running synchronously,
every his step corresponds to two heartbeats and so on.
During the construction, we also created the program that controls our Frankenstein. The Frankenstein is successfully built and released to the nearby town. His real life begins. The first problem this guy will experience would be the inability to cross the road because crossing the road will require the change of the ratio between the number of heart beats and the number of steps. Even if the controlling program is perfect, let us imagine unimaginable, the physics in the universe we live, will not allow to follow the program strictly.  The macro world is still built of the subatomic particles and they are governed by the quantum physics, which has Heisenberg’s Uncertainty Principle.  Even a perfect program will fail eventually and our system must adapt to the changing world around us.

The connection

The connectionless protocols are becoming less popular due to the inability to assess the state of the system they are dealing with and the state of the object is a fundamental property of the reality, it is not just a software factor.  Do not forget that without the state (the memory) the Alan Turing’s machine cannot exist.

Client-Server is not good enough?

A typical distributed system now is based on the client-server architecture, where the client communicates with the server synchronously.  Intuitively the developers feel that this pattern is not sufficient. Look at this article
It is the attempt, in fact relatively successful to compensate for the inherent client-server pattern deficiency. I say successful and that is partly right. Nothing really can compensate for the inherent deficiency of the sync pattern.  You cannot turn the steam engine into the space shuttle, and the shuttle into the steam engine. Simply they were designed for different purpose.


Timeout is the most important moment in the component design. Why is that?
Because we assume that the time flows at the same pace on the other side of the network or even in the whole universe. It is the only parameter that is available without sending and receiving anything, it is also invariable. That why it is so universal and so valuable.

The timeout and the probabilities

We wait for the bus at the bus stop. The bus is not yet coming.
What is the probability of the bus to come? Well, it all depends upon the period we set for this probability (or rather a mathematical expectation) to materialize. In other words, it is a function of time.  At first, the probability monotonously grows and then starts to drop sharply.  What do we do? We wait for a bus and within first 5 minutes, we do not even think about catching the taxicab.
However, the situation changes, we become desperate and eventually we are ready to take a taxi.
What do we see? What pattern describes the situation? Actually, the expected probability changes the scenario we are following. So, we see that it is not a simple timeout, at every individual moment we have a different scenario.  Our software usually is not that smart, however some different levels of timeouts should be implemented.

Francisco Scaramanga and the software development

The rule number one of the engineering is - do not re-invent the wheel. Take something that exists and improve it. (in the software terms it is an inheritance) Well, sounds good. What is the best system in the world? So far in the known us universe, we humans are the most sophisticated systems. Coping ourselves in the C# or C++ code? What a nonsense? Not quite. I suggest taking a bit closer look at ourselves.
If you remember James Bond movie "The Man with the Golden Gun", you possibly can recollect the character-  Francisco Scaramanga, the villain and the man with 3 nipples. Error of nature occurred and the person had 3 nipples instead of two.
Our genome (DNA) which is the instruction how to build our organism was broken or somehow misinterpreted during the construction. The most important knowledge out of this error is that the instruction how to build our body is not the instruction at all. It is just a recommendation; otherwise, the third nipple would not fit in. Imagine the airplane construction plant. You have the drawing how to build the plane. Is it possible by some mistake to build the plane with one extra wing? Even if this extra wing is built, there is no way that this wing can be fitted onto the plane; you have to redesign all other bits and pieces. However, unlike our poor three-wing plane, Francisco Scaramanga was fully functional and almost killed our perfect James Bond. How come? The reason for that is, when Scaramanga was constructed (let us stick to this generic term), the building blocks of our body try to adjust to each other.  It is a mutual adjustment; it is not the construction according to the plan.
The conclusion from that is - the more complex the system is, the less coupling should be between the blocks. The real life complex systems are always multithreaded because without multithreading it is physically impossible to achieve the decoupling of the components of the system and without the decoupling the large system is not functional. Decoupling also means that the synchronization between the different blocks is external in relation to the block itself. There should be a system manager that synchronizes all the subsystems in whole system. The systems must be multithreaded not because of the performance issues. The major reason is that they must be built from the self-adjustable and self-adaptable components. 
The system built with one thread is always sequential, if your heart waits for the piece of meat to be digested in the stomach, you are doomed to die.

Choosing the wife and the software design

What a strange question. What is the connection between the software design and choosing the partner? Well, there is one, very fundamental.
The reason why biological objects (humans for instance) have two genders is simple - two is the minimum and yet sufficient number for spreading the gens into the wider population.
Could be not two genders, but 3 or even 4. Simply adding the number of sexes is not adding anything functional to the gens exchange mechanism, so, two is optimal. Why do we exchange the genetic material at all? Would it be easier to reproduce the children by recombining the gens internally and then giving birth to this new organism, and later on this new organism enters the natural selection as we all do? What is wrong with that? The major problem with this approach is that 99.9999% of the descendants will consist of total genetic garbage and will not be functional.
Instead, with the sex (or rather binary) approach, the organisms exchange the bits and pieces that are already functional.  Don't we have the father's eyes and mother's lips? So, we inherit the functional blocks and the blocks get recombined at the moment when the child is conceived.
This is a simplified version of genetics, in reality it is far more complex, but the basics is - only the functional blocks are used for the building of the whole organism and microscopic bit is left to the mutations.
In the software world, we have same pattern - we use only the blocks that were built long time ago and did have the time and the opportunity to pass the real life test. When we build everything from the scratch, we simply leave 100% of the design to the mutations. Typical mutation kills the organism, only the tiny fraction of the mutations are useful, but without the mutation, the new species will never appear. So, the practical outcome of this is that the developer has to reuse the existing frameworks and relievable patterns as much as possible, relying only on your home made software will kill your product, but you have to leave some room for the design from scratch, that is how the new breed of the software gets created.

The music of the system development

There are thousands if not millions of the articles and tips on how to write the software.
Codeproject has at least hundred of them. Take a look at this one:

It is the most popular article on the software development. In my humble opinion,
this article is not about the software development at all.  Just a simple analogy - there is a piano performer and there is a music composer.  The piano performer plays only what was written by the composer, just  that and what all this articles are focused on is how to write the notes, what ink to be used, what paper, handwriting style but absolutely nothing about the music itself. Everybody forgets that it is the music that is played, not the note sheets. We all remember Mozart and Bach not because they wrote heaps of the note sheets, but because they created the Music.

In fact all these articles are not about building the software, they all about the writing the code and the purpose of this article is to show that writing the code and building the systems that work, are from parallel though different universes. Let us begin our journey to a parallel universe.
Firstly, the software, as it was shown above, is merely a reflection of the real world we all live in. This fundamental fact is often overlooked and when the software becomes too artificial, it stops working.

Default settings

Everything in our world is defined by the probabilities. Even crossing the road sometimes can be fatal. There is always the chance of the catastrophic outcome of anything; on the other hand, the opposite is also true - we can win 50 million in lotto.

When we build the software component, we have to rely on the probabilities of its usage.
Typically, the component has the set of the parameters. Naturally, all of them are set to some defaults.

How do we chose these defaults?

The rule is very simple and straightforward - the default must rely on the potential frequency of usage. If 99% of the developers set the param A to, say, 5 and the rest 1% set it to 10, means that the component must be released with the default set to 5. So, if the parameter is not set at all explicitly, the system will still be functional. That is obvious; however, the major component vendors for some reason keep forgetting this simple rule.
Imagine you are sending the letter to your beloved girlfriend, and in order this letter to be delivered you have to specify the color of the envelope, the number plate of post truck that will carry the  letter, the religion of driver and so on. Perhaps you will change your mind about the sending the letter at all. Clearly it is all irrelevant, you just want the letter to be delivered in the default manner, and if you need extra options, like confirmation of the delivery, you specify them separately.
However, exact same situation we have with WCF or different components or frameworks.
The configuration even for the simplest operation is enormous.

What is the difference between the server and the client?

The actual difference is only in who exactly initiates the connection, after the connection is made, there is no difference between the server and the client. The relation between them becomes peer-to-peer and the canonical software architecture bluntly ignores this fact.
They are no longer the client and the server. They interact with each other.
Let us take the example from the real life. You come to the restaurant for a dinner. The waiter is a typical server, and you are a client. You ask the waiter to approach and when he comes, you order the meal. Ok, up to this point the relation is client - server, but after the first words, the waiter has to clarify which kind of vodka-martini you prefer.  Shaken, may be stirred? In fact, you start talking and it is not as if you keep ordering everything until the very end.
The software (which is the reflection of our world) using standard components simply cannot do that. The software, most programmers use, is inadequate. We twist it one way or another, but it is not designed to serve us properly because people who designed it in the first place never ever thought about something real.

Brain surgery and coding

The ,software that works, copies the real world because the world around us simply works, as we know it.  Let us assume for a moment, you are a brain surgeon and right in the middle of the operation. At this moment, your wife calls you and starts talking about the cute kitten that plays in the backyard. What would you do?  Most likely, you hang up and later on you apologize for not being nice. What would the average software do? I suspect that in 99% of cases you should drop your brain surgery, talk to your wife and when the business with the cute kitten is finished, you get back to your (apparently dead) patient.
So, what was wrong? The priority. We do not think about it much, but our life is the set of the priorities and the robust software must prioritize the action, otherwise it ends up like our unlucky patient. The priority can be static or the dynamic one, depending on the real task.
The software is firstly a system, and secondly is the sequence of the commands.

If we have just a couple of components, it is easy to interact. Just an ordinary event handling will do the trick:
Writer.MessEvent += new /...
void HereWeReceive(string mess)
What if we have thousands of subsystems and they have to interact?
If we just use the simple event handling, the system stalls if one of the components develops a fault. Oops!  So, the system must be built in a way that allows to ignore less significant signals. It is how it is happening in the real life. The chirping of the bird up in the tree should not stop our heart beating. 
The real robust system always has more than one level of the signal priority and typically it is implemented through having more than one message delivering system.
In practical terms there always should be the subsystem that runs in its own thread. Without multithreading it is physically impossible to ignore the useless or wrong signal because of sequential nature of our CPUs. In our body we have also multiple signal delivery systems - central nervous , peripheral nervous, endocrine etc  because they also have different speeds and priorities.
The rule of thumb is - the less important the signal is, the less the probability of delivering it to the core of the system, the peripherals should deal with the garbage. The least important signals have to be processed locally without even delivery to the core.


How to handle the exceptions?
It is so much written about it. Is everything that is written wrong? No, it not wrong, simply sometimes it is good to look at the things under the different angle.
The way how the exception has to be used, firstly should depend on what we are going to do with this exception. In some organizations, there are very strict rules on how the exceptions should be handled. Usually it requires the error code, message and something else.
The error codes could be possibly put in the list with thousands of numbers (typically it is unsigned integer). Therefore, when the exception occurs, we know the error code. How nice! However, the point is, why would we need the error code in the first place?
The error code we need only for the recovery from the fault in order the system should be able to undertake some action to recover. However, in 99.99% of cases no such an intelligent recovery system was ever implemented, in fact, it might be right. The design of such a recovery system is already a challenge and usually the waste of the recourses.
So, why do we need to maintain the tables with thousands of the error codes?
As we can see the designers of this system did not think that the exception handling system is not only the recovery system, it is also a signal delivery system and the signal once it is delivered, must be interpreted, otherwise the delivery does not make sense whatsoever. The signal that was delivered and not interpreted is a garbage by definition. The smart designer has to take this into consideration - what to deliver and the most important why. Getting back to the practical code, the rule is - the error message is usually the most important info because it is interpreted by the humans when other systems fail, whereas the error code is kind of optional, depending on what is implemented in terms of fault recovery. Usually it is nothing.


"That which does not kill us makes us stronger."
Friedrich Nietzsche
What is redundancy? The redundancy is the excessive resources that can be used in the case of emergency.

Racing car example

what about redundancy in the racing car. Well, it must be zero. The ideal racing car should fall apart right after it crosses the finish line.
Have you seen the old healthy person?  One day he falls ill, nothing serious, probably a flu and a few days later he dies from a kidney failure. Why? He looked healthy.
In fact not only looked, but he was healthy. Why did he die? He died because all his redundancies were exhausted and any external cause (flue in our case) killed him. What happened? Simply the flu triggered a chain reaction, it stressed the immune system, then failure of the immune system caused the kidney infection and the person died. What does it have to do with the software?  Same thing, the software modules that have some degree of freedom must have the redundancy otherwise, any stress on the individual component will provoke the chain reaction and eventually will cause a catastrophic failure.

No comments:

Post a Comment