Chapter 2
In this chapter we seek to understand why current technologies
for composing software components into new applications often
require so much additional design effort, as to make the development
of applications from scratch preferable to the reuse of existing
components. We analyze the process of component composition and
argue that the difficulties of performing its steps are closely
related to the failure of current programming languages and design
tools to separate the implementation of component interconnection
protocols from that of the "core" function of each component.
We then argue that the composition effort can be reduced, if we
separate the design problem of component interconnection from
that of implementing a component's core function, and begin to
comprehend, represent and systematize its design dimensions and
design alternatives in a coherent framework. This argument forms
the principal thesis of the work. We outline a set of notations,
theories and tools needed to support this separation. We propose
a new, improved component composition process based on them. This
sets the stage for the rest of the thesis, which presents one
concrete implementation of those requirements and demonstrates
how they can form the basis of a useful methodology for component-based
application development and maintenance.
2.1 Software Applications are Interdependent
Sets of Activities
When a software engineer begins the design of a new software application,
he/she often draws a diagram labeled the "software architecture".
The diagram usually consists of boxes, depicting the main functional
pieces of the application, and lines, depicting interconnection
relationships among functional pieces. For example, Figure 2-1
shows one way of representing the architecture of a document processing
application based on the TEX system.
Figure 2-1: A software architecture for a document
processing application. Boxes correspond to activities while labeled
arcs represent dependencies.
At this level of description, software applications are systems
of interdependent software activities. In this thesis, we will
use the term activities to describe the major functional
elements of an application. We will use the term dependencies
to describe relationships and constraints among activities, such
as data and control flows, timing constraints, or resource sharing
relationships. Table 2-1 lists some dependency types commonly
encountered in software systems.
Software architecture diagrams are specification-level
descriptions of software systems. In order to obtain the desired
behavior of an application, activities must be implemented by
executable activities or software components and
dependencies must be managed by appropriate coordination processes.
Software components are software entities of various forms, including
source code modules, executable programs, or remote network services.
Coordination processes implement component interconnection protocols,
such as pipe channels (manage one-to-one flow dependencies), client/server
organizations (manage many-to-one flow dependencies), semaphore
protocols (manage mutual exclusion dependencies), etc.
Some coordination processes correspond to atomic language or operating
system protocols (such as a simple procedure call). Other describe
composite protocols which introduce additional activities and
dependencies to be integrated into the rest of the system. It
is important to observe that, whereas software activities relate
to concepts specific to the application domain, interdependency
patterns relate to concepts orthogonal to the problem domain
of most applications, such as transportation and sharing of resources,
or synchronization constraints among components.
Dependency Type | Design Dimensions | Coordination Processes |
Prerequisite | - number of precedents
- number of consequents - relationship among precedents (And/Or) | - Explicit Notification
- Event Polling |
Mutual Exclusion | - number of participants | - Semaphore protocol
- Token Ring protocol |
Data Flow | - number of
producers
- number of consumers - data type(s) produced - data type(s) consumed | - Shared Memory protocol
- Pipe protocol - Client/Server protocol |
Shared Resource | - number of users
- modes of usage - resource sharing properties | - Time-Sharing
- Paging - Replication |
Table 2-1: Some common types of dependencies encountered
in software systems and representative coordination processes
for managing them.
________________________________________________________________________
Example 2-1: Dependencies and Coordination Processes
Figure 2-2 depicts a simple architecture where a one-to-one flow
dependency connects two activities. One way of managing this dependency
is by setting up a pipe protocol between the two activities. The
pipe protocol introduces three new activities and two new dependencies
into the system.
Figure 2-2: A simple application architecture
and a coordination process for managing a data flow dependency
between its activities.
________________________________________________________________________
Example 2-2: Dependency patterns define system behavior
Figure 2-3: Interconnecting the same activities
in two different ways results in two different applications.
The purpose of this example is to demonstrate the equal importance
of activities and dependencies in determining the behavior of
an application. Figure 2-3 shows the architecture of two applications
where an identical set of activities has been interconnected in
two different ways. The behavior of the two applications is quite
different. Application (a) outputs the squares of all members
of its input sequence that are below 100 (since 1002
= 10000). Application (b) outputs the squares of all members of
its input sequence that are below 10000.
________________________________________________________________________
In conclusion, at the architectural level activities and dependencies
are two distinct elements of complex software applications. Both
are explicitly represented and both are equally important to the
definition of a system.
2.2 Implementation Languages Focus
on Components
Despite the equal status of activities and interdependencies in
informal descriptions of a software architecture, as design moves
closer to implementation, current design and programming tools
increasingly focus on components, leaving the description of interconnections
among components implicit, distributed, and often difficult to
identify. At the implementation level, software systems are sets
of source and executable modules in one or more programming languages.
Although modules come under a variety of names and flavors (procedures,
packages, objects, clusters etc.), they are all essentially abstractions
for components.
Traditional language mechanisms for specifying module interconnections
are essentially variations of the procedure call interface. They
specify input and output parameters, and in some cases, lists
of imported and exported procedures. They are not sufficient for
describing more complex interconnections that are commonplace
in today's software systems (for example, a distributed mutual
exclusion protocol). As a consequence, support for complex module
interconnections is either left implicit, relying on the semantics
of programming languages and operating systems, or is broken down
into fragments, and embedded within the modules themselves.
________________________________________________________________________
Example 2-3: Distribution of Component Interconnection Protocols
This example will demonstrate how protocols for implementing even
the simplest interconnection relationships, become fragmented
and distributed among several code modules when the system is
implemented in a conventional programming language.
Figure 2-4: Implementation languages distribute
support for component interconnection protocols among several
code modules.
Figure 2-4 shows the architecture and one implementation of a
simple application that generates, and then displays, a body of
text. Management of the single flow dependency requires a mechanism
for transporting the text from activity Generate Text to activity
Display Text. It also requires a mechanism for ensuring that activity
Display Text reads the text only after activity Generate Text
has written it. In this particular implementation we have chosen
to manage the flow dependency among the two application activities
using a sequential file. Component generate opens, writes and
closes the file, while component display opens, reads, and closes
the file. The main program passes the shared filename to both
components.
In this program, the management of a single data flow dependency
has been distributed to three different places. Procedures generate
and display each contain one part of the file transfer protocol
embedded in their code body. Less obviously, the management of
the read-after-write prerequisite assumption is being implicitly
managed by ordering the invocation of generate before that of
display in the (sequential) main program.
The distribution of even such a simple protocol in three different modules has the consequence that each of the resulting modules now contains certain implicit interconnection assumptions about other modules with which it will work together. For example, module display assumes that, by the time it begins execution, some other module will have written the data it needs to a file. When attempting to reuse display in a different context, these assumptions will quite often not hold. In those cases, the code of the module must be manually modified to match the interconnection assumptions that are valid in the new context.
________________________________________________________________________
A visual metaphor that might be useful in order to understand
the loss in expressive power as we move from architectural diagrams
to system implementations, is the separation of a 3-dimensional
image into its projections on a set of 2-dimensional planes. Let
us try to picture a software application as a set of black boxes,
each corresponding to a functional component. In 3-dimensional
space, black boxes are interconnected by 3-dimensional connectors,
which manage their interdependencies. This corresponds to the
architectural view of an application. Using this metaphor, the
lack of explicit support for representation of interconnections
at the implementation level corresponds to an inability to represent
the third dimension. Programs are thus like sets of 2-dimensional
planes. In this metaphor, planes correspond to code-level modules.
In order to map a 3-dimensional image into a set of planes, we
take its projection into each of the planes. At the end of the
process, each plane encodes a "core" functional component
but also contains the trace of an interconnection protocol embedded
in it (Figure 2-5 ).
Trying to understand a system by reading its code modules is equivalent
to trying to mentally reconstruct a 3-d image from its 2-d projections.
The separation of interconnections into a set of 2-dimensional
traces, blurs the ostensible function of each module and makes
global understanding of the interdependency patterns a nontrivial
mental exercise. Furthermore, as we shall see, it has negative
implications on the reusability of modules thus constructed.
Figure 2-5: Moving from architectural diagrams
to system implementation is similar to projecting a 3-d image
onto a set of 2-d planes.
Several researchers [Shaw94a, Allen94] have identified this expressive
shortcoming of programming languages and systems and have discussed
its negative effects in developing and maintaining software systems.
This thesis makes the additional argument that the lack of explicit
support for representing component interdependencies and implementing
component interconnections separately from the components themselves,
is one of the principal causes for the difficulty of reusing software
components in new applications. The following sections will attempt
to make this argument more explicit.
2.3 Implicit Interconnection Assumptions
Pose Obstacles to Component Reuse
Implementation-level software modules (source or executable) are
the usual candidate units for software reuse. This section will
discuss why the current practice of ignoring the significance
of architectural interdependencies and embedding component interconnections
within components, increases the difficulty of reusing these components
in new applications.
As described in the previous section, components built with current
technologies make strong, and usually undocumented, assumptions
about their interdependencies and interconnections with other
components in the same application. More precisely, they contain
fragments of protocols that implement their interconnections with
the rest of the system in their original development environment.
Such assumptions are either hardwired into their interfaces, embedded
in their code bodies, or left implicit, relying or specific properties
of the environment for which they were originally designed.
Since interconnection relationships is one of the defining elements
of each individual application, different applications are expected
to contain different interconnection relationships. Therefore,
the interaction assumptions embedded inside a component will most
likely not match its interconnection patterns in the target application.
In order to ensure interoperability, the original assumptions
have to be identified, and subsequently replaced or bridged with
the valid assumptions for the target application. In many cases
this requires extensive code modifications or the writing of additional
coordination software (or "glue" code).
2.3.1 A Taxonomy of Interconnection
Assumptions
Since the problem of component composition is closely related
to that of managing component interconnection assumptions, it
is useful at this point to introduce a framework that characterizes
the various categories of interconnection assumptions encountered
in reusable software components. Our framework classifies assumptions
according to the following two dimensions:
a. The design level (specification, implementation) of the assumption
b. The location of the assumption in the component
With respect to their design level, interconnection assumptions
are classified into:
Protocol assumptions. These are implementation-level assumptions
which result from the fact that components encode parts of their
interconnection protocols into their bodies. For example, a producer
module which was originally designed as a UNIX filter contains
the writer part of the pipe protocol and assumes it will be interacting
with modules which encode the corresponding reader part of the
protocol.
Architectural assumptions. These are specification-level
equivalents of protocol assumptions. They can be expressed as
constraints on the patterns of allowed interconnection relationships
between a component and the rest of the system. Take, for example,
the previously mentioned producer module, implemented as a UNIX
filter. Since pipe protocols manage one-to-one flow dependencies
(each value written to a pipe can only be read once), that module
contains the architectural assumption that each of its values
will flow to a single consumer component. Architectural assumptions
are more difficult to identify than protocol assumptions, because
there exist no systematic frameworks for mapping interconnection
relationships to interconnection protocols, and vice-versa.
With respect to their location inside a component, interconnection
assumptions are classified into:
Interface assumptions. These are protocol and architectural
assumptions encoded into a module's parameter and import/export
interface. Let's consider a simple procedure call as an example.
Protocol assumptions are the most obvious: they include the names,
data types, and ordering of parameters it expects. However, procedure
interfaces also encode a set of implicit, architectural assumptions:
These include the fact that all input parameters are expected
to flow from a single place (the point of call), that all parameters
will be available when control flows into the procedure body,
and that output parameters flow to the same place that inputs
came from.
Assumptions embedded in the code. These assumptions relate to interconnection protocol fragments found in the code bodies of components and their associated dependency patterns. For example, a Dynamic Data Exchange (DDE) server module written for the Microsoft Windows environment, contains the server part of a DDE protocol embedded in its code. It assumes it will be interacting with DDE client components containing the corresponding client part of the protocol. From an architectural perspective, it assumes it will be connected to other components through many-to-one flow dependencies.
Implicit assumptions. These assumptions are manifested by the absence of any particular mention in code and rely instead on properties of the environment for which components were originally designed. An example is a user interface event loop, originally designed for an environment with preemptive scheduling. It implicitly assumes it will be sharing the processor with all other applications running in the system. Therefore, it does not contain statements that periodically yield the processor. The explicit addition of such statements would be required in environments without
Figure 2-6: Architecture of a simple application and descriptions of candidate components for implementing its activities.
preemptive scheduling. Another example is a module which assumes
it will be the only writer of a file in the system and thus does
not contain any statements for locking the file. In order to use
the module in an environment with multiple concurrent file writers,
a file locking protocol must be introduced.
________________________________________________________________________
Example 2-4: Construction of a simple application from existing
components
The following example demonstrates how the difficulties of reusing
software components in new applications are a consequence of mismatches
between the original component interconnection assumptions and
the actual interdependency patterns of the target application.
Figure 2-6 shows the architecture of a simple market simulator
application. We wish to build the application using existing components,
whose interfaces and requirements are also summarized in the same
diagram.
The first important mismatch occurs at the inputs of activity
Estimate Profits. Procedure calcprof expects to receive both its
input parameters, via a procedure call (protocol assumption),
from a single place (architectural assumption). However, in this
application, it is connected to two different activities, each
of which passes its respective parameter to a different procedure.
Module gencost repeatedly calls procedure nextcost for each new
cost data item it generates. Likewise, module gendemand independently
calls procedure nextdemand for each new demand data item it generates.
Somehow the two values must be collected together and transmitted,
via a single procedure call, to calcprof. One way to achieve this
is to write one of the values to a global variable, and then read
that variable and call calcprof from inside the procedure to which
the other value has been passed. Such a scheme requires additional
coordination to ensure that (a) the global variable is not read
by the second procedure before it has been written by the first
procedure, and (b) that the first procedure does not overwrite
the variable before the second procedure has read and transmitted
each value.
The other mismatch occurs at the output of activity Estimate Profits.
Since calcprof writes its output values to a pipe, it implicitly
assumes (a) that there will be a pipe reader at the other end
(protocol assumption), and (b) that each value will be used by
a single consumer activity (architectural assumption). In this
example, however, two activities need to read each value. In order
for this to happen, additional software must be provided that
reads the pipe and makes each value available to both consumer
activities. In the case of activity Display results this involves
setting up a second pipe. In the case of activity Log results
a simple procedure call is sufficient.
________________________________________________________________________
We observe that, even in a simple system, the process of component
composition is not trivial. In more complex systems, carrying
out the above process might involve significant design effort,
to the extent that many designers prefer to build new applications
from scratch rather than reuse existing components they might
have at hand. Progress has been made in automating the handling
of some of the most "obvious" categories of protocol
assumption mismatches, such as interface assumptions. However,
up to this date, there has been little success in handling more
complex protocol assumptions, as well as in treating all different
sources of assumption mismatches under a unifying framework. Researchers
have only recently began to recognize the importance of architectural
assumption mismatches [Garlan95].
Step 1: | Identify original component assumptions Identify the interconnection assumptions of each component. |
Step 2: | Determine new patterns of interaction Determine the patterns of interconnection of each component with the rest of the system in the target application. |
Step 3: | Mediate component assumption mismatches Replace or mediate the original component assumptions in order to make them compatible with the patterns of interconnection in the target application. |
.
Figure 2-7: Conventional view of the process for
building software applications from existing components.
We claim that an important source of difficulties in this area is due to our failure to recognize component interconnection as a separate design problem, provide tools for visualizing component interdependencies, and separate support for component interactions from the implementation of a component's core function. Most designers ignore (or do not consciously take into account) the dimension of component interdependencies and view the process of component composition as an ad-hoc effort to bridge mismatches among components (see Figure 2-7). This view of the process results in a number of practical difficulties:
Difficulties in identifying component assumptions
Difficulties in determining new patterns of interaction
Difficulties in replacing or mediating component assumptions
The following sections elaborate on each of these difficulties.
2.3.2 Difficulties in Identifying Component
Assumptions
By not recognizing component interconnections as an explicit element
of software systems, current programming languages leave them
implicit and often not documented. Designers have to understand
or deduce them by reverse-engineering the code. As explained in
Section 2.3.1, relevant assumptions might be hidden in a variety
of places, including:
Module interfaces. Such assumptions include the names,
order and data type of input and output parameters, as well as
names and signatures of imported and exported modules. They are
the most obvious and easy to identify assumptions. Researchers
have developed special notations, called Module Interconnection
Languages (MILs), to facilitate this task (see Section 7.3.1).
Unfortunately, they don't tell the whole story.
Code blocks. Apart from procedure argument lists, there
is a large variety of other means of component interaction, ranging
from global variables, to complex interprocess communication protocols
requiring elaborate setup and maintenance. Each participating
component will contain fragments of those protocols embedded in
its code blocks. Although good program design is able to help
localize such fragments, nothing in today's languages actually
forces programmers to do so. In many cases, designers have to
read the entire code in order to identify them.
Implicit environmental properties. The most problematic
assumptions are those which are manifested by the absence of any
particular mention in code and rely instead on properties of the
environment for which components were originally designed. There
is no tangible documentation of such assumptions, except in program
comments. In the worst case, designers must rely on their experience,
intuition, and knowledge of the development environment of the
component.
2.3.3 Difficulties in Determining New
Patterns of Interaction
This is the step where the lack of support for explicit representation
and management of component interdependencies creates the most
problems. The interactions of components in the target application
result from the management of their interdependencies in the new
system. Designers need to understand those new patterns of interdependencies,
map them to appropriate coordination processes that manage them
and, finally, determine the pieces of those processes that are
associated with each component in the system. We see, therefore,
that this step should, in fact, decompose into three distinct
steps, and be supported by explicit representations of the target
application architecture and frameworks for mapping interdependency
patterns to coordination processes.
An additional complexity comes from the fact that interconnection
protocols typically deal with issues orthogonal to those of the
problem domain of an application, such as machine architectures,
operating system mechanisms and communication protocols. This
increases the expertise requirements from the part of the designer
and complicates the mental effort needed to mix those orthogonal
concerns in the same code unit.
2.3.4 Difficulties in Replacing or
Mediating Component Assumptions
Since original component assumptions might be encoded in a variety
of places, replacing or mediating them might require several modifications
within or around a component. Furthermore, since target coordination
processes typically define a set of "roles" to be integrated
with several interdependent components, this, again, usually results
in the need to perform several modifications distributed across
the system.
The previous difficulties can, again, be illustrated by the 3-d-to-2-d-transformation
metaphor of the component composition process (see Section 2.2).
The input to the process (a set of modules to reuse) corresponds
to a set of 2-d planes, each of which encodes a specific component,
but also contains the trace of interconnections of this component
in its original environment. The output of the process corresponds
to another set of 2-d planes (the input set with possibly some
new modules added). The process consists in modifying the input
planes, erasing or modifying the projection of original interconnection
protocols, so that at the end of the process it has become equal
to the equivalent projection of the target interconnection protocols.
The visual process is greatly facilitated by the existence of
an intermediate 3-d representation, picturing interconnection
relationships among components in the target application. Managing
application interdependencies with coordination processes and
decomposing processes into pieces associated with each component
is equivalent to projecting the new 3-d model onto the set of
2-d planes that represent the original modules. It would be very
hard to imagine the form of the new projections without recourse
to this 3-dimensional view.
The problem with current technologies is that they do not provide
support for the equivalent of a 3-d view of software systems,
nor a systematic way of projecting "3-dimensional" interconnection
protocols onto existing "2-dimensional" software modules.
Even if the inputs and outputs of the process are still modules
expressed in our current "2-dimensional" idioms, the
existence of an intermediate "3-dimensional" architectural
view and algorithms for performing the equivalent of a projection
of that view to "2-dimensional" modules, would greatly
facilitate the task of component composition.
2.4 Component Interdependencies Deserve
First-Class Status
The preceding discussion provides the motivation for the main
thesis of this work. We have shown that the complexity of the
process of constructing software applications from existing components
is a consequence of our failure to separate context-specific support
for component interconnections from the implementation of a component's
"core" functionality.
We define software interconnection to be the design problem
of:
(a) specifying the patterns of dependencies among activities in a software application
(b) selecting coordination processes to manage dependencies and
integrating the resulting interconnection protocols with the rest
of the system
This thesis argues that the problem of component interconnection
should be treated as an orthogonal design problem from
that of the specification and implementation of a component's
core function. We should begin to comprehend, represent and systematize
its design dimensions and design alternatives in a coherent framework.
Such a separation will provide benefits, not only for the initial
development, but also for the maintenance and portability of component-based
applications.
2.4.1 Requirements for Separating Interconnection
from Implementation
The proposed separation translates to a number of concrete requirements for new notations, tools and theories to support the process of component composition. These requirements relate directly to the discussion of the difficulties presented in the Section 2.3, and can be summarized as follows:
2.4.1.1 Make component assumptions
explicit
A full separation of the problem of component interconnection from that of component implementation would require us to design components with minimal interdependency assumptions. In actual practice such a requirement would limit the reuse of components developed with current technologies. It might also pose stringent constraints to the development of new components. A more relaxed requirement is to develop component description notations that help make interconnection assumptions explicit.
2.4.1.2 Separate and localize representations
of component interdependencies
We need to be able to represent component interdependencies separately from components and to localize interconnection protocols that manage them. This requires the development of architectural languages with separate abstractions for software activities, dependencies, and coordination processes. It also requires the development of a vocabulary for specifying common component interdependencies.
2.4.1.3 Develop systematic design
guidelines for component interconnection
We need systematic frameworks for selecting coordination processes
that manage application interdependencies. Such frameworks should
enumerate the design dimensions of the problem of component interconnection.
They should provide mappings from patterns of dependencies to
sets of alternative coordination processes for managing them.
Finally, they should provide compatibility criteria and design
rules for selecting among alternative design choices.
Step 1: | Specify target application architecture
Express the architecture of the target application as a set of activities interconnected through dependencies. |
Step 2: | Associate activities to code-level components
Locate and select existing code-level components that implement the core functionality of application activities. |
Step 3: | Manage target application interdependencies
Select coordination processes that manage target application interdependencies. |
Step 4: | Integrate components and coordination processes
Integrate code-level components and coordination processes into sets of executable modules. |
Figure 2-8: An improved approach for building
software applications from existing components.
2.4.2 Benefits from Separating Interconnection
from Implementation
If we are able to satisfy the above requirements, we can develop
improved processes for developing and maintaining software applications
out of existing software components. Such processes will center
around explicit, abstract descriptions of software applications
at the architectural level. Figure 2-8 gives a generic description
of such an approach and Figure 2-9 a schematic high-level diagram
of the transformations involved.
Successful implementations of such processes should be able to
offer the following practical benefits:
2.4.2.1 Benefits to initial application
development
Independent selection of components. Designers should be
able to select components to implement activities independently
of one another. Candidate software components need not conform
to any particular set of standards or assumptions. Any mismatches
will be handled by coordination processes.
Routine management of dependencies. Dependencies should
be routinely managed by coordination processes based on systematic
design frameworks. Depending on how successfully frameworks are
developed, the process of dependency management should be assisted,
or even be automated, by design tools.
Minimal or no need for user-written coordination code.
Technologies for mediating original component assumptions and
integrating components and coordination processes into sets of
executable modules should be able to generate new applications
with minimal, or no need for additional, user-written coordination
code.
2.4.2.2 Benefits to application
maintenance
Easy replacement of components with alternative implementations.
Designers often need to change the code-level components that
implement the functionality of specific activities, in order to
reflect changes in functional requirements, or to take advantage
of new, improved software products. Applications should be easily
reconstructed after such changes, by reusing the same architectural
diagram and simply managing again the dependencies of the affected
activities with the rest of the system.
Easy porting of applications to new configurations. When
applications are ported to a new environment, their abstract architecture
(activities and dependencies) remains unaffected. However, the
use of different coordination processes might be required. By
making the step of dependency management a routine one, it should
be easy to manage them again for the new environment and construct
a new application from the original architectural description
and functional components.
Figure 2-9: A schematic view of the proposed process
for building component-based applications.
Our objective in the rest of this thesis is to propose one concrete
implementation of the requirements presented in Section 2.4.1
and to demonstrate how they can be combined into a useful methodology
for component-based application development. The methodology is
based on the process sketched in Figure 2-9 and offers the practical
benefits outlined in Section 2.4.2. The end products of the work
include:
An architectural language (SYNOPSIS) for describing software applications,
with explicit support for activities, dependencies, and coordination
processes. Chapter 3 is devoted to a detailed description of SYNOPSIS.
A vocabulary of commonly encountered activity interdependencies
and a design space of associated coordination processes. Chapter
4 presents the vocabulary and design space.
An application development tool (SYNTHESIS) that allows designers
to enter architectural descriptions of applications and is able
to assist the process of generating execus systems by successive
semi-automatic specializations of their architecture.
Chapter 5 proposes one concrete implementation of the process
outlined in Figures 2-8 and 2-9. The design assistant component
of the SYNTHESIS system is based on that process. Finally, Chapter
6 presents a number of experiments that test and validate the
feasibility and usefulness of the entire approach.