Chapter 1

Introduction



1.1 Motivation

Many would argue that future breakthroughs in software productivity will depend on our ability to combine existing pieces of software to produce new applications. Less than 15% of the software developed is innovative (i.e. "unique, novel, and specific to individual applications") [Jones84]. The large bulk of the code in our software systems implements solutions to routine design problems, such as sorting, searching, and manipulating well-known data structures.

In mature engineering disciplines, such as chemical or civil engineering, routine design problems are rapidly solved by reusing existing designs [Kogut94a]. The knowledge for routine design is captured in standardized descriptions, organized in design handbooks, and shared within the engineering community. Thus, designers are able to rapidly construct large, high-quality systems, out of well-tested, existing parts.

The potential for reuse certainly exists in the software development domain. In fact, over the past decade, there has been a very significant amount of work in the area of software reuse (see [Biggerstaff89, Krueger92]). However, despite all this effort, software production is still dominated by build-from-scratch techniques.

This work is broadly motivated by the question: Why is it so hard to build software applications out of existing components?

One difficulty lies in locating the desired components. In contrast to more mature disciplines, software engineering is still lacking comprehensive component libraries and design handbooks. The convergence of the industry to a small number of dominant designs, plus the easy access to software repositories offered by technologies such as the Internet [Obraczka93], is already making the process of finding appropriate components relatively easy.

But even when the components are at hand, composing them into new applications often requires so much additional effort, that in many cases designers still decide to build their applications from scratch.

Problems arise because the chosen parts "do not fit together well". Designers typically have to either modify the code of existing components, or write additional coordination software that bridges mismatches among components. Examples of such mismatches include:

low-level interoperability mismatches, such as differences in expected and provided procedure names, parameter orderings, data types, and calling conventions.

more fundamental architectural mismatches, that is, different assumptions about the structure of the application in which components are to appear. Such mismatches include differences in expected and provided communication protocols, different assumptions about resource ownership and sharing, different assumptions about the presence or absence of particular operating system and hardware capabilities, etc.

Most research efforts to facilitate the process of bridging component mismatches have focused on limited classes of interoperability mismatches, such as interface mismatches [Wileden91, Beach92, Purtilo94] or data type mismatches [Lamb87]. The importance of architectural mismatches has only recently been highlighted [Garlan95]. To this date there has been no unified framework for describing the various kinds of component mismatches, nor a systematic set of rules for dealing with them. Designers still have to rely on their intuition and experience, and the problem of component composition is still being confronted in a largely ad-hoc fashion.

1.2 Research Overview

1.2.1 Objectives

The practical objectives of this work is:

to better understand why it is currently difficult to build software applications by composing existing components, and

to propose a novel software system representation and a design framework that aim to reduce the effort of integrating software components into new applications.

1.2.2 Theoretical Argument

This work adopts an architectural perspective on software applications, viewing them as interdependent collections of software components At the architectural level, components and their interdependencies are represented as two distinct, equally important entities. Software components represent the core functional pieces of an application and deal with concepts specific to the application domain. Interdependencies relate to concepts orthogonal to the problem domain of most applications, such as transportation and sharing of resources, and synchronization constraints among components.

However, as design moves closer to implementation, current programming tools increasingly focus on representing components. At the implementation level, software systems are sets of source and executable modules in one or more programming languages. Although modules come under a variety of names and flavors (procedures, packages, objects, clusters, etc.) they are all essentially abstractions for components.

By failing to provide separate abstractions for specifying and implementing interconnection protocols among software components, current programming languages force programmers to distribute such protocols among the interdependent components. As a consequence, code-level components encode (apart from their ostensible function) fragments of interconnection protocols from their original development environments. These fragments translate into a set of undocumented assumptions about their dependencies with the rest of the system. When attempting to reuse components in new applications, such assumptions have to be manually identified and modified, in order to match the new interdependency patterns at the target environment. This often requires extensive modifications of existing code, or the development of additional coordination software.

This thesis argues that many of the practical difficulties associated with the above process are due to our current failure to recognize the problem of component interconnection as a separate design problem, orthogonal to the problem of implementing a component's core functionality. We need to develop programming languages and tools with support for abstractions that localize and separate the definition of interconnection relationships from that of the interdependent components. Furthermore, we need to understand the patterns of component interdependencies encountered in software systems, and develop frameworks that guide designers in selecting appropriate coordination processes for managing them.

If we can satisfy these requirements, we can develop improved processes for component-based application development. Such processes will center around architectural descriptions of applications, where software activities and their interdependency patterns will be explicitly represented by distinct entities. They will offer the following set of practical benefits:

Reuse of code-level components. Designers will be able to generate new applications simply by selecting existing components to implement activities, and coordination processes to manage dependencies, independently of one another. The development of successful frameworks for mapping dependencies to coordination processes will reduce the step of managing dependencies to a routine one, and enable it to be assisted, or even automated, by design tools. Overall, the objective is to be able to generate new applications with minimal, or no need for user-written coordination software.

Reuse of software architectures. Designers will be able to reuse the same architectural description, in order to reconstruct applications after one or more activities have been replaced by alternative implementations. They will simply have to semi-automatically re-manage the dependencies of the affected activities with the rest of the system. Furthermore, they will be able to reuse the same set of components in different execution environments by managing the dependencies of the same architectural description using different coordination processes, appropriate for each environment.

Insight into software organization alternatives. The description of software applications as sets of components interconnected through abstract dependencies will help designers structure their thinking about how to best integrate the components together. A design space of coordination processes for managing interdependency patterns will assist them to explore alternative ways of organizing the same set of components, in order to select the one which exhibits optimal design properties.

1.2.3 Deliverables

The main body of the thesis proposes a concrete implementation of the previous requirements and demonstrates how they can be integrated into a methodology for component-based application development. The following are the principal deliverables of this research:

SYNOPSIS: A software architecture description language

SYNOPSIS supports graphical descriptions of software application architectures at both the specification and the implementation level. It provides separate language entities for representing software activities and dependencies. Activities represent the main functional pieces of an application, while dependencies describe their interconnection relationships. An important attribute of dependencies is their coordination process. Coordination processes represent interconnection protocols that manage the relationships and constraints specified by their associated dependency. SYNOPSIS language elements are connected together through ports. Ports provide a general mechanism for representing abstract component interfaces. All elements of the language can contain an arbitrary number of attributes. Attributes encode additional properties of the element, as well as compatibility criteria that constrain its connection to other elements.

SYNOPSIS provides two mechanisms for abstraction: Decomposition allows new entities to be defined as patterns of simpler ones. It enables the naming, storage, and reuse of designs at the architectural level. Specialization allows new entities to be defined as variations of other existing entities. Specialized entities inherit the decomposition and attributes of their parents and can differentiate themselves by modifying any of those elements. Specialization enables the incremental generation of new designs from existing ones, as well as the organization of related designs in concise hierarchies. Finally, it enables the representation of reusable software architectures at various levels of abstraction (from very generic to very specific).

A design handbook of software component interconnection

To assist the design task of specifying application interdependencies, as well as the design of corresponding coordination processes, this work proposes a standardized, but extensible, vocabulary of dependency types and a design space of associated coordination processes. The vocabulary is based on the observation that most software interconnection relationships can be specified using a relatively narrow set of concepts orthogonal to the problem domain of most applications, such as resource flows, resource sharing, and timing dependencies. The implementation of interconnection protocols involves a similarly orthogonal set of coordination concepts, such as shared events, invocation mechanisms, and communication protocols. The development of an application-independent framework that captures the most useful patterns of interdependencies and the ways of managing them, can form the basis for a design handbook for integrating software components. The development of such a handbook aims to reduce the specification and implementation of software component interdependencies to a routine design problem, capable of being assisted, or even automated, by computer tools.

SYNTHESIS: A component-based application development environment

SYNTHESIS provides an integrated environment for developing software applications by combining existing components. The system provides support for:

Creating and editing software architectural diagrams written in the SYNOPSIS language.

Maintaining repositories of SYNOPSIS entities (activities, dependencies, coordination processes, ports), organized as specialization hierarchies.

Assisting, and in some cases automating, the process of generating executable applications by successive transformations of SYNOPSIS architectural diagrams.

The prototype implementation of SYNTHESIS that has been developed for this thesis contains a version of our component interconnection handbook encoded as a SYNOPSIS specialization hierarchy of dependency types. Using this repository, the SYNTHESIS design assistant is able to semi-automate the process of generating applications from their SYNOPSIS diagrams, by managing dependencies with coordination processes stored in the repository. This results in minimal, and often no need for user-written coordination software to "glue" the components together.

1.2.4 Validation

In order to demonstrate the thesis brought forth in Section 1.2.2, we need to provide positive evidence about:

the feasibility of describing applications as collections of orthogonal subcomponents, representing core activities and interdependencies

the usefulness of such a representation in facilitating both code-level and architectural software reuse, as well as in providing insight into software component organization alternatives.

The feasibility and usefulness of our claims has been demonstrated by building a prototype implementation of SYNTHESIS, and using it to perform a set of four experiments. Each experiment consisted in:

describing a test application as a SYNOPSIS diagram

selecting a set of components exhibiting various mismatches to implement activities

using SYNTHESIS and its repository of dependencies in order to integrate the selected components into an executable system

exploring alternative executable implementations based on the same set of components

The test applications include:

a File Viewer system, which integrates heterogeneous components written in C and Visual Basic.

a Key Word In Context index system, built by assembling components with mismatching architectural assumptions (UNIX filters and servers).

an Interactive TEX system, which integrates the components of the TEX document typesetting system in a WYSIWYG (what-you-see-is-what-you-get) ensemble.

a Collaborative Editor architecture, which extends the functionality of existing single user editors with group editing capabilities.

This experience has demonstrated that that the proposed architectural ontology and vocabulary of dependencies were capable of accurately and completely expressing the architecture of all four test applications. Furthermore, it has provided positive evidence for the principal practical claims of the approach. The evidence can be summarized as follows:

Support for code-level software reuse: SYNTHESIS was able to resolve a wide range of interoperability and architectural mismatches and successfully integrate independently developed components into all four test applications, with minimal or no need for user-written coordination software.

Support for reuse of software architectures: SYNTHESIS was able to reuse a configuration-independent SYNOPSIS description of a collaborative editor and the source code of an existing single user editor, in order to generate collaborative editor executables for two different execution environments (UNIX and Windows).

Insight into alternative software architectures: SYNTHESIS was able to suggest a variety of alternative overall architectures for integrating each test set of code-level components into its corresponding application, thus helping designers explore alternative designs.

1.3 Related Work

Both the problem of representing software application architectures, and that of developing applications from existing components have received attention in a variety of different research areas. This section briefly discusses the three research areas that are most closely related to this work. Chapter 7 is devoted to a detailed discussion of other related research.

Coordination Theory

Coordination theory [Malone94] focuses on the interdisciplinary study of coordination. Research in this area uses and extends ideas about coordination from disciplines such as computer science, organization theory, operations research, economics, linguistics, and psychology. It defines coordination as the process of managing dependencies among activities. Its research agenda includes characterizing different kinds of dependencies and identifying the coordination processes that can be used to manage them.

This work is directly related to coordination theory, in that it views the process of developing applications as one of specifying architectures in which patterns of dependencies among software activities are eventually managed by coordination processes.

It makes two principal contributions to coordination theory:

The development of SYNOPSIS, an architectural language that is able to support coordination theory research, with distinct abstractions for activities and dependencies. Although developed for describing software applications, SYNOPSIS can be used to describe other complex systems as well, such as business processes.

The definition of a vocabulary of dependency types and associated coordination processes for the domain of software systems.

This project grew out of the Process Handbook project [Malone93, Dellarocas94] which applies the ideas of coordination theory to the representation and design of business processes. The goal of the Process Handbook project is to provide a firmer theoretical and empirical foundation for such tasks as enterprise modeling, enterprise integration, and process re-engineering. The project includes (1) collecting examples of how different organizations perform similar processes, and (2) representing these examples in an on-line "Process Handbook" which includes the relative advantages of the alternatives.

The Process Handbook relies on a representation of business processes that distinguishes between activities and dependencies and supports entity specialization. It builds repositories of alternative ways of performing specific business functions, represented at various levels of abstraction. SYNOPSIS has borrowed the ideas of separating activities from dependencies and the notion of entity specialization from the Process Handbook. It is especially concerned with (1) refining the process representation so that it can describe software applications at a level precise enough for code generation to take place, and (2) populating repositories of dependencies and coordination processes for the specialized domain of software component integration.

Architecture Description Languages

Architecture Description Languages (ADLs) provide support for representing software systems in terms of their components and their interconnections [Kogut94b, Shaw94b]. They are used to model domain-specific software architectures so that new application systems can be built from existing solutions. They typically provide separate abstractions for representing components and their interconnections.

SYNOPSIS shares many of the goals and principles of ADLs. However, whereas previously proposed architectural languages only provide support for implementation-level connector abstractions (such as a pipe, or a client/server protocol), SYNOPSIS is the first language which also supports specification-level abstractions for encoding interconnection relationships (dependencies). It is also the first system that proposes a design framework for describing the most common interconnection relationships encountered in software systems, and for associating each relationship to appropriate implementations (coordination processes).

Operating, Concurrent, and Distributed System Design

Operating system research is concerned with developing algorithms for the allocation of system resources and the management of interdependencies among user and system processes. The study of concurrent and distributed systems in concerned with essentially the same problems in the special domains of systems with multiple processors or physically distributed components [Andrews91, Bacon93].

This work provides a unifying framework for organizing the techniques and algorithms developed in those areas, relates them to dependency patterns, and uses them to populate the design space of coordination processes for software systems.

1.4 Organization of the Thesis

The rest of the thesis is organized as follows:

Chapter 2 contains a detailed exposition of the underlying thesis of this work. It explains why current technologies for reusing software components into new applications often require significant additional design effort to resolve component mismatches. It argues for treating component interconnection as a separate design problem, orthogonal to the problem of implementing a component's core function. It outlines a set of practical requirements to achieve this separation and a set of practical benefits that are expected to result from it.

Chapter 3 is devoted to a description of SYNOPSIS, a software architecture description language. Its main distinguishing feature is a clear separation of an application's core functional pieces from their interconnection relationships.

Chapter 4 introduces a vocabulary of dependencies for describing software component interconnection needs. It also describes a design space of associated coordination processes. The vocabulary and design space can form the basis of a design handbook of software component interconnection, that facilitates the representation and solution of component interconnection problems.

Chapter 5 describes an algorithm for generating executable applications by successive transformations of their SYNOPSIS architectures. The algorithm forms the basis of the SYNTHESIS design assistant.

Chapter 6 describes the prototype implementation of SYNTHESIS. It also contains a detailed discussion of four experiments which demonstrate the feasibility and usefulness of the ideas proposed in this thesis.

Chapter 7 contains a detailed discussion of related research.

Finally, Chapter 8 summarizes the results of this work, distills some lessons learned, and discusses some ideas for future research.


Continue on to Chapter 2