软件工程
Software Development
Software Engineering
Software Engineering is concerned with constructing software systems that are:
- Correct, meaning they do exactly what it should;
- Reliable, meaning the don't crash;
- Robust, meaning they can handle unforeseen or error conditions by logging an alert, cleaning up, and proceeding if possible;
- Predictable, meaning their behavior is never shocking (without good reason);
- Efficient, meaning they do not take too long to run, nor consume more memory or system resources than is reasonable;
- Understandable, meaning that you can look at source code and tell what they do;
- Reusable, meaning the components from which they are built can function unchanged in other applications;
- Scalable, meaning that new features can be added without excessive restructuring of existing code, and without impacting its run-time performance;
- Maintainable, meaning that bugs can easily be isolated and fixed;
- Appropriate, meaning they do what their users want them to do; and
- Economical, meaning they are produced on-time and under-budget, and are fairly priced.
Software engineering is fundamentally different from other engineering disciplines, argues Jack Reeves in thesethree excellent essays.
Why Study Software Engineering?
According to Booch, "industrial-strength" software is inherently complex. Steve McConnell calls it a wicked problem. No single person can understand all the subtleties of the design of a large software system. Why?
- Requirements are under- or over-specified, contradictory, too unintelligible, always changing.
- Development process involves too many people, different machines, excessive documentation.
- Software has unlimited flexibility. (A carpenter wouldn't truck in a freshly cut tree and set up a lumber mill on your front lawn to make you a new front door, but many programmers are guilty of writing their own linked-list classes.)
- Discrete systems are hard to characterize (combinatorial explosion in number of states).
To construct complex software we should try to understand the nature of complex systems in general, and see how we deal with them.
Complex Systems
An empirical study of various complex systems (e.g., matter, personal computers, plants and animals, social institutions) reveals:
- Hierarchy: different levels of abstraction are built upon each other.
- Strict separation of concerns between levels.
- Little or no centralization: the high level functionality appears due to a cooperation of agents (so concurrency is fundamental).
- Strong intracomponent linkages (strong cohesion) and weak intercomponent linkages (loose coupling).
- Economy of Expression: similar building blocks in different levels (e.g. cells, transistors, quarks/leptons, people).
- As systems evolve, objects that were once considered complex become the primitive components of the next generation system.
How Humans Deal With Complexity
We manage complexity using abstraction, classification, and hierarchy.
ABSTRACTION | Recognition of fundamental concepts, structures and behaviors, without concern for implementation details |
CLASSIFICATION | Recognition that every object is an instance of some class |
HIERARCHY | Distillation of essential similarities and differences
|
Abstraction
Abstraction is probably the single most important principle in Computer Science. It is the primary way humans deal with complexity, and software systems are humankind's most complex creations. You have to be able to view software components in an abstract way, that is, you have to be able to describe what they do, without relying on describing how they do it.
Examples
- Driving a car — you don't have to know how internal combustion, fuel cells, or batteries work in order to drive.
- Using a microwave — you don't have to know the physics to cook;
- Talking on a phone — you don't have to know how your voice is encoded or how calls are routed to communicate.
- Setting a thermostat — you don't have to know what gets the AC or heater to fire up in order to set a temperature.
- Ripping a video or music disc — you don't need to know how the content is read off the disc and stored in memory.
Primary types of abstraction in programming
- Procedural Abstraction: You call a function with a known interface but you do not know or care how it runs (what code implements it).
- Data Abstraction: You declare objects of a known type, but you do not know or care how those objects are laid out in memory, nor how the operations that manipulate them work.
Examples of Data Abstraction
- Integers can be represented in 1's-complement, 2's-complement, BCD, ...
- There are many different formats for floating point values
- There are different file systems, e.g. NTFS, FAT, xfs, e2fs
- Lists can use array or linked structures
Classification
Classification is identifying that a number of objects have similar structure and behavior and giving a name to that class of objects. For example "Dog" is a class and your particular dog is an object of that class.
Hierarchy
Hierarchy is how levels of abstraction are organized.
Is-a (Kind-of) Hierarchy
Shows classes and subclasses. Moving up is called generalization (recognizing that different classes share some similarities), moving down is called specialization (factoring a class into subclasses which are different from each other in some ways).
Has-a (Part-of) Hierarchy
Shows classes in a containment hierarchy. Moving up is called composition (combining parts to form larger objects), moving down is called decomposition (breaking a larger structure down into components). In a composition relationship, contained objects are completely owned by the container: if the containing object goes away, so does the containee.
Member-of (Aggregation) Hierarchy
Shows classes related by groups and subgroups. This is very similar to composition except that the members of a group continue to exist even if the group goes away.
Software Development Methodologies
Many software development methodologies, or processes, have been created. Usually they fall into a spectrum from adaptive to plan-driven. Many of the more adaptive are known as agile methods. The most plan-driven method is probably the waterfall model. This is pretty much despised as a way to build software, because it doesn't work. It does work in heavy manufacturing and similar industries, though.
Some methodologies:
:
- Rational Unified Process (RUP)
- Extreme Programming (XP)
- Scrum
- Agile Unified Process (AUP)
- Open Unified Process (OpenUP)
- DSDM
Boehm and Turner give a great characterization of "home grounds" for adaptive and plan-driven methods (loosely summarized here):
Adaptive | Plan-driven |
---|---|
Goal is to respond quickly to change | Goals include predictability and stability |
Smaller teams, more senior developers | Larger teams, more juniors |
Tacit interpersonal knowledge | Explicit documentation |
User stories | Explicit, formal, detailed, requirements |
Simple designs | Detailed and extensive designs |
Culture "thriving on chaos" | Culture "thriving on order" |
Phases, Iterations, and Workflows
Software Development is an incremental and iterative process (waterfall doesn't work for software). You iterate because coding might show part of the design was infeasible, maintenance requires recoding, the customer will change requirements just when the product is about to be shipped, etc.
The major elements of a development cycle are phases, iterations and workflows. This diagram shows the four major phases (inception, elaboration, construction and transition). Within each phase you do a number of iterations. An iteration results in the development of a complete, executable subset of the system. The diagram (from the RUP) shows how much effort within a given workflow you put into an iteration.
Current Research
- Physical Distribution
- Concurrency
- Replication
- Security
- Load Balancing
- Fault Tolerance
- Grid Computing
- Data Mining
- Robotics
Technologies that Help
- Componentware
- Visual Programming
- Patterns
- Frameworks
- Application Containers
- Aspect Orientation
Producing Efficient Software
An efficient algorithm minimizes cost which is one or more of:
- Time, which is affected by algorithmic issues like the could be number of operations, or physical issues like communication delays between network nodes, between processors, between the CPU and a graphics chip, between the CPU and its cache or memory, etc. (In more technical terms we want to minimize network hops, page faults, context switches, and so on).
- Space, which refers to the amount of memory or disk space required to "hold intermediate results".
- Coding time, which goes way up if you're not experienced or are unaware of existing libraries and solutions that you can adapt.
- Verification and debugging time, which goes up when the code is hard to read or understand)
- System integration effort, which is too high when there are artificially too many little pieces of code written by too many different people.
Caskey's Law of Software Development
"A good system must first and foremost be easy to modify and extend." —Caskey Dickson
http://larch.cs.iastate.edu/ComS541Fall97/hw-pages/paradigms/major.html
Major Programming Paradigms
There are several kinds of major programming paradigms:
It can be shown that anything solvable using one of these paradigms can be solved using the others; however, certain types of problems lend themselves more naturally to specific paradigms.
Imperative
- Introduction:
The imperative programming paradigm assumes that the computer can maintain through environments of variables any changes in a computation process. Computations are performed through a guided sequence of steps, in which these variables are referred to or changed. The order of the steps is crucial, because a given step will have different consequences depending on the current values of variables when the step is executed. - Imperative Languages:
Popular programming languages are imperative more often than they are any other paradigm studies in this course. There are two reasons for such popularity:- the imperative paradigm most closely resembles the actual machine itself, so the programmer is much closer to the machine;
- because of such closeness, the imperative paradigm was the only one efficient enough for widespread use until recently.
- Advantages
- efficient;
- close to the machine;
- popular;
- familiar.
- Disadvantages
- The semantics of a program can be complex to understand or prove, because of referential transparency does not hold(due to side effects)
- Side effects also make debugging harder;
- Abstration is more limitted than with some paradigms;
- Order is crucial, which doesn't always suit itself to problems.
Logical
- Introduction:
The Logical Paradigm takes a declarative approach to problem-solving. Various logical assertions about a situation are made, establishing all known facts. Then queries are made. The role of the computer becomes maintaining data and logical deduction. - Logical Paradigm Programming:
A logical program is divided into three sections:- a series of definitions/declarations that define the problem domain
- statements of relevant facts
- statement of goals in the form of a query
Any deducible solution to a query is returned. The definitions and declarations are constructed entirely from relations. i.e. X is a member of Y or X is in the internal between a and b etc. - Advantages:
The advantages of logic oriented programming are bifold:- The system solves the problem, so the programming steps themselves are kept to a minimum;
- Proving the validity of a given program is simple.
Functional
- Introduction
The Functional Programming paradigm views all subprograms as functions in the mathematical sense-informally, they take in arguments and return a single solution. The solution returned is based entirely on the input, and the time at which a function is called has no relevance. The computational model is therefore one of function application and reduction. - Languages
Functional languages are created based on the functional paradigm. Such languages permit functional solutions to problems by permitting a programmer to treat functions as first-class objects(they can be treated as data, assumed to have the value of what they return; therefore, they can be passed to other functions as arguments or returned from functions). - Advantages
The following are desirable properties of a functional language:- The high level of abstraction, especially when functions are used, supresses many of the details of programming and thus removes the possibility of commiting many classes of errors;
- The lack of dependence on assignment operations, allowing programs to be evaluated in many different orders. This evaluation order independence makes function-oriented languages good candidates for programming massively parallel computers;
- The absence of assignment operations makes the function-oriented programs much more amenable to mathematical proof and analysis than are imperative programs, because functional programs possess referential transparency.
- Disadvantages
- Perhaps less efficiencey
- Problems involving many variables or a lot of sequential activity are sometimes easier to handle imperatively or with object-oriented programming.
Object-Oriented
- Introduction
Object Oriented Programming (OOP) is a paradigm in which real-world objects are each viewed as seperate entities having their own state which is modified only by built in procedures, called methods.
Because objects operate independently, they are encapsulated into modules which contain both local environments and methods. Communication with an object is done by message passing.
Objects are organized into classes, from which they inherit methods and equivalent variables. The object-oriented paradigm provides key benefits of reusable code and code extensibility. - Features & Benefits
A new class (called a derived class or subclass) may be derived from another class (called a base class or superclass) by a mechanism called inheritance. The derived class inherits all the features of the base class: its structure and behavior(response to messages). In addition, the derived class may contain additional state (instance variables), and may exhibit additional behavior (new methods to resond to new messages). Significantly, the derived class can also override behavior corresponding to some of the methods of the base class: there would be a differentmethod to respond to the same message. Also, the inheritance mechanism is allowed even without access to the source code of the base class.
The ability to use inheritance is the single most distinguishing feature of the OOP paradigm. Inheritance gives OOP its chief benefit over other programming paradigms - relatively easy code reuse and extension without the need to change existing source code.
The mechanism of modeling a program as a collection of objects of various classes, and furthermore describing many classes as extensions or modifications of other classes, provides a high degree of modularity.
Ideally, the state of an object is manipulated and accessed only by that object's methods. (Most O-O languages allow direct manipulation of the state, but such access is stylistically discouraged). In this way, a class' interface (how objects of that class are accessed) is separate from the class' implementation (the actual code of the class' methods). Thus encapsulation and information hiding are inherent benefits of OOP.