首页资源分类其它科学普及 > software_testing_and_quality_assurance_theory_and_practice

software_testing_and_quality_assurance_theory_and_practice

已有 456409个资源

下载专区

文档信息举报收藏

标    签: softwaretesting

分    享:

文档简介

software_testing_and_quality_assurance_theory_and_practice

文档预览

SOFTWARE TESTING AND QUALITY ASSURANCE Theory and Practice KSHIRASAGAR NAIK Department of Electrical and Computer Engineering University of Waterloo, Waterloo PRIYADARSHI TRIPATHY NEC Laboratories America, Inc. A JOHN WILEY & SONS, INC., PUBLICATION SOFTWARE TESTING AND QUALITY ASSURANCE SOFTWARE TESTING AND QUALITY ASSURANCE Theory and Practice KSHIRASAGAR NAIK Department of Electrical and Computer Engineering University of Waterloo, Waterloo PRIYADARSHI TRIPATHY NEC Laboratories America, Inc. A JOHN WILEY & SONS, INC., PUBLICATION Copyright © 2008 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Naik, Kshirasagar, 1959– Software testing and quality assurance / Kshirasagar Naik and Priyadarshi Tripathy. p. cm. Includes bibliographical references and index. ISBN 978-0-471-78911-6 (cloth) 1. Computer software—Testing. 2. Computer software—Quality control. I. Tripathy, Piyu, 1958–II. Title. QA76.76.T48N35 2008 005.14 — dc22 2008008331 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 To our parents Sukru and Teva Naik Kunjabihari and Surekha Tripathy CONTENTS Preface xvii List of Figures xxi List of Tables xxvii CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES 1 1.1 Quality Revolution 1 1.2 Software Quality 5 1.3 Role of Testing 7 1.4 Verification and Validation 7 1.5 Failure, Error, Fault, and Defect 9 1.6 Notion of Software Reliability 10 1.7 Objectives of Testing 10 1.8 What Is a Test Case? 11 1.9 Expected Outcome 12 1.10 Concept of Complete Testing 13 1.11 Central Issue in Testing 13 1.12 Testing Activities 14 1.13 Test Levels 16 1.14 Sources of Information for Test Case Selection 18 1.15 White-Box and Black-Box Testing 20 1.16 Test Planning and Design 21 1.17 Monitoring and Measuring Test Execution 22 1.18 Test Tools and Automation 24 1.19 Test Team Organization and Management 26 1.20 Outline of Book 27 References 28 Exercises 30 CHAPTER 2 THEORY OF PROGRAM TESTING 31 2.1 Basic Concepts in Testing Theory 31 2.2 Theory of Goodenough and Gerhart 32 2.2.1 Fundamental Concepts 32 2.2.2 Theory of Testing 34 2.2.3 Program Errors 34 2.2.4 Conditions for Reliability 36 2.2.5 Drawbacks of Theory 37 2.3 Theory of Weyuker and Ostrand 37 vii viii CONTENTS 2.4 Theory of Gourlay 39 2.4.1 Few Definitions 40 2.4.2 Power of Test Methods 42 2.5 Adequacy of Testing 42 2.6 Limitations of Testing 45 2.7 Summary 46 Literature Review 47 References 48 Exercises 49 CHAPTER 3 UNIT TESTING 51 3.1 Concept of Unit Testing 51 3.2 Static Unit Testing 53 3.3 Defect Prevention 60 3.4 Dynamic Unit Testing 62 3.5 Mutation Testing 65 3.6 Debugging 68 3.7 Unit Testing in eXtreme Programming 71 3.8 JUnit: Framework for Unit Testing 73 3.9 Tools for Unit Testing 76 3.10 Summary 81 Literature Review 82 References 84 Exercises 86 CHAPTER 4 CONTROL FLOW TESTING 88 4.1 Basic Idea 88 4.2 Outline of Control Flow Testing 89 4.3 Control Flow Graph 90 4.4 Paths in a Control Flow Graph 93 4.5 Path Selection Criteria 94 4.5.1 All-Path Coverage Criterion 96 4.5.2 Statement Coverage Criterion 97 4.5.3 Branch Coverage Criterion 98 4.5.4 Predicate Coverage Criterion 100 4.6 Generating Test Input 101 4.7 Examples of Test Data Selection 106 4.8 Containing Infeasible Paths 107 4.9 Summary 108 Literature Review 109 References 110 Exercises 111 CHAPTER 5 DATA FLOW TESTING 112 5.1 General Idea 112 5.2 Data Flow Anomaly 113 5.3 Overview of Dynamic Data Flow Testing 115 5.4 Data Flow Graph 116 5.5 Data Flow Terms 119 5.6 Data Flow Testing Criteria 121 5.7 Comparison of Data Flow Test Selection Criteria 124 5.8 Feasible Paths and Test Selection Criteria 125 5.9 Comparison of Testing Techniques 126 5.10 Summary 128 Literature Review 129 References 131 Exercises 132 CHAPTER 6 DOMAIN TESTING 6.1 Domain Error 135 6.2 Testing for Domain Errors 137 6.3 Sources of Domains 138 6.4 Types of Domain Errors 141 6.5 ON and OFF Points 144 6.6 Test Selection Criterion 146 6.7 Summary 154 Literature Review 155 References 156 Exercises 156 CHAPTER 7 SYSTEM INTEGRATION TESTING 7.1 Concept of Integration Testing 158 7.2 Different Types of Interfaces and Interface Errors 159 7.3 Granularity of System Integration Testing 163 7.4 System Integration Techniques 164 7.4.1 Incremental 164 7.4.2 Top Down 167 7.4.3 Bottom Up 171 7.4.4 Sandwich and Big Bang 173 7.5 Software and Hardware Integration 174 7.5.1 Hardware Design Verification Tests 174 7.5.2 Hardware and Software Compatibility Matrix 177 7.6 Test Plan for System Integration 180 7.7 Off-the-Shelf Component Integration 184 7.7.1 Off-the-Shelf Component Testing 185 7.7.2 Built-in Testing 186 7.8 Summary 187 Literature Review 188 References 189 Exercises 190 CHAPTER 8 SYSTEM TEST CATEGORIES 8.1 Taxonomy of System Tests 192 8.2 Basic Tests 194 8.2.1 Boot Tests 194 8.2.2 Upgrade/Downgrade Tests 195 CONTENTS ix 135 158 192 x CONTENTS 8.2.3 Light Emitting Diode Tests 195 8.2.4 Diagnostic Tests 195 8.2.5 Command Line Interface Tests 196 8.3 Functionality Tests 196 8.3.1 Communication Systems Tests 196 8.3.2 Module Tests 197 8.3.3 Logging and Tracing Tests 198 8.3.4 Element Management Systems Tests 198 8.3.5 Management Information Base Tests 202 8.3.6 Graphical User Interface Tests 202 8.3.7 Security Tests 203 8.3.8 Feature Tests 204 8.4 Robustness Tests 204 8.4.1 Boundary Value Tests 205 8.4.2 Power Cycling Tests 206 8.4.3 On-Line Insertion and Removal Tests 206 8.4.4 High-Availability Tests 206 8.4.5 Degraded Node Tests 207 8.5 Interoperability Tests 208 8.6 Performance Tests 209 8.7 Scalability Tests 210 8.8 Stress Tests 211 8.9 Load and Stability Tests 213 8.10 Reliability Tests 214 8.11 Regression Tests 214 8.12 Documentation Tests 215 8.13 Regulatory Tests 216 8.14 Summary 218 Literature Review 219 References 220 Exercises 221 CHAPTER 9 FUNCTIONAL TESTING 222 9.1 Functional Testing Concepts of Howden 222 9.1.1 Different Types of Variables 224 9.1.2 Test Vector 230 9.1.3 Testing a Function in Context 231 9.2 Complexity of Applying Functional Testing 232 9.3 Pairwise Testing 235 9.3.1 Orthogonal Array 236 9.3.2 In Parameter Order 240 9.4 Equivalence Class Partitioning 244 9.5 Boundary Value Analysis 246 9.6 Decision Tables 248 9.7 Random Testing 252 9.8 Error Guessing 255 9.9 Category Partition 256 9.10 Summary 258 Literature Review 260 References 261 Exercises 262 CHAPTER 10 TEST GENERATION FROM FSM MODELS 10.1 State-Oriented Model 265 10.2 Points of Control and Observation 269 10.3 Finite-State Machine 270 10.4 Test Generation from an FSM 273 10.5 Transition Tour Method 273 10.6 Testing with State Verification 277 10.7 Unique Input–Output Sequence 279 10.8 Distinguishing Sequence 284 10.9 Characterizing Sequence 287 10.10 Test Architectures 291 10.10.1 Local Architecture 292 10.10.2 Distributed Architecture 293 10.10.3 Coordinated Architecture 294 10.10.4 Remote Architecture 295 10.11 Testing and Test Control Notation Version 3 (TTCN-3) 295 10.11.1 Module 296 10.11.2 Data Declarations 296 10.11.3 Ports and Components 298 10.11.4 Test Case Verdicts 299 10.11.5 Test Case 300 10.12 Extended FSMs 302 10.13 Test Generation from EFSM Models 307 10.14 Additional Coverage Criteria for System Testing 313 10.15 Summary 315 Literature Review 316 References 317 Exercises 318 CHAPTER 11 SYSTEM TEST DESIGN 11.1 Test Design Factors 321 11.2 Requirement Identification 322 11.3 Characteristics of Testable Requirements 331 11.4 Test Objective Identification 334 11.5 Example 335 11.6 Modeling a Test Design Process 345 11.7 Modeling Test Results 347 11.8 Test Design Preparedness Metrics 349 11.9 Test Case Design Effectiveness 350 11.10 Summary 351 Literature Review 351 References 353 Exercises 353 CONTENTS xi 265 321 xii CONTENTS CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION 355 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 12.10 12.11 12.12 12.13 12.14 12.15 12.16 Structure of a System Test Plan 355 Introduction and Feature Description 356 Assumptions 357 Test Approach 357 Test Suite Structure 358 Test Environment 358 Test Execution Strategy 361 12.7.1 Multicycle System Test Strategy 362 12.7.2 Characterization of Test Cycles 362 12.7.3 Preparing for First Test Cycle 366 12.7.4 Selecting Test Cases for Final Test Cycle 369 12.7.5 Prioritization of Test Cases 371 12.7.6 Details of Three Test Cycles 372 Test Effort Estimation 377 12.8.1 Number of Test Cases 378 12.8.2 Test Case Creation Effort 384 12.8.3 Test Case Execution Effort 385 Scheduling and Test Milestones 387 System Test Automation 391 Evaluation and Selection of Test Automation Tools 392 Test Selection Guidelines for Automation 395 Characteristics of Automated Test Cases 397 Structure of an Automated Test Case 399 Test Automation Infrastructure 400 Summary 402 Literature Review 403 References 405 Exercises 406 CHAPTER 13 SYSTEM TEST EXECUTION 408 13.1 Basic Ideas 408 13.2 Modeling Defects 409 13.3 Preparedness to Start System Testing 415 13.4 Metrics for Tracking System Test 419 13.4.1 Metrics for Monitoring Test Execution 420 13.4.2 Test Execution Metric Examples 420 13.4.3 Metrics for Monitoring Defect Reports 423 13.4.4 Defect Report Metric Examples 425 13.5 Orthogonal Defect Classification 428 13.6 Defect Causal Analysis 431 13.7 Beta Testing 435 13.8 First Customer Shipment 437 13.9 System Test Report 438 13.10 Product Sustaining 439 13.11 Measuring Test Effectiveness 441 13.12 Summary 445 Literature Review 446 CONTENTS xiii References 447 Exercises 448 CHAPTER 14 ACCEPTANCE TESTING 450 14.1 Types of Acceptance Testing 450 14.2 Acceptance Criteria 451 14.3 Selection of Acceptance Criteria 461 14.4 Acceptance Test Plan 461 14.5 Acceptance Test Execution 463 14.6 Acceptance Test Report 464 14.7 Acceptance Testing in eXtreme Programming 466 14.8 Summary 467 Literature Review 468 References 468 Exercises 469 CHAPTER 15 SOFTWARE RELIABILITY 471 15.1 What Is Reliability? 471 15.1.1 Fault and Failure 472 15.1.2 Time 473 15.1.3 Time Interval between Failures 474 15.1.4 Counting Failures in Periodic Intervals 475 15.1.5 Failure Intensity 476 15.2 Definitions of Software Reliability 477 15.2.1 First Definition of Software Reliability 477 15.2.2 Second Definition of Software Reliability 478 15.2.3 Comparing the Definitions of Software Reliability 479 15.3 Factors Influencing Software Reliability 479 15.4 Applications of Software Reliability 481 15.4.1 Comparison of Software Engineering Technologies 481 15.4.2 Measuring the Progress of System Testing 481 15.4.3 Controlling the System in Operation 482 15.4.4 Better Insight into Software Development Process 482 15.5 Operational Profiles 482 15.5.1 Operation 483 15.5.2 Representation of Operational Profile 483 15.6 Reliability Models 486 15.7 Summary 491 Literature Review 492 References 494 Exercises 494 CHAPTER 16 TEST TEAM ORGANIZATION 496 16.1 Test Groups 496 16.1.1 Integration Test Group 496 16.1.2 System Test Group 497 16.2 Software Quality Assurance Group 499 16.3 System Test Team Hierarchy 500 xiv CONTENTS 16.4 Effective Staffing of Test Engineers 501 16.5 Recruiting Test Engineers 504 16.5.1 Job Requisition 504 16.5.2 Job Profiling 505 16.5.3 Screening Resumes 505 16.5.4 Coordinating an Interview Team 506 16.5.5 Interviewing 507 16.5.6 Making a Decision 511 16.6 Retaining Test Engineers 511 16.6.1 Career Path 511 16.6.2 Training 512 16.6.3 Reward System 513 16.7 Team Building 513 16.7.1 Expectations 513 16.7.2 Consistency 514 16.7.3 Information Sharing 514 16.7.4 Standardization 514 16.7.5 Test Environments 514 16.7.6 Recognitions 515 16.8 Summary 515 Literature Review 516 References 516 Exercises 517 CHAPTER 17 SOFTWARE QUALITY 519 17.1 Five Views of Software Quality 519 17.2 McCall’s Quality Factors and Criteria 523 17.2.1 Quality Factors 523 17.2.2 Quality Criteria 527 17.2.3 Relationship between Quality Factors and Criteria 527 17.2.4 Quality Metrics 530 17.3 ISO 9126 Quality Characteristics 530 17.4 ISO 9000:2000 Software Quality Standard 534 17.4.1 ISO 9000:2000 Fundamentals 535 17.4.2 ISO 9001:2000 Requirements 537 17.5 Summary 542 Literature Review 544 References 544 Exercises 545 CHAPTER 18 MATURITY MODELS 546 18.1 Basic Idea in Software Process 546 18.2 Capability Maturity Model 548 18.2.1 CMM Architecture 549 18.2.2 Five Levels of Maturity and Key Process Areas 550 18.2.3 Common Features of Key Practices 553 18.2.4 Application of CMM 553 18.2.5 Capability Maturity Model Integration (CMMI) 554 18.3 Test Process Improvement 555 18.4 Testing Maturity Model 568 18.5 Summary 578 Literature Review 578 References 579 Exercises 579 GLOSSARY INDEX CONTENTS xv 581 600 PREFACE karmany eva dhikaras te; ma phalesu kadachana; ma karmaphalahetur bhur; ma te sango stv akarmani. Your right is to work only; but never to the fruits thereof; may you not be motivated by the fruits of actions; nor let your attachment to be towards inaction. — Bhagavad Gita We have been witnessing tremendous growth in the software industry over the past 25 years. Software applications have proliferated from the original data processing and scientific computing domains into our daily lives in such a way that we do not realize that some kind of software executes when we do even something ordinary, such as making a phone call, starting a car, turning on a microwave oven, and making a debit card payment. The processes for producing software must meet two broad challenges. First, the processes must produce low-cost software in a short time so that corporations can stay competitive. Second, the processes must produce usable, dependable, and safe software; these attributes are commonly known as quality attributes. Software quality impacts a number of important factors in our daily lives, such as economy, personal and national security, health, and safety. Twenty-five years ago, testing accounted for about 50% of the total time and more than 50% of the total money expended in a software development project—and, the same is still true today. Those days the software industry was a much smaller one, and academia offered a single, comprehensive course entitled Software Engineering to educate undergraduate students in the nuts and bolts of software development. Although software testing has been a part of the classical software engineering literature for decades, the subject is seldom incorporated into the mainstream undergraduate curriculum. A few universities have started offering an option in software engineering comprising three specialized courses, namely, Requirements Specification, Software Design, and Testing and Quality Assurance. In addition, some universities have introduced full undergraduate and graduate degree programs in software engineering. Considering the impact of software quality, or the lack thereof, we observe that software testing education has not received its due place. Ideally, research should lead to the development of tools and methodologies to produce low-cost, high-quality software, and students should be educated in the testing fundamentals. In other words, software testing research should not be solely academic in nature but must strive to be practical for industry consumers. However, in practice, there xvii xviii PREFACE is a large gap between the testing skills needed in the industry and what are taught and researched in the universities. Our goal is to provide the students and the teachers with a set of well-rounded educational materials covering the fundamental developments in testing theory and common testing practices in the industry. We intend to provide the students with the “big picture” of testing and quality assurance, because software quality concepts are quite broad. There are different kinds of software systems with their own intricate characteristics. We have not tried to specifically address their testing challenges. Instead, we have presented testing theory and practice as broad stepping stones which will enable the students to understand and develop testing practices for more complex systems. We decided to write this book based on our teaching and industrial experiences in software testing and quality assurance. For the past 15 years, Sagar has been teaching software engineering and software testing on a regular basis, whereas Piyu has been performing hands-on testing and managing test groups for testing routers, switches, wireless data networks, storage networks, and intrusion prevention appliances. Our experiences have helped us in selecting and structuring the contents of this book to make it suitable as a textbook. Who Should Read This Book? We have written this book to introduce students and software professionals to the fundamental ideas in testing theory, testing techniques, testing practices, and quality assurance. Undergraduate students in software engineering, computer science, and computer engineering with no prior experience in the software industry will be introduced to the subject matter in a step-by-step manner. Practitioners too will benefit from the structured presentation and comprehensive nature of the materials. Graduate students can use the book as a reference resource. After reading the whole book, the reader will have a thorough understanding of the following topics: • Fundamentals of testing theory and concepts • Practices that support the production of quality software • Software testing techniques • Life-cycle models of requirements, defects, test cases, and test results • Process models for unit, integration, system, and acceptance testing • Building test teams, including recruiting and retaining test engineers • Quality models, capability maturity model, testing maturity model, and test process improvement model How Should This Book be Read? The purpose of this book is to teach how to do software testing. We present some essential background material in Chapter 1 and save the enunciation of software PREFACE xix quality questions to a later part of the book. It is difficult to intelligently discuss for beginners what software quality means until one has a firm sense of what software testing does. However, practitioners with much testing experience can jump to Chapter 17, entitled “Software Quality,” immediately after Chapter 1. There are three different ways to read this book depending upon someone’s interest. First, those who are exclusively interested in software testing concepts and want to apply the ideas should read Chapter 1 (“Basic Concepts and Preliminaries”), Chapter 3 (“Unit Testing”), Chapter 7 (“System Integration Testing”), and Chapters 8–14, related to system-level testing. Second, test managers interested in improving the test effectiveness of their teams can read Chapters 1, 3, 7, 8–14, 16 (“Test Team Organization”), 17 (“Software Quality”), and 18 (“Maturity Models”). Third, beginners should read the book from cover to cover. Notes for Instructors The book can be used as a text in an introductory course in software testing and quality assurance. One of the authors used the contents of this book in an undergraduate course entitled Software Testing and Quality Assurance for several years at the University of Waterloo. An introductory course in software testing can cover selected sections from most of the chapters except Chapter 16. For a course with more emphasis on testing techniques than on processes, we recommend to choose Chapters 1 (“Basic Concepts and Preliminaries”) to 15 (“Software Reliability”). When used as a supplementary text in a software engineering course, selected portions from the following chapters can help students imbibe the essential concepts in software testing: • Chapter 1: Basic Concepts and Preliminaries • Chapter 3: Unit Testing • Chapter 7: System Integration Testing • Chapter 8: System Test Category • Chapter 14: Acceptance Testing Supplementary materials for instructors are available at the following Wiley website: http:/www.wiley.com/sagar. Acknowledgments In preparing this book, we received much support from many people, including the publisher, our family members, and our friends and colleagues. The support has been in many different forms. First, we would like to thank our editors, namely, Anastasia Wasko, Val Moliere, Whitney A. Lesch, Paul Petralia, and Danielle Lacourciere who gave us much professional guidance and patiently answered our various queries. Our friend Dr. Alok Patnaik read the whole draft and made numerous suggestions to improve the presentation quality of the book; we thank him for xx PREFACE all his effort and encouragement. The second author, Piyu Tripathy, would like to thank his former colleagues at Nortel Networks, Cisco Systems, and Airvana Inc., and present colleagues at NEC Laboratories America. Finally, the support of our parents, parents-in-law, and partners deserve a special mention. I, Piyu Tripathy, would like to thank my dear wife Leena, who has taken many household and family duties off my hands to give me time that I needed to write this book. And I, Sagar Naik, would like to thank my loving wife Alaka for her invaluable support and for always being there for me. I would also like to thank my charming daughters, Monisha and Sameeksha, and exciting son, Siddharth, for their understanding while I am writing this book. I am grateful to my elder brother, Gajapati Naik, for all his support. We are very pleased that now we have more time for our families and friends. Kshirasagar Naik University of Waterloo Waterloo Priyadarshi Tripathy NEC Laboratories America, Inc. Princeton LIST OF FIGURES 1.1 Shewhart cycle 2 1.2 Ishikawa diagram 4 1.3 Examples of basic test cases 11 1.4 Example of a test case with a sequence of < input, expected outcome > 12 1.5 Subset of the input domain exercising a subset of the program behavior 14 1.6 Different activities in program testing 14 1.7 Development and testing phases in the V model 16 1.8 Regression testing at different software testing levels. (From ref. 41. © 2005 John Wiley & Sons.) 17 2.1 Executing a program with a subset of the input domain 32 2.2 Example of inappropriate path selection 35 2.3 Different ways of comparing power of test methods: (a) produces all test cases produced by another method; (b) test sets have common elements. 43 2.4 Context of applying test adequacy 44 3.1 Steps in the code review process 55 3.2 Dynamic unit test environment 63 3.3 Test-first process in XP. (From ref. 24. © 2005 IEEE.) 72 3.4 Sample pseudocode for performing unit testing 73 3.5 The assertTrue() assertion throws an exception 75 3.6 Example test suite 76 4.1 Process of generating test input data for control flow testing 90 4.2 Symbols in a CFG 91 4.3 Function to open three files 91 4.4 High-level CFG representation of openfiles(). The three nodes are numbered 1, 2, and 3. 92 4.5 Detailed CFG representation of openfiles(). The numbers 1–21 are the nodes 93 4.6 Function to compute average of selected integers in an array. This program is an adaptation of “Figure 2. A sample program” in ref. 10. (With permission from the Australian Computer Society.) 94 4.7 A CFG representation of ReturnAverage(). Numbers 1–13 are the nodes. 95 4.8 Dashed arrows represent the branches not covered by statement covering in Table 4.4 99 4.9 Partial CFG with (a) OR operation and (b) AND operations 100 4.10 Example of a path from Figure 4.7 102 4.11 Path predicate for path in Figure 4.10 102 4.12 Method in Java to explain symbolic substitution [11] 103 4.13 Path predicate expression for path in Figure 4.10 105 4.14 Another example of path from Figure 4.7 105 4.15 Path predicate expression for path shown in Figure 4.14 106 4.16 Input data satisfying constraints of Figure 4.13 106 xxi xxii LIST OF FIGURES 4.17 Binary search routine 111 5.1 Sequence of computations showing data flow anomaly 113 5.2 State transition diagram of a program variable. (From ref. 2. © 1979 IEEE.) 115 5.3 Definition and uses of variables 117 5.4 Data flow graph of ReturnAverage() example 118 5.5 Relationship among DF (data flow) testing criteria. (From ref. 4. © 1988 IEEE.) 125 5.6 Relationship among FDF (feasible data flow) testing criteria. (From ref. 4. © 1988 IEEE.) 127 5.7 Limitation of different fault detection techniques 128 5.8 Binary search routine 133 5.9 Modified binary search routine 133 6.1 Illustration of the concept of program domains 137 6.2 A function to explain program domains 139 6.3 Control flow graph representation of the function in Figure 6.2 139 6.4 Domains obtained from interpreted predicates in Figure 6.3 140 6.5 Predicates defining the TT domain in Figure 6.4 141 6.6 ON and OFF points 146 6.7 Boundary shift resulting in reduced domain (closed inequality) 147 6.8 Boundary shift resulting in enlarged domain (closed inequality) 149 6.9 Tilted boundary (closed inequality) 149 6.10 Closure error (closed inequality) 150 6.11 Boundary shift resulting in reduced domain (open inequality) 151 6.12 Boundary shift resulting in enlarged domain (open inequality) 152 6.13 Tilted boundary (open inequality) 153 6.14 Closure error (open inequality) 153 6.15 Equality border 154 6.16 Domains D1, D2 and D3 157 7.1 Module hierarchy with three levels and seven modules 168 7.2 Top-down integration of modules A and B 169 7.3 Top-down integration of modules A, B, and D 169 7.4 Top-down integration of modules A, B, D, and C 169 7.5 Top-down integration of modules A, B, C, D, and E 170 7.6 Top-down integration of modules A, B, C, D, E, and F 170 7.7 Top-down integration of modules A, B, C, D, E, F and G 170 7.8 Bottom-up integration of modules E, F, and G 171 7.9 Bottom-up integration of modules B, C, and D with E, F, and G 172 7.10 Bottom-up integration of module A with all others 172 7.11 Hardware ECO process 179 7.12 Software ECO process 180 7.13 Module hierarchy of software system 190 8.1 Types of system tests 193 8.2 Types of basic tests 194 8.3 Types of functionality tests 197 8.4 Types of robustness tests 205 8.5 Typical 1xEV-DO radio access network. (Courtesy of Airvana, Inc.) 206 9.1 Frequency selection box of Bluetooth specification 224 9.2 Part of form ON479 of T1 general—2001, published by the CCRA 227 LIST OF FIGURES xxiii 9.3 Functionally related variables 231 9.4 Function in context 232 9.5 (a) Obtaining output values from an input vector and (b) obtaining an input vector from an output value in functional testing 233 9.6 Functional testing in general 234 9.7 System S with three input variables 235 9.8 (a) Too many test inputs; (b) one input selected from each subdomain 244 9.9 Gold standard oracle 253 9.10 Parametric oracle 253 9.11 Statistical oracle 254 10.1 Spectrum of software systems 266 10.2 Data-dominated systems 266 10.3 Control-dominated systems 267 10.4 FSM model of dual-boot laptop computer 267 10.5 Interactions between system and its environment modeled as FSM 268 10.6 PCOs on a telephone 269 10.7 FSM model of a PBX 270 10.8 FSM model of PBX 271 10.9 Interaction of test sequence with SUT 274 10.10 Derived test case from transition tour 275 10.11 Conceptual model of test case with state verification 278 10.12 Finite-state machine G1 (From ref. 5. © 1997 IEEE.) 281 10.13 UIO tree for G1 in Figure 10.12. (From ref. 5. © 1997 IEEE.) 282 10.14 Identification of UIO sequences on UIO tree of Figure 10.13 283 10.15 Finite-state machine G2 286 10.16 Distinguishing sequence tree for G2 in Figure 10.15 286 10.17 FSM that does not possess distinguishing sequence. (From ref. 11. © 1994 IEEE.) 287 10.18 DS tree for FSM (Figure 10.17) 288 10.19 Abstraction of N-entity in OSI reference architecture 291 10.20 Abstract local test architecture 292 10.21 Abstract external test architecture 292 10.22 Local architecture 293 10.23 Distributed architecture 293 10.24 Coordinated architecture 294 10.25 Remote architecture 295 10.26 Structure of module in TTCN-3 297 10.27 Definitions of two subtypes 297 10.28 Parameterized template for constructing message to be sent 298 10.29 Parameterized template for constructing message to be received 298 10.30 Testing (a) square-root function (SRF) calculator and (b) port between tester and SRF calculator 299 10.31 Defining port type 300 10.32 Associating port with component 300 10.33 Test case for testing SRF calculator 301 10.34 Executing test case 302 10.35 Comparison of state transitions of FSM and EFSM 303 10.36 Controlled access to a door 304 10.37 SDL/GR door control system 305 xxiv LIST OF FIGURES 10.38 Door control behavior specification 306 10.39 Door control behavior specification 307 10.40 Transition tour from door control system of Figures 10.38 and 10.39 309 10.41 Testing door control system 309 10.42 Output and input behavior obtained from transition tour of Figure 10.40 310 10.43 Test behavior obtained by refining if part in Figure 10.42 310 10.44 Test behavior that can receive unexpected events (derived from Figure 10.43) 311 10.45 Core behavior of test case for testing door control system (derived from Figure 10.44) 312 10.46 User interface of ATM 314 10.47 Binding of buttons with user options 314 10.48 Binding of buttons with cash amount 315 10.49 FSM G 318 10.50 FSM H 318 10.51 FSM K 319 10.52 Nondeterministic FSM 319 11.1 State transition diagram of requirement 323 11.2 Test suite structure 336 11.3 Service interworking between FR and ATM services 337 11.4 Transformation of FR to ATM cell 338 11.5 FrAtm test suite structure 342 11.6 State transition diagram of a test case 345 11.7 State transition diagram of test case result 349 12.1 Concept of cycle-based test execution strategy 363 12.2 Gantt chart for FR–ATM service interworking test project 390 12.3 Broad criteria of test automation tool evaluation 393 12.4 Test selection guideline for automation 396 12.5 Characteristics of automated test cases 397 12.6 Six major steps in automated test case 399 12.7 Components of a automation infrastructure 401 13.1 State transition diagram representation of life cycle of defect 409 13.2 Projected execution of test cases on weekly basis in cumulative chart form 417 13.3 PAE metric of Bazooka (PE: projected execution; AE: actually executed) project 421 13.4 Pareto diagram for defect distribution shown in Table 13.12 431 13.5 Cause–effect diagram for DCA 434 15.1 Relationship between MTTR, MTTF, and MTBF 475 15.2 Graphical representation of operational profile of library information system 484 15.3 Failure intensity λ as function of cumulative failure μ (λ0 = 9 failures per unit time, ν0 = 500 failures, θ = 0.0075) 488 15.4 Failure intensity λ as function of execution time τ (λ0 = 9 failures per unit time, ν0 = 500 failures, θ = 0.0075) 490 15.5 Cumulative failure μ as function of execution time τ (λ0 = 9 failures per unit time, ν0 = 500 failures, θ = 0.0075) 490 16.1 Structure of test groups 498 16.2 Structure of software quality assurance group 499 16.3 System test team hierarchy 500 16.4 Six phases of effective recruiting process 505 LIST OF FIGURES xxv 16.5 System test organization as part of development 518 17.1 Relation between quality factors and quality criteria [6] 528 17.2 ISO 9126 sample quality model refines standard’s features into subcharacteristics. (From ref. 4. © 1996 IEEE.) 532 18.1 CMM structure. (From ref. 3. © 2005 John Wiley & Sons.) 549 18.2 SW-CMM maturity levels. (From ref. 3 © 2005 John Wiley & Sons.) 550 18.3 Five-level structure of TMM. (From ref. 5. © 2003 Springer.) 568 LIST OF TABLES 3.1 Hierarchy of System Documents 56 3.2 Code Review Checklist 58 3.3 McCabe Complexity Measure 79 4.1 Examples of Path in CFG of Figure 4.7 95 4.2 Input Domain of openfiles() 97 4.3 Inputs and Paths in openfiles() 97 4.4 Paths for Statement Coverage of CFG of Figure 4.7 98 4.5 Paths for Branch Coverage of CFG of Figure 4.7 99 4.6 Two Cases for Complete Statement and Branch Coverage of CFG of Figure 4.9a 101 4.7 Interpretation of Path Predicate of Path in Figure 4.10 104 4.8 Interpretation of Path Predicate of Path in Figure 4.14 105 4.9 Test Data for Statement and Branch Coverage 106 5.1 Def() and c-use() Sets of Nodes in Figure 5.4 120 5.2 Predicates and p-use() Set of Edges in Figure 5.4 121 6.1 Two Interpretations of Second if() Statement in Figure 6.2 140 6.2 Detection of Boundary Shift Resulting in Reduced Domain (Closed Inequality) 148 6.3 Detection of Boundary Shift Resulting in Enlarged Domain (Closed Inequality) 149 6.4 Detection of Boundary Tilt (Closed Inequality) 150 6.5 Detection of Closure Error (Closed Inequality) 151 6.6 Detection of Boundary Shift Resulting in Reduced Domain (Open Inequality) 151 6.7 Detection of Boundary Shift Resulting in Enlarged Domain (Open Inequality) 152 6.8 Detection of Boundary Tilt (Open Inequality) 153 6.9 Detection of Closure Error (Open Inequality) 154 7.1 Check-in Request Form 166 7.2 Example Software/Hardware Compatibility Matrix 178 7.3 Framework for SIT Plan 181 7.4 Framework for Entry Criteria to Start System Integration 182 7.5 Framework for System Integration Exit Criteria 182 8.1 EMS Functionalities 199 8.2 Regulatory Approval Bodies of Different Countries 217 9.1 Number of Special Values of Inputs to FBS Module of Figure 9.1 230 9.2 Input and Output Domains of Functions of P in Figure 9.6 234 9.3 Pairwise Test Cases for System S 236 9.4 L4(23) Orthogonal Array 236 9.5 Commonly Used Orthogonal Arrays 237 9.6 Various Values That Need to Be Tested in Combinations 238 xxvii xxviii LIST OF TABLES 9.7 L9(34) Orthogonal Array 239 9.8 L9(34) Orthogonal Array after Mapping Factors 239 9.9 Generated Test Cases after Mapping Left-Over Levels 240 9.10 Generated Test Cases to Cover Each Equivalence Class 246 9.11 Decision Table Comprising Set of Conditions and Effects 248 9.12 Pay Calculation Decision Table with Values for Each Rule 250 9.13 Pay Calculation Decision Table after Column Reduction 251 9.14 Decision Table for Payment Calculation 252 10.1 PCOs for Testing Telephone PBX 270 10.2 Set of States in FSM of Figure 10.8 272 10.3 Input and Output Sets in FSM of Figure 10.8 272 10.4 Transition Tours Covering All States in Figure 10.8 276 10.5 State Transitions Not Covered by Transition Tours of Table 10.4 277 10.6 Transition Tours Covering All State Transitions in Figure 10.8 277 10.7 UIO Sequences of Minimal Lengths Obtained from Figure 10.14 284 10.8 Examples of State Blocks 284 10.9 Outputs of FSM G2 in Response to Input Sequence 11 in Different States 287 10.10 Output Sequences Generated by FSM of Figure 10.17 as Response to W1 289 10.11 Output Sequences Generated by FSM of Figure 10.17 as Response to W2 289 10.12 Test Sequences for State Transition (D, A, a/x) of FSM in Figure 10.17 290 11.1 Coverage Matrix [Aij ] 322 11.2 Requirement Schema Field Summary 324 11.3 Engineering Change Document Information 329 11.4 Characteristics of Testable Functional Specifications 333 11.5 Mapping of FR QoS Parameters to ATM QoS Parameters 340 11.6 Test Case Schema Summary 346 11.7 Test Suite Schema Summary 348 11.8 Test Result Schema Summary 348 12.1 Outline of System Test Plan 356 12.2 Equipment Needed to be Procured 360 12.3 Entry Criteria for First System Test Cycle 368 12.4 Test Case Failure Counts to Initiate RCA in Test Cycle 1 374 12.5 Test Case Failure Counts to Initiate RCA in Test Cycle 2 375 12.6 Test Effort Estimation for FR–ATM PVC Service Interworking 379 12.7 Form for Computing Unadjusted Function Point 382 12.8 Factors Affecting Development Effort 382 12.9 Empirical Relationship between Function Points and LOC 383 12.10 Guidelines for Manual Test Case Creation Effort 384 12.11 Guidelines for Manual Test Case Execution Effort 386 12.12 Guidelines for Estimation of Effort to Manually Execute Regression Test Cases 386 12.13 Benefits of Automated Testing 391 13.1 States of Defect Modeled in Figure 13.1 410 13.2 Defect Schema Summary Fields 412 13.3 State Transitions to Five Possible Next States from Open State 413 13.4 Outline of Test Execution Working Document 416 13.5 EST Metric in Week 4 of Bazooka Project 422 13.6 EST Metric in Bazooka Monitored on Weekly Basis 423 LIST OF TABLES xxix 13.7 DAR Metric for Stinger Project 425 13.8 Weekly DRR Status for Stinger Test Project 426 13.9 Weekly OD on Priority Basis for Stinger Test Project 427 13.10 Weekly CD Observed by Different Groups for Stinger Test Project 427 13.11 ARD Metric for Bayonet 428 13.12 Sample Test Data of Chainsaw Test Project 430 13.13 Framework for Beta Release Criteria 436 13.14 Structure of Final System Test Report 438 13.15 Scale for Defect Age 443 13.16 Defect Injection versus Discovery on Project Boomerang 443 13.17 Number of Defects Weighted by Defect Age on Project Boomerang 444 13.18 ARD Metric for Test Project 448 13.19 Scale for PhAge 449 14.1 Outline of ATP 462 14.2 ACC Document Information 464 14.3 Structure of Acceptance Test Status Report 465 14.4 Structure of Acceptance Test Summary Report 466 15.1 Example of Operational Profile of Library Information System 484 17.1 McCall’s Quality Factors 524 17.2 Categorization of McCall’s Quality Factors 527 17.3 McCall’s Quality Criteria 529 18.1 Requirements for Different Maturity Levels 564 18.2 Test Maturity Matrix 566 1 CHAPTER Basic Concepts and Preliminaries Software is like entropy. It is difficult to grasp, weighs nothing, and obeys the second law of thermodynamics, i.e., it always increases. — Norman Ralph Augustine 1.1 QUALITY REVOLUTION People seek quality in every man-made artifact. Certainly, the concept of quality did not originate with software systems. Rather, the quality concept is likely to be as old as human endeavor to mass produce artifacts and objects of large size. In the past couple of decades a quality revolution, has been spreading fast throughout the world with the explosion of the Internet. Global competition, outsourcing, off-shoring, and increasing customer expectations have brought the concept of quality to the forefront. Developing quality products on tighter schedules is critical for a company to be successful in the new global economy. Traditionally, efforts to improve quality have centered around the end of the product development cycle by emphasizing the detection and correction of defects. On the contrary, the new approach to enhancing quality encompasses all phases of a product development process—from a requirements analysis to the final delivery of the product to the customer. Every step in the development process must be performed to the highest possible standard. An effective quality process must focus on [1]: • Paying much attention to customer’s requirements • Making efforts to continuously improve quality • Integrating measurement processes with product design and development • Pushing the quality concept down to the lowest level of the organization • Developing a system-level perspective with an emphasis on methodology and process • Eliminating waste through continuous improvement Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 1 2 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES A quality movement started in Japan during the 1940s and the 1950s by William Edwards Deming, Joseph M. Juran, and Kaoru Ishikawa. In circa 1947, W. Edwards Deming “visited India as well, then continued on to Japan, where he had been asked to join a statistical mission responsible for planning the 1951 Japanese census” [2], p. 8. During his said visit to Japan, Deming invited statisticians for a dinner meeting and told them how important they were and what they could do for Japan [3]. In March 1950, he returned to Japan at the invitation of Managing Director Kenichi Koyanagi of the Union of Japanese Scientists and Engineers (JUSE) to teach a course to Japanese researchers, workers, executives, and engineers on statistical quality control (SQC) methods. Statistical quality control is a discipline based on measurements and statistics. Decisions are made and plans are developed based on the collection and evaluation of actual data in the form of metrics, rather than intuition and experience. The SQC methods use seven basic quality management tools: Pareto analysis, cause-and-effect diagram, flow chart, trend chart, histogram, scatter diagram, and control chart [2]. In July 1950, Deming gave an eight-day seminar based on the Shewhart methods of statistical quality control [4, 5] for Japanese engineers and executives. He introduced the plan–do–check–act (PDCA) cycle in the seminar, which he called the Shewhart cycle (Figure 1.1). The Shewhart cycle illustrates the following activity sequence: setting goals, assigning them to measurable milestones, and assessing the progress against those milestones. Deming’s 1950 lecture notes formed the basis for a series of seminars on SQC methods sponsored by the JUSE and provided the criteria for Japan’s famed Deming Prize. Deming’s work has stimulated several different kinds of industries, such as those for radios, transistors, cameras, binoculars, sewing machines, and automobiles. Between circa 1950 and circa 1970, automobile industries in Japan, in particular Toyota Motor Corporation, came up with an innovative principle to compress the time period from customer order to banking payment, known as the “lean principle.” The objective was to minimize the consumption of resources that added no value to a product. The lean principle has been defined by the National Institute of Standards and Technology (NIST) Manufacturing Extension Partnership program [61] as “a systematic approach to identifying and eliminating waste through continuous improvement, flowing the product at the pull of the customer in pursuit of perfection,” p.1. It is commonly believed that lean principles were started in Japan by Taiichi Ohno of Toyota [7], but Henry Ford Act Plan PDCA Check Do Plan—Establish the objective and process to deliver the results. Do—Implement the plan and measure its performance. Check—Assess the measurements and report the results to decision makers. Act—Decide on changes needed to improve the process. Figure 1.1 Shewhart cycle. 1.1 QUALITY REVOLUTION 3 had been using parts of lean as early as circa 1920, as evidenced by the following quote (Henry Ford, 1926) [61], p.1: One of the noteworthy accomplishments in keeping the price of Ford products low is the gradual shortening of the production cycle. The longer an article is in the process of manufacture and the more it is moved about, the greater is its ultimate cost. This concept was popularized in the United States by a Massachusetts Institute of Technology (MIT) study of the movement from mass production toward production, as described in The Machine That Changed the World , by James P. Womack, Daniel T. Jones, and Daniel Roos, New York: Rawson and Associates, 1990. Lean thinking continues to spread to every country in the world, and leaders are adapting the principles beyond automobile manufacturing, to logistics and distribution, services, retail, health care, construction, maintenance, and software development [8]. Remark: Walter Andrew Shewhart was an American physicist, engineer, and statistician and is known as the father of statistical quality control. Shewhart worked at Bell Telephone Laboratories from its foundation in 1925 until his retirement in 1956 [9]. His work was summarized in his book Economic Control of Quality of Manufactured Product, published by McGraw-Hill in 1931. In 1938, his work came to the attention of physicist W. Edwards Deming, who developed some of Shewhart’s methodological proposals in Japan from 1950 onward and named his synthesis the Shewhart cycle. In 1954, Joseph M. Juran of the United States proposed raising the level of quality management from the manufacturing units to the entire organization. He stressed the importance of systems thinking that begins with product requirement, design, prototype testing, proper equipment operations, and accurate process feedback. Juran’s seminar also became a part of the JUSE’s educational programs [10]. Juran spurred the move from SQC to TQC (total quality control) in Japan. This included companywide activities and education in quality control (QC), audits, quality circle, and promotion of quality management principles. The term TQC was coined by an American, Armand V. Feigenbaum, in his 1951 book Quality Control Principles, Practice and Administration. It was republished in 2004 [11]. By 1968, Kaoru Ishikawa, one of the fathers of TQC in Japan, had outlined, as shown in the following, the key elements of TQC management [12]: • Quality comes first, not short-term profits. • The customer comes first, not the producer. • Decisions are based on facts and data. • Management is participatory and respectful of all employees. • Management is driven by cross-functional committees covering product planning, product design, purchasing, manufacturing, sales, marketing, and distribution. 4 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES Remark: A quality circle is a volunteer group of workers, usually members of the same department, who meet regularly to discuss the problems and make presentations to management with their ideas to overcome them. Quality circles were started in Japan in 1962 by Kaoru Ishikawa as another method of improving quality. The movement in Japan was coordinated by the JUSE. One of the innovative TQC methodologies developed in Japan is referred to as the Ishikawa or cause-and-effect diagram. Kaoru Ishikawa found from statistical data that dispersion in product quality came from four common causes, namely materials, machines, methods, and measurements, known as the 4 Ms (Figure 1.2). The bold horizontal arrow points to quality, whereas the diagonal arrows in Figure 1.2 are probable causes having an effect on the quality. Materials often differ when sources of supply or size requirements vary. Machines, or equipment, also function differently depending on variations in their parts, and they operate optimally for only part of the time. Methods, or processes, cause even greater variations due to lack of training and poor handwritten instructions. Finally, measurements also vary due to outdated equipment and improper calibration. Variations in the 4 Ms parameters have an effect on the quality of a product. The Ishikawa diagram has influenced Japanese firms to focus their quality control attention on the improvement of materials, machines, methods, and measurements. The total-quality movement in Japan has led to pervasive top-management involvement. Many companies in Japan have extensive documentation of their quality activities. Senior executives in the United States either did not believe quality mattered or did not know where to begin until the National Broadcasting Corporation (NBC), an America television network, broadcast the documentary “If Japan Can . . . Why Can’t We?” at 9:30 P.M. on June 24, 1980 [2]. The documentary was produced by Clare Crawford-Mason and was narrated by Lloyd Dobyns. Fifteen minutes of the broadcast was devoted to Dr. Deming and his work. After the Materials Machines Methods Measurements Causes Figure 1.2 Ishikawa diagram. Quality Effect 1.2 SOFTWARE QUALITY 5 broadcast, many executives and government leaders realized that a renewed emphasis on quality was no longer an option for American companies but a necessity for doing business in an ever-expanding and more demanding competitive world market. Ford Motor Company and General Motors immediately adopted Deming’s SQC methodology into their manufacturing process. Other companies such as Dow Chemical and the Hughes Aircraft followed suit. Ishikawa’s TQC management philosophy gained popularity in the United States. Further, the spurred emphasis on quality in American manufacturing companies led the U.S. Congress to establish the Malcolm Baldrige National Quality Award—similar to the Deming Prize in Japan—in 1987 to recognize organizations for their achievements in quality and to raise awareness about the importance of quality excellence as a competitive edge [6]. In the Baldrige National Award, quality is viewed as something defined by the customer and thus the focus is on customer-driven quality. On the other hand, in the Deming Prize, quality is viewed as something defined by the producers by conforming to specifications and thus the focus is on conformance to specifications . Remark: Malcolm Baldrige was U.S. Secretary of Commerce from 1981 until his death in a rodeo accident in July 1987. Baldrige was a proponent of quality management as a key to his country’s prosperity and long-term strength. He took a personal interest in the quality improvement act, which was eventually named after him, and helped draft one of its early versions. In recognition of his contributions, Congress named the award in his honor. Traditionally, the TQC and lean concepts are applied in the manufacturing process. The software development process uses these concepts as another tool to guide the production of quality software [13]. These concepts provides a framework to discuss software production issues. The software capability maturity model (CMM) [14] architecture developed at the Software Engineering Institute is based on the principles of product quality that have been developed by W. Edwards Deming [15], Joseph M. Juran [16], Kaoru Ishikawa [12], and Philip Crosby [17]. 1.2 SOFTWARE QUALITY The question “What is software quality?” evokes many different answers. Quality is a complex concept—it means different things to different people, and it is highly context dependent. Garvin [18] has analyzed how software quality is perceived in different ways in different domains, such as philosophy, economics, marketing, and management. Kitchenham and Pfleeger’s article [60] on software quality gives a succinct exposition of software quality. They discuss five views of quality in a comprehensive manner as follows: 1. Transcendental View : It envisages quality as something that can be recognized but is difficult to define. The transcendental view is not specific to software quality alone but has been applied in other complex areas 6 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES of everyday life. For example, In 1964, Justice Potter Stewart of the U.S. Supreme Court, while ruling on the case Jacobellis v. Ohio, 378 U.S. 184 (1964), which involved the state of Ohio banning the French film Les Amants (“The Lovers”) on the ground of pornography, wrote “I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that” (emphasis added). 2. User View : It perceives quality as fitness for purpose. According to this view, while evaluating the quality of a product, one must ask the key question: “Does the product satisfy user needs and expectations?” 3. Manufacturing View : Here quality is understood as conformance to the specification. The quality level of a product is determined by the extent to which the product meets its specifications. 4. Product View : In this case, quality is viewed as tied to the inherent characteristics of the product. A product’s inherent characteristics, that is, internal qualities, determine its external qualities. 5. Value-Based View : Quality, in this perspective, depends on the amount a customer is willing to pay for it. The concept of software quality and the efforts to understand it in terms of measurable quantities date back to the mid-1970s. McCall, Richards, and Walters [19] were the first to study the concept of software quality in terms of quality factors and quality criteria. A quality factor represents a behavioral characteristic of a system. Some examples of high-level quality factors are correctness, reliability, efficiency, testability, maintainability, and reusability. A quality criterion is an attribute of a quality factor that is related to software development. For example, modularity is an attribute of the architecture of a software system. A highly modular software allows designers to put cohesive components in one module, thereby improving the maintainability of the system. Various software quality models have been proposed to define quality and its related attributes. The most influential ones are the ISO 9126 [20–22] and the CMM [14]. The ISO 9126 quality model was developed by an expert group under the aegis of the International Organization for Standardization (ISO). The document ISO 9126 defines six broad, independent categories of quality characteristics: functionality, reliability, usability, efficiency, maintainability, and portability. The CMM was developed by the Software Engineering Institute (SEI) at Carnegie Mellon University. In the CMM framework, a development process is evaluated on a scale of 1–5, commonly known as level 1 through level 5. For example, level 1 is called the initial level, whereas level 5—optimized—is the highest level of process maturity. In the field of software testing, there are two well-known process models, namely, the test process improvement (TPI) model [23] and the test maturity Model (TMM) [24]. These two models allow an organization to assess the current state 1.4 VERIFICATION AND VALIDATION 7 of their software testing processes, identify the next logical area for improvement, and recommend an action plan for test process improvement. 1.3 ROLE OF TESTING Testing plays an important role in achieving and assessing the quality of a software product [25]. On the one hand, we improve the quality of the products as we repeat a test–find defects–fix cycle during development. On the other hand, we assess how good our system is when we perform system-level tests before releasing a product. Thus, as Friedman and Voas [26] have succinctly described, software testing is a verification process for software quality assessment and improvement. Generally speaking, the activities for software quality assessment can be divided into two broad categories, namely, static analysis and dynamic analysis. • Static Analysis: As the term “static” suggests, it is based on the examination of a number of documents, namely requirements documents, software models, design documents, and source code. Traditional static analysis includes code review, inspection, walk-through, algorithm analysis, and proof of correctness. It does not involve actual execution of the code under development. Instead, it examines code and reasons over all possible behaviors that might arise during run time. Compiler optimizations are standard static analysis. • Dynamic Analysis: Dynamic analysis of a software system involves actual program execution in order to expose possible program failures. The behavioral and performance properties of the program are also observed. Programs are executed with both typical and carefully chosen input values. Often, the input set of a program can be impractically large. However, for practical considerations, a finite subset of the input set can be selected. Therefore, in testing, we observe some representative program behaviors and reach a conclusion about the quality of the system. Careful selection of a finite test set is crucial to reaching a reliable conclusion. By performing static and dynamic analyses, practitioners want to identify as many faults as possible so that those faults are fixed at an early stage of the software development. Static analysis and dynamic analysis are complementary in nature, and for better effectiveness, both must be performed repeatedly and alternated. Practitioners and researchers need to remove the boundaries between static and dynamic analysis and create a hybrid analysis that combines the strengths of both approaches [27]. 1.4 VERIFICATION AND VALIDATION Two similar concepts related to software testing frequently used by practitioners are verification and validation. Both concepts are abstract in nature, and each can be 8 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES realized by a set of concrete, executable activities. The two concepts are explained as follows: • Verification: This kind of activity helps us in evaluating a software system by determining whether the product of a given development phase satisfies the requirements established before the start of that phase. One may note that a product can be an intermediate product, such as requirement specification, design specification, code, user manual, or even the final product. Activities that check the correctness of a development phase are called verification activities. • Validation: Activities of this kind help us in confirming that a product meets its intended use. Validation activities aim at confirming that a product meets its customer’s expectations. In other words, validation activities focus on the final product, which is extensively tested from the customer point of view. Validation establishes whether the product meets overall expectations of the users. Late execution of validation activities is often risky by leading to higher development cost. Validation activities may be executed at early stages of the software development cycle [28]. An example of early execution of validation activities can be found in the eXtreme Programming (XP) software development methodology. In the XP methodology, the customer closely interacts with the software development group and conducts acceptance tests during each development iteration [29]. The verification process establishes the correspondence of an implementation phase of the software development process with its specification, whereas validation establishes the correspondence between a system and users’ expectations. One can compare verification and validation as follows: • Verification activities aim at confirming that one is building the product correctly, whereas validation activities aim at confirming that one is building the correct product [30]. • Verification activities review interim work products, such as requirements specification, design, code, and user manual, during a project life cycle to ensure their quality. The quality attributes sought by verification activities are consistency, completeness, and correctness at each major stage of system development. On the other hand, validation is performed toward the end of system development to determine if the entire system meets the customer’s needs and expectations. • Verification activities are performed on interim products by applying mostly static analysis techniques, such as inspection, walkthrough, and reviews, and using standards and checklists. Verification can also include dynamic analysis, such as actual program execution. On the other hand, validation is performed on the entire system by actually running the system in its real environment and using a variety of tests. 1.5 FAILURE, ERROR, FAULT, AND DEFECT 9 1.5 FAILURE, ERROR, FAULT, AND DEFECT In the literature on software testing, one can find references to the terms failure, error, fault, and defect. Although their meanings are related, there are important distinctions between these four concepts. In the following, we present first three terms as they are understood in the fault-tolerant computing community: • Failure: A failure is said to occur whenever the external behavior of a system does not conform to that prescribed in the system specification. • Error: An error is a state of the system. In the absence of any corrective action by the system, an error state could lead to a failure which would not be attributed to any event subsequent to the error. • Fault: A fault is the adjudged cause of an error. A fault may remain undetected for a long time, until some event activates it. When an event activates a fault, it first brings the program into an intermediate error state. If computation is allowed to proceed from an error state without any corrective action, the program eventually causes a failure. As an aside, in fault-tolerant computing, corrective actions can be taken to take a program out of an error state into a desirable state such that subsequent computation does not eventually lead to a failure. The process of failure manifestation can therefore be succinctly represented as a behavior chain [31] as follows: fault → error → failure. The behavior chain can iterate for a while, that is, failure of one component can lead to a failure of another interacting component. The above definition of failure assumes that the given specification is acceptable to the customer. However, if the specification does not meet the expectations of the customer, then, of course, even a fault-free implementation fails to satisfy the customer. It is a difficult task to give a precise definition of fault, error, or failure of software, because of the “human factor” involved in the overall acceptance of a system. In an article titled “What Is Software Failure” [32], Ram Chillarege commented that in modern software business software failure means “the customer’s expectation has not been met and/or the customer is unable to do useful work with product,” p. 354. Roderick Rees [33] extended Chillarege’s comments of software failure by pointing out that “failure is a matter of function only [and is thus] related to purpose, not to whether an item is physically intact or not” (p. 163). To substantiate this, Behrooz Parhami [34] provided three interesting examples to show the relevance of such a view point in wider context. One of the examples is quoted here (p. 451): Consider a small organization. Defects in the organization’s staff promotion policies can cause improper promotions, viewed as faults. The resulting ineptitudes & dissatisfactions are errors in the organization’s state. The organization’s personnel or departments probably begin to malfunction as result of the errors, in turn causing an overall degradation of performance. The end result can be the organization’s failure to achieve its goal. There is a fine difference between defects and faults in the above example, that is, execution of a defective policy may lead to a faulty promotion. In a software 10 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES context, a software system may be defective due to design issues; certain system states will expose a defect, resulting in the development of faults defined as incorrect signal values or decisions within the system. In industry, the term defect is widely used, whereas among researchers the term fault is more prevalent. For all practical purpose, the two terms are synonymous. In this book, we use the two terms interchangeably as required. 1.6 NOTION OF SOFTWARE RELIABILITY No matter how many times we run the test–find faults–fix cycle during software development, some faults are likely to escape our attention, and these will eventually surface at the customer site. Therefore, a quantitative measure that is useful in assessing the quality of a software is its reliability [35]. Software reliability is defined as the probability of failure-free operation of a software system for a specified time in a specified environment. The level of reliability of a system depends on those inputs that cause failures to be observed by the end users. Software reliability can be estimated via random testing, as suggested by Hamlet [36]. Since the notion of reliability is specific to a “specified environment,” test data must be drawn from the input distribution to closely resemble the future usage of the system. Capturing the future usage pattern of a system in a general sense is described in a form called the operational profile. The concept of operational profile of a system was pioneered by John D. Musa at AT&T Bell Laboratories between the 1970s and the 1990s [37, 38]. 1.7 OBJECTIVES OF TESTING The stakeholders in a test process are the programmers, the test engineers, the project managers, and the customers. A stakeholder is a person or an organization who influences a system’s behaviors or who is impacted by that system [39]. Different stakeholders view a test process from different perspectives as explained below: • It does work: While implementing a program unit, the programmer may want to test whether or not the unit works in normal circumstances. The programmer gets much confidence if the unit works to his or her satisfaction. The same idea applies to an entire system as well—once a system has been integrated, the developers may want to test whether or not the system performs the basic functions. Here, for the psychological reason, the objective of testing is to show that the system works, rather than it does not work. • It does not work: Once the programmer (or the development team) is satisfied that a unit (or the system) works to a certain degree, more tests are conducted with the objective of finding faults in the unit (or the system). Here, the idea is to try to make the unit (or the system) fail. 1.8 WHAT IS A TEST CASE? 11 • Reduce the risk of failure: Most of the complex software systems contain faults, which cause the system to fail from time to time. This concept of “failing from time to time” gives rise to the notion of failure rate. As faults are discovered and fixed while performing more and more tests, the failure rate of a system generally decreases. Thus, a higher level objective of performing tests is to bring down the risk of failing to an acceptable level. • Reduce the cost of testing: The different kinds of costs associated with a test process include the cost of designing, maintaining, and executing test cases, the cost of analyzing the result of executing each test case, the cost of documenting the test cases, and the cost of actually executing the system and documenting it. Therefore, the less the number of test cases designed, the less will be the associated cost of testing. However, producing a small number of arbitrary test cases is not a good way of saving cost. The highest level of objective of performing tests is to produce low-risk software with fewer number of test cases. This idea leads us to the concept of effectiveness of test cases. Test engineers must therefore judiciously select fewer, effective test cases. 1.8 WHAT IS A TEST CASE? In its most basic form, a test case is a simple pair of < input, expected outcome >. If a program under test is expected to compute the square root of nonnegative numbers, then four examples of test cases are as shown in Figure 1.3. In stateless systems, where the outcome depends solely on the current input, test cases are very simple in structure, as shown in Figure 1.3. A program to compute the square root of nonnegative numbers is an example of a stateless system. A compiler for the C programming language is another example of a stateless system. A compiler is a stateless system because to compile a program it does not need to know about the programs it compiled previously. In state-oriented systems, where the program outcome depends both on the current state of the system and the current input, a test case may consist of a TB1: < 0, 0 >, TB2: < 25, 5 >, TB3: < 40, 6.3245553 >, TB4: < 100.5, 10.024968 >. Figure 1.3 Examples of basic test cases. 12 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES TS1: < check balance, $500.00 >, < withdraw, ‘‘amount?’’ >, < $200.00, ‘‘$200.00’’ >, < check balance, $300.00 > . Figure 1.4 Example of a test case with a sequence of < input, expected outcome >. sequence of < input, expected outcome > pairs. A telephone switching system and an automated teller machine (ATM) are examples of state-oriented systems. For an ATM machine, a test case for testing the withdraw function is shown in Figure 1.4. Here, we assume that the user has already entered validated inputs, such as the cash card and the personal identification number (PIN). In the test case TS1, “check balance” and “withdraw” in the first, second, and fourth tuples represent the pressing of the appropriate keys on the ATM keypad. It is assumed that the user account has $500.00 on it, and the user wants to withdraw an amount of $200.00. The expected outcome “$200.00” in the third tuple represents the cash dispensed by the ATM. After the withdrawal operation, the user makes sure that the remaining balance is $300.00. For state-oriented systems, most of the test cases include some form of decision and timing in providing input to the system. A test case may include loops and timers, which we do not show at this moment. 1.9 EXPECTED OUTCOME An outcome of program execution is a complex entity that may include the following: • Values produced by the program: Outputs for local observation (integer, text, audio, image) Outputs (messages) for remote storage, manipulation, or observation • State change: State change of the program State change of the database (due to add, delete, and update operations) • A sequence or set of values which must be interpreted together for the outcome to be valid An important concept in test design is the concept of an oracle. An oracle is any entity—program, process, human expert, or body of data—that tells us the expected outcome of a particular test or set of tests [40]. A test case is meaningful only if it is possible to decide on the acceptability of the result produced by the program under test. Ideally, the expected outcome of a test should be computed while designing the test case. In other words, the test outcome is computed before the program is 1.11 CENTRAL ISSUE IN TESTING 13 executed with the selected test input. The idea here is that one should be able to compute the expected outcome from an understanding of the program’s requirements. Precomputation of the expected outcome will eliminate any implementation bias in case the test case is designed by the developer. In exceptional cases, where it is extremely difficult, impossible, or even undesirable to compute a single expected outcome, one should identify expected outcomes by examining the actual test outcomes, as explained in the following: 1. Execute the program with the selected input. 2. Observe the actual outcome of program execution. 3. Verify that the actual outcome is the expected outcome. 4. Use the verified actual outcome as the expected outcome in subsequent runs of the test case. 1.10 CONCEPT OF COMPLETE TESTING It is not unusual to find people making claims such as “I have exhaustively tested the program.” Complete, or exhaustive, testing means there are no undiscovered faults at the end of the test phase. All problems must be known at the end of complete testing. For most of the systems, complete testing is near impossible because of the following reasons: • The domain of possible inputs of a program is too large to be completely used in testing a system. There are both valid inputs and invalid inputs. The program may have a large number of states. There may be timing constraints on the inputs, that is, an input may be valid at a certain time and invalid at other times. An input value which is valid but is not properly timed is called an inopportune input. The input domain of a system can be very large to be completely used in testing a program. • The design issues may be too complex to completely test. The design may have included implicit design decisions and assumptions. For example, a programmer may use a global variable or a static variable to control program execution. • It may not be possible to create all possible execution environments of the system. This becomes more significant when the behavior of the software system depends on the real, outside world, such as weather, temperature, altitude, pressure, and so on. 1.11 CENTRAL ISSUE IN TESTING We must realize that though the outcome of complete testing, that is, discovering all faults, is highly desirable, it is a near-impossible task, and it may not be attempted. The next best thing is to select a subset of the input domain to test a program. 14 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES Input domain D Program P D1 D2 Apply inputs P1 Observe outcome P2 Figure 1.5 Subset of the input domain exercising a subset of the program behavior. Referring to Figure 1.5, let D be the input domain of a program P . Suppose that we select a subset D1 of D, that is, D1 ⊂ D, to test program P . It is possible that D1 exercises only a part P 1, that is, P 1 ⊂ P , of the execution behavior of P , in which case faults with the other part, P2, will go undetected. By selecting a subset of the input domain D1, the test engineer attempts to deduce properties of an entire program P by observing the behavior of a part P 1 of the entire behavior of P on selected inputs D1. Therefore, selection of the subset of the input domain must be done in a systematic and careful manner so that the deduction is as accurate and complete as possible. For example, the idea of coverage is considered while selecting test cases. 1.12 TESTING ACTIVITIES In order to test a program, a test engineer must perform a sequence of testing activities. Most of these activities have been shown in Figure 1.6 and are explained in the following. These explanations focus on a single test case. • Identify an objective to be tested: The first activity is to identify an objective to be tested. The objective defines the intention, or purpose, of designing one or more test cases to ensure that the program supports the objective. A clear purpose must be associated with every test case. Compute expected outcome for the selected input Selected input Program (P) Observe actual outcome Result analysis Environment Assign a test verdict Figure 1.6 Different activities in program testing. 1.12 TESTING ACTIVITIES 15 • Select inputs: The second activity is to select test inputs. Selection of test inputs can be based on the requirements specification, the source code, or our expectations. Test inputs are selected by keeping the test objective in mind. • Compute the expected outcome: The third activity is to compute the expected outcome of the program with the selected inputs. In most cases, this can be done from an overall, high-level understanding of the test objective and the specification of the program under test. • Set up the execution environment of the program: The fourth step is to prepare the right execution environment of the program. In this step all the assumptions external to the program must be satisfied. A few examples of assumptions external to a program are as follows: Initialize the local system, external to the program. This may include making a network connection available, making the right database system available, and so on. Initialize any remote, external system (e.g., remote partner process in a distributed application.) For example, to test the client code, we may need to start the server at a remote site. • Execute the program: In the fifth step, the test engineer executes the program with the selected inputs and observes the actual outcome of the program. To execute a test case, inputs may be provided to the program at different physical locations at different times. The concept of test coordination is used in synchronizing different components of a test case. • Analyze the test result: The final test activity is to analyze the result of test execution. Here, the main task is to compare the actual outcome of program execution with the expected outcome. The complexity of comparison depends on the complexity of the data to be observed. The observed data type can be as simple as an integer or a string of characters or as complex as an image, a video, or an audio clip. At the end of the analysis step, a test verdict is assigned to the program. There are three major kinds of test verdicts, namely, pass, fail , and inconclusive, as explained below. If the program produces the expected outcome and the purpose of the test case is satisfied, then a pass verdict is assigned. If the program does not produce the expected outcome, then a fail verdict is assigned. However, in some cases it may not be possible to assign a clear pass or fail verdict. For example, if a timeout occurs while executing a test case on a distributed application, we may not be in a position to assign a clear pass or fail verdict. In those cases, an inconclusive test verdict is assigned. An inconclusive test verdict means that further tests are needed to be done to refine the inconclusive verdict into a clear pass or fail verdict. 16 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES A test report must be written after analyzing the test result. The motivation for writing a test report is to get the fault fixed if the test revealed a fault. A test report contains the following items to be informative: Explain how to reproduce the failure. Analyze the failure to be able to describe it. A pointer to the actual outcome and the test case, complete with the input, the expected outcome, and the execution environment. 1.13 TEST LEVELS Testing is performed at different levels involving the complete system or parts of it throughout the life cycle of a software product. A software system goes through four stages of testing before it is actually deployed. These four stages are known as unit, integration, system, and acceptance level testing. The first three levels of testing are performed by a number of different stakeholders in the development organization, where as acceptance testing is performed by the customers. The four stages of testing have been illustrated in the form of what is called the classical V model in Figure 1.7. In unit testing, programmers test individual program units, such as a procedures, functions, methods, or classes, in isolation. After ensuring that individual units work to a satisfactory extent, modules are assembled to construct larger subsystems by following integration testing techniques. Integration testing is jointly performed by software developers and integration test engineers. The objective of Development Requirements Testing Acceptance High-level design System Detailed design Integration Coding Unit Legend Validation Verification Figure 1.7 Development and testing phases in the V model. 1.13 TEST LEVELS 17 integration testing is to construct a reasonably stable system that can withstand the rigor of system-level testing. System-level testing includes a wide spectrum of testing, such as functionality testing, security testing, robustness testing, load testing, stability testing, stress testing, performance testing, and reliability testing. System testing is a critical phase in a software development process because of the need to meet a tight schedule close to delivery date, to discover most of the faults, and to verify that fixes are working and have not resulted in new faults. System testing comprises a number of distinct activities: creating a test plan, designing a test suite, preparing test environments, executing the tests by following a clear strategy, and monitoring the process of test execution. Regression testing is another level of testing that is performed throughout the life cycle of a system. Regression testing is performed whenever a component of the system is modified. The key idea in regression testing is to ascertain that the modification has not introduced any new faults in the portion that was not subject to modification. To be precise, regression testing is not a distinct level of testing. Rather, it is considered as a subphase of unit, integration, and system-level testing, as illustrated in Figure 1.8 [41]. In regression testing, new tests are not designed. Instead, tests are selected, prioritized, and executed from the existing pool of test cases to ensure that nothing is broken in the new version of the software. Regression testing is an expensive process and accounts for a predominant portion of testing effort in the industry. It is desirable to select a subset of the test cases from the existing pool to reduce the cost. A key question is how many and which test cases should be selected so that the selected test cases are more likely to uncover new faults [42–44]. After the completion of system-level testing, the product is delivered to the customer. The customer performs their own series of tests, commonly known as acceptance testing. The objective of acceptance testing is to measure the quality of the product, rather than searching for the defects, which is objective of system testing. A key notion in acceptance testing is the customer’s expectations from the system. By the time of acceptance testing, the customer should have developed their acceptance criteria based on their own expectations from the system. There are two kinds of acceptance testing as explained in the following: • User acceptance testing (UAT) • Business acceptance testing (BAT) Regression testing Unit testing Integration testing System testing Acceptance testing Figure 1.8 Regression testing at different software testing levels. (From ref. 41. © 2005 John Wiley & Sons.) 18 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES User acceptance testing is conducted by the customer to ensure that the system satisfies the contractual acceptance criteria before being signed off as meeting user needs. On the other hand, BAT is undertaken within the supplier’s development organization. The idea in having a BAT is to ensure that the system will eventually pass the user acceptance test. It is a rehearsal of UAT at the supplier’s premises. 1.14 SOURCES OF INFORMATION FOR TEST CASE SELECTION Designing test cases has continued to stay in the foci of the research community and the practitioners. A software development process generates a large body of information, such as requirements specification, design document, and source code. In order to generate effective tests at a lower cost, test designers analyze the following sources of information: • Requirements and functional specifications • Source code • Input and output domains • Operational profile • Fault model Requirements and Functional Specifications The process of software development begins by capturing user needs. The nature and amount of user needs identified at the beginning of system development will vary depending on the specific life-cycle model to be followed. Let us consider a few examples. In the Waterfall model [45] of software development, a requirements engineer tries to capture most of the requirements. On the other hand, in an agile software development model, such as XP [29] or the Scrum [46–48], only a few requirements are identified in the beginning. A test engineer considers all the requirements the program is expected to meet whichever life-cycle model is chosen to test a program. The requirements might have been specified in an informal manner, such as a combination of plaintext, equations, figures, and flowcharts. Though this form of requirements specification may be ambiguous, it is easily understood by customers. For example, the Bluetooth specification consists of about 1100 pages of descriptions explaining how various subsystems of a Bluetooth interface is expected to work. The specification is written in plaintext form supplemented with mathematical equations, state diagrams, tables, and figures. For some systems, requirements may have been captured in the form of use cases, entity–relationship diagrams, and class diagrams. Sometimes the requirements of a system may have been specified in a formal language or notation, such as Z, SDL, Estelle, or finite-state machine. Both the informal and formal specifications are prime sources of test cases [49]. 1.14 SOURCES OF INFORMATION FOR TEST CASE SELECTION 19 Source Code Whereas a requirements specification describes the intended behavior of a system, the source code describes the actual behavior of the system. High-level assumptions and constraints take concrete form in an implementation. Though a software designer may produce a detailed design, programmers may introduce additional details into the system. For example, a step in the detailed design can be “sort array A.” To sort an array, there are many sorting algorithms with different characteristics, such as iteration, recursion, and temporarily using another array. Therefore, test cases must be designed based on the program [50]. Input and Output Domains Some values in the input domain of a program have special meanings, and hence must be treated separately [5]. To illustrate this point, let us consider the factorial function. The factorial of a nonnegative integer n is computed as follows: factorial(0) = 1; factorial(1) = 1; factorial(n) = n * factorial(n-1); A programmer may wrongly implement the factorial function as factorial(n) = 1 * 2 * ... * n; without considering the special case of n = 0. The above wrong implementation will produce the correct result for all positive values of n, but will fail for n = 0. Sometimes even some output values have special meanings, and a program must be tested to ensure that it produces the special values for all possible causes. In the above example, the output value 1 has special significance: (i) it is the minimum value computed by the factorial function and (ii) it is the only value produced for two different inputs. In the integer domain, the values 0 and 1 exhibit special characteristics if arithmetic operations are performed. These characteristics are 0 × x = 0 and 1 × x = x for all values of x . Therefore, all the special values in the input and output domains of a program must be considered while testing the program. Operational Profile As the term suggests, an operational profile is a quantitative characterization of how a system will be used. It was created to guide test engineers in selecting test cases (inputs) using samples of system usage. The notion of operational profiles, or usage profiles, was developed by Mills et al. [52] at IBM in the context of Cleanroom Software Engineering and by Musa [37] at AT&T Bell Laboratories to help develop software systems with better reliability. The idea is to infer, from the observed test results, the future reliability of the software when it is in actual use. To do this, test inputs are assigned a probability distribution, or profile, according to their occurrences in actual operation. The ways test engineers assign probability and select test cases to operate a system may significantly differ from the ways actual users operate a system. However, for accurate estimation of the reliability of a system it is important to test a system by considering the ways it will actually be used in the field. This concept is being used to test web 20 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES applications, where the user session data are collected from the web servers to select test cases [53, 54]. Fault Model Previously encountered faults are an excellent source of information in designing new test cases. The known faults are classified into different classes, such as initialization faults, logic faults, and interface faults, and stored in a repository [55, 56]. Test engineers can use these data in designing tests to ensure that a particular class of faults is not resident in the program. There are three types of fault-based testing: error guessing, fault seeding, and mutation analysis. In error guessing, a test engineer applies his experience to (i) assess the situation and guess where and what kinds of faults might exist, and (ii) design tests to specifically expose those kinds of faults. In fault seeding, known faults are injected into a program, and the test suite is executed to assess the effectiveness of the test suite. Fault seeding makes an assumption that a test suite that finds seeded faults is also likely to find other faults. Mutation analysis is similar to fault seeding, except that mutations to program statements are made in order to determine the fault detection capability of the test suite. If the test cases are not capable of revealing such faults, the test engineer may specify additional test cases to reveal the faults. Mutation testing is based on the idea of fault simulation, whereas fault seeding is based on the idea of fault injection. In the fault injection approach, a fault is inserted into a program, and an oracle is available to assert that the inserted fault indeed made the program incorrect. On the other hand, in fault simulation, a program modification is not guaranteed to lead to a faulty program. In fault simulation, one may modify an incorrect program and turn it into a correct program. 1.15 WHITE-BOX AND BLACK-BOX TESTING A key idea in Section 1.14 was that test cases need to be designed by considering information from several sources, such as the specification, source code, and special properties of the program’s input and output domains. This is because all those sources provide complementary information to test designers. Two broad concepts in testing, based on the sources of information for test design, are white-box and black-box testing. White-box testing techniques are also called structural testing techniques, whereas black-box testing techniques are called functional testing techniques. In structural testing, one primarily examines source code with a focus on control flow and data flow. Control flow refers to flow of control from one instruction to another. Control passes from one instruction to another instruction in a number of ways, such as one instruction appearing after another, function call, message passing, and interrupts. Conditional statements alter the normal, sequential flow of control in a program. Data flow refers to the propagation of values from one variable or constant to another variable. Definitions and uses of variables determine the data flow aspect in a program. 1.16 TEST PLANNING AND DESIGN 21 In functional testing, one does not have access to the internal details of a program and the program is treated as a black box. A test engineer is concerned only with the part that is accessible outside the program, that is, just the input and the externally visible outcome. A test engineer applies input to a program, observes the externally visible outcome of the program, and determines whether or not the program outcome is the expected outcome. Inputs are selected from the program’s requirements specification and properties of the program’s input and output domains. A test engineer is concerned only with the functionality and the features found in the program’s specification. At this point it is useful to identify a distinction between the scopes of structural testing and functional testing. One applies structural testing techniques to individual units of a program, whereas functional testing techniques can be applied to both an entire system and the individual program units. Since individual programmers know the details of the source code they write, they themselves perform structural testing on the individual program units they write. On the other hand, functional testing is performed at the external interface level of a system, and it is conducted by a separate software quality assurance group. Let us consider a program unit U which is a part of a larger program P . A program unit is just a piece of source code with a well-defined objective and well-defined input and output domains. Now, if a programmer derives test cases for testing U from a knowledge of the internal details of U , then the programmer is said to be performing structural testing. On the other hand, if the programmer designs test cases from the stated objective of the unit U and from his or her knowledge of the special properties of the input and output domains of U , then he or she is said to be performing functional testing on the same unit U . The ideas of structural testing and functional testing do not give programmers and test engineers a choice of whether to design test cases from the source code or from the requirements specification of a program. However, these strategies are used by different groups of people at different times during a software’s life cycle. For example, individual programmers use both the structural and functional testing techniques to test their own code, whereas quality assurance engineers apply the idea of functional testing. Neither structural testing nor functional testing is by itself good enough to detect most of the faults. Even if one selects all possible inputs, a structural testing technique cannot detect all faults if there are missing paths in a program. Intuitively, a path is said to be missing if there is no code to handle a possible condition. Similarly, without knowledge of the structural details of a program, many faults will go undetected. Therefore, a combination of both structural and functional testing techniques must be used in program testing. 1.16 TEST PLANNING AND DESIGN The purpose of system test planning, or simply test planning, is to get ready and organized for test execution. A test plan provides a framework, scope, details of resource needed, effort required, schedule of activities, and a budget. A framework 22 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES is a set of ideas, facts, or circumstances within which the tests will be conducted. The stated scope outlines the domain, or extent, of the test activities. The scope covers the managerial aspects of testing, rather than the detailed techniques and specific test cases. Test design is a critical phase of software testing. During the test design phase, the system requirements are critically studied, system features to be tested are thoroughly identified, and the objectives of test cases and the detailed behavior of test cases are defined. Test objectives are identified from different sources, namely, the requirement specification and the functional specification, and one or more test cases are designed for each test objective. Each test case is designed as a combination of modular test components called test steps. These test steps can be combined together to create more complex, multistep tests. A test case is clearly specified so that others can easily borrow, understand, and reuse it. It is interesting to note that a new test-centric approach to system development is gradually emerging. This approach is called test-driven development (TDD) [57]. In test-driven development, programmers design and implement test cases before the production code is written. This approach is a key practice in modern agile software development processes such as XP. The main characteristics of agile software development processes are (i) incremental development, (ii) coding of unit and acceptance tests conducted by the programmers along with customers, (iii) frequent regression testing, and (iv) writing test code, one test case at a time, before the production code. 1.17 MONITORING AND MEASURING TEST EXECUTION Monitoring and measurement are two key principles followed in every scientific and engineering endeavor. The same principles are also applicable to the testing phases of software development. It is important to monitor certain metrics which truly represent the progress of testing and reveal the quality level of the system. Based on those metrics, the management can trigger corrective and preventive actions. By putting a small but critical set of metrics in place the executive management will be able to know whether they are on the right track [58]. Test execution metrics can be broadly categorized into two classes as follows: • Metrics for monitoring test execution • Metrics for monitoring defects The first class of metrics concerns the process of executing test cases, whereas the second class concerns the defects found as a result of test execution. These metrics need to be tracked and analyzed on a periodic basis, say, daily or weekly. In order to effectively control a test project, it is important to gather valid and accurate information about the project. One such example is to precisely know when to trigger revert criteria for a test cycle and initiate root cause analysis of 1.17 MONITORING AND MEASURING TEST EXECUTION 23 the problems before more tests can be performed. By triggering such a revert criteria, a test manager can effectively utilize the time of test engineers, and possibly money, by suspending a test cycle on a product with too many defects to carry out a meaningful system test. A management team must identify and monitor metrics while testing is in progress so that important decisions can be made [59]. It is important to analyze and understand the test metrics, rather than just collect data and make decisions based on those raw data. Metrics are meaningful only if they enable the management to make decisions which result in lower cost of production, reduced delay in delivery, and improved quality of software systems. Quantitative evaluation is important in every scientific and engineering field. Quantitative evaluation is carried out through measurement. Measurement lets one evaluate parameters of interest in a quantitative manner as follows: • Evaluate the effectiveness of a technique used in performing a task. One can evaluate the effectiveness of a test generation technique by counting the number of defects detected by test cases generated by following the technique and those detected by test cases generated by other means. • Evaluate the productivity of the development activities. One can keep track of productivity by counting the number of test cases designed per day, the number of test cases executed per day, and so on. • Evaluate the quality of the product. By monitoring the number of defects detected per week of testing, one can observe the quality level of the system. • Evaluate the product testing. For evaluating a product testing process, the following two measurements are critical: Test case effectiveness metric: The objective of this metric is twofold as explained in what follows: (1) measure the “defect revealing ability” of the test suite and (2) use the metric to improve the test design process. During the unit, integration, and system testing phases, faults are revealed by executing the planned test cases. In addition to these faults, new faults are also found during a testing phase for which no test cases had been designed. For these new faults, new test cases are added to the test suite. Those new test cases are called test case escaped (TCE). Test escapes occur because of deficiencies in test design. The need for more testing occurs as test engineers get new ideas while executing the planned test cases. Test effort effectiveness metric: It is important to evaluate the effectiveness of the testing effort in the development of a product. After a product is deployed at the customer’s site, one is interested to know the effectiveness of testing that was performed. A common measure of test effectiveness is the number of defects found by the customers that were not found by the test engineers prior to the release of the product. These defects had escaped our test effort. 24 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES 1.18 TEST TOOLS AND AUTOMATION In general, software testing is a highly labor intensive task. This is because test cases are to a great extent manually generated and often manually executed. Moreover, the results of test executions are manually analyzed. The durations of those tasks can be shortened by using appropriate tools. A test engineer can use a variety of tools, such as a static code analyzer, a test data generator, and a network analyzer, if a network-based application or protocol is under test. Those tools are useful in increasing the efficiency and effectiveness of testing. Test automation is essential for any testing and quality assurance division of an organization to move forward to become more efficient. The benefits of test automation are as follows: • Increased productivity of the testers • Better coverage of regression testing • Reduced durations of the testing phases • Reduced cost of software maintenance • Increased effectiveness of test cases Test automation provides an opportunity to improve the skills of the test engineers by writing programs, and hence their morale. They will be more focused on developing automated test cases to avoid being a bottleneck in product delivery to the market. Consequently, software testing becomes less of a tedious job. Test automation improves the coverage of regression testing because of accumulation of automated test cases over time. Automation allows an organization to create a rich library of reusable test cases and facilitates the execution of a consistent set of test cases. Here consistency means our ability to produce repeated results for the same set of tests. It may be very difficult to reproduce test results in manual testing, because exact conditions at the time and point of failure may not be precisely known. In automated testing it is easier to set up the initial conditions of a system, thereby making it easier to reproduce test results. Test automation simplifies the debugging work by providing a detailed, unambiguous log of activities and intermediate test steps. This leads to a more organized, structured, and reproducible testing approach. Automated execution of test cases reduces the elapsed time for testing, and, thus, it leads to a shorter time to market. The same automated test cases can be executed in an unsupervised manner at night, thereby efficiently utilizing the different platforms, such as hardware and configuration. In short, automation increases test execution efficiency. However, at the end of test execution, it is important to analyze the test results to determine the number of test cases that passed or failed. And, if a test case failed, one analyzes the reasons for its failure. In the long run, test automation is cost-effective. It drastically reduces the software maintenance cost. In the sustaining phase of a software system, the regression tests required after each change to the system are too many. As a result, regression testing becomes too time and labor intensive without automation. 1.18 TEST TOOLS AND AUTOMATION 25 A repetitive type of testing is very cumbersome and expensive to perform manually, but it can be automated easily using software tools. A simple repetitive type of application can reveal memory leaks in a software. However, the application has to be run for a significantly long duration, say, for weeks, to reveal memory leaks. Therefore, manual testing may not be justified, whereas with automation it is easy to reveal memory leaks. For example, stress testing is a prime candidate for automation. Stress testing requires a worst-case load for an extended period of time, which is very difficult to realize by manual means. Scalability testing is another area that can be automated. Instead of creating a large test bed with hundreds of equipment, one can develop a simulator to verify the scalability of the system. Test automation is very attractive, but it comes with a price tag. Sufficient time and resources need to be allocated for the development of an automated test suite. Development of automated test cases need to be managed like a programming project. That is, it should be done in an organized manner; otherwise it is highly likely to fail. An automated test suite may take longer to develop because the test suite needs to be debugged before it can be used for testing. Sufficient time and resources need to be allocated for maintaining an automated test suite and setting up a test environment. Moreover, every time the system is modified, the modification must be reflected in the automated test suite. Therefore, an automated test suite should be designed as a modular system, coordinated into reusable libraries, and cross-referenced and traceable back to the feature being tested. It is important to remember that test automation cannot replace manual testing. Human creativity, variability, and observability cannot be mimicked through automation. Automation cannot detect some problems that can be easily observed by a human being. Automated testing does not introduce minor variations the way a human can. Certain categories of tests, such as usability, interoperability, robustness, and compatibility, are often not suited for automation. It is too difficult to automate all the test cases; usually 50% of all the system-level test cases can be automated. There will always be a need for some manual testing, even if all the system-level test cases are automated. The objective of test automation is not to reduce the head counts in the testing department of an organization, but to improve the productivity, quality, and efficiency of test execution. In fact, test automation requires a larger head count in the testing department in the first year, because the department needs to automate the test cases and simultaneously continue the execution of manual tests. Even after the completion of the development of a test automation framework and test case libraries, the head count in the testing department does not drop below its original level. The test organization needs to retain the original team members in order to improve the quality by adding more test cases to the automated test case repository. Before a test automation project can proceed, the organization must assess and address a number of considerations. The following list of prerequisites must be considered for an assessment of whether the organization is ready for test automation: • The test cases to be automated are well defined. • Test tools and an infrastructure are in place. 26 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES • The test automation professionals have prior successful experience in automation. • Adequate budget should have been allocated for the procurement of software tools. 1.19 TEST TEAM ORGANIZATION AND MANAGEMENT Testing is a distributed activity conducted at different levels throughout the life cycle of a software. These different levels are unit testing, integration testing, system testing, and acceptance testing. It is logical to have different testing groups in an organization for each level of testing. However, it is more logical—and is the case in reality—that unit-level tests be developed and executed by the programmers themselves rather than an independent group of unit test engineers. The programmer who develops a software unit should take the ownership and responsibility of producing good-quality software to his or her satisfaction. System integration testing is performed by the system integration test engineers. The integration test engineers involved need to know the software modules very well. This means that all development engineers who collectively built all the units being integrated need to be involved in integration testing. Also, the integration test engineers should thoroughly know the build mechanism, which is key to integrating large systems. A team for performing system-level testing is truly separated from the development team, and it usually has a separate head count and a separate budget. The mandate of this group is to ensure that the system requirements have been met and the system is acceptable. Members of the system test group conduct different categories of tests, such as functionality, robustness, stress, load, scalability, reliability, and performance. They also execute business acceptance tests identified in the user acceptance test plan to ensure that the system will eventually pass user acceptance testing at the customer site. However, the real user acceptance testing is executed by the client’s special user group. The user group consists of people from different backgrounds, such as software quality assurance engineers, business associates, and customer support engineers. It is a common practice to create a temporary user acceptance test group consisting of people with different backgrounds, such as integration test engineers, system test engineers, customer support engineers, and marketing engineers. Once the user acceptance is completed, the group is dismantled. It is recommended to have at least two test groups in an organization: integration test group and system test group. Hiring and retaining test engineers are challenging tasks. Interview is the primary mechanism for evaluating applicants. Interviewing is a skill that improves with practice. It is necessary to have a recruiting process in place in order to be effective in hiring excellent test engineers. In order to retain test engineers, the management must recognize the importance of testing efforts at par with development efforts. The management should treat the test engineers as professionals and as a part of the overall team that delivers quality products. 1.20 OUTLINE OF BOOK 27 1.20 OUTLINE OF BOOK With the above high-level introduction to quality and software testing, we are now in a position to outline the remaining chapters. Each chapter in the book covers technical, process, and/or managerial topics related to software testing. The topics have been designed and organized to facilitate the reader to become a software test specialist. In Chapter 2 we provide a self-contained introduction to the theory and limitations of software testing. Chapters 3–6 treat unit testing techniques one by one, as quantitatively as possible. These chapters describe both static and dynamic unit testing. Static unit testing has been presented within a general framework called code review , rather than individual techniques called inspection and walkthrough. Dynamic unit testing, or execution-based unit testing, focuses on control flow, data flow, and domain testing. The JUnit framework, which is used to create and execute dynamic unit tests, is introduced. We discuss some tools for effectively performing unit testing. Chapter 7 discusses the concept of integration testing. Specifically, five kinds of integration techniques, namely, top down, bottom up, sandwich, big bang, and incremental, are explained. Next, we discuss the integration of hardware and software components to form a complete system. We introduce a framework to develop a plan for system integration testing. The chapter is completed with a brief discussion of integration testing of off-the-shelf components. Chapters 8–13 discuss various aspects of system-level testing. These six chapters introduce the reader to the technical details of system testing that is the practice in industry. These chapters promote both qualitative and quantitative evaluation of a system testing process. The chapters emphasize the need for having an independent system testing group. A process for monitoring and controlling system testing is clearly explained. Chapter 14 is devoted to acceptance testing, which includes acceptance testing criteria, planning for acceptance testing, and acceptance test execution. Chapter 15 contains the fundamental concepts of software reliability and their application to software testing. We discuss the notion of operation profile and its application in system testing. We conclude the chapter with the description of an example and the time of releasing a system by determining the additional length of system testing. The additional testing time is calculated by using the idea of software reliability. In Chapter 16, we present the structure of test groups and how these groups can be organized in a software company. Next, we discuss how to hire and retain test engineers by providing training, instituting a reward system, and establishing an attractive career path for them within the testing organization. We conclude this chapter with the description of how to build and manage a test team with a focus on teamwork rather than individual gain. Chapters 17 and 18 explain the concepts of software quality and different maturity models. Chapter 17 focuses on quality factors and criteria and describes the ISO 9126 and ISO 9000:2000 standards. Chapter 18 covers the CMM, which 28 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES was developed by the SEI at Carnegie Mellon University. Two test-related models, namely the TPI model and the TMM, are explained at the end of Chapter 18. We define the key words used in the book in a glossary at the end of the book. The reader will find about 10 practice exercises at the end of each chapter. A list of references is included at the end of each chapter for a reader who would like to find more detailed discussions of some of the topics. Finally, each chapter, except this one, contains a literature review section that, essentially, provides pointers to more advanced material related to the topics. The more advanced materials are based on current research and alternate viewpoints. REFERENCES 1. B. Davis, C. Skube, L. Hellervik, S. Gebelein, and J. Sheard. Successful Manager’s Handbook . Personnel Decisions International, Minneapolis, 1996. 2. M. Walton. The Deming Management Method . The Berkley Publishing Group, New York, 1986. 3. W. E. Deming. Transcript of Speech to GAO Roundtable on Product Quality—Japan vs. the United States. Quality Progress, March 1994, pp. 39–44. 4. W. A. Shewhart. Economic Control of Quality of Manufactured Product. Van Nostrand, New York, 1931. 5. W. A. Shewhart. The Application of Statistics as an Aid in Maintaining Quality of a Manufactured Product. Journal of American Statistical Association, December 1925, pp. 546– 548. 6. National Institute of Standards and Technology, Baldridge National Quality Program, 2008. Avail- able: http://www.quality.nist.gov/. 7. J. Liker and D. Meier. The Toyota Way Fieldbook . McGraw-Hill, New York, 2005. 8. M. Poppendieck and T. Poppendieck. Implementing Lean Software Development: From Concept to Cash. Addison-Wesley, Reading, MA, 2006. 9. A. B. Godfrey and A. I. C. Endres. The Evolution of Quality Management Within Telecommuni- cations. IEEE Communications Magazine, October 1994, pp. 26– 34. 10. M. Pecht and W. R. Boulton. Quality Assurance and Reliability in the Japanese Electronics Industry. Japanses Technology Evaluation Center (JTEC), Report on Electronic Manufacturing and Packaging in Japan, W. R. Boulton, Ed. International Technology Research Institute at Loyola College, February 1995, pp. 115–126. 11. A. V. Feigenbaum. Total Quality Control , 4th ed. McGraw-Hill, New York, 2004. 12. K. Ishikawa. What Is Total Quality Control . Prentice-Hall, Englewood Cliffs, NJ, 1985. 13. A. Cockburn. What Engineering Has in Common With Manufacturing and Why It Matters. Crosstalk, the Journal of Defense Software Engineering, April 2007, pp. 4–7. 14. S. Land. Jumpstart CMM/CMMI Software Process Improvement. Wiley, Hoboken, NJ, 2005. 15. W. E. Deming. Out of the Crisis. MIT, Cambridge, MA, 1986. 16. J. M. Juran and A. B. Godfrey. Juran’s Quality Handbook , 5th ed. McGraw-Hill, New York, 1998. 17. P. Crosby. Quality Is Free. New American Library, New York, 1979. 18. D. A. Garvin. What Does “Product Quality” Really Mean? Sloan Management Review , Fall 1984, pp. 25–43. 19. J. A. McCall, P. K. Richards, and G. F. Walters. Factors in Software Quality, Technical Report RADC-TR-77-369. U.S. Department of Commerce, Washington, DC, 1977. 20. International Organization for Standardization (ISO). Quality Management Systems—Fundamentals and Vocabulary, ISO 9000:2000. ISO, Geneva, December 2000. 21. International Organization for Standardization (ISO). Quality Management Systems—Guidelines for Performance Improvements, ISO 9004:2000. ISO, Geneva, December 2000. 22. International Organization for Standardization (ISO). Quality Management Systems—Requirements, ISO 9001:2000. ISO, Geneva, December 2000. 23. T. Koomen and M. Pol. Test Process Improvement. Addison-Wesley, Reading, MA, 1999. REFERENCES 29 24. I. Burnstein. Practical Software Testing. Springer, New York, 2003. 25. L. Osterweil et al. Strategic Directions in Software Quality. ACM Computing Surveys, December 1996, pp. 738– 750. 26. M. A. Friedman and J. M. Voas. Software Assessment: Reliability, Safety, Testability. Wiley, New York, 1995. 27. Michael D. Ernst. Static and Dynamic Analysis: Synergy and Duality. Paper presented at ICSE Workshop on Dynamic Analysis, Portland, OR, May 2003, pp. 24–27. 28. L. Baresi and M. Pezze`. An Introduction to Software Testing, Electronic Notes in Theoretical Com- puter Science. Elsevier, Vol. 148, Feb. 2006, pp. 89–111. 29. K. Beck and C. Andres. Extreme Programming Explained: Embrace Change, 2nd ed. Addison-Wesley, Reading, MA, 2004. 30. B. W. Boehm. Software Engineering Economics. Prentice-Hall, Englewood Cliffs, NJ, 1981. 31. J. C. Laprie. Dependability— Its Attributes, Impairments and Means. In Predictably Dependable Computing Systems, B. Randall, J. C. Laprie, H. Kopetz, and B. Littlewood, Eds. Springer-Verlag, New York, 1995. 32. R. Chillarege. What Is Software Failure. IEEE Transactions on Reliability, September 1996, pp. 354–355. 33. R. Rees. What Is a Failure. IEEE Transactions on Reliability, June 1997, p. 163. 34. B. Parhami. Defect, Fault, Error, . . . , or Failure. IEEE Transactions on Reliability, December 1997, pp. 450–451. 35. M. R. Lyu. Handbook of Software Reliability Engineering. McGraw-Hill, New York, 1995. 36. R. Hamlet. Random Testing. In Encyclopedia of Software Engineering, J. Marciniak, Ed. Wiley, New York, 1994, pp. 970– 978. 37. J. D. Musa. Software Reliability Engineering. IEEE Software, March 1993, pp. 14– 32. 38. J. D. Musa. A Theory of Software Reliability and Its Application. IEEE Transactions on Software Engineering, September 1975, pp. 312– 327. 39. M. Glinz and R. J. Wieringa. Stakeholders in Requirements Engineering. IEEE Software, March–April 2007, pp. 18–20. 40. A. Bertolino and L. Strigini. On the Use of Testability Measures for Dependability Assessment. IEEE Transactions on Software Engineering, February 1996, pp. 97–108. 41. A. Bertolino and E Marchelli. A Brief Essay on Software Testing. In Software Engineering, Vol. 1, The Development Process, 3rd ed., R. H. Thayer and M. J. Christensen, Eds. Wiley–IEEE Computer Society Press, Hoboken, NJ, 2005. 42. D. Jeffrey and N. Gupta. Improving Fault Detection Capability by Selectively Retaining Test Cases during Test Suite Reduction. IEEE Transactions on Software Engineering, February 2007, pp. 108– 123. 43. Z. Li, M. Harman, and R. M. Hierons. Search Algorithms for Regression Test Case Prioritization. IEEE Transactions on Software Engineering, April 2007, pp. 225–237. 44. W. Masri, A. Podgurski, and D. Leon. An Empirical Study of Test Case Filtering Techniques Based on Exercising Information Flows. IEEE Transactions on Software Engineering, July 2007, pp. 454–477. 45. W. W. Royce. Managing the Development of Large Software Systems: Concepts and Techniques. In Proceedings of IEEE WESCON , August 1970, pp. 1–9. Republished in ICSE, Monterey, 1987, pp. 328–338. 46. L. Rising and N. S. Janoff. The Scrum Software Development Process for Small Teams. IEEE Software, July/August 2000, pp. 2–8. 47. K. Schwaber. Agile Project Management with Scrum. Microsoft Press, Redmond, WA, 2004. 48. H. Takeuchi and I. Nonaka. The New Product Development Game. Harvard Business Review , Boston, January-February 1986, pp. 1–11. 49. A. P. Mathur. Foundation of Software Testing. Pearson Education, New Delhi, 2007. 50. P. Ammann and J. Offutt. Introduction to Software Testing. Cambridge University Press, 2008. 51. M. Pezze` and M. Young. Software Testing and Analysis: Process, Principles, and Techniques. Wiley, Hoboken, NJ, 2007. 52. H. D. Mills, M. Dyer, and R. C. Linger. Cleanroom Software Engineering. IEEE Software, September 1987, pp. 19–24. 30 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES 53. S. Elbaum, G. Rothermel, S. Karre, and M. Fisher II. Leveraging User Session Data to Support Web Application Testing. IEEE Transactions on Software Engineering, March 2005, pp. 187– 202. 54. S. Sampath, S. Sprenkle, E. Gibson, L. Pollock, and A. S. Greenwald. Applying Concept Analysis to User-Session-Based Testing of Web Applications. IEEE Transactions on Software Engineering, October 2007, pp. 643– 657. 55. A. Endress. An Analysis of Errors and Their Causes in System Programs. IEEE Transactions on Software Engineering, June 1975, pp. 140– 149. 56. T. J. Ostrand and E. J. Weyuker. Collecting and Categorizing Software Error Data in an Industrial Environment. Journal of Systems and Software, November 1984, pp. 289– 300. 57. K. Beck. Test-Driven Development. Addison-Wesley, Reading, MA, 2003. 58. D. Lemont. CEO Discussion—From Start-up to Market Leader—Breakthrough Milestones. Ernst and Young Milestones, Boston, May 2004, pp. 9–11. 59. G. Stark, R. C. Durst, and C. W. Vowell. Using Metrics in Management Decision Making. IEEE Computer, September 1994, pp. 42–48. 60. B. Kitchenham and S. L. Pfleeger. Software Quality: The Elusive Target. IEEE Software, January 1996, pp. 12–21. 61. J. Kilpatrick. Lean Principles. http://www.mep.org/textfiles/LeanPrinciples.pdf, 2003, pp. 1–5. Exercises 1. Explain the principles of statistical quality control. What are the tools used for this purpose? Explain the principle of a control chart. 2. Explain the concept of lean principles. 3. What is an “Ishikawa” diagram? When should the Ishikawa diagram be used? Provide a procedure to construct an Ishikawa diagram. 4. What is total quality management (TQM)? What is the difference between TQM and TQC? 5. Explain the differences between validation and verification. 6. Explain the differences between failure, error, and fault. 7. What is a test case? What are the objectives of testing? 8. Explain the concepts of unit, integration, system, acceptance, and regression testing. 9. What are the different sources from which test cases can be selected? 10. What is the difference between fault injection and fault simulation? 11. Explain the differences between structural and functional testing. 12. What are the strengths and weaknesses of automated testing and manual testing? 2 CHAPTER Theory of Program Testing He who loves practice without theory is like the sailor who boards [a] ship without a rudder and compass and never knows where he may cast. — Leonardo da Vinci 2.1 BASIC CONCEPTS IN TESTING THEORY The idea of program testing is as old as computer programming. As computer programs got larger and larger since their early days in the 1960s, the need for eliminating defects from them in a systematic manner received more attention. Both the research community and the practitioners became more deeply involved in software testing. Thus, in the 1970s, a new field of research called testing theory emerged. Testing theory puts emphasis on the following: • Detecting defects through execution-based testing • Designing test cases from different sources, namely, requirement specifi- cation, source code, and the input and output domains of programs • Selecting a subset of test cases from the set of all possible test cases [1, 2] • Effectiveness of the test case selection strategy [3–5] • Test oracles used during testing [6, 7] • Prioritizing the execution of the selected test cases [8] • Adequacy analysis of test cases [9–15] A theoretical foundation of testing gives testers and developers valuable insight into software systems and the development processes. As a consequence, testers design more effective test cases at a lower cost. While considering testing theory, there may be a heightened expectation that it lets us detect all the defects in a computer program. Any testing theory must inherit the fundamental limitation of testing. The limitation of testing has been best articulated by Dijkstra: Testing can only reveal the presence of errors, never their absence [16]. In spite of the Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 31 32 CHAPTER 2 THEORY OF PROGRAM TESTING said limitation, testing remains as the most practical and reliable method for defect detection and quality improvement. In this chapter, three well-known testing theories are discussed. These are Goodenough and Gerhart’s theory [17], Weyuker and Ostrand’s theory [18], and Gourlay’s theory [19]. Goodenough and Gerhart introduced some key concepts such as an ideal test, reliability and validity of a test, test selection criteria, thorough test, and five categories of program errors. Weyuker and Ostrand refined some of the above ideas in the form of uniformly reliable criterion, uniformly valid criterion, and uniformly ideal test. Gourlay introduced the concept of a test system and a general method for comparing different test methods. 2.2 THEORY OF GOODENOUGH AND GERHART Goodenough and Gerhart published a seminal paper [17] in 1975 on test data selection. This paper gave a fundamental testing concept, identified a few types of program errors, and gave a theory for selecting test data from the input domain of a program. Though this theory is not without critiques, it is widely quoted and appreciated in the research community of software testing. 2.2.1 Fundamental Concepts Let D be the input domain of a program P. Let T ⊆ D. The result of executing P with input d ∈ D is denoted by P(d ) (Figure 2.1): OK(d): Define a predicate OK(d ) which expresses the acceptability of result P (d ). Thus, OK(d ) = true if and only if P (d ) is an acceptable outcome. SUCCESSFUL(T): For a given T ⊆ D, T is a successful test, denoted by SUCCESSFUL(T ), if and only if, ∀t ∈ T, OK(t). Thus, SUCCESSFUL(T ) = true if and only if, ∀t ∈ T, OK(t). Ideal Test: T constitutes an ideal test if OK(t) ∀t ∈ T ⇒ OK(d) ∀d ∈ D An ideal test is interpreted as follows. If from the successful execution of a sample of the input domain we can conclude that the program contains no errors, then the sample constitutes an ideal test. Practitioners may Input domain D Program P(d) T P Figure 2.1 Executing a program with a subset of the input domain. 2.2 THEORY OF GOODENOUGH AND GERHART 33 loosely interpret “no error” as “not many errors of severe consequences.” The validity of the above definition of an ideal test depends on how “thoroughly” T exercises P . Some people equate thorough test with exhaustive or complete test, in which case T = D. COMPLETE(T, C): A thorough test T is defined to be one satisfying COMPLETE(T ,C ), where COMPLETE is a predicate that defines how some test selection criteria C is used in selecting a particular set of test data T from D. COMPLETE(T , C ) will be defined in a later part of this section. Essentially, C defines the properties of a program that must be exercised to constitute a thorough test. Reliable Criterion: A selection criterion C is reliable if and only if either every test selected by C is successful or no test selected is successful. Thus, reliability refers to consistency. Valid Criterion: A selection criterion C is valid if and only if whenever P is incorrect C selects at least one test set T which is not successful for P . Thus, validity refers to the ability to produce meaningful results. Fundamental Theorem. (∃T ⊆ D)(COMPLETE(T , C) ∧ RELIABLE(C) ∧ VALID(C) ∧ SUCCESSFUL(T )) ⇒ (∀d ∈ D)OK(d) Proof. Let P be a program and D be the set of inputs for P . Let d be a member of D. We assume that P fails on input d . In other words, the actual outcome of executing P with input d is not the same as the expected outcome. In the form of our notation, ¬OK(d ) is true. VALID(C ) implies that there exists a complete set of test data T such that ¬SUCCESSFUL(T ). RELIABLE(C ) implies that if one complete test fails, all tests fail. However, this leads to a contradiction that there exists a complete test that is successfully executed. One may be tempted to find a reliable and valid criterion, if it exists, so that all faults can be detected with a small set of test cases. However, there are several difficulties in applying the above theory, as explained in the following: • Since faults in a program are unknown, it is impossible to prove the reliability and validity of a criterion. A criterion is guaranteed to be both reliable and valid if it selects the entire input domain D. However, this is undesirable and impractical. • Neither reliability nor validity is preserved during the debugging process, where faults keep disappearing. • If the program P is correct, then any test will be successful and every selection criterion is reliable and valid. • If P is not correct, there is in general no way of knowing whether a criterion is ideal without knowing the errors in P . 34 CHAPTER 2 THEORY OF PROGRAM TESTING 2.2.2 Theory of Testing Let D be the input domain of a program P. Let C denote a set of test predicates. If d ∈ D satisfies test predicate c ∈ C , then c(d ) is said to be true. Selecting data to satisfy a test predicate means selecting data to exercise the condition combination in the course of executing P. With the above idea in mind, COMPLETE(T , C ), where T ⊆ D, is defined as follows: COMPLETE(T , C) ≡ (∀c ∈ C)(∃t ∈ T )c(t) ∧ (∀t ∈ T )(∃c ∈ C)c(t) The above theory means that, for every test predicate, we select a test such that the test predicate is satisfied. Also, for every test selected, there exists a test predicate which is satisfied by the selected test. The definitions of an ideal test and thoroughness of a test do not reveal any relationship between them. However, we can establish a relationship between the two in the following way. Let B be the set of faults (or bugs) in a program P revealed by an ideal test T I . Let a test engineer identify a set of test predicates C 1 and design a set of test cases T 1, such that COMPLETE(T 1, C 1) is satisfied. Let B 1 represent the set of faults revealed by T 1. There is no guarantee that T 1 reveals all the faults. Later, the test engineer identifies a larger set of test predicates C 2 such that C 2 ⊃ C 1 and designs a new set of test cases T 2 such that T 2 ⊃ T 1 and COMPLETE(T 2, C 2) is satisfied. Let B 2 be the set of faults revealed by T 2. Assuming that the additional test cases selected reveal more faults, we have B 2 ⊃ B 1. If the test engineer repeats this process, he may ultimately identify a set of test predicates C I and design a set of test cases T I such that COMPLETE(T I , C I ) is satisfied and T I reveals the entire set of faults B . In this case, T I is a thorough test satisfying COMPLETE(T I , C I ) and represents an ideal test set. 2.2.3 Program Errors Any approach to testing is based on assumptions about the way program faults occur. Faults are due to two main reasons: • Faults occur due to our inadequate understanding of all conditions with which a program must deal. • Faults occur due to our failure to realize that certain combinations of conditions require special treatments. Goodenough and Gerhart classify program faults as follows: • Logic Fault: This class of faults means a program produces incorrect results independent of resources required. That is, the program fails because of the faults present in the program and not because of a lack of resources. Logic faults can be further split into three categories: Requirements fault: This means our failure to capture the real requirements of the customer. 2.2 THEORY OF GOODENOUGH AND GERHART 35 Design fault: This represents our failure to satisfy an understood requirement. Construction fault: This represents our failure to satisfy a design. Suppose that a design step says “Sort array A.” To sort the array with N elements, one may choose one of several sorting algorithms. Let for (i = 0; i < N; i++) { : } be the desired for loop construct to sort the array. If a programmer writes the for loop in the form for (i = 0; i <= N; i++){ : } then there is a construction error in the implementation. • Performance Fault: This class of faults leads to a failure of the program to produce expected results within specified or desired resource limitations. A thorough test must be able to detect faults arising from any of the above reasons. Test data selection criteria must reflect information derived from each stage of software development. Since each type of fault is manifested as an improper effect produced by an implementation, it is useful to categorize the sources of faults in implementation terms as follows: Missing Control Flow Paths: Intuitively, a control flow path, or simply a path, is a feasible sequence of instructions in a program. A path may be missing from a program if we fail to identify a condition and specify a path to handle that condition. An example of a missing path is our failure to test for a zero divisor before executing a division. If we fail to recognize that a divisor can take a zero value, then we will not include a piece of code to handle the special case. Thus, a certain desirable computation will be missing from the program. Inappropriate Path Selection: A program executes an inappropriate path if a condition is expressed incorrectly. In Figure 2.2, we show a desired behavior and an implemented behavior. Both the behaviors are identical except in the condition part of the if statement. The if part of the implemented behavior contains an additional condition B . It is easy to see that Desired behavior if (A) proc1(); else proc2(); Implemented behavior if (A && B) proc1(); else proc2(); Figure 2.2 Example of inappropriate path selection. 36 CHAPTER 2 THEORY OF PROGRAM TESTING both the desired part and the implemented part behave in the same way for all combinations of values of A and B except when A = 1 and B = 0. Inappropriate or Missing Action: There are three instances of this class of fault: • One may calculate a value using a method that does not necessarily give the correct result. For example, a desired expression is x = x × w, whereas it is wrongly written as x = x + w. These two expressions produce identical results for several combinations of x and w, such as x = 1.5 and w = 3, for example. • Failing to assign a value to a variable is an example of a missing action. • Calling a function with the wrong argument list is a kind of inappropriate action. The main danger due to an inappropriate or missing action is that the action is incorrect only under certain combinations of conditions. Therefore, one must do the following to find test data that reliably reveal errors: • Identify all the conditions relevant to the correct operation of a program. • Select test data to exercise all possible combinations of these conditions. The above idea of selecting test data leads us to define the following terms: Test Data: Test data are actual values from the input domain of a program that collectively satisfy some test selection criteria. Test Predicate: A test predicate is a description of conditions and combinations of conditions relevant to correct operation of the program: • Test predicates describe the aspects of a program that are to be tested. Test data cause these aspects to be tested. • Test predicates are the motivating force for test data selection. • Components of test predicates arise first and primarily from the specifications for a program. • Further conditions and predicates may be added as implementations are considered. 2.2.4 Conditions for Reliability A set of test predicates must at least satisfy the following conditions to have any chance of being reliable. These conditions are key to meaningful testing: • Every individual branching condition in a program must be represented by an equivalent condition in C . • Every potential termination condition in the program, for example, an overflow, must be represented by a condition in C . • Every condition relevant to the correct operation of the program that is implied by the specification and knowledge of the data structure of the program must be represented as a condition in C . 2.3 THEORY OF WEYUKER AND OSTRAND 37 2.2.5 Drawbacks of Theory Several difficulties prevent us from applying Goodenough and Gerhart’s theory of an ideal test as follows [18]: • The concepts of reliability and validity have been defined with respect to the entire input domain of a program. A criterion is guaranteed to be both reliable and valid if and only if it selects the entire domain as a single test. Since such exhaustive testing is impractical, one will have much difficulty in assessing the reliability and validity of a criterion. • The concepts of reliability and validity have been defined with respect to a program. A test selection criterion that is reliable and valid for one program may not be so for another program. The goodness of a test set should be independent of individual programs and the faults therein. • Neither validity nor reliability is preserved throughout the debugging process. In practice, as program failures are observed, the program is debugged to locate the faults, and the faults are generally fixed as soon as they are found. During this debugging phase, as the program changes, so does the idealness of a test set. This is because a fault that was revealed before debugging is no more revealed after debugging and fault fixing. Thus, properties of test selection criteria are not even “monotonic” in the sense of being either always gained or preserved or always lost or preserved. 2.3 THEORY OF WEYUKER AND OSTRAND A key problem in the theory of Goodenough and Gerhart is that the reliability and validity of a criterion depend upon the presence of faults in a program and their types. Weyuker and Ostrand [18] provide a modified theory in which the validity and reliability of test selection criteria are dependent only on the program specification, rather than a program. They propose the concept of a uniformly ideal test selection criterion for a given output specification. In the theory of Goodenough and Gerhart, implicit in the definitions of the predicates OK(d ) and SUCCESSFUL(T ) is a program P . By abbreviating SUCCESSFUL() as SUCC(), the two predicates are rewritten as follows: OK(P, d): Define a predicate OK(P, d ) which expresses the acceptability of result P (d ). Thus, OK(P , d ) = true if and only if P (d ) is an acceptable outcome of program P . SUCC(P, T): For a given T ⊆ D, T is a successful test for a program P , denoted by SUCC(P , T ), if and only if, ∀t ∈ T , OK(P , t). Thus, SUCC(T ) = true if and only if, ∀t ∈ T , OK(P , t). With the above definitions of OK(P, d ) and SUCC(P , T ), the concepts of uniformly valid criterion, uniformly reliable criterion, and uniformly ideal test selection are defined as follows. 38 CHAPTER 2 THEORY OF PROGRAM TESTING Uniformly Valid Criterion C : Criterion C is uniformly valid iff (∀P )[(∃d ∈ D)(¬OK(P , d)) ⇒ (∃T ⊆ D)(C(T ) & ¬SUCC(P , T ))] Uniformly Reliable Criterion C : Criterion C is uniformly reliable iff (∀P )(∀T1, ∀T2 ⊆ D)[(C(T1) & C(T2)) ⇒ (SUCC(P , T1) ⇔ SUCC(P , T2))] Uniformly Ideal Test Selection: A uniformly ideal test selection criterion for a given specification is both uniformly valid and uniformly reliable. The external quantifier (∀P) binding the free variable P in the definition of uniformly valid criterion C essentially means that the rest of the predicate holds for all programs P for a given output specification. Similarly, the external quantifier (∀P) binding the free variable P in the definition of uniformly reliable criterion C means that the rest of the predicate holds for all programs P for a given output specification. Since a uniformly ideal test selection criterion is defined over all programs for a given specification, it was intended to solve all the program-dependent difficulties in the definitions given by Goodenough and Gerhart. However, the concept of uniformly ideal test selection also has several flaws. For example, for any significant program there can be no uniformly ideal criterion that is not trivial in the sense of selecting the entire input domain D. A criterion C is said to be trivially valid if the union of all tests selected by C is D. Hence, the following theorems. Theorem. A criterion C is uniformly valid if and only if C is trivially valid. Proof. Obviously a trivially valid criterion is valid. Now we need to show that a criterion C which is not trivially valid cannot be uniformly valid for a given output specification. For any element d not included in any test of C , one can write a program which is incorrect for d and correct for D − {d }. Theorem. A criterion C is uniformly reliable if and only if C selects a single test set. Proof. If C selects only one test, it is obviously reliable for any program. Now, assume that C selects different tests T 1 and T 2 and that t ∈ T 1 but t ∈/ T 2. A program P exists which is correct with respect to test inputs in T 2 but incorrect on t. Thus, the two tests yield different results for P , and C is not reliable. Now, we can combine the above two theorems to have the following corollary. Corollary. A criterion C is uniformly valid and uniformly reliable if and only if C selects only the single test set T = D. 2.4 THEORY OF GOURLAY 39 An important implication of the above corollary is that uniform validity and uniform reliability lead to exhaustive testing —and exhaustive testing is considered to be impractical. Next, the above corollary is reformulated to state that irrespective of test selection criterion used and irrespective of tests selected, except the entire D, one can always write a program which can defeat the tests. A program P is said to defeat a test T if P passes T but fails on some other valid input. This is paraphrasing the well-known statement of Dijkstra that testing can only reveal the presence of errors, never their absence [16]. Reliability and validity of test selection criterion are ideal goals, and ideal goals are rarely achieved. It is useful to seek less ideal but usable goals. By settling for less ideal goals, we essentially accept the reality that correctness of large programs is not something that we strive to achieve. Weyuker and Ostrand [18] have introduced the concept of a revealing criterion with respect to a subdomain, where a subdomain S is a subset of the input domain D. A test selection criterion C is revealing for a subdomain S if whenever S contains an input which is processed incorrectly then every test set which satisfies C is unsuccessful. In other words, if any test selected by C is successfully executed, then every test in S produces correct output. A predicate called REVEALING(C , S ) captures the above idea in the following definition: REVEALING(C, S) iff (∃d ∈ S)(¬OK(d)) ⇒ (∀T ⊆ S)(C(T ) ⇒ ¬SUCC(T )) The key advantage in a revealing criterion is that it concerns only a subset of the input domain, rather than the entire input domain. By considering a subset of the input domain, programmers can concentrate on local errors. An important task in applying the idea of a revealing criterion is to partition the input domain into smaller subdomains, which is akin to partitioning a problem into a set of subproblems. However, partitioning a problem into subproblems has been recognized to be a difficult task. 2.4 THEORY OF GOURLAY An ideal goal in software development is to find out whether or not a program is correct, where a correct program is void of faults. Much research results have been reported in the field of program correctness. However, due to the highly constrained nature of program verification techniques, no developer makes any effort to prove the correctness of even small programs of, say, a few thousand lines, let alone large programs with millions of lines of code. Instead, testing is accepted in the industry as a practical way of finding faults in programs. The flip side of testing is that it cannot be used to settle the question of program correctness, which is the ideal goal. Even though testing cannot settle the program correctness issue, there is a need for a testing theory to enable us to compare the power of different test methods. To motivate a theoretical discussion of testing, we begin with an ideal process for software development, which consists of the following steps: 40 CHAPTER 2 THEORY OF PROGRAM TESTING • A customer and a development team specify the needs. • The development team takes the specification and attempts to write a pro- gram to meet the specification. • A test engineer takes both the specification and the program and selects a set of test cases. The test cases are based on the specification and the program. • The program is executed with the selected test data, and the test outcome is compared with the expected outcome. • The program is said to have faults if some tests fail. • One can say the program to be ready for use if it passes all the test cases. We focus on the selection of test cases and the interpretation of their results. We assume that the specification is correct, and the specification is the sole arbiter of the correctness of the program. The program is said to be correct if and only if it satisfies the specification. Gourlay’s testing theory [19] establishes a relationship between three sets of entities, namely, specifications, programs, and tests, and provides a basis for comparing different methods for selecting tests. 2.4.1 Few Definitions The set of all programs are denoted by P , the set of all specifications by S, and the set of all tests by T . Members of P will be denoted by p and q, members of S will be denoted by r and s, and members of T will be denoted by t and u. Uppercase letters will denote subsets of P , S, and T . For examples, p ∈ P ⊆ P and t ∈ T ⊆ T , where t denotes a single test case. The correctness of a program p with respect to a specification s will be denoted by p corr s. Given s, p, and t, the predicate p ok(t) s means that the result of testing p under t is judged successful by specification s. The reader may recall that T denotes a set of test cases, and p ok(T ) s is true if and only if p ok(t) s ∀t ∈ T . We must realize that if a program is correct, then it will never produce any unexpected outcome with respect to the specification. Thus, p corr s ⇒ p ok(t) s ∀t . Definition. A testing system is a collection < P, S, T , corr, ok >, where P , S, and T are arbitrary sets, corr ⊆ P × S, sets, ok ⊆ T × P × S, and ∀p∀s∀t (p corr s ⇒ p ok(t)s). Definition. Given a testing system < P, S, T , corr, ok > a new system < P, S, T , corr, ok > is called a set construction, where T is the set of all subsets of T , and where p ok (T )s ⇔ ∀t (t ∈ T ⇒ p ok(t)s). (The reader may recall that T is a member of T because T ⊆ T .) Theorem. < P, S, T , corr, ok >, a set construction on a testing system < P, S, T , corr, ok >, is itself a testing system. 2.4 THEORY OF GOURLAY 41 Proof. We need to show that p corr s ⇒ p ok (T ) s. Assume that p corr s holds. By assumption, the original system is a testing system. Thus, ∀t, p ok(t) s. If we choose a test set T , we know that, ∀t ∈ T , p ok(t) s. Therefore, p ok (T ) s holds. The set construction is interpreted as follows. A test consists of a number of trials of some sort, and success of the test as a whole depends on success of all the trials. In fact, this is the rule in testing practice, where a test engineer must run a program again and again on a variety of test data. Failure of any one run is enough to invalidate the program. Definition. Given a testing system < P , S, T , corr, ok > a new system < P , S, T , corr, ok > is called a choice construction, where T is the set of subsets of T , and where p ok (T ) s ⇔ ∃t(t ∈ T ∧p ok(t) s). (The reader may recall that T is a member of T because T ⊆ T .) Theorem. < P , S, T , corr, ok > , a choice construction on a testing system < P , S, T , corr, ok > , is itself a testing system. Proof. Similar to the previous theorem, we need to show that p corr s ⇒ p ok (T ) s. Assume that p corr s. Thus, ∀t, p ok(t) s. If we pick a nonempty test set T , we know that ∃t ∈ T such that p ok(t) s. Thus, we can write ∀T (T = φ ⇒ ∃t(t ∈ T ∧p ok(t) s)), and ∀T (T = φ ⇒ p ok (T ) s). The empty test set φ must be excluded from (T ’) because a testing system must include at least one test. The choice construction models the situation in which a test engineer is given a number of alternative ways of testing the program, all of which are assumed to be equivalent. Definition. A test method is a function M :P × S → T . That is, in the general case, a test method takes the specification S and an implementation program P and produces test cases. In practice, test methods are predominantly program dependent, specification dependent, or totally dependent on the expectations of customers, as explained below: • Program Dependent: In this case, T = M (P ), that is, test cases are derived solely based on the source code of a system. This is called white-box testing. Here, a test method has complete knowledge of the internal details of a program. However, from the viewpoint of practical testing, a white-box method is not generally applied to an entire program. One applies such a method to small units of a given large system. A unit refers to a function, procedure, method, and so on. A white-box method allows a test engineer to use the details of a program unit. Effective use of a program unit requires a thorough understanding of the unit. Therefore, white-box test methods are used by programmers to test their own code. 42 CHAPTER 2 THEORY OF PROGRAM TESTING • Specification Dependent: In this case, T = M (S ), that is, test cases are derived solely based on the specification of a system. This is called black-box testing. Here, a test method does not have access to the internal details of a program. Such a method uses information provided in the specification of a system. It is not unusual to use an entire specification in the generation of test cases because specifications are much smaller in size than their corresponding implementations. Black-box methods are generally used by the development team and an independent system test group. • Expectation Dependent: In practice, customers may generate test cases based on their expectations from the product at the time of taking delivery of the system. These test cases may include continuous-operation tests, usability tests, and so on. 2.4.2 Power of Test Methods A tester is concerned with the methods to produce test cases and to compare test methods so that they can identify an appropriate test method. Let M and N be two test methods. For M to be at least as good as N , we must have the situation that whenever N finds an error, so does M . In other words, whenever a program fails under a test case produced by method N , it will also fail under a test case produced by method M , with respect to the same specification. Therefore, F N ⊆ F M , where F N and F M are the sets of faults discovered by test sets produced by methods N and M , respectively. Let T M and T N be the set of test cases produced by methods M and N , respectively. Then, we need to follow two ways to compare their fault detection power. Case 1 : T M ⊇ T N . In this case, it is clear that method M is at least as good as method N . This is because method M produces test cases which reveal all the faults revealed by test cases produced by method N . This case is depicted in Figure 2.3a. Case 2 : T M and T N overlap, but T M T N . This case suggests that T M does not totally contain T N . To be able to compare their fault detection ability, we execute the program P under both sets of test cases, namely T M and T N . Let F M and F N be the sets of faults detected by test sets T M and T N , respectively. If F M ⊇ F N , then we say that method M is at least as good as method N . This situation is explained in Figure 2.3b. 2.5 ADEQUACY OF TESTING Testing gives designers and programmers much confidence in a software component or a complete product if it passes their test cases. Assume that a set of test cases 2.5 ADEQUACY OF TESTING 43 S N TM TN P M (a) S N TN Execute P FN P M TM FM (b) Figure 2.3 Different ways of comparing power of test methods: (a) produces all test cases produced by another method; (b) test sets have common elements. T has been designed to test a program P. We execute P with the test set T . If T reveals faults in P , then we modify the program in an attempt to fix those faults. At this stage, there may be a need to design some new test cases, because, for example, we may include a new procedure in the code. After modifying the code, we execute the program with the new test set. Thus, we execute the test-and-fix loop until no more faults are revealed by the updated test set. Now we face a dilemma as follows: Is P really fault free, or is T not good enough to reveal the remaining faults in P ? From testing we cannot conclude that P is fault free, since, as Dijkstra observed, testing can reveal the presence of faults, but not their absence. Therefore, if P passes T , we need to know that T is “good enough” or, in other words, that T is an adequate set of tests. It is important to evaluate the adequacy of T because if T is found to be not adequate, then more test cases need to be designed, as illustrated in Figure 2.4. Adequacy of T means whether or not T thoroughly tests P. Ideally, testing should be performed with an adequate test set T . Intuitively, the idea behind specifying a criterion for evaluating test adequacy is to know whether or not sufficient testing has been done. We will soon return to the idea of test adequacy. In the absence of test adequacy, developers will be forced to use ad hoc measures to decide when to stop testing. Some examples of ad hoc measures for stopping testing are as follows [13]: • Stop when the allocated time for testing expires. • Stop when it is time to release the product. • Stop when all the test cases execute without revealing faults. Figure 2.4 depicts two important notions concerning test design and evaluating test adequacy as follows: 44 CHAPTER 2 THEORY OF PROGRAM TESTING Design a set of test cases T to test a program P. Execute P with T. Does T reveal faults in P? No Yes Fix the faults in P. If there is a need, augment T with new test cases. Is T an adequate test set? No Augment T with new test cases. Yes Stop Figure 2.4 Context of applying test adequacy. • Adequacy of a test set T is evaluated after it is found that T reveals no more faults. One may argue: Why not design test cases to meet an adequacy criterion? However, it is important to design test cases independent of an adequacy criterion because the primary goal of testing is to locate errors, and, thus, test design should not be constrained by an adequacy criterion. An example of a test design criteria is as follows: Select test cases to execute all statements in a program at least once. However, the difficulty with such a test design criterion is that we may not be able to know whether every program statement can be executed. Thus, it is difficult to judge the adequacy of the test set selected thereby. Finally, since the goal of testing is to reveal faults, there is no point in evaluating the adequacy of the test set as long as faults are being revealed. • An adequate test set T does not say anything about the correctness of a program. A common understanding of correctness is that we have found and fixed all faults in a program to make it “correct.” However, in practice, it is not realistic—though very much desirable—to find and fix all faults in a program. Thus, on the one hand, an adequacy criterion may not try 2.6 LIMITATIONS OF TESTING 45 to aim for program correctness. On the other hand, a fault-free program should not turn any arbitrary test set T into an adequate test. The above two points tell us an important notion: that the adequacy of a test set be evaluated independent of test design processes for the programs under test. Intuitively, a test set T is said to be adequate if it covers all aspects of the actual computation performed by a program and all computations intended by its specification. Two practical methods for evaluating test adequacy are as follows: • Fault Seeding: This method refers to implanting a certain number of faults in a program P and executing P with test set T . If T reveals k percent of the implanted faults, we assume that T has revealed only k percent of the original faults. If 100% of the implanted faults have been revealed by T , we feel more confident about the adequacy of T . A thorough discussion of fault seeding can be found in Chapter 13. • Program Mutation: Given a program P , a mutation is a program obtained by making a small change to P. In the program mutation method, a series of mutations are obtained from P . Some of the mutations may contain faults and the rest are equivalent to P. A test set T is said to be adequate if it causes every faulty mutation to produce an unexpected outcome. A more thorough discussion of program mutation can be found in Chapter 3. 2.6 LIMITATIONS OF TESTING Ideally, all programs should be correct, that is, there is no fault in a program. Due to the impractical nature of proving even small programs to be correct, customers and software developers rely on the efficacy of testing. In this section, we introduce two main limitations of testing: • Testing means executing a program with a generally small, proper subset of the input domain of the program. A small, proper subset of the input domain is chosen because cost may not allow a much larger subset to be chosen, let alone the full input set. Testing with the full input set is known as exhaustive testing. Thus, the inherent need to test a program with a small subset of the input domain poses a fundamental limit on the efficacy of testing. The limit is in the form of our inability to extrapolate the correctness of results for a proper subset of the input domain to program correctness. In other words, even if a program passes a test set T , we cannot conclude that the program is correct. • Once we have selected a subset of the input domain, we are faced with the problem of verifying the correctness of the program outputs for individual test input. That is, a program output is examined to determine if the program performed correctly on the test input. The mechanism which verifies the correctness of a program output is known as an oracle. The concept of an oracle is discussed in detail in Chapter 9. Determining the correctness 46 CHAPTER 2 THEORY OF PROGRAM TESTING of a program output is not a trivial task. If either of the following two conditions hold, a program is considered nontestable [20]: There does not exist an oracle. It is too difficult to determine the correct output. If there is no mechanism to verify the correctness of a program output or it takes an extraordinary amount of time to verify an output, there is not much to be gained by running the test. 2.7 SUMMARY The ideal, abstract goal of testing is to reveal all faults in a software system without exhaustively testing the software. This idea is the basis of the concept of an ideal test developed by Goodenough and Gerhart [17]. An ideal test is supposed to be a small, proper subset of the entire input domain, and we should be able to extrapolate the results of an ideal test to program correctness. In other words, in an abstract sense, if a program passes all the tests in a carefully chosen test set, called an ideal test, we are in a position to claim that the program is correct. Coupled with the concept of an ideal test is a test selection criterion which allows us to pick members of an ideal test. A test selection criterion is characterized in terms of reliability and validity. A reliable criterion is one which selects test cases such that a program either passes all tests or fails all tests. On the other hand, a valid criterion is one which selects at least one test set which fails in case the program contains a fault. If a criterion is both valid and reliable, then any test selected by the criterion is an ideal test. The theory has a few drawback. First, the concepts of reliability and validity have been defined with respect to one program and its entire input domain. Second, neither reliability nor validity is preserved throughout the debugging phase of software development. Faults occur due to our inadequate understanding of all conditions that a program must deal with and our failure to realize that certain combinations of conditions require special treatments. Goodenough and Gerhart categorize faults into five categories: logic faults, requirement faults, design faults, construction faults, and performance faults. Weyuker and Ostrand [18] tried to eliminate the drawbacks of the theory of Goodenough and Gerhart by proposing the concept of a uniformly ideal test. The concept is defined with respect to all programs designed to satisfy a specification, rather than just one program—hence the concept of “uniformity” over all program instances for a given specification. Further, the idea of uniformity was extended to test selection criteria in the form of a uniformly reliable and uniformly valid criterion. However, their theory too is impractical because a uniformly valid and uniformly reliable criterion selects the entire input domain of a program, thereby causing exhaustive testing. Next, the idea of an ideal test was extended to a proper subset of the input domain called a subdomain, and the concept of a revealing criterion was defined. LITERATURE REVIEW 47 Though testing cannot settle the question of program correctness, different testing methods continue to be developed. For example, there are specificationbased testing methods and code-based testing methods. It is important to develop a theory to compare the power of different testing methods. Gourlay [19] put forward a theory to compare the power of testing methods based on their fault detection abilities. A software system undergoes multiple test–fix–retest cycles until, ideally, no more faults are revealed. Faults are fixed by modifying the code or adding new code to the system. At this stage there may be a need to design new test cases. When no more faults are revealed, we can conclude this way: either there is no fault in the program or the tests could not reveal the faults. Since we have no way to know the exact situation, it is useful to evaluate the adequacy of the test set. There is no need to evaluate the adequacy of tests so long as they reveal faults. Two practical ways of evaluating test adequacy are fault seeding and program mutation. Finally, we discussed two limitations of testing. The first limitation of testing is that it cannot settle the question of program correctness. In other words, by testing a program with a proper subset of the input domain and observing no fault, we cannot conclude that there are no remaining faults in the program. The second limitation of testing is that in several instances we do not know the expected output of a program. If for some inputs the expected output of a program is not known or it cannot be determined within a reasonable amount of time, then the program is called nontestable [20]. LITERATURE REVIEW Weyuker and Ostrand [18] have shown by examples how to construct revealing subdomains from source code. Their main example is the well-known triangle classification problem. The triangle classification problem is as follows. Let us consider three positive integers A, B , and C . The problem is to find whether the given integers represent the sides of an equilateral triangle, the sides of a scalene right triangle, and so on. Weyuker [13] has introduced the notion of program inference to capture the notion of test data adequacy. Essentially, program inference refers to deriving a program from its specification and a sample of its input–output behavior. On the other hand, the testing process begins with a specification S and a program P and selects input–output pairs that characterize every aspect of the actual computations performed by the program and the intended computations performed by the specification. Thus, program testing and program inference are thought of as inverse processes. A test set T is said to be adequate if T contains sufficient data to infer the computations defined by both S and P. However, Weyuker [13] explains that such an adequacy criterion is not pragmatically usable. Rather, the criterion can at best be used as a guide. By considering the difficulty in using the criterion, Weyuker defines two weaker adequacy criterion, namely program adequate and specification adequate. A test set T is said to be program adequate if it contains sufficient data to infer the computations defined by P . Similarly, the test set T is 48 CHAPTER 2 THEORY OF PROGRAM TESTING said to be specification adequate if it contains sufficient data to infer the computations defined by S . It is suggested that depending upon how test data are selected, one of the two criteria can be eased out. For example, if T is derived from S , then it is useful to evaluate if T is program adequate. Since T is selected from S , T is expected to contain sufficient data to infer the computations defined by S , and there is no need to evaluate T ’s specification adequacy. Similarly, if T is derived from P, it is useful to evaluate if T is specification adequate. The students are encouraged to read the article by Stuart H. Zweben and John S. Gourlay entitled “On the Adequacy of Weyuker’s Test Data Adequacy Axioms” [15] The authors raise the issue of what makes an axiomatic system as well as what constitutes a proper axiom. Weyuker responds to the criticism at the end of the article. Those students have never seen such a professional interchange; this is worth reading for this aspect alone. This article must be read along with the article by Elaine Weyuker entitled “Axiomatizing Software Test Data Adequacy” [12]. Martin David and Elaine Weyuker [9] present an interesting notion of distance between programs to study the concept of test data adequacy. Specifically, they equate adequacy with the capability of a test set to be able to successfully distinguish a program being tested from all programs that are sufficiently close to it and differ in input–output behavior from the given program. Weyuker [12, 21] proposed a set of properties to evaluate test data adequacy criteria. Some examples of adequacy criteria are to (i) ensure coverage of all branches in the program being tested and (ii) ensure that boundary values of all input data have been selected for the program under test. Parrish and Zweben [11] formalized those properties and identified dependencies within the set. They formalized the adequacy properties with respect to criteria that do not make use of the specification of the program under test. Frankl and Weyuker [10] compared the relative fault-detecting ability of a number of structural testing techniques, namely, data flow testing, mutation testing, and a condition coverage technique, to branch testing. They showed that the former three techniques are better than branch testing according to two probabilistic measures. A good survey on test adequacy is presented in an article by Hong Zhu, Patrick A. V. Hall, and John H. R. May entitled “Software Unit Test Coverage and Adequacy” [14]. In this article, various types of software test adequacy criteria proposed in the literature are surveyed followed by a summary of methods for comparison and assessment of adequacy criteria. REFERENCES 1. R. Gupta, M. J. Harrold, and M. L. Soffa. An Approach to Regression Testing Using Slicing. Paper presented at the IEEE-CS International Conference on Software Maintenance, Orlando, FL, November 1992, pp. 299–308. 2. G. Rothermel and M. Harrold. Analyzing Regression Test Selection Techniques. IEEE Transactions on Software Engineering, August 1996, pp. 529–551. REFERENCES 49 3. V. R. Basili and R. W. Selby. Comparing the Effectiveness of Software Testing. IEEE Transactions on Software Engineering, December 1987, pp. 1278– 1296. 4. W. E. Howden. Weak Mutation Testing and Completeness of Test Sets. IEEE Transactions on Software Engineering, July 1982, pp. 371– 379. 5. D. S. Rosenblum and E. J. Weyuker. Using Coverage Information to Predict the Cost-effectiveness of Regression Testing Strategies. IEEE Transactions on Software Engineering, March 1997, pp. 146– 156. 6. L. Baresi and M. Young. Test Oracles, Technical Report CIS-TR-01–02. University of Oregon, Department of Computer and Information Science, Eugene, OR, August 2002, pp. 1–55. 7. Q. Xie and A. M. Memon. Designing and Comparing Automated Test Oracles for Gui Based Software Applications. ACM Transactions on Software Engineering amd Methodology, February 2007, pp. 1–36. 8. G. Rothermel, R. Untch, C. Chu, and M. Harrold. Prioritizing Test Cases for Regression Testing. IEEE Transactions on Software Engineering, October 2001, pp. 929– 948. 9. M. Davis and E. J. Weyuker. Metric Space-Based Test-Data Adequacy Criteria. Computer Journal , January 1988, pp. 17–24. 10. P. G. Frankl and E. J. Weyuker. Provable Improvements on Branch Testing. IEEE Transactions on Software Engineering, October 1993, pp. 962–975. 11. A. Parrish and S. H. Zweben. Analysis and Refinement of Software Test Data Adequacy Properties. IEEE Transactions on Software Engineering, June 1991, pp. 565–581. 12. E. J. Weyuker. Axiomatizing Software Test Data Adequacy. IEEE Transactions on Software Engineering, December 1986, pp. 1128– 1138. 13. E. J. Weyuker. Assessing Test Data Adequacy through Program Inference. ACM Transactions on Programming Languages and Systems, October 1983, pp. 641–655. 14. H. Zhu, P. A. V. Hall, and J. H. R. May. Software Unit Test Coverage and Adequacy. ACM Computing Surveys, December 1997, pp. 366– 427. 15. S. H. Zweben and J. S. Gourlay. On the Adequacy of Weyuker’s Test Data Adequacy Axioms. IEEE Transactions on Software Engineering, April 1989, pp. 496–500. 16. E. W. Dijkstra. Notes on Structured Programming. In Structured Programming, O.-J. Dahl, E. W. Dijkstra, and C. A. R. Hoare, Eds. Academic, New York, 1972, pp. 1–81. 17. J. B. Goodenough and S. L. Gerhart. Toward a Theory of Test Data Selection. IEEE Transactions on Software Engineering, June 1975, pp. 26–37. 18. E. J. Weyuker and T. J. Ostrand. Theories of Program Testing and the Application of Revealing Subdomains. IEEE Transactions on Software Engineering, May 1980, pp. 236– 246. 19. J. S. Gourlay. A Mathematical Framework for the Investigation of Testing. IEEE Transactions on Software Engineering, November 1983, pp. 686–709. 20. E. J. Weyuker. On Testing Non-Testable Programs. Computer Journal , Vol. 25, No. 4, 1982, pp. 465– 470. 21. E. J. Weyuker. The Evaluation of Program-Based Software Test Data Adequacy Criteria. Communications of the ACM , June 1988, pp. 668– 675. Exercises 1. Explain the concept of an ideal test. 2. Explain the concept of a selection criterion in test design. 3. Explain the concepts of a valid and reliable criterion. 4. Explain five kinds of program faults. 5. What are the drawbacks of Goodenough and Gerhart’s theory of program testing? 50 CHAPTER 2 THEORY OF PROGRAM TESTING 6. Explain the concepts of a uniformly ideal test as well as the concepts of uniformly valid and uniformly reliable criteria. 7. Explain how two test methods can be compared. 8. Explain the need for evaluating test adequacy. 9. Explain two practical methods for assessing test data adequacy. 10. Explain the concept of a nontestable program. 3 CHAPTER Unit Testing Knowledge is of no value unless you put it into practice. — Anton Chekhov 3.1 CONCEPT OF UNIT TESTING In this chapter we consider the first level of testing, that is, unit testing. Unit testing refers to testing program units in isolation. However, there is no consensus on the definition of a unit. Some examples of commonly understood units are functions, procedures, or methods. Even a class in an object-oriented programming language can be considered as a program unit. Syntactically, a program unit is a piece of code, such as a function or method of class, that is invoked from outside the unit and that can invoke other program units. Moreover, a program unit is assumed to implement a well-defined function providing a certain level of abstraction to the implementation of higher level functions. The function performed by a program unit may not have a direct association with a system-level function. Thus, a program unit may be viewed as a piece of code implementing a “low”-level function. In this chapter, we use the terms unit and module interchangeably. Now, given that a program unit implements a function, it is only natural to test the unit before it is integrated with other units. Thus, a program unit is tested in isolation, that is, in a stand-alone manner. There are two reasons for testing a unit in a stand-alone manner. First, errors found during testing can be attributed to a specific unit so that it can be easily fixed. Moreover, unit testing removes dependencies on other program units. Second, during unit testing it is desirable to verify that each distinct execution of a program unit produces the expected result. In terms of code details, a distinct execution refers to a distinct path in the unit. Ideally, all possible—or as much as possible—distinct executions are to be considered during unit testing. This requires careful selection of input data for each distinct execution. A programmer has direct access to the input vector of the unit by executing a program unit in isolation. This direct access makes it easier to execute as many distinct paths as desirable or possible. If multiple units are put together for Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 51 52 CHAPTER 3 UNIT TESTING testing, then a programmer needs to generate test input with indirect relationship with the input vectors of several units under test. The said indirect relationship makes it difficult to control the execution of distinct paths in a chosen unit. Unit testing has a limited scope. A programmer will need to verify whether or not a code works correctly by performing unit-level testing. Intuitively, a programmer needs to test a unit as follows: • Execute every line of code. This is desirable because the programmer needs to know what happens when a line of code is executed. In the absence of such basic observations, surprises at a later stage can be expensive. • Execute every predicate in the unit to evaluate them to true and false separately. • Observe that the unit performs its intended function and ensure that it contains no known errors. In spite of the above tests, there is no guarantee that a satisfactorily tested unit is functionally correct from a systemwide perspective. Not everything pertinent to a unit can be tested in isolation because of the limitations of testing in isolation. This means that some errors in a program unit can only be found later, when the unit is integrated with other units in the integration testing and system testing phases. Even though it is not possible to find all errors in a program unit in isolation, it is still necessary to ensure that a unit performs satisfactorily before it is used by other program units. It serves no purpose to integrate an erroneous unit with other units for the following reasons: (i) many of the subsequent tests will be a waste of resources and (ii) finding the root causes of failures in an integrated system is more resource consuming. Unit testing is performed by the programmer who writes the program unit because the programmer is intimately familiar with the internal details of the unit. The objective for the programmer is to be satisfied that the unit works as expected. Since a programmer is supposed to construct a unit with no errors in it, a unit test is performed by him or her to their satisfaction in the beginning and to the satisfaction of other programmers when the unit is integrated with other units. This means that all programmers are accountable for the quality of their own work, which may include both new code and modifications to the existing code. The idea here is to push the quality concept down to the lowest level of the organization and empower each programmer to be responsible for his or her own quality. Therefore, it is in the best interest of the programmer to take preventive actions to minimize the number of defects in the code. The defects found during unit testing are internal to the software development group and are not reported up the personnel hierarchy to be counted in quality measurement metrics. The source code of a unit is not used for interfacing by other group members until the programmer completes unit testing and checks in the unit to the version control system. Unit testing is conducted in two complementary phases: • Static unit testing • Dynamic unit testing 3.2 STATIC UNIT TESTING 53 In static unit testing, a programmer does not execute the unit; instead, the code is examined over all possible behaviors that might arise during run time. Static unit testing is also known as non-execution-based unit testing, whereas dynamic unit testing is execution based. In static unit testing, the code of each unit is validated against requirements of the unit by reviewing the code. During the review process, potential issues are identified and resolved. For example, in the C programming language the two program-halting instructions are abort() and exit(). While the two are closely related, they have different effects as explained below: • Abort(): This means abnormal program termination. By default, a call to abort() results in a run time diagnostic and program self-destruction. The program destruction may or may not flush and close opened files or remove temporary files, depending on the implementation. • Exit(): This means graceful program termination. That is, the exit() call closes the opened files and returns a status code to the execution environment. Whether to use abort() or exit() depends on the context that can be easily detected and resolved during static unit testing. More issues caught earlier lead to fewer errors being identified in the dynamic test phase and result in fewer defects in shipped products. Moreover, performing static tests is less expensive than performing dynamic tests. Code review is one component of the defect minimization process and can help detect problems that are common to software development. After a round of code review, dynamic unit testing is conducted. In dynamic unit testing, a program unit is actually executed and its outcomes are observed. Dynamic unit testing means testing the code by actually running it. It may be noted that static unit testing is not an alternative to dynamic unit testing. A programmer performs both kinds of tests. In practice, partial dynamic unit testing is performed concurrently with static unit testing. If the entire dynamic unit testing has been performed and a static unit testing identifies significant problems, the dynamic unit testing must be repeated. As a result of this repetition, the development schedule may be affected. To minimize the probability of such an event, it is required that static unit testing be performed prior to the final dynamic unit testing. 3.2 STATIC UNIT TESTING Static unit testing is conducted as a part of a larger philosophical belief that a software product should undergo a phase of inspection and correction at each milestone in its life cycle. At a certain milestone, the product need not be in its final form. For example, completion of coding is a milestone, even though coding of all the units may not make the desired product. After coding, the next milestone is testing all or a substantial number of units forming the major components of the product. Thus, before units are individually tested by actually executing them, those are subject to usual review and correction as it is commonly understood. The idea behind review is to find the defects as close to their points of origin as possible so 54 CHAPTER 3 UNIT TESTING that those defects are eliminated with less effort, and the interim product contains fewer defects before the next task is undertaken. In static unit testing, code is reviewed by applying techniques commonly known as inspection and walkthrough. The original definition of inspection was coined by Michael Fagan [1] and that of walkthrough by Edward Yourdon [2]: • Inspection: It is a step-by-step peer group review of a work product, with each step checked against predetermined criteria. • Walkthrough: It is a review where the author leads the team through a manual or simulated execution of the product using predefined scenarios. Regardless of whether a review is called an inspection or a walkthrough, it is a systematic approach to examining source code in detail. The goal of such an exercise is to assess the quality of the software in question, not the quality of the process used to develop the product [3]. Reviews of this type are characterized by significant preparation by groups of designers and programmers with varying degree of interest in the software development project. Code examination can be time consuming. Moreover, no examination process is perfect. Examiners may take shortcuts, may not have adequate understanding of the product, and may accept a product which should not be accepted. Nonetheless, a well-designed code review process can find faults that may be missed by execution-based testing. The key to the success of code review is to divide and conquer, that is, having an examiner inspect small parts of the unit in isolation, while making sure of the following: (i) nothing is overlooked and (ii) the correctness of all examined parts of the module implies the correctness of the whole module. The decomposition of the review into discrete steps must assure that each step is simple enough that it can be carried out without detailed knowledge of the others. The objective of code review is to review the code, not to evaluate the author of the code. A clash may occur between the author of the code and the reviewers, and this may make the meetings unproductive. Therefore, code review must be planned and managed in a professional manner. There is a need for mutual respect, openness, trust, and sharing of expertise in the group. The general guidelines for performing code review consists of six steps as outlined in Figure 3.1: readiness, preparation, examination, rework, validation, and exit. The input to the readiness step is the criteria that must be satisfied before the start of the code review process, and the process produces two types of documents, a change request (CR) and a report. These steps and documents are explained in the following. Step 1: Readiness The author of the unit ensures that the unit under test is ready for review. A unit is said to be ready if it satisfies the following criteria: • Completeness: All the code relating to the unit to be reviewed must be available. This is because the reviewers are going to read the code and try to understand it. It is unproductive to review partially written code or code that is going to be significantly modified by the programmer. Criteria Readiness 3.2 STATIC UNIT TESTING 55 Preparation Examination Rework Change requests Validation Exit Figure 3.1 Steps in the code review process. Report • Minimal Functionality: The code must compile and link. Moreover, the code must have been tested to some extent to make sure that it performs its basic functionalities. • Readability: Since code review involves actual reading of code by other programmers, it is essential that the code is highly readable. Some code characteristics that enhance readability are proper formatting, using meaningful identifier names, straightforward use of programming language constructs, and an appropriate level of abstraction using function calls. In the absence of readability, the reviewers are likely to be discouraged from performing the task effectively. • Complexity: There is no need to schedule a group meeting to review straightforward code which can be easily reviewed by the programmer. The code to be reviewed must be of sufficient complexity to warrant group review. Here, complexity is a composite term referring to the number of conditional statements in the code, the number of input data elements of the unit, the number of output data elements produced by the unit, real-time processing of the code, and the number of other units with which the code communicates. • Requirements and Design Documents: The latest approved version of the low-level design specification or other appropriate descriptions 56 CHAPTER 3 UNIT TESTING TABLE 3.1 Hierarchy of System Documents Requirement: High-level marketing or product proposal. Functional specification: Software engineering response to the marketing proposal. High-level design: Overall system architecture. Low-level design: Detailed specification of the modules within the architecture. Programming: Coding of the modules. of program requirements (see Table 3.1) should be available. These documents help the reviewers in verifying whether or not the code under review implements the expected functionalities. If the low-level design document is available, it helps the reviewers in assessing whether or not the code appropriately implements the design. All the people involved in the review process are informed of the group review meeting schedule two or three days before the meeting. They are also given a copy of the work package for their perusal. Reviews are conducted in bursts of 1–2 hours. Longer meetings are less and less productive because of the limited attention span of human beings. The rate of code review is restricted to about 125 lines of code (in a high-level language) per hour. Reviewing complex code at a higher rate will result in just glossing over the code, thereby defeating the fundamental purpose of code review. The composition of the review group involves a number of people with different roles. These roles are explained as follows: • Moderator: A review meeting is chaired by the moderator. The mod- erator is a trained individual who guides the pace of the review process. The moderator selects the reviewers and schedules the review meetings. Myers suggests that the moderator be a member of a group from an unrelated project to preserve objectivity [4]. • Author: This is the person who has written the code to be reviewed