首页资源分类其它科学普及 > software_testing_and_quality_assurance_theory_and_practice

software_testing_and_quality_assurance_theory_and_practice

已有 451559个资源

下载专区

文档信息举报收藏

标    签:softwaretesting

分    享:

文档简介

software_testing_and_quality_assurance_theory_and_practice

文档预览

SOFTWARE TESTING AND QUALITY ASSURANCE Theory and Practice KSHIRASAGAR NAIK Department of Electrical and Computer Engineering University of Waterloo, Waterloo PRIYADARSHI TRIPATHY NEC Laboratories America, Inc. A JOHN WILEY & SONS, INC., PUBLICATION SOFTWARE TESTING AND QUALITY ASSURANCE SOFTWARE TESTING AND QUALITY ASSURANCE Theory and Practice KSHIRASAGAR NAIK Department of Electrical and Computer Engineering University of Waterloo, Waterloo PRIYADARSHI TRIPATHY NEC Laboratories America, Inc. A JOHN WILEY & SONS, INC., PUBLICATION Copyright © 2008 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Naik, Kshirasagar, 1959– Software testing and quality assurance / Kshirasagar Naik and Priyadarshi Tripathy. p. cm. Includes bibliographical references and index. ISBN 978-0-471-78911-6 (cloth) 1. Computer software—Testing. 2. Computer software—Quality control. I. Tripathy, Piyu, 1958–II. Title. QA76.76.T48N35 2008 005.14 — dc22 2008008331 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 To our parents Sukru and Teva Naik Kunjabihari and Surekha Tripathy CONTENTS Preface xvii List of Figures xxi List of Tables xxvii CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES 1 1.1 Quality Revolution 1 1.2 Software Quality 5 1.3 Role of Testing 7 1.4 Verification and Validation 7 1.5 Failure, Error, Fault, and Defect 9 1.6 Notion of Software Reliability 10 1.7 Objectives of Testing 10 1.8 What Is a Test Case? 11 1.9 Expected Outcome 12 1.10 Concept of Complete Testing 13 1.11 Central Issue in Testing 13 1.12 Testing Activities 14 1.13 Test Levels 16 1.14 Sources of Information for Test Case Selection 18 1.15 White-Box and Black-Box Testing 20 1.16 Test Planning and Design 21 1.17 Monitoring and Measuring Test Execution 22 1.18 Test Tools and Automation 24 1.19 Test Team Organization and Management 26 1.20 Outline of Book 27 References 28 Exercises 30 CHAPTER 2 THEORY OF PROGRAM TESTING 31 2.1 Basic Concepts in Testing Theory 31 2.2 Theory of Goodenough and Gerhart 32 2.2.1 Fundamental Concepts 32 2.2.2 Theory of Testing 34 2.2.3 Program Errors 34 2.2.4 Conditions for Reliability 36 2.2.5 Drawbacks of Theory 37 2.3 Theory of Weyuker and Ostrand 37 vii viii CONTENTS 2.4 Theory of Gourlay 39 2.4.1 Few Definitions 40 2.4.2 Power of Test Methods 42 2.5 Adequacy of Testing 42 2.6 Limitations of Testing 45 2.7 Summary 46 Literature Review 47 References 48 Exercises 49 CHAPTER 3 UNIT TESTING 51 3.1 Concept of Unit Testing 51 3.2 Static Unit Testing 53 3.3 Defect Prevention 60 3.4 Dynamic Unit Testing 62 3.5 Mutation Testing 65 3.6 Debugging 68 3.7 Unit Testing in eXtreme Programming 71 3.8 JUnit: Framework for Unit Testing 73 3.9 Tools for Unit Testing 76 3.10 Summary 81 Literature Review 82 References 84 Exercises 86 CHAPTER 4 CONTROL FLOW TESTING 88 4.1 Basic Idea 88 4.2 Outline of Control Flow Testing 89 4.3 Control Flow Graph 90 4.4 Paths in a Control Flow Graph 93 4.5 Path Selection Criteria 94 4.5.1 All-Path Coverage Criterion 96 4.5.2 Statement Coverage Criterion 97 4.5.3 Branch Coverage Criterion 98 4.5.4 Predicate Coverage Criterion 100 4.6 Generating Test Input 101 4.7 Examples of Test Data Selection 106 4.8 Containing Infeasible Paths 107 4.9 Summary 108 Literature Review 109 References 110 Exercises 111 CHAPTER 5 DATA FLOW TESTING 112 5.1 General Idea 112 5.2 Data Flow Anomaly 113 5.3 Overview of Dynamic Data Flow Testing 115 5.4 Data Flow Graph 116 5.5 Data Flow Terms 119 5.6 Data Flow Testing Criteria 121 5.7 Comparison of Data Flow Test Selection Criteria 124 5.8 Feasible Paths and Test Selection Criteria 125 5.9 Comparison of Testing Techniques 126 5.10 Summary 128 Literature Review 129 References 131 Exercises 132 CHAPTER 6 DOMAIN TESTING 6.1 Domain Error 135 6.2 Testing for Domain Errors 137 6.3 Sources of Domains 138 6.4 Types of Domain Errors 141 6.5 ON and OFF Points 144 6.6 Test Selection Criterion 146 6.7 Summary 154 Literature Review 155 References 156 Exercises 156 CHAPTER 7 SYSTEM INTEGRATION TESTING 7.1 Concept of Integration Testing 158 7.2 Different Types of Interfaces and Interface Errors 159 7.3 Granularity of System Integration Testing 163 7.4 System Integration Techniques 164 7.4.1 Incremental 164 7.4.2 Top Down 167 7.4.3 Bottom Up 171 7.4.4 Sandwich and Big Bang 173 7.5 Software and Hardware Integration 174 7.5.1 Hardware Design Verification Tests 174 7.5.2 Hardware and Software Compatibility Matrix 177 7.6 Test Plan for System Integration 180 7.7 Off-the-Shelf Component Integration 184 7.7.1 Off-the-Shelf Component Testing 185 7.7.2 Built-in Testing 186 7.8 Summary 187 Literature Review 188 References 189 Exercises 190 CHAPTER 8 SYSTEM TEST CATEGORIES 8.1 Taxonomy of System Tests 192 8.2 Basic Tests 194 8.2.1 Boot Tests 194 8.2.2 Upgrade/Downgrade Tests 195 CONTENTS ix 135 158 192 x CONTENTS 8.2.3 Light Emitting Diode Tests 195 8.2.4 Diagnostic Tests 195 8.2.5 Command Line Interface Tests 196 8.3 Functionality Tests 196 8.3.1 Communication Systems Tests 196 8.3.2 Module Tests 197 8.3.3 Logging and Tracing Tests 198 8.3.4 Element Management Systems Tests 198 8.3.5 Management Information Base Tests 202 8.3.6 Graphical User Interface Tests 202 8.3.7 Security Tests 203 8.3.8 Feature Tests 204 8.4 Robustness Tests 204 8.4.1 Boundary Value Tests 205 8.4.2 Power Cycling Tests 206 8.4.3 On-Line Insertion and Removal Tests 206 8.4.4 High-Availability Tests 206 8.4.5 Degraded Node Tests 207 8.5 Interoperability Tests 208 8.6 Performance Tests 209 8.7 Scalability Tests 210 8.8 Stress Tests 211 8.9 Load and Stability Tests 213 8.10 Reliability Tests 214 8.11 Regression Tests 214 8.12 Documentation Tests 215 8.13 Regulatory Tests 216 8.14 Summary 218 Literature Review 219 References 220 Exercises 221 CHAPTER 9 FUNCTIONAL TESTING 222 9.1 Functional Testing Concepts of Howden 222 9.1.1 Different Types of Variables 224 9.1.2 Test Vector 230 9.1.3 Testing a Function in Context 231 9.2 Complexity of Applying Functional Testing 232 9.3 Pairwise Testing 235 9.3.1 Orthogonal Array 236 9.3.2 In Parameter Order 240 9.4 Equivalence Class Partitioning 244 9.5 Boundary Value Analysis 246 9.6 Decision Tables 248 9.7 Random Testing 252 9.8 Error Guessing 255 9.9 Category Partition 256 9.10 Summary 258 Literature Review 260 References 261 Exercises 262 CHAPTER 10 TEST GENERATION FROM FSM MODELS 10.1 State-Oriented Model 265 10.2 Points of Control and Observation 269 10.3 Finite-State Machine 270 10.4 Test Generation from an FSM 273 10.5 Transition Tour Method 273 10.6 Testing with State Verification 277 10.7 Unique Input–Output Sequence 279 10.8 Distinguishing Sequence 284 10.9 Characterizing Sequence 287 10.10 Test Architectures 291 10.10.1 Local Architecture 292 10.10.2 Distributed Architecture 293 10.10.3 Coordinated Architecture 294 10.10.4 Remote Architecture 295 10.11 Testing and Test Control Notation Version 3 (TTCN-3) 295 10.11.1 Module 296 10.11.2 Data Declarations 296 10.11.3 Ports and Components 298 10.11.4 Test Case Verdicts 299 10.11.5 Test Case 300 10.12 Extended FSMs 302 10.13 Test Generation from EFSM Models 307 10.14 Additional Coverage Criteria for System Testing 313 10.15 Summary 315 Literature Review 316 References 317 Exercises 318 CHAPTER 11 SYSTEM TEST DESIGN 11.1 Test Design Factors 321 11.2 Requirement Identification 322 11.3 Characteristics of Testable Requirements 331 11.4 Test Objective Identification 334 11.5 Example 335 11.6 Modeling a Test Design Process 345 11.7 Modeling Test Results 347 11.8 Test Design Preparedness Metrics 349 11.9 Test Case Design Effectiveness 350 11.10 Summary 351 Literature Review 351 References 353 Exercises 353 CONTENTS xi 265 321 xii CONTENTS CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION 355 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 12.10 12.11 12.12 12.13 12.14 12.15 12.16 Structure of a System Test Plan 355 Introduction and Feature Description 356 Assumptions 357 Test Approach 357 Test Suite Structure 358 Test Environment 358 Test Execution Strategy 361 12.7.1 Multicycle System Test Strategy 362 12.7.2 Characterization of Test Cycles 362 12.7.3 Preparing for First Test Cycle 366 12.7.4 Selecting Test Cases for Final Test Cycle 369 12.7.5 Prioritization of Test Cases 371 12.7.6 Details of Three Test Cycles 372 Test Effort Estimation 377 12.8.1 Number of Test Cases 378 12.8.2 Test Case Creation Effort 384 12.8.3 Test Case Execution Effort 385 Scheduling and Test Milestones 387 System Test Automation 391 Evaluation and Selection of Test Automation Tools 392 Test Selection Guidelines for Automation 395 Characteristics of Automated Test Cases 397 Structure of an Automated Test Case 399 Test Automation Infrastructure 400 Summary 402 Literature Review 403 References 405 Exercises 406 CHAPTER 13 SYSTEM TEST EXECUTION 408 13.1 Basic Ideas 408 13.2 Modeling Defects 409 13.3 Preparedness to Start System Testing 415 13.4 Metrics for Tracking System Test 419 13.4.1 Metrics for Monitoring Test Execution 420 13.4.2 Test Execution Metric Examples 420 13.4.3 Metrics for Monitoring Defect Reports 423 13.4.4 Defect Report Metric Examples 425 13.5 Orthogonal Defect Classification 428 13.6 Defect Causal Analysis 431 13.7 Beta Testing 435 13.8 First Customer Shipment 437 13.9 System Test Report 438 13.10 Product Sustaining 439 13.11 Measuring Test Effectiveness 441 13.12 Summary 445 Literature Review 446 CONTENTS xiii References 447 Exercises 448 CHAPTER 14 ACCEPTANCE TESTING 450 14.1 Types of Acceptance Testing 450 14.2 Acceptance Criteria 451 14.3 Selection of Acceptance Criteria 461 14.4 Acceptance Test Plan 461 14.5 Acceptance Test Execution 463 14.6 Acceptance Test Report 464 14.7 Acceptance Testing in eXtreme Programming 466 14.8 Summary 467 Literature Review 468 References 468 Exercises 469 CHAPTER 15 SOFTWARE RELIABILITY 471 15.1 What Is Reliability? 471 15.1.1 Fault and Failure 472 15.1.2 Time 473 15.1.3 Time Interval between Failures 474 15.1.4 Counting Failures in Periodic Intervals 475 15.1.5 Failure Intensity 476 15.2 Definitions of Software Reliability 477 15.2.1 First Definition of Software Reliability 477 15.2.2 Second Definition of Software Reliability 478 15.2.3 Comparing the Definitions of Software Reliability 479 15.3 Factors Influencing Software Reliability 479 15.4 Applications of Software Reliability 481 15.4.1 Comparison of Software Engineering Technologies 481 15.4.2 Measuring the Progress of System Testing 481 15.4.3 Controlling the System in Operation 482 15.4.4 Better Insight into Software Development Process 482 15.5 Operational Profiles 482 15.5.1 Operation 483 15.5.2 Representation of Operational Profile 483 15.6 Reliability Models 486 15.7 Summary 491 Literature Review 492 References 494 Exercises 494 CHAPTER 16 TEST TEAM ORGANIZATION 496 16.1 Test Groups 496 16.1.1 Integration Test Group 496 16.1.2 System Test Group 497 16.2 Software Quality Assurance Group 499 16.3 System Test Team Hierarchy 500 xiv CONTENTS 16.4 Effective Staffing of Test Engineers 501 16.5 Recruiting Test Engineers 504 16.5.1 Job Requisition 504 16.5.2 Job Profiling 505 16.5.3 Screening Resumes 505 16.5.4 Coordinating an Interview Team 506 16.5.5 Interviewing 507 16.5.6 Making a Decision 511 16.6 Retaining Test Engineers 511 16.6.1 Career Path 511 16.6.2 Training 512 16.6.3 Reward System 513 16.7 Team Building 513 16.7.1 Expectations 513 16.7.2 Consistency 514 16.7.3 Information Sharing 514 16.7.4 Standardization 514 16.7.5 Test Environments 514 16.7.6 Recognitions 515 16.8 Summary 515 Literature Review 516 References 516 Exercises 517 CHAPTER 17 SOFTWARE QUALITY 519 17.1 Five Views of Software Quality 519 17.2 McCall’s Quality Factors and Criteria 523 17.2.1 Quality Factors 523 17.2.2 Quality Criteria 527 17.2.3 Relationship between Quality Factors and Criteria 527 17.2.4 Quality Metrics 530 17.3 ISO 9126 Quality Characteristics 530 17.4 ISO 9000:2000 Software Quality Standard 534 17.4.1 ISO 9000:2000 Fundamentals 535 17.4.2 ISO 9001:2000 Requirements 537 17.5 Summary 542 Literature Review 544 References 544 Exercises 545 CHAPTER 18 MATURITY MODELS 546 18.1 Basic Idea in Software Process 546 18.2 Capability Maturity Model 548 18.2.1 CMM Architecture 549 18.2.2 Five Levels of Maturity and Key Process Areas 550 18.2.3 Common Features of Key Practices 553 18.2.4 Application of CMM 553 18.2.5 Capability Maturity Model Integration (CMMI) 554 18.3 Test Process Improvement 555 18.4 Testing Maturity Model 568 18.5 Summary 578 Literature Review 578 References 579 Exercises 579 GLOSSARY INDEX CONTENTS xv 581 600 PREFACE karmany eva dhikaras te; ma phalesu kadachana; ma karmaphalahetur bhur; ma te sango stv akarmani. Your right is to work only; but never to the fruits thereof; may you not be motivated by the fruits of actions; nor let your attachment to be towards inaction. — Bhagavad Gita We have been witnessing tremendous growth in the software industry over the past 25 years. Software applications have proliferated from the original data processing and scientific computing domains into our daily lives in such a way that we do not realize that some kind of software executes when we do even something ordinary, such as making a phone call, starting a car, turning on a microwave oven, and making a debit card payment. The processes for producing software must meet two broad challenges. First, the processes must produce low-cost software in a short time so that corporations can stay competitive. Second, the processes must produce usable, dependable, and safe software; these attributes are commonly known as quality attributes. Software quality impacts a number of important factors in our daily lives, such as economy, personal and national security, health, and safety. Twenty-five years ago, testing accounted for about 50% of the total time and more than 50% of the total money expended in a software development project—and, the same is still true today. Those days the software industry was a much smaller one, and academia offered a single, comprehensive course entitled Software Engineering to educate undergraduate students in the nuts and bolts of software development. Although software testing has been a part of the classical software engineering literature for decades, the subject is seldom incorporated into the mainstream undergraduate curriculum. A few universities have started offering an option in software engineering comprising three specialized courses, namely, Requirements Specification, Software Design, and Testing and Quality Assurance. In addition, some universities have introduced full undergraduate and graduate degree programs in software engineering. Considering the impact of software quality, or the lack thereof, we observe that software testing education has not received its due place. Ideally, research should lead to the development of tools and methodologies to produce low-cost, high-quality software, and students should be educated in the testing fundamentals. In other words, software testing research should not be solely academic in nature but must strive to be practical for industry consumers. However, in practice, there xvii xviii PREFACE is a large gap between the testing skills needed in the industry and what are taught and researched in the universities. Our goal is to provide the students and the teachers with a set of well-rounded educational materials covering the fundamental developments in testing theory and common testing practices in the industry. We intend to provide the students with the “big picture” of testing and quality assurance, because software quality concepts are quite broad. There are different kinds of software systems with their own intricate characteristics. We have not tried to specifically address their testing challenges. Instead, we have presented testing theory and practice as broad stepping stones which will enable the students to understand and develop testing practices for more complex systems. We decided to write this book based on our teaching and industrial experiences in software testing and quality assurance. For the past 15 years, Sagar has been teaching software engineering and software testing on a regular basis, whereas Piyu has been performing hands-on testing and managing test groups for testing routers, switches, wireless data networks, storage networks, and intrusion prevention appliances. Our experiences have helped us in selecting and structuring the contents of this book to make it suitable as a textbook. Who Should Read This Book? We have written this book to introduce students and software professionals to the fundamental ideas in testing theory, testing techniques, testing practices, and quality assurance. Undergraduate students in software engineering, computer science, and computer engineering with no prior experience in the software industry will be introduced to the subject matter in a step-by-step manner. Practitioners too will benefit from the structured presentation and comprehensive nature of the materials. Graduate students can use the book as a reference resource. After reading the whole book, the reader will have a thorough understanding of the following topics: • Fundamentals of testing theory and concepts • Practices that support the production of quality software • Software testing techniques • Life-cycle models of requirements, defects, test cases, and test results • Process models for unit, integration, system, and acceptance testing • Building test teams, including recruiting and retaining test engineers • Quality models, capability maturity model, testing maturity model, and test process improvement model How Should This Book be Read? The purpose of this book is to teach how to do software testing. We present some essential background material in Chapter 1 and save the enunciation of software PREFACE xix quality questions to a later part of the book. It is difficult to intelligently discuss for beginners what software quality means until one has a firm sense of what software testing does. However, practitioners with much testing experience can jump to Chapter 17, entitled “Software Quality,” immediately after Chapter 1. There are three different ways to read this book depending upon someone’s interest. First, those who are exclusively interested in software testing concepts and want to apply the ideas should read Chapter 1 (“Basic Concepts and Preliminaries”), Chapter 3 (“Unit Testing”), Chapter 7 (“System Integration Testing”), and Chapters 8–14, related to system-level testing. Second, test managers interested in improving the test effectiveness of their teams can read Chapters 1, 3, 7, 8–14, 16 (“Test Team Organization”), 17 (“Software Quality”), and 18 (“Maturity Models”). Third, beginners should read the book from cover to cover. Notes for Instructors The book can be used as a text in an introductory course in software testing and quality assurance. One of the authors used the contents of this book in an undergraduate course entitled Software Testing and Quality Assurance for several years at the University of Waterloo. An introductory course in software testing can cover selected sections from most of the chapters except Chapter 16. For a course with more emphasis on testing techniques than on processes, we recommend to choose Chapters 1 (“Basic Concepts and Preliminaries”) to 15 (“Software Reliability”). When used as a supplementary text in a software engineering course, selected portions from the following chapters can help students imbibe the essential concepts in software testing: • Chapter 1: Basic Concepts and Preliminaries • Chapter 3: Unit Testing • Chapter 7: System Integration Testing • Chapter 8: System Test Category • Chapter 14: Acceptance Testing Supplementary materials for instructors are available at the following Wiley website: http:/www.wiley.com/sagar. Acknowledgments In preparing this book, we received much support from many people, including the publisher, our family members, and our friends and colleagues. The support has been in many different forms. First, we would like to thank our editors, namely, Anastasia Wasko, Val Moliere, Whitney A. Lesch, Paul Petralia, and Danielle Lacourciere who gave us much professional guidance and patiently answered our various queries. Our friend Dr. Alok Patnaik read the whole draft and made numerous suggestions to improve the presentation quality of the book; we thank him for xx PREFACE all his effort and encouragement. The second author, Piyu Tripathy, would like to thank his former colleagues at Nortel Networks, Cisco Systems, and Airvana Inc., and present colleagues at NEC Laboratories America. Finally, the support of our parents, parents-in-law, and partners deserve a special mention. I, Piyu Tripathy, would like to thank my dear wife Leena, who has taken many household and family duties off my hands to give me time that I needed to write this book. And I, Sagar Naik, would like to thank my loving wife Alaka for her invaluable support and for always being there for me. I would also like to thank my charming daughters, Monisha and Sameeksha, and exciting son, Siddharth, for their understanding while I am writing this book. I am grateful to my elder brother, Gajapati Naik, for all his support. We are very pleased that now we have more time for our families and friends. Kshirasagar Naik University of Waterloo Waterloo Priyadarshi Tripathy NEC Laboratories America, Inc. Princeton LIST OF FIGURES 1.1 Shewhart cycle 2 1.2 Ishikawa diagram 4 1.3 Examples of basic test cases 11 1.4 Example of a test case with a sequence of < input, expected outcome > 12 1.5 Subset of the input domain exercising a subset of the program behavior 14 1.6 Different activities in program testing 14 1.7 Development and testing phases in the V model 16 1.8 Regression testing at different software testing levels. (From ref. 41. © 2005 John Wiley & Sons.) 17 2.1 Executing a program with a subset of the input domain 32 2.2 Example of inappropriate path selection 35 2.3 Different ways of comparing power of test methods: (a) produces all test cases produced by another method; (b) test sets have common elements. 43 2.4 Context of applying test adequacy 44 3.1 Steps in the code review process 55 3.2 Dynamic unit test environment 63 3.3 Test-first process in XP. (From ref. 24. © 2005 IEEE.) 72 3.4 Sample pseudocode for performing unit testing 73 3.5 The assertTrue() assertion throws an exception 75 3.6 Example test suite 76 4.1 Process of generating test input data for control flow testing 90 4.2 Symbols in a CFG 91 4.3 Function to open three files 91 4.4 High-level CFG representation of openfiles(). The three nodes are numbered 1, 2, and 3. 92 4.5 Detailed CFG representation of openfiles(). The numbers 1–21 are the nodes 93 4.6 Function to compute average of selected integers in an array. This program is an adaptation of “Figure 2. A sample program” in ref. 10. (With permission from the Australian Computer Society.) 94 4.7 A CFG representation of ReturnAverage(). Numbers 1–13 are the nodes. 95 4.8 Dashed arrows represent the branches not covered by statement covering in Table 4.4 99 4.9 Partial CFG with (a) OR operation and (b) AND operations 100 4.10 Example of a path from Figure 4.7 102 4.11 Path predicate for path in Figure 4.10 102 4.12 Method in Java to explain symbolic substitution [11] 103 4.13 Path predicate expression for path in Figure 4.10 105 4.14 Another example of path from Figure 4.7 105 4.15 Path predicate expression for path shown in Figure 4.14 106 4.16 Input data satisfying constraints of Figure 4.13 106 xxi xxii LIST OF FIGURES 4.17 Binary search routine 111 5.1 Sequence of computations showing data flow anomaly 113 5.2 State transition diagram of a program variable. (From ref. 2. © 1979 IEEE.) 115 5.3 Definition and uses of variables 117 5.4 Data flow graph of ReturnAverage() example 118 5.5 Relationship among DF (data flow) testing criteria. (From ref. 4. © 1988 IEEE.) 125 5.6 Relationship among FDF (feasible data flow) testing criteria. (From ref. 4. © 1988 IEEE.) 127 5.7 Limitation of different fault detection techniques 128 5.8 Binary search routine 133 5.9 Modified binary search routine 133 6.1 Illustration of the concept of program domains 137 6.2 A function to explain program domains 139 6.3 Control flow graph representation of the function in Figure 6.2 139 6.4 Domains obtained from interpreted predicates in Figure 6.3 140 6.5 Predicates defining the TT domain in Figure 6.4 141 6.6 ON and OFF points 146 6.7 Boundary shift resulting in reduced domain (closed inequality) 147 6.8 Boundary shift resulting in enlarged domain (closed inequality) 149 6.9 Tilted boundary (closed inequality) 149 6.10 Closure error (closed inequality) 150 6.11 Boundary shift resulting in reduced domain (open inequality) 151 6.12 Boundary shift resulting in enlarged domain (open inequality) 152 6.13 Tilted boundary (open inequality) 153 6.14 Closure error (open inequality) 153 6.15 Equality border 154 6.16 Domains D1, D2 and D3 157 7.1 Module hierarchy with three levels and seven modules 168 7.2 Top-down integration of modules A and B 169 7.3 Top-down integration of modules A, B, and D 169 7.4 Top-down integration of modules A, B, D, and C 169 7.5 Top-down integration of modules A, B, C, D, and E 170 7.6 Top-down integration of modules A, B, C, D, E, and F 170 7.7 Top-down integration of modules A, B, C, D, E, F and G 170 7.8 Bottom-up integration of modules E, F, and G 171 7.9 Bottom-up integration of modules B, C, and D with E, F, and G 172 7.10 Bottom-up integration of module A with all others 172 7.11 Hardware ECO process 179 7.12 Software ECO process 180 7.13 Module hierarchy of software system 190 8.1 Types of system tests 193 8.2 Types of basic tests 194 8.3 Types of functionality tests 197 8.4 Types of robustness tests 205 8.5 Typical 1xEV-DO radio access network. (Courtesy of Airvana, Inc.) 206 9.1 Frequency selection box of Bluetooth specification 224 9.2 Part of form ON479 of T1 general—2001, published by the CCRA 227 LIST OF FIGURES xxiii 9.3 Functionally related variables 231 9.4 Function in context 232 9.5 (a) Obtaining output values from an input vector and (b) obtaining an input vector from an output value in functional testing 233 9.6 Functional testing in general 234 9.7 System S with three input variables 235 9.8 (a) Too many test inputs; (b) one input selected from each subdomain 244 9.9 Gold standard oracle 253 9.10 Parametric oracle 253 9.11 Statistical oracle 254 10.1 Spectrum of software systems 266 10.2 Data-dominated systems 266 10.3 Control-dominated systems 267 10.4 FSM model of dual-boot laptop computer 267 10.5 Interactions between system and its environment modeled as FSM 268 10.6 PCOs on a telephone 269 10.7 FSM model of a PBX 270 10.8 FSM model of PBX 271 10.9 Interaction of test sequence with SUT 274 10.10 Derived test case from transition tour 275 10.11 Conceptual model of test case with state verification 278 10.12 Finite-state machine G1 (From ref. 5. © 1997 IEEE.) 281 10.13 UIO tree for G1 in Figure 10.12. (From ref. 5. © 1997 IEEE.) 282 10.14 Identification of UIO sequences on UIO tree of Figure 10.13 283 10.15 Finite-state machine G2 286 10.16 Distinguishing sequence tree for G2 in Figure 10.15 286 10.17 FSM that does not possess distinguishing sequence. (From ref. 11. © 1994 IEEE.) 287 10.18 DS tree for FSM (Figure 10.17) 288 10.19 Abstraction of N-entity in OSI reference architecture 291 10.20 Abstract local test architecture 292 10.21 Abstract external test architecture 292 10.22 Local architecture 293 10.23 Distributed architecture 293 10.24 Coordinated architecture 294 10.25 Remote architecture 295 10.26 Structure of module in TTCN-3 297 10.27 Definitions of two subtypes 297 10.28 Parameterized template for constructing message to be sent 298 10.29 Parameterized template for constructing message to be received 298 10.30 Testing (a) square-root function (SRF) calculator and (b) port between tester and SRF calculator 299 10.31 Defining port type 300 10.32 Associating port with component 300 10.33 Test case for testing SRF calculator 301 10.34 Executing test case 302 10.35 Comparison of state transitions of FSM and EFSM 303 10.36 Controlled access to a door 304 10.37 SDL/GR door control system 305 xxiv LIST OF FIGURES 10.38 Door control behavior specification 306 10.39 Door control behavior specification 307 10.40 Transition tour from door control system of Figures 10.38 and 10.39 309 10.41 Testing door control system 309 10.42 Output and input behavior obtained from transition tour of Figure 10.40 310 10.43 Test behavior obtained by refining if part in Figure 10.42 310 10.44 Test behavior that can receive unexpected events (derived from Figure 10.43) 311 10.45 Core behavior of test case for testing door control system (derived from Figure 10.44) 312 10.46 User interface of ATM 314 10.47 Binding of buttons with user options 314 10.48 Binding of buttons with cash amount 315 10.49 FSM G 318 10.50 FSM H 318 10.51 FSM K 319 10.52 Nondeterministic FSM 319 11.1 State transition diagram of requirement 323 11.2 Test suite structure 336 11.3 Service interworking between FR and ATM services 337 11.4 Transformation of FR to ATM cell 338 11.5 FrAtm test suite structure 342 11.6 State transition diagram of a test case 345 11.7 State transition diagram of test case result 349 12.1 Concept of cycle-based test execution strategy 363 12.2 Gantt chart for FR–ATM service interworking test project 390 12.3 Broad criteria of test automation tool evaluation 393 12.4 Test selection guideline for automation 396 12.5 Characteristics of automated test cases 397 12.6 Six major steps in automated test case 399 12.7 Components of a automation infrastructure 401 13.1 State transition diagram representation of life cycle of defect 409 13.2 Projected execution of test cases on weekly basis in cumulative chart form 417 13.3 PAE metric of Bazooka (PE: projected execution; AE: actually executed) project 421 13.4 Pareto diagram for defect distribution shown in Table 13.12 431 13.5 Cause–effect diagram for DCA 434 15.1 Relationship between MTTR, MTTF, and MTBF 475 15.2 Graphical representation of operational profile of library information system 484 15.3 Failure intensity λ as function of cumulative failure μ (λ0 = 9 failures per unit time, ν0 = 500 failures, θ = 0.0075) 488 15.4 Failure intensity λ as function of execution time τ (λ0 = 9 failures per unit time, ν0 = 500 failures, θ = 0.0075) 490 15.5 Cumulative failure μ as function of execution time τ (λ0 = 9 failures per unit time, ν0 = 500 failures, θ = 0.0075) 490 16.1 Structure of test groups 498 16.2 Structure of software quality assurance group 499 16.3 System test team hierarchy 500 16.4 Six phases of effective recruiting process 505 LIST OF FIGURES xxv 16.5 System test organization as part of development 518 17.1 Relation between quality factors and quality criteria [6] 528 17.2 ISO 9126 sample quality model refines standard’s features into subcharacteristics. (From ref. 4. © 1996 IEEE.) 532 18.1 CMM structure. (From ref. 3. © 2005 John Wiley & Sons.) 549 18.2 SW-CMM maturity levels. (From ref. 3 © 2005 John Wiley & Sons.) 550 18.3 Five-level structure of TMM. (From ref. 5. © 2003 Springer.) 568 LIST OF TABLES 3.1 Hierarchy of System Documents 56 3.2 Code Review Checklist 58 3.3 McCabe Complexity Measure 79 4.1 Examples of Path in CFG of Figure 4.7 95 4.2 Input Domain of openfiles() 97 4.3 Inputs and Paths in openfiles() 97 4.4 Paths for Statement Coverage of CFG of Figure 4.7 98 4.5 Paths for Branch Coverage of CFG of Figure 4.7 99 4.6 Two Cases for Complete Statement and Branch Coverage of CFG of Figure 4.9a 101 4.7 Interpretation of Path Predicate of Path in Figure 4.10 104 4.8 Interpretation of Path Predicate of Path in Figure 4.14 105 4.9 Test Data for Statement and Branch Coverage 106 5.1 Def() and c-use() Sets of Nodes in Figure 5.4 120 5.2 Predicates and p-use() Set of Edges in Figure 5.4 121 6.1 Two Interpretations of Second if() Statement in Figure 6.2 140 6.2 Detection of Boundary Shift Resulting in Reduced Domain (Closed Inequality) 148 6.3 Detection of Boundary Shift Resulting in Enlarged Domain (Closed Inequality) 149 6.4 Detection of Boundary Tilt (Closed Inequality) 150 6.5 Detection of Closure Error (Closed Inequality) 151 6.6 Detection of Boundary Shift Resulting in Reduced Domain (Open Inequality) 151 6.7 Detection of Boundary Shift Resulting in Enlarged Domain (Open Inequality) 152 6.8 Detection of Boundary Tilt (Open Inequality) 153 6.9 Detection of Closure Error (Open Inequality) 154 7.1 Check-in Request Form 166 7.2 Example Software/Hardware Compatibility Matrix 178 7.3 Framework for SIT Plan 181 7.4 Framework for Entry Criteria to Start System Integration 182 7.5 Framework for System Integration Exit Criteria 182 8.1 EMS Functionalities 199 8.2 Regulatory Approval Bodies of Different Countries 217 9.1 Number of Special Values of Inputs to FBS Module of Figure 9.1 230 9.2 Input and Output Domains of Functions of P in Figure 9.6 234 9.3 Pairwise Test Cases for System S 236 9.4 L4(23) Orthogonal Array 236 9.5 Commonly Used Orthogonal Arrays 237 9.6 Various Values That Need to Be Tested in Combinations 238 xxvii xxviii LIST OF TABLES 9.7 L9(34) Orthogonal Array 239 9.8 L9(34) Orthogonal Array after Mapping Factors 239 9.9 Generated Test Cases after Mapping Left-Over Levels 240 9.10 Generated Test Cases to Cover Each Equivalence Class 246 9.11 Decision Table Comprising Set of Conditions and Effects 248 9.12 Pay Calculation Decision Table with Values for Each Rule 250 9.13 Pay Calculation Decision Table after Column Reduction 251 9.14 Decision Table for Payment Calculation 252 10.1 PCOs for Testing Telephone PBX 270 10.2 Set of States in FSM of Figure 10.8 272 10.3 Input and Output Sets in FSM of Figure 10.8 272 10.4 Transition Tours Covering All States in Figure 10.8 276 10.5 State Transitions Not Covered by Transition Tours of Table 10.4 277 10.6 Transition Tours Covering All State Transitions in Figure 10.8 277 10.7 UIO Sequences of Minimal Lengths Obtained from Figure 10.14 284 10.8 Examples of State Blocks 284 10.9 Outputs of FSM G2 in Response to Input Sequence 11 in Different States 287 10.10 Output Sequences Generated by FSM of Figure 10.17 as Response to W1 289 10.11 Output Sequences Generated by FSM of Figure 10.17 as Response to W2 289 10.12 Test Sequences for State Transition (D, A, a/x) of FSM in Figure 10.17 290 11.1 Coverage Matrix [Aij ] 322 11.2 Requirement Schema Field Summary 324 11.3 Engineering Change Document Information 329 11.4 Characteristics of Testable Functional Specifications 333 11.5 Mapping of FR QoS Parameters to ATM QoS Parameters 340 11.6 Test Case Schema Summary 346 11.7 Test Suite Schema Summary 348 11.8 Test Result Schema Summary 348 12.1 Outline of System Test Plan 356 12.2 Equipment Needed to be Procured 360 12.3 Entry Criteria for First System Test Cycle 368 12.4 Test Case Failure Counts to Initiate RCA in Test Cycle 1 374 12.5 Test Case Failure Counts to Initiate RCA in Test Cycle 2 375 12.6 Test Effort Estimation for FR–ATM PVC Service Interworking 379 12.7 Form for Computing Unadjusted Function Point 382 12.8 Factors Affecting Development Effort 382 12.9 Empirical Relationship between Function Points and LOC 383 12.10 Guidelines for Manual Test Case Creation Effort 384 12.11 Guidelines for Manual Test Case Execution Effort 386 12.12 Guidelines for Estimation of Effort to Manually Execute Regression Test Cases 386 12.13 Benefits of Automated Testing 391 13.1 States of Defect Modeled in Figure 13.1 410 13.2 Defect Schema Summary Fields 412 13.3 State Transitions to Five Possible Next States from Open State 413 13.4 Outline of Test Execution Working Document 416 13.5 EST Metric in Week 4 of Bazooka Project 422 13.6 EST Metric in Bazooka Monitored on Weekly Basis 423 LIST OF TABLES xxix 13.7 DAR Metric for Stinger Project 425 13.8 Weekly DRR Status for Stinger Test Project 426 13.9 Weekly OD on Priority Basis for Stinger Test Project 427 13.10 Weekly CD Observed by Different Groups for Stinger Test Project 427 13.11 ARD Metric for Bayonet 428 13.12 Sample Test Data of Chainsaw Test Project 430 13.13 Framework for Beta Release Criteria 436 13.14 Structure of Final System Test Report 438 13.15 Scale for Defect Age 443 13.16 Defect Injection versus Discovery on Project Boomerang 443 13.17 Number of Defects Weighted by Defect Age on Project Boomerang 444 13.18 ARD Metric for Test Project 448 13.19 Scale for PhAge 449 14.1 Outline of ATP 462 14.2 ACC Document Information 464 14.3 Structure of Acceptance Test Status Report 465 14.4 Structure of Acceptance Test Summary Report 466 15.1 Example of Operational Profile of Library Information System 484 17.1 McCall’s Quality Factors 524 17.2 Categorization of McCall’s Quality Factors 527 17.3 McCall’s Quality Criteria 529 18.1 Requirements for Different Maturity Levels 564 18.2 Test Maturity Matrix 566 1 CHAPTER Basic Concepts and Preliminaries Software is like entropy. It is difficult to grasp, weighs nothing, and obeys the second law of thermodynamics, i.e., it always increases. — Norman Ralph Augustine 1.1 QUALITY REVOLUTION People seek quality in every man-made artifact. Certainly, the concept of quality did not originate with software systems. Rather, the quality concept is likely to be as old as human endeavor to mass produce artifacts and objects of large size. In the past couple of decades a quality revolution, has been spreading fast throughout the world with the explosion of the Internet. Global competition, outsourcing, off-shoring, and increasing customer expectations have brought the concept of quality to the forefront. Developing quality products on tighter schedules is critical for a company to be successful in the new global economy. Traditionally, efforts to improve quality have centered around the end of the product development cycle by emphasizing the detection and correction of defects. On the contrary, the new approach to enhancing quality encompasses all phases of a product development process—from a requirements analysis to the final delivery of the product to the customer. Every step in the development process must be performed to the highest possible standard. An effective quality process must focus on [1]: • Paying much attention to customer’s requirements • Making efforts to continuously improve quality • Integrating measurement processes with product design and development • Pushing the quality concept down to the lowest level of the organization • Developing a system-level perspective with an emphasis on methodology and process • Eliminating waste through continuous improvement Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 1 2 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES A quality movement started in Japan during the 1940s and the 1950s by William Edwards Deming, Joseph M. Juran, and Kaoru Ishikawa. In circa 1947, W. Edwards Deming “visited India as well, then continued on to Japan, where he had been asked to join a statistical mission responsible for planning the 1951 Japanese census” [2], p. 8. During his said visit to Japan, Deming invited statisticians for a dinner meeting and told them how important they were and what they could do for Japan [3]. In March 1950, he returned to Japan at the invitation of Managing Director Kenichi Koyanagi of the Union of Japanese Scientists and Engineers (JUSE) to teach a course to Japanese researchers, workers, executives, and engineers on statistical quality control (SQC) methods. Statistical quality control is a discipline based on measurements and statistics. Decisions are made and plans are developed based on the collection and evaluation of actual data in the form of metrics, rather than intuition and experience. The SQC methods use seven basic quality management tools: Pareto analysis, cause-and-effect diagram, flow chart, trend chart, histogram, scatter diagram, and control chart [2]. In July 1950, Deming gave an eight-day seminar based on the Shewhart methods of statistical quality control [4, 5] for Japanese engineers and executives. He introduced the plan–do–check–act (PDCA) cycle in the seminar, which he called the Shewhart cycle (Figure 1.1). The Shewhart cycle illustrates the following activity sequence: setting goals, assigning them to measurable milestones, and assessing the progress against those milestones. Deming’s 1950 lecture notes formed the basis for a series of seminars on SQC methods sponsored by the JUSE and provided the criteria for Japan’s famed Deming Prize. Deming’s work has stimulated several different kinds of industries, such as those for radios, transistors, cameras, binoculars, sewing machines, and automobiles. Between circa 1950 and circa 1970, automobile industries in Japan, in particular Toyota Motor Corporation, came up with an innovative principle to compress the time period from customer order to banking payment, known as the “lean principle.” The objective was to minimize the consumption of resources that added no value to a product. The lean principle has been defined by the National Institute of Standards and Technology (NIST) Manufacturing Extension Partnership program [61] as “a systematic approach to identifying and eliminating waste through continuous improvement, flowing the product at the pull of the customer in pursuit of perfection,” p.1. It is commonly believed that lean principles were started in Japan by Taiichi Ohno of Toyota [7], but Henry Ford Act Plan PDCA Check Do Plan—Establish the objective and process to deliver the results. Do—Implement the plan and measure its performance. Check—Assess the measurements and report the results to decision makers. Act—Decide on changes needed to improve the process. Figure 1.1 Shewhart cycle. 1.1 QUALITY REVOLUTION 3 had been using parts of lean as early as circa 1920, as evidenced by the following quote (Henry Ford, 1926) [61], p.1: One of the noteworthy accomplishments in keeping the price of Ford products low is the gradual shortening of the production cycle. The longer an article is in the process of manufacture and the more it is moved about, the greater is its ultimate cost. This concept was popularized in the United States by a Massachusetts Institute of Technology (MIT) study of the movement from mass production toward production, as described in The Machine That Changed the World , by James P. Womack, Daniel T. Jones, and Daniel Roos, New York: Rawson and Associates, 1990. Lean thinking continues to spread to every country in the world, and leaders are adapting the principles beyond automobile manufacturing, to logistics and distribution, services, retail, health care, construction, maintenance, and software development [8]. Remark: Walter Andrew Shewhart was an American physicist, engineer, and statistician and is known as the father of statistical quality control. Shewhart worked at Bell Telephone Laboratories from its foundation in 1925 until his retirement in 1956 [9]. His work was summarized in his book Economic Control of Quality of Manufactured Product, published by McGraw-Hill in 1931. In 1938, his work came to the attention of physicist W. Edwards Deming, who developed some of Shewhart’s methodological proposals in Japan from 1950 onward and named his synthesis the Shewhart cycle. In 1954, Joseph M. Juran of the United States proposed raising the level of quality management from the manufacturing units to the entire organization. He stressed the importance of systems thinking that begins with product requirement, design, prototype testing, proper equipment operations, and accurate process feedback. Juran’s seminar also became a part of the JUSE’s educational programs [10]. Juran spurred the move from SQC to TQC (total quality control) in Japan. This included companywide activities and education in quality control (QC), audits, quality circle, and promotion of quality management principles. The term TQC was coined by an American, Armand V. Feigenbaum, in his 1951 book Quality Control Principles, Practice and Administration. It was republished in 2004 [11]. By 1968, Kaoru Ishikawa, one of the fathers of TQC in Japan, had outlined, as shown in the following, the key elements of TQC management [12]: • Quality comes first, not short-term profits. • The customer comes first, not the producer. • Decisions are based on facts and data. • Management is participatory and respectful of all employees. • Management is driven by cross-functional committees covering product planning, product design, purchasing, manufacturing, sales, marketing, and distribution. 4 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES Remark: A quality circle is a volunteer group of workers, usually members of the same department, who meet regularly to discuss the problems and make presentations to management with their ideas to overcome them. Quality circles were started in Japan in 1962 by Kaoru Ishikawa as another method of improving quality. The movement in Japan was coordinated by the JUSE. One of the innovative TQC methodologies developed in Japan is referred to as the Ishikawa or cause-and-effect diagram. Kaoru Ishikawa found from statistical data that dispersion in product quality came from four common causes, namely materials, machines, methods, and measurements, known as the 4 Ms (Figure 1.2). The bold horizontal arrow points to quality, whereas the diagonal arrows in Figure 1.2 are probable causes having an effect on the quality. Materials often differ when sources of supply or size requirements vary. Machines, or equipment, also function differently depending on variations in their parts, and they operate optimally for only part of the time. Methods, or processes, cause even greater variations due to lack of training and poor handwritten instructions. Finally, measurements also vary due to outdated equipment and improper calibration. Variations in the 4 Ms parameters have an effect on the quality of a product. The Ishikawa diagram has influenced Japanese firms to focus their quality control attention on the improvement of materials, machines, methods, and measurements. The total-quality movement in Japan has led to pervasive top-management involvement. Many companies in Japan have extensive documentation of their quality activities. Senior executives in the United States either did not believe quality mattered or did not know where to begin until the National Broadcasting Corporation (NBC), an America television network, broadcast the documentary “If Japan Can . . . Why Can’t We?” at 9:30 P.M. on June 24, 1980 [2]. The documentary was produced by Clare Crawford-Mason and was narrated by Lloyd Dobyns. Fifteen minutes of the broadcast was devoted to Dr. Deming and his work. After the Materials Machines Methods Measurements Causes Figure 1.2 Ishikawa diagram. Quality Effect 1.2 SOFTWARE QUALITY 5 broadcast, many executives and government leaders realized that a renewed emphasis on quality was no longer an option for American companies but a necessity for doing business in an ever-expanding and more demanding competitive world market. Ford Motor Company and General Motors immediately adopted Deming’s SQC methodology into their manufacturing process. Other companies such as Dow Chemical and the Hughes Aircraft followed suit. Ishikawa’s TQC management philosophy gained popularity in the United States. Further, the spurred emphasis on quality in American manufacturing companies led the U.S. Congress to establish the Malcolm Baldrige National Quality Award—similar to the Deming Prize in Japan—in 1987 to recognize organizations for their achievements in quality and to raise awareness about the importance of quality excellence as a competitive edge [6]. In the Baldrige National Award, quality is viewed as something defined by the customer and thus the focus is on customer-driven quality. On the other hand, in the Deming Prize, quality is viewed as something defined by the producers by conforming to specifications and thus the focus is on conformance to specifications . Remark: Malcolm Baldrige was U.S. Secretary of Commerce from 1981 until his death in a rodeo accident in July 1987. Baldrige was a proponent of quality management as a key to his country’s prosperity and long-term strength. He took a personal interest in the quality improvement act, which was eventually named after him, and helped draft one of its early versions. In recognition of his contributions, Congress named the award in his honor. Traditionally, the TQC and lean concepts are applied in the manufacturing process. The software development process uses these concepts as another tool to guide the production of quality software [13]. These concepts provides a framework to discuss software production issues. The software capability maturity model (CMM) [14] architecture developed at the Software Engineering Institute is based on the principles of product quality that have been developed by W. Edwards Deming [15], Joseph M. Juran [16], Kaoru Ishikawa [12], and Philip Crosby [17]. 1.2 SOFTWARE QUALITY The question “What is software quality?” evokes many different answers. Quality is a complex concept—it means different things to different people, and it is highly context dependent. Garvin [18] has analyzed how software quality is perceived in different ways in different domains, such as philosophy, economics, marketing, and management. Kitchenham and Pfleeger’s article [60] on software quality gives a succinct exposition of software quality. They discuss five views of quality in a comprehensive manner as follows: 1. Transcendental View : It envisages quality as something that can be recognized but is difficult to define. The transcendental view is not specific to software quality alone but has been applied in other complex areas 6 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES of everyday life. For example, In 1964, Justice Potter Stewart of the U.S. Supreme Court, while ruling on the case Jacobellis v. Ohio, 378 U.S. 184 (1964), which involved the state of Ohio banning the French film Les Amants (“The Lovers”) on the ground of pornography, wrote “I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that” (emphasis added). 2. User View : It perceives quality as fitness for purpose. According to this view, while evaluating the quality of a product, one must ask the key question: “Does the product satisfy user needs and expectations?” 3. Manufacturing View : Here quality is understood as conformance to the specification. The quality level of a product is determined by the extent to which the product meets its specifications. 4. Product View : In this case, quality is viewed as tied to the inherent characteristics of the product. A product’s inherent characteristics, that is, internal qualities, determine its external qualities. 5. Value-Based View : Quality, in this perspective, depends on the amount a customer is willing to pay for it. The concept of software quality and the efforts to understand it in terms of measurable quantities date back to the mid-1970s. McCall, Richards, and Walters [19] were the first to study the concept of software quality in terms of quality factors and quality criteria. A quality factor represents a behavioral characteristic of a system. Some examples of high-level quality factors are correctness, reliability, efficiency, testability, maintainability, and reusability. A quality criterion is an attribute of a quality factor that is related to software development. For example, modularity is an attribute of the architecture of a software system. A highly modular software allows designers to put cohesive components in one module, thereby improving the maintainability of the system. Various software quality models have been proposed to define quality and its related attributes. The most influential ones are the ISO 9126 [20–22] and the CMM [14]. The ISO 9126 quality model was developed by an expert group under the aegis of the International Organization for Standardization (ISO). The document ISO 9126 defines six broad, independent categories of quality characteristics: functionality, reliability, usability, efficiency, maintainability, and portability. The CMM was developed by the Software Engineering Institute (SEI) at Carnegie Mellon University. In the CMM framework, a development process is evaluated on a scale of 1–5, commonly known as level 1 through level 5. For example, level 1 is called the initial level, whereas level 5—optimized—is the highest level of process maturity. In the field of software testing, there are two well-known process models, namely, the test process improvement (TPI) model [23] and the test maturity Model (TMM) [24]. These two models allow an organization to assess the current state 1.4 VERIFICATION AND VALIDATION 7 of their software testing processes, identify the next logical area for improvement, and recommend an action plan for test process improvement. 1.3 ROLE OF TESTING Testing plays an important role in achieving and assessing the quality of a software product [25]. On the one hand, we improve the quality of the products as we repeat a test–find defects–fix cycle during development. On the other hand, we assess how good our system is when we perform system-level tests before releasing a product. Thus, as Friedman and Voas [26] have succinctly described, software testing is a verification process for software quality assessment and improvement. Generally speaking, the activities for software quality assessment can be divided into two broad categories, namely, static analysis and dynamic analysis. • Static Analysis: As the term “static” suggests, it is based on the examination of a number of documents, namely requirements documents, software models, design documents, and source code. Traditional static analysis includes code review, inspection, walk-through, algorithm analysis, and proof of correctness. It does not involve actual execution of the code under development. Instead, it examines code and reasons over all possible behaviors that might arise during run time. Compiler optimizations are standard static analysis. • Dynamic Analysis: Dynamic analysis of a software system involves actual program execution in order to expose possible program failures. The behavioral and performance properties of the program are also observed. Programs are executed with both typical and carefully chosen input values. Often, the input set of a program can be impractically large. However, for practical considerations, a finite subset of the input set can be selected. Therefore, in testing, we observe some representative program behaviors and reach a conclusion about the quality of the system. Careful selection of a finite test set is crucial to reaching a reliable conclusion. By performing static and dynamic analyses, practitioners want to identify as many faults as possible so that those faults are fixed at an early stage of the software development. Static analysis and dynamic analysis are complementary in nature, and for better effectiveness, both must be performed repeatedly and alternated. Practitioners and researchers need to remove the boundaries between static and dynamic analysis and create a hybrid analysis that combines the strengths of both approaches [27]. 1.4 VERIFICATION AND VALIDATION Two similar concepts related to software testing frequently used by practitioners are verification and validation. Both concepts are abstract in nature, and each can be 8 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES realized by a set of concrete, executable activities. The two concepts are explained as follows: • Verification: This kind of activity helps us in evaluating a software system by determining whether the product of a given development phase satisfies the requirements established before the start of that phase. One may note that a product can be an intermediate product, such as requirement specification, design specification, code, user manual, or even the final product. Activities that check the correctness of a development phase are called verification activities. • Validation: Activities of this kind help us in confirming that a product meets its intended use. Validation activities aim at confirming that a product meets its customer’s expectations. In other words, validation activities focus on the final product, which is extensively tested from the customer point of view. Validation establishes whether the product meets overall expectations of the users. Late execution of validation activities is often risky by leading to higher development cost. Validation activities may be executed at early stages of the software development cycle [28]. An example of early execution of validation activities can be found in the eXtreme Programming (XP) software development methodology. In the XP methodology, the customer closely interacts with the software development group and conducts acceptance tests during each development iteration [29]. The verification process establishes the correspondence of an implementation phase of the software development process with its specification, whereas validation establishes the correspondence between a system and users’ expectations. One can compare verification and validation as follows: • Verification activities aim at confirming that one is building the product correctly, whereas validation activities aim at confirming that one is building the correct product [30]. • Verification activities review interim work products, such as requirements specification, design, code, and user manual, during a project life cycle to ensure their quality. The quality attributes sought by verification activities are consistency, completeness, and correctness at each major stage of system development. On the other hand, validation is performed toward the end of system development to determine if the entire system meets the customer’s needs and expectations. • Verification activities are performed on interim products by applying mostly static analysis techniques, such as inspection, walkthrough, and reviews, and using standards and checklists. Verification can also include dynamic analysis, such as actual program execution. On the other hand, validation is performed on the entire system by actually running the system in its real environment and using a variety of tests. 1.5 FAILURE, ERROR, FAULT, AND DEFECT 9 1.5 FAILURE, ERROR, FAULT, AND DEFECT In the literature on software testing, one can find references to the terms failure, error, fault, and defect. Although their meanings are related, there are important distinctions between these four concepts. In the following, we present first three terms as they are understood in the fault-tolerant computing community: • Failure: A failure is said to occur whenever the external behavior of a system does not conform to that prescribed in the system specification. • Error: An error is a state of the system. In the absence of any corrective action by the system, an error state could lead to a failure which would not be attributed to any event subsequent to the error. • Fault: A fault is the adjudged cause of an error. A fault may remain undetected for a long time, until some event activates it. When an event activates a fault, it first brings the program into an intermediate error state. If computation is allowed to proceed from an error state without any corrective action, the program eventually causes a failure. As an aside, in fault-tolerant computing, corrective actions can be taken to take a program out of an error state into a desirable state such that subsequent computation does not eventually lead to a failure. The process of failure manifestation can therefore be succinctly represented as a behavior chain [31] as follows: fault → error → failure. The behavior chain can iterate for a while, that is, failure of one component can lead to a failure of another interacting component. The above definition of failure assumes that the given specification is acceptable to the customer. However, if the specification does not meet the expectations of the customer, then, of course, even a fault-free implementation fails to satisfy the customer. It is a difficult task to give a precise definition of fault, error, or failure of software, because of the “human factor” involved in the overall acceptance of a system. In an article titled “What Is Software Failure” [32], Ram Chillarege commented that in modern software business software failure means “the customer’s expectation has not been met and/or the customer is unable to do useful work with product,” p. 354. Roderick Rees [33] extended Chillarege’s comments of software failure by pointing out that “failure is a matter of function only [and is thus] related to purpose, not to whether an item is physically intact or not” (p. 163). To substantiate this, Behrooz Parhami [34] provided three interesting examples to show the relevance of such a view point in wider context. One of the examples is quoted here (p. 451): Consider a small organization. Defects in the organization’s staff promotion policies can cause improper promotions, viewed as faults. The resulting ineptitudes & dissatisfactions are errors in the organization’s state. The organization’s personnel or departments probably begin to malfunction as result of the errors, in turn causing an overall degradation of performance. The end result can be the organization’s failure to achieve its goal. There is a fine difference between defects and faults in the above example, that is, execution of a defective policy may lead to a faulty promotion. In a software 10 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES context, a software system may be defective due to design issues; certain system states will expose a defect, resulting in the development of faults defined as incorrect signal values or decisions within the system. In industry, the term defect is widely used, whereas among researchers the term fault is more prevalent. For all practical purpose, the two terms are synonymous. In this book, we use the two terms interchangeably as required. 1.6 NOTION OF SOFTWARE RELIABILITY No matter how many times we run the test–find faults–fix cycle during software development, some faults are likely to escape our attention, and these will eventually surface at the customer site. Therefore, a quantitative measure that is useful in assessing the quality of a software is its reliability [35]. Software reliability is defined as the probability of failure-free operation of a software system for a specified time in a specified environment. The level of reliability of a system depends on those inputs that cause failures to be observed by the end users. Software reliability can be estimated via random testing, as suggested by Hamlet [36]. Since the notion of reliability is specific to a “specified environment,” test data must be drawn from the input distribution to closely resemble the future usage of the system. Capturing the future usage pattern of a system in a general sense is described in a form called the operational profile. The concept of operational profile of a system was pioneered by John D. Musa at AT&T Bell Laboratories between the 1970s and the 1990s [37, 38]. 1.7 OBJECTIVES OF TESTING The stakeholders in a test process are the programmers, the test engineers, the project managers, and the customers. A stakeholder is a person or an organization who influences a system’s behaviors or who is impacted by that system [39]. Different stakeholders view a test process from different perspectives as explained below: • It does work: While implementing a program unit, the programmer may want to test whether or not the unit works in normal circumstances. The programmer gets much confidence if the unit works to his or her satisfaction. The same idea applies to an entire system as well—once a system has been integrated, the developers may want to test whether or not the system performs the basic functions. Here, for the psychological reason, the objective of testing is to show that the system works, rather than it does not work. • It does not work: Once the programmer (or the development team) is satisfied that a unit (or the system) works to a certain degree, more tests are conducted with the objective of finding faults in the unit (or the system). Here, the idea is to try to make the unit (or the system) fail. 1.8 WHAT IS A TEST CASE? 11 • Reduce the risk of failure: Most of the complex software systems contain faults, which cause the system to fail from time to time. This concept of “failing from time to time” gives rise to the notion of failure rate. As faults are discovered and fixed while performing more and more tests, the failure rate of a system generally decreases. Thus, a higher level objective of performing tests is to bring down the risk of failing to an acceptable level. • Reduce the cost of testing: The different kinds of costs associated with a test process include the cost of designing, maintaining, and executing test cases, the cost of analyzing the result of executing each test case, the cost of documenting the test cases, and the cost of actually executing the system and documenting it. Therefore, the less the number of test cases designed, the less will be the associated cost of testing. However, producing a small number of arbitrary test cases is not a good way of saving cost. The highest level of objective of performing tests is to produce low-risk software with fewer number of test cases. This idea leads us to the concept of effectiveness of test cases. Test engineers must therefore judiciously select fewer, effective test cases. 1.8 WHAT IS A TEST CASE? In its most basic form, a test case is a simple pair of < input, expected outcome >. If a program under test is expected to compute the square root of nonnegative numbers, then four examples of test cases are as shown in Figure 1.3. In stateless systems, where the outcome depends solely on the current input, test cases are very simple in structure, as shown in Figure 1.3. A program to compute the square root of nonnegative numbers is an example of a stateless system. A compiler for the C programming language is another example of a stateless system. A compiler is a stateless system because to compile a program it does not need to know about the programs it compiled previously. In state-oriented systems, where the program outcome depends both on the current state of the system and the current input, a test case may consist of a TB1: < 0, 0 >, TB2: < 25, 5 >, TB3: < 40, 6.3245553 >, TB4: < 100.5, 10.024968 >. Figure 1.3 Examples of basic test cases. 12 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES TS1: < check balance, $500.00 >, < withdraw, ‘‘amount?’’ >, < $200.00, ‘‘$200.00’’ >, < check balance, $300.00 > . Figure 1.4 Example of a test case with a sequence of < input, expected outcome >. sequence of < input, expected outcome > pairs. A telephone switching system and an automated teller machine (ATM) are examples of state-oriented systems. For an ATM machine, a test case for testing the withdraw function is shown in Figure 1.4. Here, we assume that the user has already entered validated inputs, such as the cash card and the personal identification number (PIN). In the test case TS1, “check balance” and “withdraw” in the first, second, and fourth tuples represent the pressing of the appropriate keys on the ATM keypad. It is assumed that the user account has $500.00 on it, and the user wants to withdraw an amount of $200.00. The expected outcome “$200.00” in the third tuple represents the cash dispensed by the ATM. After the withdrawal operation, the user makes sure that the remaining balance is $300.00. For state-oriented systems, most of the test cases include some form of decision and timing in providing input to the system. A test case may include loops and timers, which we do not show at this moment. 1.9 EXPECTED OUTCOME An outcome of program execution is a complex entity that may include the following: • Values produced by the program: Outputs for local observation (integer, text, audio, image) Outputs (messages) for remote storage, manipulation, or observation • State change: State change of the program State change of the database (due to add, delete, and update operations) • A sequence or set of values which must be interpreted together for the outcome to be valid An important concept in test design is the concept of an oracle. An oracle is any entity—program, process, human expert, or body of data—that tells us the expected outcome of a particular test or set of tests [40]. A test case is meaningful only if it is possible to decide on the acceptability of the result produced by the program under test. Ideally, the expected outcome of a test should be computed while designing the test case. In other words, the test outcome is computed before the program is 1.11 CENTRAL ISSUE IN TESTING 13 executed with the selected test input. The idea here is that one should be able to compute the expected outcome from an understanding of the program’s requirements. Precomputation of the expected outcome will eliminate any implementation bias in case the test case is designed by the developer. In exceptional cases, where it is extremely difficult, impossible, or even undesirable to compute a single expected outcome, one should identify expected outcomes by examining the actual test outcomes, as explained in the following: 1. Execute the program with the selected input. 2. Observe the actual outcome of program execution. 3. Verify that the actual outcome is the expected outcome. 4. Use the verified actual outcome as the expected outcome in subsequent runs of the test case. 1.10 CONCEPT OF COMPLETE TESTING It is not unusual to find people making claims such as “I have exhaustively tested the program.” Complete, or exhaustive, testing means there are no undiscovered faults at the end of the test phase. All problems must be known at the end of complete testing. For most of the systems, complete testing is near impossible because of the following reasons: • The domain of possible inputs of a program is too large to be completely used in testing a system. There are both valid inputs and invalid inputs. The program may have a large number of states. There may be timing constraints on the inputs, that is, an input may be valid at a certain time and invalid at other times. An input value which is valid but is not properly timed is called an inopportune input. The input domain of a system can be very large to be completely used in testing a program. • The design issues may be too complex to completely test. The design may have included implicit design decisions and assumptions. For example, a programmer may use a global variable or a static variable to control program execution. • It may not be possible to create all possible execution environments of the system. This becomes more significant when the behavior of the software system depends on the real, outside world, such as weather, temperature, altitude, pressure, and so on. 1.11 CENTRAL ISSUE IN TESTING We must realize that though the outcome of complete testing, that is, discovering all faults, is highly desirable, it is a near-impossible task, and it may not be attempted. The next best thing is to select a subset of the input domain to test a program. 14 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES Input domain D Program P D1 D2 Apply inputs P1 Observe outcome P2 Figure 1.5 Subset of the input domain exercising a subset of the program behavior. Referring to Figure 1.5, let D be the input domain of a program P . Suppose that we select a subset D1 of D, that is, D1 ⊂ D, to test program P . It is possible that D1 exercises only a part P 1, that is, P 1 ⊂ P , of the execution behavior of P , in which case faults with the other part, P2, will go undetected. By selecting a subset of the input domain D1, the test engineer attempts to deduce properties of an entire program P by observing the behavior of a part P 1 of the entire behavior of P on selected inputs D1. Therefore, selection of the subset of the input domain must be done in a systematic and careful manner so that the deduction is as accurate and complete as possible. For example, the idea of coverage is considered while selecting test cases. 1.12 TESTING ACTIVITIES In order to test a program, a test engineer must perform a sequence of testing activities. Most of these activities have been shown in Figure 1.6 and are explained in the following. These explanations focus on a single test case. • Identify an objective to be tested: The first activity is to identify an objective to be tested. The objective defines the intention, or purpose, of designing one or more test cases to ensure that the program supports the objective. A clear purpose must be associated with every test case. Compute expected outcome for the selected input Selected input Program (P) Observe actual outcome Result analysis Environment Assign a test verdict Figure 1.6 Different activities in program testing. 1.12 TESTING ACTIVITIES 15 • Select inputs: The second activity is to select test inputs. Selection of test inputs can be based on the requirements specification, the source code, or our expectations. Test inputs are selected by keeping the test objective in mind. • Compute the expected outcome: The third activity is to compute the expected outcome of the program with the selected inputs. In most cases, this can be done from an overall, high-level understanding of the test objective and the specification of the program under test. • Set up the execution environment of the program: The fourth step is to prepare the right execution environment of the program. In this step all the assumptions external to the program must be satisfied. A few examples of assumptions external to a program are as follows: Initialize the local system, external to the program. This may include making a network connection available, making the right database system available, and so on. Initialize any remote, external system (e.g., remote partner process in a distributed application.) For example, to test the client code, we may need to start the server at a remote site. • Execute the program: In the fifth step, the test engineer executes the program with the selected inputs and observes the actual outcome of the program. To execute a test case, inputs may be provided to the program at different physical locations at different times. The concept of test coordination is used in synchronizing different components of a test case. • Analyze the test result: The final test activity is to analyze the result of test execution. Here, the main task is to compare the actual outcome of program execution with the expected outcome. The complexity of comparison depends on the complexity of the data to be observed. The observed data type can be as simple as an integer or a string of characters or as complex as an image, a video, or an audio clip. At the end of the analysis step, a test verdict is assigned to the program. There are three major kinds of test verdicts, namely, pass, fail , and inconclusive, as explained below. If the program produces the expected outcome and the purpose of the test case is satisfied, then a pass verdict is assigned. If the program does not produce the expected outcome, then a fail verdict is assigned. However, in some cases it may not be possible to assign a clear pass or fail verdict. For example, if a timeout occurs while executing a test case on a distributed application, we may not be in a position to assign a clear pass or fail verdict. In those cases, an inconclusive test verdict is assigned. An inconclusive test verdict means that further tests are needed to be done to refine the inconclusive verdict into a clear pass or fail verdict. 16 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES A test report must be written after analyzing the test result. The motivation for writing a test report is to get the fault fixed if the test revealed a fault. A test report contains the following items to be informative: Explain how to reproduce the failure. Analyze the failure to be able to describe it. A pointer to the actual outcome and the test case, complete with the input, the expected outcome, and the execution environment. 1.13 TEST LEVELS Testing is performed at different levels involving the complete system or parts of it throughout the life cycle of a software product. A software system goes through four stages of testing before it is actually deployed. These four stages are known as unit, integration, system, and acceptance level testing. The first three levels of testing are performed by a number of different stakeholders in the development organization, where as acceptance testing is performed by the customers. The four stages of testing have been illustrated in the form of what is called the classical V model in Figure 1.7. In unit testing, programmers test individual program units, such as a procedures, functions, methods, or classes, in isolation. After ensuring that individual units work to a satisfactory extent, modules are assembled to construct larger subsystems by following integration testing techniques. Integration testing is jointly performed by software developers and integration test engineers. The objective of Development Requirements Testing Acceptance High-level design System Detailed design Integration Coding Unit Legend Validation Verification Figure 1.7 Development and testing phases in the V model. 1.13 TEST LEVELS 17 integration testing is to construct a reasonably stable system that can withstand the rigor of system-level testing. System-level testing includes a wide spectrum of testing, such as functionality testing, security testing, robustness testing, load testing, stability testing, stress testing, performance testing, and reliability testing. System testing is a critical phase in a software development process because of the need to meet a tight schedule close to delivery date, to discover most of the faults, and to verify that fixes are working and have not resulted in new faults. System testing comprises a number of distinct activities: creating a test plan, designing a test suite, preparing test environments, executing the tests by following a clear strategy, and monitoring the process of test execution. Regression testing is another level of testing that is performed throughout the life cycle of a system. Regression testing is performed whenever a component of the system is modified. The key idea in regression testing is to ascertain that the modification has not introduced any new faults in the portion that was not subject to modification. To be precise, regression testing is not a distinct level of testing. Rather, it is considered as a subphase of unit, integration, and system-level testing, as illustrated in Figure 1.8 [41]. In regression testing, new tests are not designed. Instead, tests are selected, prioritized, and executed from the existing pool of test cases to ensure that nothing is broken in the new version of the software. Regression testing is an expensive process and accounts for a predominant portion of testing effort in the industry. It is desirable to select a subset of the test cases from the existing pool to reduce the cost. A key question is how many and which test cases should be selected so that the selected test cases are more likely to uncover new faults [42–44]. After the completion of system-level testing, the product is delivered to the customer. The customer performs their own series of tests, commonly known as acceptance testing. The objective of acceptance testing is to measure the quality of the product, rather than searching for the defects, which is objective of system testing. A key notion in acceptance testing is the customer’s expectations from the system. By the time of acceptance testing, the customer should have developed their acceptance criteria based on their own expectations from the system. There are two kinds of acceptance testing as explained in the following: • User acceptance testing (UAT) • Business acceptance testing (BAT) Regression testing Unit testing Integration testing System testing Acceptance testing Figure 1.8 Regression testing at different software testing levels. (From ref. 41. © 2005 John Wiley & Sons.) 18 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES User acceptance testing is conducted by the customer to ensure that the system satisfies the contractual acceptance criteria before being signed off as meeting user needs. On the other hand, BAT is undertaken within the supplier’s development organization. The idea in having a BAT is to ensure that the system will eventually pass the user acceptance test. It is a rehearsal of UAT at the supplier’s premises. 1.14 SOURCES OF INFORMATION FOR TEST CASE SELECTION Designing test cases has continued to stay in the foci of the research community and the practitioners. A software development process generates a large body of information, such as requirements specification, design document, and source code. In order to generate effective tests at a lower cost, test designers analyze the following sources of information: • Requirements and functional specifications • Source code • Input and output domains • Operational profile • Fault model Requirements and Functional Specifications The process of software development begins by capturing user needs. The nature and amount of user needs identified at the beginning of system development will vary depending on the specific life-cycle model to be followed. Let us consider a few examples. In the Waterfall model [45] of software development, a requirements engineer tries to capture most of the requirements. On the other hand, in an agile software development model, such as XP [29] or the Scrum [46–48], only a few requirements are identified in the beginning. A test engineer considers all the requirements the program is expected to meet whichever life-cycle model is chosen to test a program. The requirements might have been specified in an informal manner, such as a combination of plaintext, equations, figures, and flowcharts. Though this form of requirements specification may be ambiguous, it is easily understood by customers. For example, the Bluetooth specification consists of about 1100 pages of descriptions explaining how various subsystems of a Bluetooth interface is expected to work. The specification is written in plaintext form supplemented with mathematical equations, state diagrams, tables, and figures. For some systems, requirements may have been captured in the form of use cases, entity–relationship diagrams, and class diagrams. Sometimes the requirements of a system may have been specified in a formal language or notation, such as Z, SDL, Estelle, or finite-state machine. Both the informal and formal specifications are prime sources of test cases [49]. 1.14 SOURCES OF INFORMATION FOR TEST CASE SELECTION 19 Source Code Whereas a requirements specification describes the intended behavior of a system, the source code describes the actual behavior of the system. High-level assumptions and constraints take concrete form in an implementation. Though a software designer may produce a detailed design, programmers may introduce additional details into the system. For example, a step in the detailed design can be “sort array A.” To sort an array, there are many sorting algorithms with different characteristics, such as iteration, recursion, and temporarily using another array. Therefore, test cases must be designed based on the program [50]. Input and Output Domains Some values in the input domain of a program have special meanings, and hence must be treated separately [5]. To illustrate this point, let us consider the factorial function. The factorial of a nonnegative integer n is computed as follows: factorial(0) = 1; factorial(1) = 1; factorial(n) = n * factorial(n-1); A programmer may wrongly implement the factorial function as factorial(n) = 1 * 2 * ... * n; without considering the special case of n = 0. The above wrong implementation will produce the correct result for all positive values of n, but will fail for n = 0. Sometimes even some output values have special meanings, and a program must be tested to ensure that it produces the special values for all possible causes. In the above example, the output value 1 has special significance: (i) it is the minimum value computed by the factorial function and (ii) it is the only value produced for two different inputs. In the integer domain, the values 0 and 1 exhibit special characteristics if arithmetic operations are performed. These characteristics are 0 × x = 0 and 1 × x = x for all values of x . Therefore, all the special values in the input and output domains of a program must be considered while testing the program. Operational Profile As the term suggests, an operational profile is a quantitative characterization of how a system will be used. It was created to guide test engineers in selecting test cases (inputs) using samples of system usage. The notion of operational profiles, or usage profiles, was developed by Mills et al. [52] at IBM in the context of Cleanroom Software Engineering and by Musa [37] at AT&T Bell Laboratories to help develop software systems with better reliability. The idea is to infer, from the observed test results, the future reliability of the software when it is in actual use. To do this, test inputs are assigned a probability distribution, or profile, according to their occurrences in actual operation. The ways test engineers assign probability and select test cases to operate a system may significantly differ from the ways actual users operate a system. However, for accurate estimation of the reliability of a system it is important to test a system by considering the ways it will actually be used in the field. This concept is being used to test web 20 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES applications, where the user session data are collected from the web servers to select test cases [53, 54]. Fault Model Previously encountered faults are an excellent source of information in designing new test cases. The known faults are classified into different classes, such as initialization faults, logic faults, and interface faults, and stored in a repository [55, 56]. Test engineers can use these data in designing tests to ensure that a particular class of faults is not resident in the program. There are three types of fault-based testing: error guessing, fault seeding, and mutation analysis. In error guessing, a test engineer applies his experience to (i) assess the situation and guess where and what kinds of faults might exist, and (ii) design tests to specifically expose those kinds of faults. In fault seeding, known faults are injected into a program, and the test suite is executed to assess the effectiveness of the test suite. Fault seeding makes an assumption that a test suite that finds seeded faults is also likely to find other faults. Mutation analysis is similar to fault seeding, except that mutations to program statements are made in order to determine the fault detection capability of the test suite. If the test cases are not capable of revealing such faults, the test engineer may specify additional test cases to reveal the faults. Mutation testing is based on the idea of fault simulation, whereas fault seeding is based on the idea of fault injection. In the fault injection approach, a fault is inserted into a program, and an oracle is available to assert that the inserted fault indeed made the program incorrect. On the other hand, in fault simulation, a program modification is not guaranteed to lead to a faulty program. In fault simulation, one may modify an incorrect program and turn it into a correct program. 1.15 WHITE-BOX AND BLACK-BOX TESTING A key idea in Section 1.14 was that test cases need to be designed by considering information from several sources, such as the specification, source code, and special properties of the program’s input and output domains. This is because all those sources provide complementary information to test designers. Two broad concepts in testing, based on the sources of information for test design, are white-box and black-box testing. White-box testing techniques are also called structural testing techniques, whereas black-box testing techniques are called functional testing techniques. In structural testing, one primarily examines source code with a focus on control flow and data flow. Control flow refers to flow of control from one instruction to another. Control passes from one instruction to another instruction in a number of ways, such as one instruction appearing after another, function call, message passing, and interrupts. Conditional statements alter the normal, sequential flow of control in a program. Data flow refers to the propagation of values from one variable or constant to another variable. Definitions and uses of variables determine the data flow aspect in a program. 1.16 TEST PLANNING AND DESIGN 21 In functional testing, one does not have access to the internal details of a program and the program is treated as a black box. A test engineer is concerned only with the part that is accessible outside the program, that is, just the input and the externally visible outcome. A test engineer applies input to a program, observes the externally visible outcome of the program, and determines whether or not the program outcome is the expected outcome. Inputs are selected from the program’s requirements specification and properties of the program’s input and output domains. A test engineer is concerned only with the functionality and the features found in the program’s specification. At this point it is useful to identify a distinction between the scopes of structural testing and functional testing. One applies structural testing techniques to individual units of a program, whereas functional testing techniques can be applied to both an entire system and the individual program units. Since individual programmers know the details of the source code they write, they themselves perform structural testing on the individual program units they write. On the other hand, functional testing is performed at the external interface level of a system, and it is conducted by a separate software quality assurance group. Let us consider a program unit U which is a part of a larger program P . A program unit is just a piece of source code with a well-defined objective and well-defined input and output domains. Now, if a programmer derives test cases for testing U from a knowledge of the internal details of U , then the programmer is said to be performing structural testing. On the other hand, if the programmer designs test cases from the stated objective of the unit U and from his or her knowledge of the special properties of the input and output domains of U , then he or she is said to be performing functional testing on the same unit U . The ideas of structural testing and functional testing do not give programmers and test engineers a choice of whether to design test cases from the source code or from the requirements specification of a program. However, these strategies are used by different groups of people at different times during a software’s life cycle. For example, individual programmers use both the structural and functional testing techniques to test their own code, whereas quality assurance engineers apply the idea of functional testing. Neither structural testing nor functional testing is by itself good enough to detect most of the faults. Even if one selects all possible inputs, a structural testing technique cannot detect all faults if there are missing paths in a program. Intuitively, a path is said to be missing if there is no code to handle a possible condition. Similarly, without knowledge of the structural details of a program, many faults will go undetected. Therefore, a combination of both structural and functional testing techniques must be used in program testing. 1.16 TEST PLANNING AND DESIGN The purpose of system test planning, or simply test planning, is to get ready and organized for test execution. A test plan provides a framework, scope, details of resource needed, effort required, schedule of activities, and a budget. A framework 22 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES is a set of ideas, facts, or circumstances within which the tests will be conducted. The stated scope outlines the domain, or extent, of the test activities. The scope covers the managerial aspects of testing, rather than the detailed techniques and specific test cases. Test design is a critical phase of software testing. During the test design phase, the system requirements are critically studied, system features to be tested are thoroughly identified, and the objectives of test cases and the detailed behavior of test cases are defined. Test objectives are identified from different sources, namely, the requirement specification and the functional specification, and one or more test cases are designed for each test objective. Each test case is designed as a combination of modular test components called test steps. These test steps can be combined together to create more complex, multistep tests. A test case is clearly specified so that others can easily borrow, understand, and reuse it. It is interesting to note that a new test-centric approach to system development is gradually emerging. This approach is called test-driven development (TDD) [57]. In test-driven development, programmers design and implement test cases before the production code is written. This approach is a key practice in modern agile software development processes such as XP. The main characteristics of agile software development processes are (i) incremental development, (ii) coding of unit and acceptance tests conducted by the programmers along with customers, (iii) frequent regression testing, and (iv) writing test code, one test case at a time, before the production code. 1.17 MONITORING AND MEASURING TEST EXECUTION Monitoring and measurement are two key principles followed in every scientific and engineering endeavor. The same principles are also applicable to the testing phases of software development. It is important to monitor certain metrics which truly represent the progress of testing and reveal the quality level of the system. Based on those metrics, the management can trigger corrective and preventive actions. By putting a small but critical set of metrics in place the executive management will be able to know whether they are on the right track [58]. Test execution metrics can be broadly categorized into two classes as follows: • Metrics for monitoring test execution • Metrics for monitoring defects The first class of metrics concerns the process of executing test cases, whereas the second class concerns the defects found as a result of test execution. These metrics need to be tracked and analyzed on a periodic basis, say, daily or weekly. In order to effectively control a test project, it is important to gather valid and accurate information about the project. One such example is to precisely know when to trigger revert criteria for a test cycle and initiate root cause analysis of 1.17 MONITORING AND MEASURING TEST EXECUTION 23 the problems before more tests can be performed. By triggering such a revert criteria, a test manager can effectively utilize the time of test engineers, and possibly money, by suspending a test cycle on a product with too many defects to carry out a meaningful system test. A management team must identify and monitor metrics while testing is in progress so that important decisions can be made [59]. It is important to analyze and understand the test metrics, rather than just collect data and make decisions based on those raw data. Metrics are meaningful only if they enable the management to make decisions which result in lower cost of production, reduced delay in delivery, and improved quality of software systems. Quantitative evaluation is important in every scientific and engineering field. Quantitative evaluation is carried out through measurement. Measurement lets one evaluate parameters of interest in a quantitative manner as follows: • Evaluate the effectiveness of a technique used in performing a task. One can evaluate the effectiveness of a test generation technique by counting the number of defects detected by test cases generated by following the technique and those detected by test cases generated by other means. • Evaluate the productivity of the development activities. One can keep track of productivity by counting the number of test cases designed per day, the number of test cases executed per day, and so on. • Evaluate the quality of the product. By monitoring the number of defects detected per week of testing, one can observe the quality level of the system. • Evaluate the product testing. For evaluating a product testing process, the following two measurements are critical: Test case effectiveness metric: The objective of this metric is twofold as explained in what follows: (1) measure the “defect revealing ability” of the test suite and (2) use the metric to improve the test design process. During the unit, integration, and system testing phases, faults are revealed by executing the planned test cases. In addition to these faults, new faults are also found during a testing phase for which no test cases had been designed. For these new faults, new test cases are added to the test suite. Those new test cases are called test case escaped (TCE). Test escapes occur because of deficiencies in test design. The need for more testing occurs as test engineers get new ideas while executing the planned test cases. Test effort effectiveness metric: It is important to evaluate the effectiveness of the testing effort in the development of a product. After a product is deployed at the customer’s site, one is interested to know the effectiveness of testing that was performed. A common measure of test effectiveness is the number of defects found by the customers that were not found by the test engineers prior to the release of the product. These defects had escaped our test effort. 24 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES 1.18 TEST TOOLS AND AUTOMATION In general, software testing is a highly labor intensive task. This is because test cases are to a great extent manually generated and often manually executed. Moreover, the results of test executions are manually analyzed. The durations of those tasks can be shortened by using appropriate tools. A test engineer can use a variety of tools, such as a static code analyzer, a test data generator, and a network analyzer, if a network-based application or protocol is under test. Those tools are useful in increasing the efficiency and effectiveness of testing. Test automation is essential for any testing and quality assurance division of an organization to move forward to become more efficient. The benefits of test automation are as follows: • Increased productivity of the testers • Better coverage of regression testing • Reduced durations of the testing phases • Reduced cost of software maintenance • Increased effectiveness of test cases Test automation provides an opportunity to improve the skills of the test engineers by writing programs, and hence their morale. They will be more focused on developing automated test cases to avoid being a bottleneck in product delivery to the market. Consequently, software testing becomes less of a tedious job. Test automation improves the coverage of regression testing because of accumulation of automated test cases over time. Automation allows an organization to create a rich library of reusable test cases and facilitates the execution of a consistent set of test cases. Here consistency means our ability to produce repeated results for the same set of tests. It may be very difficult to reproduce test results in manual testing, because exact conditions at the time and point of failure may not be precisely known. In automated testing it is easier to set up the initial conditions of a system, thereby making it easier to reproduce test results. Test automation simplifies the debugging work by providing a detailed, unambiguous log of activities and intermediate test steps. This leads to a more organized, structured, and reproducible testing approach. Automated execution of test cases reduces the elapsed time for testing, and, thus, it leads to a shorter time to market. The same automated test cases can be executed in an unsupervised manner at night, thereby efficiently utilizing the different platforms, such as hardware and configuration. In short, automation increases test execution efficiency. However, at the end of test execution, it is important to analyze the test results to determine the number of test cases that passed or failed. And, if a test case failed, one analyzes the reasons for its failure. In the long run, test automation is cost-effective. It drastically reduces the software maintenance cost. In the sustaining phase of a software system, the regression tests required after each change to the system are too many. As a result, regression testing becomes too time and labor intensive without automation. 1.18 TEST TOOLS AND AUTOMATION 25 A repetitive type of testing is very cumbersome and expensive to perform manually, but it can be automated easily using software tools. A simple repetitive type of application can reveal memory leaks in a software. However, the application has to be run for a significantly long duration, say, for weeks, to reveal memory leaks. Therefore, manual testing may not be justified, whereas with automation it is easy to reveal memory leaks. For example, stress testing is a prime candidate for automation. Stress testing requires a worst-case load for an extended period of time, which is very difficult to realize by manual means. Scalability testing is another area that can be automated. Instead of creating a large test bed with hundreds of equipment, one can develop a simulator to verify the scalability of the system. Test automation is very attractive, but it comes with a price tag. Sufficient time and resources need to be allocated for the development of an automated test suite. Development of automated test cases need to be managed like a programming project. That is, it should be done in an organized manner; otherwise it is highly likely to fail. An automated test suite may take longer to develop because the test suite needs to be debugged before it can be used for testing. Sufficient time and resources need to be allocated for maintaining an automated test suite and setting up a test environment. Moreover, every time the system is modified, the modification must be reflected in the automated test suite. Therefore, an automated test suite should be designed as a modular system, coordinated into reusable libraries, and cross-referenced and traceable back to the feature being tested. It is important to remember that test automation cannot replace manual testing. Human creativity, variability, and observability cannot be mimicked through automation. Automation cannot detect some problems that can be easily observed by a human being. Automated testing does not introduce minor variations the way a human can. Certain categories of tests, such as usability, interoperability, robustness, and compatibility, are often not suited for automation. It is too difficult to automate all the test cases; usually 50% of all the system-level test cases can be automated. There will always be a need for some manual testing, even if all the system-level test cases are automated. The objective of test automation is not to reduce the head counts in the testing department of an organization, but to improve the productivity, quality, and efficiency of test execution. In fact, test automation requires a larger head count in the testing department in the first year, because the department needs to automate the test cases and simultaneously continue the execution of manual tests. Even after the completion of the development of a test automation framework and test case libraries, the head count in the testing department does not drop below its original level. The test organization needs to retain the original team members in order to improve the quality by adding more test cases to the automated test case repository. Before a test automation project can proceed, the organization must assess and address a number of considerations. The following list of prerequisites must be considered for an assessment of whether the organization is ready for test automation: • The test cases to be automated are well defined. • Test tools and an infrastructure are in place. 26 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES • The test automation professionals have prior successful experience in automation. • Adequate budget should have been allocated for the procurement of software tools. 1.19 TEST TEAM ORGANIZATION AND MANAGEMENT Testing is a distributed activity conducted at different levels throughout the life cycle of a software. These different levels are unit testing, integration testing, system testing, and acceptance testing. It is logical to have different testing groups in an organization for each level of testing. However, it is more logical—and is the case in reality—that unit-level tests be developed and executed by the programmers themselves rather than an independent group of unit test engineers. The programmer who develops a software unit should take the ownership and responsibility of producing good-quality software to his or her satisfaction. System integration testing is performed by the system integration test engineers. The integration test engineers involved need to know the software modules very well. This means that all development engineers who collectively built all the units being integrated need to be involved in integration testing. Also, the integration test engineers should thoroughly know the build mechanism, which is key to integrating large systems. A team for performing system-level testing is truly separated from the development team, and it usually has a separate head count and a separate budget. The mandate of this group is to ensure that the system requirements have been met and the system is acceptable. Members of the system test group conduct different categories of tests, such as functionality, robustness, stress, load, scalability, reliability, and performance. They also execute business acceptance tests identified in the user acceptance test plan to ensure that the system will eventually pass user acceptance testing at the customer site. However, the real user acceptance testing is executed by the client’s special user group. The user group consists of people from different backgrounds, such as software quality assurance engineers, business associates, and customer support engineers. It is a common practice to create a temporary user acceptance test group consisting of people with different backgrounds, such as integration test engineers, system test engineers, customer support engineers, and marketing engineers. Once the user acceptance is completed, the group is dismantled. It is recommended to have at least two test groups in an organization: integration test group and system test group. Hiring and retaining test engineers are challenging tasks. Interview is the primary mechanism for evaluating applicants. Interviewing is a skill that improves with practice. It is necessary to have a recruiting process in place in order to be effective in hiring excellent test engineers. In order to retain test engineers, the management must recognize the importance of testing efforts at par with development efforts. The management should treat the test engineers as professionals and as a part of the overall team that delivers quality products. 1.20 OUTLINE OF BOOK 27 1.20 OUTLINE OF BOOK With the above high-level introduction to quality and software testing, we are now in a position to outline the remaining chapters. Each chapter in the book covers technical, process, and/or managerial topics related to software testing. The topics have been designed and organized to facilitate the reader to become a software test specialist. In Chapter 2 we provide a self-contained introduction to the theory and limitations of software testing. Chapters 3–6 treat unit testing techniques one by one, as quantitatively as possible. These chapters describe both static and dynamic unit testing. Static unit testing has been presented within a general framework called code review , rather than individual techniques called inspection and walkthrough. Dynamic unit testing, or execution-based unit testing, focuses on control flow, data flow, and domain testing. The JUnit framework, which is used to create and execute dynamic unit tests, is introduced. We discuss some tools for effectively performing unit testing. Chapter 7 discusses the concept of integration testing. Specifically, five kinds of integration techniques, namely, top down, bottom up, sandwich, big bang, and incremental, are explained. Next, we discuss the integration of hardware and software components to form a complete system. We introduce a framework to develop a plan for system integration testing. The chapter is completed with a brief discussion of integration testing of off-the-shelf components. Chapters 8–13 discuss various aspects of system-level testing. These six chapters introduce the reader to the technical details of system testing that is the practice in industry. These chapters promote both qualitative and quantitative evaluation of a system testing process. The chapters emphasize the need for having an independent system testing group. A process for monitoring and controlling system testing is clearly explained. Chapter 14 is devoted to acceptance testing, which includes acceptance testing criteria, planning for acceptance testing, and acceptance test execution. Chapter 15 contains the fundamental concepts of software reliability and their application to software testing. We discuss the notion of operation profile and its application in system testing. We conclude the chapter with the description of an example and the time of releasing a system by determining the additional length of system testing. The additional testing time is calculated by using the idea of software reliability. In Chapter 16, we present the structure of test groups and how these groups can be organized in a software company. Next, we discuss how to hire and retain test engineers by providing training, instituting a reward system, and establishing an attractive career path for them within the testing organization. We conclude this chapter with the description of how to build and manage a test team with a focus on teamwork rather than individual gain. Chapters 17 and 18 explain the concepts of software quality and different maturity models. Chapter 17 focuses on quality factors and criteria and describes the ISO 9126 and ISO 9000:2000 standards. Chapter 18 covers the CMM, which 28 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES was developed by the SEI at Carnegie Mellon University. Two test-related models, namely the TPI model and the TMM, are explained at the end of Chapter 18. We define the key words used in the book in a glossary at the end of the book. The reader will find about 10 practice exercises at the end of each chapter. A list of references is included at the end of each chapter for a reader who would like to find more detailed discussions of some of the topics. Finally, each chapter, except this one, contains a literature review section that, essentially, provides pointers to more advanced material related to the topics. The more advanced materials are based on current research and alternate viewpoints. REFERENCES 1. B. Davis, C. Skube, L. Hellervik, S. Gebelein, and J. Sheard. Successful Manager’s Handbook . Personnel Decisions International, Minneapolis, 1996. 2. M. Walton. The Deming Management Method . The Berkley Publishing Group, New York, 1986. 3. W. E. Deming. Transcript of Speech to GAO Roundtable on Product Quality—Japan vs. the United States. Quality Progress, March 1994, pp. 39–44. 4. W. A. Shewhart. Economic Control of Quality of Manufactured Product. Van Nostrand, New York, 1931. 5. W. A. Shewhart. The Application of Statistics as an Aid in Maintaining Quality of a Manufactured Product. Journal of American Statistical Association, December 1925, pp. 546– 548. 6. National Institute of Standards and Technology, Baldridge National Quality Program, 2008. Avail- able: http://www.quality.nist.gov/. 7. J. Liker and D. Meier. The Toyota Way Fieldbook . McGraw-Hill, New York, 2005. 8. M. Poppendieck and T. Poppendieck. Implementing Lean Software Development: From Concept to Cash. Addison-Wesley, Reading, MA, 2006. 9. A. B. Godfrey and A. I. C. Endres. The Evolution of Quality Management Within Telecommuni- cations. IEEE Communications Magazine, October 1994, pp. 26– 34. 10. M. Pecht and W. R. Boulton. Quality Assurance and Reliability in the Japanese Electronics Industry. Japanses Technology Evaluation Center (JTEC), Report on Electronic Manufacturing and Packaging in Japan, W. R. Boulton, Ed. International Technology Research Institute at Loyola College, February 1995, pp. 115–126. 11. A. V. Feigenbaum. Total Quality Control , 4th ed. McGraw-Hill, New York, 2004. 12. K. Ishikawa. What Is Total Quality Control . Prentice-Hall, Englewood Cliffs, NJ, 1985. 13. A. Cockburn. What Engineering Has in Common With Manufacturing and Why It Matters. Crosstalk, the Journal of Defense Software Engineering, April 2007, pp. 4–7. 14. S. Land. Jumpstart CMM/CMMI Software Process Improvement. Wiley, Hoboken, NJ, 2005. 15. W. E. Deming. Out of the Crisis. MIT, Cambridge, MA, 1986. 16. J. M. Juran and A. B. Godfrey. Juran’s Quality Handbook , 5th ed. McGraw-Hill, New York, 1998. 17. P. Crosby. Quality Is Free. New American Library, New York, 1979. 18. D. A. Garvin. What Does “Product Quality” Really Mean? Sloan Management Review , Fall 1984, pp. 25–43. 19. J. A. McCall, P. K. Richards, and G. F. Walters. Factors in Software Quality, Technical Report RADC-TR-77-369. U.S. Department of Commerce, Washington, DC, 1977. 20. International Organization for Standardization (ISO). Quality Management Systems—Fundamentals and Vocabulary, ISO 9000:2000. ISO, Geneva, December 2000. 21. International Organization for Standardization (ISO). Quality Management Systems—Guidelines for Performance Improvements, ISO 9004:2000. ISO, Geneva, December 2000. 22. International Organization for Standardization (ISO). Quality Management Systems—Requirements, ISO 9001:2000. ISO, Geneva, December 2000. 23. T. Koomen and M. Pol. Test Process Improvement. Addison-Wesley, Reading, MA, 1999. REFERENCES 29 24. I. Burnstein. Practical Software Testing. Springer, New York, 2003. 25. L. Osterweil et al. Strategic Directions in Software Quality. ACM Computing Surveys, December 1996, pp. 738– 750. 26. M. A. Friedman and J. M. Voas. Software Assessment: Reliability, Safety, Testability. Wiley, New York, 1995. 27. Michael D. Ernst. Static and Dynamic Analysis: Synergy and Duality. Paper presented at ICSE Workshop on Dynamic Analysis, Portland, OR, May 2003, pp. 24–27. 28. L. Baresi and M. Pezze`. An Introduction to Software Testing, Electronic Notes in Theoretical Com- puter Science. Elsevier, Vol. 148, Feb. 2006, pp. 89–111. 29. K. Beck and C. Andres. Extreme Programming Explained: Embrace Change, 2nd ed. Addison-Wesley, Reading, MA, 2004. 30. B. W. Boehm. Software Engineering Economics. Prentice-Hall, Englewood Cliffs, NJ, 1981. 31. J. C. Laprie. Dependability— Its Attributes, Impairments and Means. In Predictably Dependable Computing Systems, B. Randall, J. C. Laprie, H. Kopetz, and B. Littlewood, Eds. Springer-Verlag, New York, 1995. 32. R. Chillarege. What Is Software Failure. IEEE Transactions on Reliability, September 1996, pp. 354–355. 33. R. Rees. What Is a Failure. IEEE Transactions on Reliability, June 1997, p. 163. 34. B. Parhami. Defect, Fault, Error, . . . , or Failure. IEEE Transactions on Reliability, December 1997, pp. 450–451. 35. M. R. Lyu. Handbook of Software Reliability Engineering. McGraw-Hill, New York, 1995. 36. R. Hamlet. Random Testing. In Encyclopedia of Software Engineering, J. Marciniak, Ed. Wiley, New York, 1994, pp. 970– 978. 37. J. D. Musa. Software Reliability Engineering. IEEE Software, March 1993, pp. 14– 32. 38. J. D. Musa. A Theory of Software Reliability and Its Application. IEEE Transactions on Software Engineering, September 1975, pp. 312– 327. 39. M. Glinz and R. J. Wieringa. Stakeholders in Requirements Engineering. IEEE Software, March–April 2007, pp. 18–20. 40. A. Bertolino and L. Strigini. On the Use of Testability Measures for Dependability Assessment. IEEE Transactions on Software Engineering, February 1996, pp. 97–108. 41. A. Bertolino and E Marchelli. A Brief Essay on Software Testing. In Software Engineering, Vol. 1, The Development Process, 3rd ed., R. H. Thayer and M. J. Christensen, Eds. Wiley–IEEE Computer Society Press, Hoboken, NJ, 2005. 42. D. Jeffrey and N. Gupta. Improving Fault Detection Capability by Selectively Retaining Test Cases during Test Suite Reduction. IEEE Transactions on Software Engineering, February 2007, pp. 108– 123. 43. Z. Li, M. Harman, and R. M. Hierons. Search Algorithms for Regression Test Case Prioritization. IEEE Transactions on Software Engineering, April 2007, pp. 225–237. 44. W. Masri, A. Podgurski, and D. Leon. An Empirical Study of Test Case Filtering Techniques Based on Exercising Information Flows. IEEE Transactions on Software Engineering, July 2007, pp. 454–477. 45. W. W. Royce. Managing the Development of Large Software Systems: Concepts and Techniques. In Proceedings of IEEE WESCON , August 1970, pp. 1–9. Republished in ICSE, Monterey, 1987, pp. 328–338. 46. L. Rising and N. S. Janoff. The Scrum Software Development Process for Small Teams. IEEE Software, July/August 2000, pp. 2–8. 47. K. Schwaber. Agile Project Management with Scrum. Microsoft Press, Redmond, WA, 2004. 48. H. Takeuchi and I. Nonaka. The New Product Development Game. Harvard Business Review , Boston, January-February 1986, pp. 1–11. 49. A. P. Mathur. Foundation of Software Testing. Pearson Education, New Delhi, 2007. 50. P. Ammann and J. Offutt. Introduction to Software Testing. Cambridge University Press, 2008. 51. M. Pezze` and M. Young. Software Testing and Analysis: Process, Principles, and Techniques. Wiley, Hoboken, NJ, 2007. 52. H. D. Mills, M. Dyer, and R. C. Linger. Cleanroom Software Engineering. IEEE Software, September 1987, pp. 19–24. 30 CHAPTER 1 BASIC CONCEPTS AND PRELIMINARIES 53. S. Elbaum, G. Rothermel, S. Karre, and M. Fisher II. Leveraging User Session Data to Support Web Application Testing. IEEE Transactions on Software Engineering, March 2005, pp. 187– 202. 54. S. Sampath, S. Sprenkle, E. Gibson, L. Pollock, and A. S. Greenwald. Applying Concept Analysis to User-Session-Based Testing of Web Applications. IEEE Transactions on Software Engineering, October 2007, pp. 643– 657. 55. A. Endress. An Analysis of Errors and Their Causes in System Programs. IEEE Transactions on Software Engineering, June 1975, pp. 140– 149. 56. T. J. Ostrand and E. J. Weyuker. Collecting and Categorizing Software Error Data in an Industrial Environment. Journal of Systems and Software, November 1984, pp. 289– 300. 57. K. Beck. Test-Driven Development. Addison-Wesley, Reading, MA, 2003. 58. D. Lemont. CEO Discussion—From Start-up to Market Leader—Breakthrough Milestones. Ernst and Young Milestones, Boston, May 2004, pp. 9–11. 59. G. Stark, R. C. Durst, and C. W. Vowell. Using Metrics in Management Decision Making. IEEE Computer, September 1994, pp. 42–48. 60. B. Kitchenham and S. L. Pfleeger. Software Quality: The Elusive Target. IEEE Software, January 1996, pp. 12–21. 61. J. Kilpatrick. Lean Principles. http://www.mep.org/textfiles/LeanPrinciples.pdf, 2003, pp. 1–5. Exercises 1. Explain the principles of statistical quality control. What are the tools used for this purpose? Explain the principle of a control chart. 2. Explain the concept of lean principles. 3. What is an “Ishikawa” diagram? When should the Ishikawa diagram be used? Provide a procedure to construct an Ishikawa diagram. 4. What is total quality management (TQM)? What is the difference between TQM and TQC? 5. Explain the differences between validation and verification. 6. Explain the differences between failure, error, and fault. 7. What is a test case? What are the objectives of testing? 8. Explain the concepts of unit, integration, system, acceptance, and regression testing. 9. What are the different sources from which test cases can be selected? 10. What is the difference between fault injection and fault simulation? 11. Explain the differences between structural and functional testing. 12. What are the strengths and weaknesses of automated testing and manual testing? 2 CHAPTER Theory of Program Testing He who loves practice without theory is like the sailor who boards [a] ship without a rudder and compass and never knows where he may cast. — Leonardo da Vinci 2.1 BASIC CONCEPTS IN TESTING THEORY The idea of program testing is as old as computer programming. As computer programs got larger and larger since their early days in the 1960s, the need for eliminating defects from them in a systematic manner received more attention. Both the research community and the practitioners became more deeply involved in software testing. Thus, in the 1970s, a new field of research called testing theory emerged. Testing theory puts emphasis on the following: • Detecting defects through execution-based testing • Designing test cases from different sources, namely, requirement specifi- cation, source code, and the input and output domains of programs • Selecting a subset of test cases from the set of all possible test cases [1, 2] • Effectiveness of the test case selection strategy [3–5] • Test oracles used during testing [6, 7] • Prioritizing the execution of the selected test cases [8] • Adequacy analysis of test cases [9–15] A theoretical foundation of testing gives testers and developers valuable insight into software systems and the development processes. As a consequence, testers design more effective test cases at a lower cost. While considering testing theory, there may be a heightened expectation that it lets us detect all the defects in a computer program. Any testing theory must inherit the fundamental limitation of testing. The limitation of testing has been best articulated by Dijkstra: Testing can only reveal the presence of errors, never their absence [16]. In spite of the Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 31 32 CHAPTER 2 THEORY OF PROGRAM TESTING said limitation, testing remains as the most practical and reliable method for defect detection and quality improvement. In this chapter, three well-known testing theories are discussed. These are Goodenough and Gerhart’s theory [17], Weyuker and Ostrand’s theory [18], and Gourlay’s theory [19]. Goodenough and Gerhart introduced some key concepts such as an ideal test, reliability and validity of a test, test selection criteria, thorough test, and five categories of program errors. Weyuker and Ostrand refined some of the above ideas in the form of uniformly reliable criterion, uniformly valid criterion, and uniformly ideal test. Gourlay introduced the concept of a test system and a general method for comparing different test methods. 2.2 THEORY OF GOODENOUGH AND GERHART Goodenough and Gerhart published a seminal paper [17] in 1975 on test data selection. This paper gave a fundamental testing concept, identified a few types of program errors, and gave a theory for selecting test data from the input domain of a program. Though this theory is not without critiques, it is widely quoted and appreciated in the research community of software testing. 2.2.1 Fundamental Concepts Let D be the input domain of a program P. Let T ⊆ D. The result of executing P with input d ∈ D is denoted by P(d ) (Figure 2.1): OK(d): Define a predicate OK(d ) which expresses the acceptability of result P (d ). Thus, OK(d ) = true if and only if P (d ) is an acceptable outcome. SUCCESSFUL(T): For a given T ⊆ D, T is a successful test, denoted by SUCCESSFUL(T ), if and only if, ∀t ∈ T, OK(t). Thus, SUCCESSFUL(T ) = true if and only if, ∀t ∈ T, OK(t). Ideal Test: T constitutes an ideal test if OK(t) ∀t ∈ T ⇒ OK(d) ∀d ∈ D An ideal test is interpreted as follows. If from the successful execution of a sample of the input domain we can conclude that the program contains no errors, then the sample constitutes an ideal test. Practitioners may Input domain D Program P(d) T P Figure 2.1 Executing a program with a subset of the input domain. 2.2 THEORY OF GOODENOUGH AND GERHART 33 loosely interpret “no error” as “not many errors of severe consequences.” The validity of the above definition of an ideal test depends on how “thoroughly” T exercises P . Some people equate thorough test with exhaustive or complete test, in which case T = D. COMPLETE(T, C): A thorough test T is defined to be one satisfying COMPLETE(T ,C ), where COMPLETE is a predicate that defines how some test selection criteria C is used in selecting a particular set of test data T from D. COMPLETE(T , C ) will be defined in a later part of this section. Essentially, C defines the properties of a program that must be exercised to constitute a thorough test. Reliable Criterion: A selection criterion C is reliable if and only if either every test selected by C is successful or no test selected is successful. Thus, reliability refers to consistency. Valid Criterion: A selection criterion C is valid if and only if whenever P is incorrect C selects at least one test set T which is not successful for P . Thus, validity refers to the ability to produce meaningful results. Fundamental Theorem. (∃T ⊆ D)(COMPLETE(T , C) ∧ RELIABLE(C) ∧ VALID(C) ∧ SUCCESSFUL(T )) ⇒ (∀d ∈ D)OK(d) Proof. Let P be a program and D be the set of inputs for P . Let d be a member of D. We assume that P fails on input d . In other words, the actual outcome of executing P with input d is not the same as the expected outcome. In the form of our notation, ¬OK(d ) is true. VALID(C ) implies that there exists a complete set of test data T such that ¬SUCCESSFUL(T ). RELIABLE(C ) implies that if one complete test fails, all tests fail. However, this leads to a contradiction that there exists a complete test that is successfully executed. One may be tempted to find a reliable and valid criterion, if it exists, so that all faults can be detected with a small set of test cases. However, there are several difficulties in applying the above theory, as explained in the following: • Since faults in a program are unknown, it is impossible to prove the reliability and validity of a criterion. A criterion is guaranteed to be both reliable and valid if it selects the entire input domain D. However, this is undesirable and impractical. • Neither reliability nor validity is preserved during the debugging process, where faults keep disappearing. • If the program P is correct, then any test will be successful and every selection criterion is reliable and valid. • If P is not correct, there is in general no way of knowing whether a criterion is ideal without knowing the errors in P . 34 CHAPTER 2 THEORY OF PROGRAM TESTING 2.2.2 Theory of Testing Let D be the input domain of a program P. Let C denote a set of test predicates. If d ∈ D satisfies test predicate c ∈ C , then c(d ) is said to be true. Selecting data to satisfy a test predicate means selecting data to exercise the condition combination in the course of executing P. With the above idea in mind, COMPLETE(T , C ), where T ⊆ D, is defined as follows: COMPLETE(T , C) ≡ (∀c ∈ C)(∃t ∈ T )c(t) ∧ (∀t ∈ T )(∃c ∈ C)c(t) The above theory means that, for every test predicate, we select a test such that the test predicate is satisfied. Also, for every test selected, there exists a test predicate which is satisfied by the selected test. The definitions of an ideal test and thoroughness of a test do not reveal any relationship between them. However, we can establish a relationship between the two in the following way. Let B be the set of faults (or bugs) in a program P revealed by an ideal test T I . Let a test engineer identify a set of test predicates C 1 and design a set of test cases T 1, such that COMPLETE(T 1, C 1) is satisfied. Let B 1 represent the set of faults revealed by T 1. There is no guarantee that T 1 reveals all the faults. Later, the test engineer identifies a larger set of test predicates C 2 such that C 2 ⊃ C 1 and designs a new set of test cases T 2 such that T 2 ⊃ T 1 and COMPLETE(T 2, C 2) is satisfied. Let B 2 be the set of faults revealed by T 2. Assuming that the additional test cases selected reveal more faults, we have B 2 ⊃ B 1. If the test engineer repeats this process, he may ultimately identify a set of test predicates C I and design a set of test cases T I such that COMPLETE(T I , C I ) is satisfied and T I reveals the entire set of faults B . In this case, T I is a thorough test satisfying COMPLETE(T I , C I ) and represents an ideal test set. 2.2.3 Program Errors Any approach to testing is based on assumptions about the way program faults occur. Faults are due to two main reasons: • Faults occur due to our inadequate understanding of all conditions with which a program must deal. • Faults occur due to our failure to realize that certain combinations of conditions require special treatments. Goodenough and Gerhart classify program faults as follows: • Logic Fault: This class of faults means a program produces incorrect results independent of resources required. That is, the program fails because of the faults present in the program and not because of a lack of resources. Logic faults can be further split into three categories: Requirements fault: This means our failure to capture the real requirements of the customer. 2.2 THEORY OF GOODENOUGH AND GERHART 35 Design fault: This represents our failure to satisfy an understood requirement. Construction fault: This represents our failure to satisfy a design. Suppose that a design step says “Sort array A.” To sort the array with N elements, one may choose one of several sorting algorithms. Let for (i = 0; i < N; i++) { : } be the desired for loop construct to sort the array. If a programmer writes the for loop in the form for (i = 0; i <= N; i++){ : } then there is a construction error in the implementation. • Performance Fault: This class of faults leads to a failure of the program to produce expected results within specified or desired resource limitations. A thorough test must be able to detect faults arising from any of the above reasons. Test data selection criteria must reflect information derived from each stage of software development. Since each type of fault is manifested as an improper effect produced by an implementation, it is useful to categorize the sources of faults in implementation terms as follows: Missing Control Flow Paths: Intuitively, a control flow path, or simply a path, is a feasible sequence of instructions in a program. A path may be missing from a program if we fail to identify a condition and specify a path to handle that condition. An example of a missing path is our failure to test for a zero divisor before executing a division. If we fail to recognize that a divisor can take a zero value, then we will not include a piece of code to handle the special case. Thus, a certain desirable computation will be missing from the program. Inappropriate Path Selection: A program executes an inappropriate path if a condition is expressed incorrectly. In Figure 2.2, we show a desired behavior and an implemented behavior. Both the behaviors are identical except in the condition part of the if statement. The if part of the implemented behavior contains an additional condition B . It is easy to see that Desired behavior if (A) proc1(); else proc2(); Implemented behavior if (A && B) proc1(); else proc2(); Figure 2.2 Example of inappropriate path selection. 36 CHAPTER 2 THEORY OF PROGRAM TESTING both the desired part and the implemented part behave in the same way for all combinations of values of A and B except when A = 1 and B = 0. Inappropriate or Missing Action: There are three instances of this class of fault: • One may calculate a value using a method that does not necessarily give the correct result. For example, a desired expression is x = x × w, whereas it is wrongly written as x = x + w. These two expressions produce identical results for several combinations of x and w, such as x = 1.5 and w = 3, for example. • Failing to assign a value to a variable is an example of a missing action. • Calling a function with the wrong argument list is a kind of inappropriate action. The main danger due to an inappropriate or missing action is that the action is incorrect only under certain combinations of conditions. Therefore, one must do the following to find test data that reliably reveal errors: • Identify all the conditions relevant to the correct operation of a program. • Select test data to exercise all possible combinations of these conditions. The above idea of selecting test data leads us to define the following terms: Test Data: Test data are actual values from the input domain of a program that collectively satisfy some test selection criteria. Test Predicate: A test predicate is a description of conditions and combinations of conditions relevant to correct operation of the program: • Test predicates describe the aspects of a program that are to be tested. Test data cause these aspects to be tested. • Test predicates are the motivating force for test data selection. • Components of test predicates arise first and primarily from the specifications for a program. • Further conditions and predicates may be added as implementations are considered. 2.2.4 Conditions for Reliability A set of test predicates must at least satisfy the following conditions to have any chance of being reliable. These conditions are key to meaningful testing: • Every individual branching condition in a program must be represented by an equivalent condition in C . • Every potential termination condition in the program, for example, an overflow, must be represented by a condition in C . • Every condition relevant to the correct operation of the program that is implied by the specification and knowledge of the data structure of the program must be represented as a condition in C . 2.3 THEORY OF WEYUKER AND OSTRAND 37 2.2.5 Drawbacks of Theory Several difficulties prevent us from applying Goodenough and Gerhart’s theory of an ideal test as follows [18]: • The concepts of reliability and validity have been defined with respect to the entire input domain of a program. A criterion is guaranteed to be both reliable and valid if and only if it selects the entire domain as a single test. Since such exhaustive testing is impractical, one will have much difficulty in assessing the reliability and validity of a criterion. • The concepts of reliability and validity have been defined with respect to a program. A test selection criterion that is reliable and valid for one program may not be so for another program. The goodness of a test set should be independent of individual programs and the faults therein. • Neither validity nor reliability is preserved throughout the debugging process. In practice, as program failures are observed, the program is debugged to locate the faults, and the faults are generally fixed as soon as they are found. During this debugging phase, as the program changes, so does the idealness of a test set. This is because a fault that was revealed before debugging is no more revealed after debugging and fault fixing. Thus, properties of test selection criteria are not even “monotonic” in the sense of being either always gained or preserved or always lost or preserved. 2.3 THEORY OF WEYUKER AND OSTRAND A key problem in the theory of Goodenough and Gerhart is that the reliability and validity of a criterion depend upon the presence of faults in a program and their types. Weyuker and Ostrand [18] provide a modified theory in which the validity and reliability of test selection criteria are dependent only on the program specification, rather than a program. They propose the concept of a uniformly ideal test selection criterion for a given output specification. In the theory of Goodenough and Gerhart, implicit in the definitions of the predicates OK(d ) and SUCCESSFUL(T ) is a program P . By abbreviating SUCCESSFUL() as SUCC(), the two predicates are rewritten as follows: OK(P, d): Define a predicate OK(P, d ) which expresses the acceptability of result P (d ). Thus, OK(P , d ) = true if and only if P (d ) is an acceptable outcome of program P . SUCC(P, T): For a given T ⊆ D, T is a successful test for a program P , denoted by SUCC(P , T ), if and only if, ∀t ∈ T , OK(P , t). Thus, SUCC(T ) = true if and only if, ∀t ∈ T , OK(P , t). With the above definitions of OK(P, d ) and SUCC(P , T ), the concepts of uniformly valid criterion, uniformly reliable criterion, and uniformly ideal test selection are defined as follows. 38 CHAPTER 2 THEORY OF PROGRAM TESTING Uniformly Valid Criterion C : Criterion C is uniformly valid iff (∀P )[(∃d ∈ D)(¬OK(P , d)) ⇒ (∃T ⊆ D)(C(T ) & ¬SUCC(P , T ))] Uniformly Reliable Criterion C : Criterion C is uniformly reliable iff (∀P )(∀T1, ∀T2 ⊆ D)[(C(T1) & C(T2)) ⇒ (SUCC(P , T1) ⇔ SUCC(P , T2))] Uniformly Ideal Test Selection: A uniformly ideal test selection criterion for a given specification is both uniformly valid and uniformly reliable. The external quantifier (∀P) binding the free variable P in the definition of uniformly valid criterion C essentially means that the rest of the predicate holds for all programs P for a given output specification. Similarly, the external quantifier (∀P) binding the free variable P in the definition of uniformly reliable criterion C means that the rest of the predicate holds for all programs P for a given output specification. Since a uniformly ideal test selection criterion is defined over all programs for a given specification, it was intended to solve all the program-dependent difficulties in the definitions given by Goodenough and Gerhart. However, the concept of uniformly ideal test selection also has several flaws. For example, for any significant program there can be no uniformly ideal criterion that is not trivial in the sense of selecting the entire input domain D. A criterion C is said to be trivially valid if the union of all tests selected by C is D. Hence, the following theorems. Theorem. A criterion C is uniformly valid if and only if C is trivially valid. Proof. Obviously a trivially valid criterion is valid. Now we need to show that a criterion C which is not trivially valid cannot be uniformly valid for a given output specification. For any element d not included in any test of C , one can write a program which is incorrect for d and correct for D − {d }. Theorem. A criterion C is uniformly reliable if and only if C selects a single test set. Proof. If C selects only one test, it is obviously reliable for any program. Now, assume that C selects different tests T 1 and T 2 and that t ∈ T 1 but t ∈/ T 2. A program P exists which is correct with respect to test inputs in T 2 but incorrect on t. Thus, the two tests yield different results for P , and C is not reliable. Now, we can combine the above two theorems to have the following corollary. Corollary. A criterion C is uniformly valid and uniformly reliable if and only if C selects only the single test set T = D. 2.4 THEORY OF GOURLAY 39 An important implication of the above corollary is that uniform validity and uniform reliability lead to exhaustive testing —and exhaustive testing is considered to be impractical. Next, the above corollary is reformulated to state that irrespective of test selection criterion used and irrespective of tests selected, except the entire D, one can always write a program which can defeat the tests. A program P is said to defeat a test T if P passes T but fails on some other valid input. This is paraphrasing the well-known statement of Dijkstra that testing can only reveal the presence of errors, never their absence [16]. Reliability and validity of test selection criterion are ideal goals, and ideal goals are rarely achieved. It is useful to seek less ideal but usable goals. By settling for less ideal goals, we essentially accept the reality that correctness of large programs is not something that we strive to achieve. Weyuker and Ostrand [18] have introduced the concept of a revealing criterion with respect to a subdomain, where a subdomain S is a subset of the input domain D. A test selection criterion C is revealing for a subdomain S if whenever S contains an input which is processed incorrectly then every test set which satisfies C is unsuccessful. In other words, if any test selected by C is successfully executed, then every test in S produces correct output. A predicate called REVEALING(C , S ) captures the above idea in the following definition: REVEALING(C, S) iff (∃d ∈ S)(¬OK(d)) ⇒ (∀T ⊆ S)(C(T ) ⇒ ¬SUCC(T )) The key advantage in a revealing criterion is that it concerns only a subset of the input domain, rather than the entire input domain. By considering a subset of the input domain, programmers can concentrate on local errors. An important task in applying the idea of a revealing criterion is to partition the input domain into smaller subdomains, which is akin to partitioning a problem into a set of subproblems. However, partitioning a problem into subproblems has been recognized to be a difficult task. 2.4 THEORY OF GOURLAY An ideal goal in software development is to find out whether or not a program is correct, where a correct program is void of faults. Much research results have been reported in the field of program correctness. However, due to the highly constrained nature of program verification techniques, no developer makes any effort to prove the correctness of even small programs of, say, a few thousand lines, let alone large programs with millions of lines of code. Instead, testing is accepted in the industry as a practical way of finding faults in programs. The flip side of testing is that it cannot be used to settle the question of program correctness, which is the ideal goal. Even though testing cannot settle the program correctness issue, there is a need for a testing theory to enable us to compare the power of different test methods. To motivate a theoretical discussion of testing, we begin with an ideal process for software development, which consists of the following steps: 40 CHAPTER 2 THEORY OF PROGRAM TESTING • A customer and a development team specify the needs. • The development team takes the specification and attempts to write a pro- gram to meet the specification. • A test engineer takes both the specification and the program and selects a set of test cases. The test cases are based on the specification and the program. • The program is executed with the selected test data, and the test outcome is compared with the expected outcome. • The program is said to have faults if some tests fail. • One can say the program to be ready for use if it passes all the test cases. We focus on the selection of test cases and the interpretation of their results. We assume that the specification is correct, and the specification is the sole arbiter of the correctness of the program. The program is said to be correct if and only if it satisfies the specification. Gourlay’s testing theory [19] establishes a relationship between three sets of entities, namely, specifications, programs, and tests, and provides a basis for comparing different methods for selecting tests. 2.4.1 Few Definitions The set of all programs are denoted by P , the set of all specifications by S, and the set of all tests by T . Members of P will be denoted by p and q, members of S will be denoted by r and s, and members of T will be denoted by t and u. Uppercase letters will denote subsets of P , S, and T . For examples, p ∈ P ⊆ P and t ∈ T ⊆ T , where t denotes a single test case. The correctness of a program p with respect to a specification s will be denoted by p corr s. Given s, p, and t, the predicate p ok(t) s means that the result of testing p under t is judged successful by specification s. The reader may recall that T denotes a set of test cases, and p ok(T ) s is true if and only if p ok(t) s ∀t ∈ T . We must realize that if a program is correct, then it will never produce any unexpected outcome with respect to the specification. Thus, p corr s ⇒ p ok(t) s ∀t . Definition. A testing system is a collection < P, S, T , corr, ok >, where P , S, and T are arbitrary sets, corr ⊆ P × S, sets, ok ⊆ T × P × S, and ∀p∀s∀t (p corr s ⇒ p ok(t)s). Definition. Given a testing system < P, S, T , corr, ok > a new system < P, S, T , corr, ok > is called a set construction, where T is the set of all subsets of T , and where p ok (T )s ⇔ ∀t (t ∈ T ⇒ p ok(t)s). (The reader may recall that T is a member of T because T ⊆ T .) Theorem. < P, S, T , corr, ok >, a set construction on a testing system < P, S, T , corr, ok >, is itself a testing system. 2.4 THEORY OF GOURLAY 41 Proof. We need to show that p corr s ⇒ p ok (T ) s. Assume that p corr s holds. By assumption, the original system is a testing system. Thus, ∀t, p ok(t) s. If we choose a test set T , we know that, ∀t ∈ T , p ok(t) s. Therefore, p ok (T ) s holds. The set construction is interpreted as follows. A test consists of a number of trials of some sort, and success of the test as a whole depends on success of all the trials. In fact, this is the rule in testing practice, where a test engineer must run a program again and again on a variety of test data. Failure of any one run is enough to invalidate the program. Definition. Given a testing system < P , S, T , corr, ok > a new system < P , S, T , corr, ok > is called a choice construction, where T is the set of subsets of T , and where p ok (T ) s ⇔ ∃t(t ∈ T ∧p ok(t) s). (The reader may recall that T is a member of T because T ⊆ T .) Theorem. < P , S, T , corr, ok > , a choice construction on a testing system < P , S, T , corr, ok > , is itself a testing system. Proof. Similar to the previous theorem, we need to show that p corr s ⇒ p ok (T ) s. Assume that p corr s. Thus, ∀t, p ok(t) s. If we pick a nonempty test set T , we know that ∃t ∈ T such that p ok(t) s. Thus, we can write ∀T (T = φ ⇒ ∃t(t ∈ T ∧p ok(t) s)), and ∀T (T = φ ⇒ p ok (T ) s). The empty test set φ must be excluded from (T ’) because a testing system must include at least one test. The choice construction models the situation in which a test engineer is given a number of alternative ways of testing the program, all of which are assumed to be equivalent. Definition. A test method is a function M :P × S → T . That is, in the general case, a test method takes the specification S and an implementation program P and produces test cases. In practice, test methods are predominantly program dependent, specification dependent, or totally dependent on the expectations of customers, as explained below: • Program Dependent: In this case, T = M (P ), that is, test cases are derived solely based on the source code of a system. This is called white-box testing. Here, a test method has complete knowledge of the internal details of a program. However, from the viewpoint of practical testing, a white-box method is not generally applied to an entire program. One applies such a method to small units of a given large system. A unit refers to a function, procedure, method, and so on. A white-box method allows a test engineer to use the details of a program unit. Effective use of a program unit requires a thorough understanding of the unit. Therefore, white-box test methods are used by programmers to test their own code. 42 CHAPTER 2 THEORY OF PROGRAM TESTING • Specification Dependent: In this case, T = M (S ), that is, test cases are derived solely based on the specification of a system. This is called black-box testing. Here, a test method does not have access to the internal details of a program. Such a method uses information provided in the specification of a system. It is not unusual to use an entire specification in the generation of test cases because specifications are much smaller in size than their corresponding implementations. Black-box methods are generally used by the development team and an independent system test group. • Expectation Dependent: In practice, customers may generate test cases based on their expectations from the product at the time of taking delivery of the system. These test cases may include continuous-operation tests, usability tests, and so on. 2.4.2 Power of Test Methods A tester is concerned with the methods to produce test cases and to compare test methods so that they can identify an appropriate test method. Let M and N be two test methods. For M to be at least as good as N , we must have the situation that whenever N finds an error, so does M . In other words, whenever a program fails under a test case produced by method N , it will also fail under a test case produced by method M , with respect to the same specification. Therefore, F N ⊆ F M , where F N and F M are the sets of faults discovered by test sets produced by methods N and M , respectively. Let T M and T N be the set of test cases produced by methods M and N , respectively. Then, we need to follow two ways to compare their fault detection power. Case 1 : T M ⊇ T N . In this case, it is clear that method M is at least as good as method N . This is because method M produces test cases which reveal all the faults revealed by test cases produced by method N . This case is depicted in Figure 2.3a. Case 2 : T M and T N overlap, but T M T N . This case suggests that T M does not totally contain T N . To be able to compare their fault detection ability, we execute the program P under both sets of test cases, namely T M and T N . Let F M and F N be the sets of faults detected by test sets T M and T N , respectively. If F M ⊇ F N , then we say that method M is at least as good as method N . This situation is explained in Figure 2.3b. 2.5 ADEQUACY OF TESTING Testing gives designers and programmers much confidence in a software component or a complete product if it passes their test cases. Assume that a set of test cases 2.5 ADEQUACY OF TESTING 43 S N TM TN P M (a) S N TN Execute P FN P M TM FM (b) Figure 2.3 Different ways of comparing power of test methods: (a) produces all test cases produced by another method; (b) test sets have common elements. T has been designed to test a program P. We execute P with the test set T . If T reveals faults in P , then we modify the program in an attempt to fix those faults. At this stage, there may be a need to design some new test cases, because, for example, we may include a new procedure in the code. After modifying the code, we execute the program with the new test set. Thus, we execute the test-and-fix loop until no more faults are revealed by the updated test set. Now we face a dilemma as follows: Is P really fault free, or is T not good enough to reveal the remaining faults in P ? From testing we cannot conclude that P is fault free, since, as Dijkstra observed, testing can reveal the presence of faults, but not their absence. Therefore, if P passes T , we need to know that T is “good enough” or, in other words, that T is an adequate set of tests. It is important to evaluate the adequacy of T because if T is found to be not adequate, then more test cases need to be designed, as illustrated in Figure 2.4. Adequacy of T means whether or not T thoroughly tests P. Ideally, testing should be performed with an adequate test set T . Intuitively, the idea behind specifying a criterion for evaluating test adequacy is to know whether or not sufficient testing has been done. We will soon return to the idea of test adequacy. In the absence of test adequacy, developers will be forced to use ad hoc measures to decide when to stop testing. Some examples of ad hoc measures for stopping testing are as follows [13]: • Stop when the allocated time for testing expires. • Stop when it is time to release the product. • Stop when all the test cases execute without revealing faults. Figure 2.4 depicts two important notions concerning test design and evaluating test adequacy as follows: 44 CHAPTER 2 THEORY OF PROGRAM TESTING Design a set of test cases T to test a program P. Execute P with T. Does T reveal faults in P? No Yes Fix the faults in P. If there is a need, augment T with new test cases. Is T an adequate test set? No Augment T with new test cases. Yes Stop Figure 2.4 Context of applying test adequacy. • Adequacy of a test set T is evaluated after it is found that T reveals no more faults. One may argue: Why not design test cases to meet an adequacy criterion? However, it is important to design test cases independent of an adequacy criterion because the primary goal of testing is to locate errors, and, thus, test design should not be constrained by an adequacy criterion. An example of a test design criteria is as follows: Select test cases to execute all statements in a program at least once. However, the difficulty with such a test design criterion is that we may not be able to know whether every program statement can be executed. Thus, it is difficult to judge the adequacy of the test set selected thereby. Finally, since the goal of testing is to reveal faults, there is no point in evaluating the adequacy of the test set as long as faults are being revealed. • An adequate test set T does not say anything about the correctness of a program. A common understanding of correctness is that we have found and fixed all faults in a program to make it “correct.” However, in practice, it is not realistic—though very much desirable—to find and fix all faults in a program. Thus, on the one hand, an adequacy criterion may not try 2.6 LIMITATIONS OF TESTING 45 to aim for program correctness. On the other hand, a fault-free program should not turn any arbitrary test set T into an adequate test. The above two points tell us an important notion: that the adequacy of a test set be evaluated independent of test design processes for the programs under test. Intuitively, a test set T is said to be adequate if it covers all aspects of the actual computation performed by a program and all computations intended by its specification. Two practical methods for evaluating test adequacy are as follows: • Fault Seeding: This method refers to implanting a certain number of faults in a program P and executing P with test set T . If T reveals k percent of the implanted faults, we assume that T has revealed only k percent of the original faults. If 100% of the implanted faults have been revealed by T , we feel more confident about the adequacy of T . A thorough discussion of fault seeding can be found in Chapter 13. • Program Mutation: Given a program P , a mutation is a program obtained by making a small change to P. In the program mutation method, a series of mutations are obtained from P . Some of the mutations may contain faults and the rest are equivalent to P. A test set T is said to be adequate if it causes every faulty mutation to produce an unexpected outcome. A more thorough discussion of program mutation can be found in Chapter 3. 2.6 LIMITATIONS OF TESTING Ideally, all programs should be correct, that is, there is no fault in a program. Due to the impractical nature of proving even small programs to be correct, customers and software developers rely on the efficacy of testing. In this section, we introduce two main limitations of testing: • Testing means executing a program with a generally small, proper subset of the input domain of the program. A small, proper subset of the input domain is chosen because cost may not allow a much larger subset to be chosen, let alone the full input set. Testing with the full input set is known as exhaustive testing. Thus, the inherent need to test a program with a small subset of the input domain poses a fundamental limit on the efficacy of testing. The limit is in the form of our inability to extrapolate the correctness of results for a proper subset of the input domain to program correctness. In other words, even if a program passes a test set T , we cannot conclude that the program is correct. • Once we have selected a subset of the input domain, we are faced with the problem of verifying the correctness of the program outputs for individual test input. That is, a program output is examined to determine if the program performed correctly on the test input. The mechanism which verifies the correctness of a program output is known as an oracle. The concept of an oracle is discussed in detail in Chapter 9. Determining the correctness 46 CHAPTER 2 THEORY OF PROGRAM TESTING of a program output is not a trivial task. If either of the following two conditions hold, a program is considered nontestable [20]: There does not exist an oracle. It is too difficult to determine the correct output. If there is no mechanism to verify the correctness of a program output or it takes an extraordinary amount of time to verify an output, there is not much to be gained by running the test. 2.7 SUMMARY The ideal, abstract goal of testing is to reveal all faults in a software system without exhaustively testing the software. This idea is the basis of the concept of an ideal test developed by Goodenough and Gerhart [17]. An ideal test is supposed to be a small, proper subset of the entire input domain, and we should be able to extrapolate the results of an ideal test to program correctness. In other words, in an abstract sense, if a program passes all the tests in a carefully chosen test set, called an ideal test, we are in a position to claim that the program is correct. Coupled with the concept of an ideal test is a test selection criterion which allows us to pick members of an ideal test. A test selection criterion is characterized in terms of reliability and validity. A reliable criterion is one which selects test cases such that a program either passes all tests or fails all tests. On the other hand, a valid criterion is one which selects at least one test set which fails in case the program contains a fault. If a criterion is both valid and reliable, then any test selected by the criterion is an ideal test. The theory has a few drawback. First, the concepts of reliability and validity have been defined with respect to one program and its entire input domain. Second, neither reliability nor validity is preserved throughout the debugging phase of software development. Faults occur due to our inadequate understanding of all conditions that a program must deal with and our failure to realize that certain combinations of conditions require special treatments. Goodenough and Gerhart categorize faults into five categories: logic faults, requirement faults, design faults, construction faults, and performance faults. Weyuker and Ostrand [18] tried to eliminate the drawbacks of the theory of Goodenough and Gerhart by proposing the concept of a uniformly ideal test. The concept is defined with respect to all programs designed to satisfy a specification, rather than just one program—hence the concept of “uniformity” over all program instances for a given specification. Further, the idea of uniformity was extended to test selection criteria in the form of a uniformly reliable and uniformly valid criterion. However, their theory too is impractical because a uniformly valid and uniformly reliable criterion selects the entire input domain of a program, thereby causing exhaustive testing. Next, the idea of an ideal test was extended to a proper subset of the input domain called a subdomain, and the concept of a revealing criterion was defined. LITERATURE REVIEW 47 Though testing cannot settle the question of program correctness, different testing methods continue to be developed. For example, there are specificationbased testing methods and code-based testing methods. It is important to develop a theory to compare the power of different testing methods. Gourlay [19] put forward a theory to compare the power of testing methods based on their fault detection abilities. A software system undergoes multiple test–fix–retest cycles until, ideally, no more faults are revealed. Faults are fixed by modifying the code or adding new code to the system. At this stage there may be a need to design new test cases. When no more faults are revealed, we can conclude this way: either there is no fault in the program or the tests could not reveal the faults. Since we have no way to know the exact situation, it is useful to evaluate the adequacy of the test set. There is no need to evaluate the adequacy of tests so long as they reveal faults. Two practical ways of evaluating test adequacy are fault seeding and program mutation. Finally, we discussed two limitations of testing. The first limitation of testing is that it cannot settle the question of program correctness. In other words, by testing a program with a proper subset of the input domain and observing no fault, we cannot conclude that there are no remaining faults in the program. The second limitation of testing is that in several instances we do not know the expected output of a program. If for some inputs the expected output of a program is not known or it cannot be determined within a reasonable amount of time, then the program is called nontestable [20]. LITERATURE REVIEW Weyuker and Ostrand [18] have shown by examples how to construct revealing subdomains from source code. Their main example is the well-known triangle classification problem. The triangle classification problem is as follows. Let us consider three positive integers A, B , and C . The problem is to find whether the given integers represent the sides of an equilateral triangle, the sides of a scalene right triangle, and so on. Weyuker [13] has introduced the notion of program inference to capture the notion of test data adequacy. Essentially, program inference refers to deriving a program from its specification and a sample of its input–output behavior. On the other hand, the testing process begins with a specification S and a program P and selects input–output pairs that characterize every aspect of the actual computations performed by the program and the intended computations performed by the specification. Thus, program testing and program inference are thought of as inverse processes. A test set T is said to be adequate if T contains sufficient data to infer the computations defined by both S and P. However, Weyuker [13] explains that such an adequacy criterion is not pragmatically usable. Rather, the criterion can at best be used as a guide. By considering the difficulty in using the criterion, Weyuker defines two weaker adequacy criterion, namely program adequate and specification adequate. A test set T is said to be program adequate if it contains sufficient data to infer the computations defined by P . Similarly, the test set T is 48 CHAPTER 2 THEORY OF PROGRAM TESTING said to be specification adequate if it contains sufficient data to infer the computations defined by S . It is suggested that depending upon how test data are selected, one of the two criteria can be eased out. For example, if T is derived from S , then it is useful to evaluate if T is program adequate. Since T is selected from S , T is expected to contain sufficient data to infer the computations defined by S , and there is no need to evaluate T ’s specification adequacy. Similarly, if T is derived from P, it is useful to evaluate if T is specification adequate. The students are encouraged to read the article by Stuart H. Zweben and John S. Gourlay entitled “On the Adequacy of Weyuker’s Test Data Adequacy Axioms” [15] The authors raise the issue of what makes an axiomatic system as well as what constitutes a proper axiom. Weyuker responds to the criticism at the end of the article. Those students have never seen such a professional interchange; this is worth reading for this aspect alone. This article must be read along with the article by Elaine Weyuker entitled “Axiomatizing Software Test Data Adequacy” [12]. Martin David and Elaine Weyuker [9] present an interesting notion of distance between programs to study the concept of test data adequacy. Specifically, they equate adequacy with the capability of a test set to be able to successfully distinguish a program being tested from all programs that are sufficiently close to it and differ in input–output behavior from the given program. Weyuker [12, 21] proposed a set of properties to evaluate test data adequacy criteria. Some examples of adequacy criteria are to (i) ensure coverage of all branches in the program being tested and (ii) ensure that boundary values of all input data have been selected for the program under test. Parrish and Zweben [11] formalized those properties and identified dependencies within the set. They formalized the adequacy properties with respect to criteria that do not make use of the specification of the program under test. Frankl and Weyuker [10] compared the relative fault-detecting ability of a number of structural testing techniques, namely, data flow testing, mutation testing, and a condition coverage technique, to branch testing. They showed that the former three techniques are better than branch testing according to two probabilistic measures. A good survey on test adequacy is presented in an article by Hong Zhu, Patrick A. V. Hall, and John H. R. May entitled “Software Unit Test Coverage and Adequacy” [14]. In this article, various types of software test adequacy criteria proposed in the literature are surveyed followed by a summary of methods for comparison and assessment of adequacy criteria. REFERENCES 1. R. Gupta, M. J. Harrold, and M. L. Soffa. An Approach to Regression Testing Using Slicing. Paper presented at the IEEE-CS International Conference on Software Maintenance, Orlando, FL, November 1992, pp. 299–308. 2. G. Rothermel and M. Harrold. Analyzing Regression Test Selection Techniques. IEEE Transactions on Software Engineering, August 1996, pp. 529–551. REFERENCES 49 3. V. R. Basili and R. W. Selby. Comparing the Effectiveness of Software Testing. IEEE Transactions on Software Engineering, December 1987, pp. 1278– 1296. 4. W. E. Howden. Weak Mutation Testing and Completeness of Test Sets. IEEE Transactions on Software Engineering, July 1982, pp. 371– 379. 5. D. S. Rosenblum and E. J. Weyuker. Using Coverage Information to Predict the Cost-effectiveness of Regression Testing Strategies. IEEE Transactions on Software Engineering, March 1997, pp. 146– 156. 6. L. Baresi and M. Young. Test Oracles, Technical Report CIS-TR-01–02. University of Oregon, Department of Computer and Information Science, Eugene, OR, August 2002, pp. 1–55. 7. Q. Xie and A. M. Memon. Designing and Comparing Automated Test Oracles for Gui Based Software Applications. ACM Transactions on Software Engineering amd Methodology, February 2007, pp. 1–36. 8. G. Rothermel, R. Untch, C. Chu, and M. Harrold. Prioritizing Test Cases for Regression Testing. IEEE Transactions on Software Engineering, October 2001, pp. 929– 948. 9. M. Davis and E. J. Weyuker. Metric Space-Based Test-Data Adequacy Criteria. Computer Journal , January 1988, pp. 17–24. 10. P. G. Frankl and E. J. Weyuker. Provable Improvements on Branch Testing. IEEE Transactions on Software Engineering, October 1993, pp. 962–975. 11. A. Parrish and S. H. Zweben. Analysis and Refinement of Software Test Data Adequacy Properties. IEEE Transactions on Software Engineering, June 1991, pp. 565–581. 12. E. J. Weyuker. Axiomatizing Software Test Data Adequacy. IEEE Transactions on Software Engineering, December 1986, pp. 1128– 1138. 13. E. J. Weyuker. Assessing Test Data Adequacy through Program Inference. ACM Transactions on Programming Languages and Systems, October 1983, pp. 641–655. 14. H. Zhu, P. A. V. Hall, and J. H. R. May. Software Unit Test Coverage and Adequacy. ACM Computing Surveys, December 1997, pp. 366– 427. 15. S. H. Zweben and J. S. Gourlay. On the Adequacy of Weyuker’s Test Data Adequacy Axioms. IEEE Transactions on Software Engineering, April 1989, pp. 496–500. 16. E. W. Dijkstra. Notes on Structured Programming. In Structured Programming, O.-J. Dahl, E. W. Dijkstra, and C. A. R. Hoare, Eds. Academic, New York, 1972, pp. 1–81. 17. J. B. Goodenough and S. L. Gerhart. Toward a Theory of Test Data Selection. IEEE Transactions on Software Engineering, June 1975, pp. 26–37. 18. E. J. Weyuker and T. J. Ostrand. Theories of Program Testing and the Application of Revealing Subdomains. IEEE Transactions on Software Engineering, May 1980, pp. 236– 246. 19. J. S. Gourlay. A Mathematical Framework for the Investigation of Testing. IEEE Transactions on Software Engineering, November 1983, pp. 686–709. 20. E. J. Weyuker. On Testing Non-Testable Programs. Computer Journal , Vol. 25, No. 4, 1982, pp. 465– 470. 21. E. J. Weyuker. The Evaluation of Program-Based Software Test Data Adequacy Criteria. Communications of the ACM , June 1988, pp. 668– 675. Exercises 1. Explain the concept of an ideal test. 2. Explain the concept of a selection criterion in test design. 3. Explain the concepts of a valid and reliable criterion. 4. Explain five kinds of program faults. 5. What are the drawbacks of Goodenough and Gerhart’s theory of program testing? 50 CHAPTER 2 THEORY OF PROGRAM TESTING 6. Explain the concepts of a uniformly ideal test as well as the concepts of uniformly valid and uniformly reliable criteria. 7. Explain how two test methods can be compared. 8. Explain the need for evaluating test adequacy. 9. Explain two practical methods for assessing test data adequacy. 10. Explain the concept of a nontestable program. 3 CHAPTER Unit Testing Knowledge is of no value unless you put it into practice. — Anton Chekhov 3.1 CONCEPT OF UNIT TESTING In this chapter we consider the first level of testing, that is, unit testing. Unit testing refers to testing program units in isolation. However, there is no consensus on the definition of a unit. Some examples of commonly understood units are functions, procedures, or methods. Even a class in an object-oriented programming language can be considered as a program unit. Syntactically, a program unit is a piece of code, such as a function or method of class, that is invoked from outside the unit and that can invoke other program units. Moreover, a program unit is assumed to implement a well-defined function providing a certain level of abstraction to the implementation of higher level functions. The function performed by a program unit may not have a direct association with a system-level function. Thus, a program unit may be viewed as a piece of code implementing a “low”-level function. In this chapter, we use the terms unit and module interchangeably. Now, given that a program unit implements a function, it is only natural to test the unit before it is integrated with other units. Thus, a program unit is tested in isolation, that is, in a stand-alone manner. There are two reasons for testing a unit in a stand-alone manner. First, errors found during testing can be attributed to a specific unit so that it can be easily fixed. Moreover, unit testing removes dependencies on other program units. Second, during unit testing it is desirable to verify that each distinct execution of a program unit produces the expected result. In terms of code details, a distinct execution refers to a distinct path in the unit. Ideally, all possible—or as much as possible—distinct executions are to be considered during unit testing. This requires careful selection of input data for each distinct execution. A programmer has direct access to the input vector of the unit by executing a program unit in isolation. This direct access makes it easier to execute as many distinct paths as desirable or possible. If multiple units are put together for Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 51 52 CHAPTER 3 UNIT TESTING testing, then a programmer needs to generate test input with indirect relationship with the input vectors of several units under test. The said indirect relationship makes it difficult to control the execution of distinct paths in a chosen unit. Unit testing has a limited scope. A programmer will need to verify whether or not a code works correctly by performing unit-level testing. Intuitively, a programmer needs to test a unit as follows: • Execute every line of code. This is desirable because the programmer needs to know what happens when a line of code is executed. In the absence of such basic observations, surprises at a later stage can be expensive. • Execute every predicate in the unit to evaluate them to true and false separately. • Observe that the unit performs its intended function and ensure that it contains no known errors. In spite of the above tests, there is no guarantee that a satisfactorily tested unit is functionally correct from a systemwide perspective. Not everything pertinent to a unit can be tested in isolation because of the limitations of testing in isolation. This means that some errors in a program unit can only be found later, when the unit is integrated with other units in the integration testing and system testing phases. Even though it is not possible to find all errors in a program unit in isolation, it is still necessary to ensure that a unit performs satisfactorily before it is used by other program units. It serves no purpose to integrate an erroneous unit with other units for the following reasons: (i) many of the subsequent tests will be a waste of resources and (ii) finding the root causes of failures in an integrated system is more resource consuming. Unit testing is performed by the programmer who writes the program unit because the programmer is intimately familiar with the internal details of the unit. The objective for the programmer is to be satisfied that the unit works as expected. Since a programmer is supposed to construct a unit with no errors in it, a unit test is performed by him or her to their satisfaction in the beginning and to the satisfaction of other programmers when the unit is integrated with other units. This means that all programmers are accountable for the quality of their own work, which may include both new code and modifications to the existing code. The idea here is to push the quality concept down to the lowest level of the organization and empower each programmer to be responsible for his or her own quality. Therefore, it is in the best interest of the programmer to take preventive actions to minimize the number of defects in the code. The defects found during unit testing are internal to the software development group and are not reported up the personnel hierarchy to be counted in quality measurement metrics. The source code of a unit is not used for interfacing by other group members until the programmer completes unit testing and checks in the unit to the version control system. Unit testing is conducted in two complementary phases: • Static unit testing • Dynamic unit testing 3.2 STATIC UNIT TESTING 53 In static unit testing, a programmer does not execute the unit; instead, the code is examined over all possible behaviors that might arise during run time. Static unit testing is also known as non-execution-based unit testing, whereas dynamic unit testing is execution based. In static unit testing, the code of each unit is validated against requirements of the unit by reviewing the code. During the review process, potential issues are identified and resolved. For example, in the C programming language the two program-halting instructions are abort() and exit(). While the two are closely related, they have different effects as explained below: • Abort(): This means abnormal program termination. By default, a call to abort() results in a run time diagnostic and program self-destruction. The program destruction may or may not flush and close opened files or remove temporary files, depending on the implementation. • Exit(): This means graceful program termination. That is, the exit() call closes the opened files and returns a status code to the execution environment. Whether to use abort() or exit() depends on the context that can be easily detected and resolved during static unit testing. More issues caught earlier lead to fewer errors being identified in the dynamic test phase and result in fewer defects in shipped products. Moreover, performing static tests is less expensive than performing dynamic tests. Code review is one component of the defect minimization process and can help detect problems that are common to software development. After a round of code review, dynamic unit testing is conducted. In dynamic unit testing, a program unit is actually executed and its outcomes are observed. Dynamic unit testing means testing the code by actually running it. It may be noted that static unit testing is not an alternative to dynamic unit testing. A programmer performs both kinds of tests. In practice, partial dynamic unit testing is performed concurrently with static unit testing. If the entire dynamic unit testing has been performed and a static unit testing identifies significant problems, the dynamic unit testing must be repeated. As a result of this repetition, the development schedule may be affected. To minimize the probability of such an event, it is required that static unit testing be performed prior to the final dynamic unit testing. 3.2 STATIC UNIT TESTING Static unit testing is conducted as a part of a larger philosophical belief that a software product should undergo a phase of inspection and correction at each milestone in its life cycle. At a certain milestone, the product need not be in its final form. For example, completion of coding is a milestone, even though coding of all the units may not make the desired product. After coding, the next milestone is testing all or a substantial number of units forming the major components of the product. Thus, before units are individually tested by actually executing them, those are subject to usual review and correction as it is commonly understood. The idea behind review is to find the defects as close to their points of origin as possible so 54 CHAPTER 3 UNIT TESTING that those defects are eliminated with less effort, and the interim product contains fewer defects before the next task is undertaken. In static unit testing, code is reviewed by applying techniques commonly known as inspection and walkthrough. The original definition of inspection was coined by Michael Fagan [1] and that of walkthrough by Edward Yourdon [2]: • Inspection: It is a step-by-step peer group review of a work product, with each step checked against predetermined criteria. • Walkthrough: It is a review where the author leads the team through a manual or simulated execution of the product using predefined scenarios. Regardless of whether a review is called an inspection or a walkthrough, it is a systematic approach to examining source code in detail. The goal of such an exercise is to assess the quality of the software in question, not the quality of the process used to develop the product [3]. Reviews of this type are characterized by significant preparation by groups of designers and programmers with varying degree of interest in the software development project. Code examination can be time consuming. Moreover, no examination process is perfect. Examiners may take shortcuts, may not have adequate understanding of the product, and may accept a product which should not be accepted. Nonetheless, a well-designed code review process can find faults that may be missed by execution-based testing. The key to the success of code review is to divide and conquer, that is, having an examiner inspect small parts of the unit in isolation, while making sure of the following: (i) nothing is overlooked and (ii) the correctness of all examined parts of the module implies the correctness of the whole module. The decomposition of the review into discrete steps must assure that each step is simple enough that it can be carried out without detailed knowledge of the others. The objective of code review is to review the code, not to evaluate the author of the code. A clash may occur between the author of the code and the reviewers, and this may make the meetings unproductive. Therefore, code review must be planned and managed in a professional manner. There is a need for mutual respect, openness, trust, and sharing of expertise in the group. The general guidelines for performing code review consists of six steps as outlined in Figure 3.1: readiness, preparation, examination, rework, validation, and exit. The input to the readiness step is the criteria that must be satisfied before the start of the code review process, and the process produces two types of documents, a change request (CR) and a report. These steps and documents are explained in the following. Step 1: Readiness The author of the unit ensures that the unit under test is ready for review. A unit is said to be ready if it satisfies the following criteria: • Completeness: All the code relating to the unit to be reviewed must be available. This is because the reviewers are going to read the code and try to understand it. It is unproductive to review partially written code or code that is going to be significantly modified by the programmer. Criteria Readiness 3.2 STATIC UNIT TESTING 55 Preparation Examination Rework Change requests Validation Exit Figure 3.1 Steps in the code review process. Report • Minimal Functionality: The code must compile and link. Moreover, the code must have been tested to some extent to make sure that it performs its basic functionalities. • Readability: Since code review involves actual reading of code by other programmers, it is essential that the code is highly readable. Some code characteristics that enhance readability are proper formatting, using meaningful identifier names, straightforward use of programming language constructs, and an appropriate level of abstraction using function calls. In the absence of readability, the reviewers are likely to be discouraged from performing the task effectively. • Complexity: There is no need to schedule a group meeting to review straightforward code which can be easily reviewed by the programmer. The code to be reviewed must be of sufficient complexity to warrant group review. Here, complexity is a composite term referring to the number of conditional statements in the code, the number of input data elements of the unit, the number of output data elements produced by the unit, real-time processing of the code, and the number of other units with which the code communicates. • Requirements and Design Documents: The latest approved version of the low-level design specification or other appropriate descriptions 56 CHAPTER 3 UNIT TESTING TABLE 3.1 Hierarchy of System Documents Requirement: High-level marketing or product proposal. Functional specification: Software engineering response to the marketing proposal. High-level design: Overall system architecture. Low-level design: Detailed specification of the modules within the architecture. Programming: Coding of the modules. of program requirements (see Table 3.1) should be available. These documents help the reviewers in verifying whether or not the code under review implements the expected functionalities. If the low-level design document is available, it helps the reviewers in assessing whether or not the code appropriately implements the design. All the people involved in the review process are informed of the group review meeting schedule two or three days before the meeting. They are also given a copy of the work package for their perusal. Reviews are conducted in bursts of 1–2 hours. Longer meetings are less and less productive because of the limited attention span of human beings. The rate of code review is restricted to about 125 lines of code (in a high-level language) per hour. Reviewing complex code at a higher rate will result in just glossing over the code, thereby defeating the fundamental purpose of code review. The composition of the review group involves a number of people with different roles. These roles are explained as follows: • Moderator: A review meeting is chaired by the moderator. The mod- erator is a trained individual who guides the pace of the review process. The moderator selects the reviewers and schedules the review meetings. Myers suggests that the moderator be a member of a group from an unrelated project to preserve objectivity [4]. • Author: This is the person who has written the code to be reviewed. • Presenter: A presenter is someone other than the author of the code. The presenter reads the code beforehand to understand it. It is the presenter who presents the author’s code in the review meeting for the following reasons: (i) an additional software developer will understand the work within the software organization; (ii) if the original programmer leaves the company with a short notice, at least one other programmer in the company knows what is being done; and (iii) the original programmer will have a good feeling about his or her work, if someone else appreciates their work. Usually, the presenter appreciates the author’s work. • Recordkeeper: The recordkeeper documents the problems found during the review process and the follow-up actions suggested. The person should be different than the author and the moderator. • Reviewers: These are experts in the subject area of the code under review. The group size depends on the content of the material under 3.2 STATIC UNIT TESTING 57 Step 2: Step 3: review. As a rule of thumb, the group size is between 3 and 7. Usually this group does not have manager to whom the author reports. This is because it is the author’s ongoing work that is under review, and neither a completed work nor the author himself is being reviewed. • Observers: These are people who want to learn about the code under review. These people do not participate in the review process but are simply passive observers. Preparation Before the meeting, each reviewer carefully reviews the work package. It is expected that the reviewers read the code and understand its organization and operation before the review meeting. Each reviewer develops the following: • List of Questions: A reviewer prepares a list of questions to be asked, if needed, of the author to clarify issues arising from his or her reading. A general guideline of what to examine while reading the code is outlined in Table 3.2. • Potential CR: A reviewer may make a formal request to make a change. These are called change requests rather than defect reports. At this stage, since the programmer has not yet made the code public, it is more appropriate to make suggestions to the author to make changes, rather than report a defect. Though CRs focus on defects in the code, these reports are not included in defect statistics related to the product. • Suggested Improvement Opportunities: The reviewers may suggest how to fix the problems, if there are any, in the code under review. Since reviewers are experts in the subject area of the code, it is not unusual for them to make suggestions for improvements. Examination The examination process consists of the following activities: • The author makes a presentation of the procedural logic used in the code, the paths denoting major computations, and the dependency of the unit under review on other units. • The presenter reads the code line by line. The reviewers may raise questions if the code is seen to have defects. However, problems are not resolved in the meeting. The reviewers may make general suggestions on how to fix the defects, but it is up to the author of the code to take corrective measures after the meeting ends. • The recordkeeper documents the change requests and the suggestions for fixing the problems, if there are any. A CR includes the following details: 1. Give a brief description of the issue or action item. 2. Assign a priority level (major or minor) to a CR. 3. Assign a person to follow up the issue. Since a CR documents a potential problem, there is a need for interaction between the author 58 CHAPTER 3 UNIT TESTING TABLE 3.2 Code Review Checklist 1. Does the code do what has been specified in the design specification? 2. Does the procedure used in the module solve the problem correctly? 3. Does a software module duplicate another existing module which could be reused? 4. If library modules are being used, are the right libraries and the right versions of the libraries being used? 5. Does each module have a single entry point and a single exit point? Multiple exit and entry point programs are harder to test. 6. Is the cyclomatic complexity of the module more than 10? If yes, then it is extremely difficult to adequately test the module. 7. Can each atomic function be reviewed and understood in 10–15 minutes? If not, it is con- sidered to be too complex. 8. Have naming conventions been followed for all identifiers, such as pointers, indices, variables, arrays, and constants? It is important to adhere to coding standards to ease the introduction of a new contributor (programmer) to the development of a system. 9. Has the code been adequately commented upon? 10. Have all the variables and constants been correctly initialized? Have correct types and scopes been checked? 11. Are the global or shared variables, if there are any, carefully controlled? 12. Are there data values hard coded in the program? Rather, these should be declared as variables. 13. Are the pointers being used correctly? 14. Are the dynamically acquired memory blocks deallocated after use? 15. Does the module terminate abnormally? Will the module eventually terminate? 16. Is there a possibility of an infinite loop, a loop that never executes, or a loop with a premature exit? 17. Have all the files been opened for use and closed at termination? 18. Are there computations using variables with inconsistent data types? Is overflow or underflow a possibility? 19. Are error codes and condition messages produced by accessing a common table of messages? Each error code should have a meaning, and all of the meanings should be available at one place in a table rather than scattered all over the program code. 20. Is the code portable? The source code is likely to execute on multiple processor architectures and on different operating systems over its lifetime. It must be implemented in a manner that does not preclude this kind of a variety of execution environments. 21. Is the code efficient? In general, clarity, readability, or correctness should not be sacrificed for efficiency. Code review is intended to detect implementation choices that have adverse effects on system performance. of the code and one of the reviewers, possibly the reviewer who made the CR. 4. Set a deadline for addressing a CR. • The moderator ensures that the meeting remains focused on the review process. The moderator makes sure that the meeting makes progress at a certain rate so that the objective of the meeting is achieved. 3.2 STATIC UNIT TESTING 59 Step 4: Step 5: Step 6: • At the end of the meeting, a decision is taken regarding whether or not to call another meeting to further review the code. If the review process leads to extensive rework of the code or critical issues are identified in the process, then another meeting is generally convened. Otherwise, a second meeting is not scheduled, and the author is given the responsibility of fixing the CRs. Rework At the end of the meeting, the recordkeeper produces a summary of the meeting that includes the following information: • A list of all the CRs, the dates by which those will be fixed, and the names of the persons responsible for validating the CRs • A list of improvement opportunities • The minutes of the meeting (optional) A copy of the report is distributed to all the members of the review group. After the meeting, the author works on the CRs to fix the problems. The author documents the improvements made to the code in the CRs. The author makes an attempt to address the issues within the agreed-upon time frame using the prevailing coding conventions [5]. Validation The CRs are independently validated by the moderator or another person designated for this purpose. The validation process involves checking the modified code as documented in the CRs and ensuring that the suggested improvements have been implemented correctly. The revised and final version of the outcome of the review meeting is distributed to all the group members. Exit Summarizing the review process, it is said to be complete if all of the following actions have been taken: • Every line of code in the unit has been inspected. • If too many defects are found in a module, the module is once again reviewed after corrections are applied by the author. As a rule of thumb, if more than 5% of the total lines of code are thought to be contentious, then a second review is scheduled. • The author and the reviewers reach a consensus that when corrections have been applied the code will be potentially free of defects. • All the CRs are documented and validated by the moderator or someone else. The author’s follow-up actions are documented. • A summary report of the meeting including the CRs is distributed to all the members of the review group. The effectiveness of static testing is limited by the ability of a reviewer to find defects in code by visual means. However, if occurrences of defects depend on some actual values of variables, then it is a difficult task to identify those defects by visual means. Therefore, a unit must be executed to observe its behaviors in response to a variety of inputs. Finally, whatever may be the effectiveness of static tests, one cannot feel confident without actually running the code. 60 CHAPTER 3 UNIT TESTING Code Review Metrics It is important to collect measurement data pertinent to a review process, so that the review process can be evaluated, made visible to the upper management as a testing strategy, and improved to be more effective. Moreover, collecting metrics during code review facilitates estimation of review time and resources for future projects. Thus, code review is a viable testing strategy that can be effectively used to improve the quality of products at an early stage. The following metrics can be collected from a code review: • Number of lines of code (LOC) reviewed per hour • Number of CRs generated per thousand lines of code (KLOC) • Number of CRs generated per hour • Total number of CRs generated per project • Total number of hours spent on code review per project 3.3 DEFECT PREVENTION It is in the best interest of the programmers in particular and the company in general to reduce the number of CRs generated during code review. This is because CRs are indications of potential problems in the code, and those problems must be resolved before different program units are integrated. Addressing CRs means spending more resources and potentially delaying the project. Therefore, it is essential to adopt the concept of defect prevention during code development. In practice, defects are inadvertently introduced by programmers. Those accidents can be reduced by taking preventive measures. It is useful to develop a set of guidelines to construct code for defect minimization as explained in the following. These guidelines focus on incorporating suitable mechanisms into the code: • Build internal diagnostic tools, also known as instrumentation code, into the units. Instrumentation codes are useful in providing information about the internal states of the units. These codes allow programmers to realize built-in tracking and tracing mechanisms. Instrumentation plays a passive role in dynamic unit testing. The role is passive in the sense of observing and recording the internal behavior without actively testing a unit. • Use standard controls to detect possible occurrences of error conditions. Some examples of error detection in the code are divides by zero and array index out of bounds. • Ensure that code exists for all return values, some of which may be invalid. Appropriate follow-up actions need to be taken to handle invalid return values. • Ensure that counter data fields and buffer overflow and underflow are appropriately handled. • Provide error messages and help texts from a common source so that changes in the text do not cause inconsistency. Good error messages 3.3 DEFECT PREVENTION 61 identify the root causes of the problems and help users in resolving the problems [7]. • Validate input data, such as the arguments, passed to a function. • Use assertions to detect impossible conditions, undefined uses of data, and undesirable program behavior. An assertion is a Boolean statement which should never be false or can be false only if an error has occurred. In other words, an assertion is a check on a condition which is assumed to be true, but it can cause a problem if it not true. Assertion should be routinely used to perform the following kinds of checks: 1. Ensure that preconditions are satisfied before beginning to execute a unit. A precondition is a Boolean function on the states of a unit specifying our expectation of the state prior to initiating an activity in the code. 2. Ensure that the expected postconditions are true while exiting from the unit. A postcondition is a Boolean function on the state of a unit specifying our expectation of the state after an activity has been completed. The postconditions may include an invariance. 3. Ensure that the invariants hold. That is, check invariant states— conditions which are expected not to change during the execution of a piece of code. • Leave assertions in the code. You may deactivate them in the released version of code in order to improve the operational performance of the system. • Fully document the assertions that appear to be unclear. • After every major computation, reverse-compute the input(s) from the results in the code itself. Then compare the outcome with the actual inputs for correctness. For example, suppose that a piece of code computes the square root of a positive number. Then square the output value and compare the result with the input. It may be needed to tolerate a margin of error in the comparison process. • In systems involving message passing, buffer management is an important internal activity. Incoming messages are stored in an already allocated buffer. It is useful to generate an event indicating low buffer availability before the system runs out of buffer. Develop a routine to continually monitor the availability of buffer after every use, calculate the remaining space available in the buffer, and call an error handling routine if the amount of available buffer space is too low. • Develop a timer routine which counts down from a preset time until it either hits zero or is reset. If the software is caught in an infinite loop, the timer will expire and an exception handler routine can be invoked. • Include a loop counter within each loop. If the loop is ever executed less than the minimum possible number of times or more than the maximum possible number of times, then invoke an exception handler routine. 62 CHAPTER 3 UNIT TESTING • Define a variable to indicate the branch of decision logic that will be taken. Check this value after the decision has been made and the right branch has supposedly been taken. If the value of the variable has not been preset, there is probably a fall-through condition in the logic. 3.4 DYNAMIC UNIT TESTING Execution-based unit testing is referred to as dynamic unit testing. In this testing, a program unit is actually executed in isolation, as we commonly understand it. However, this execution differs from ordinary execution in the following way: 1. A unit under test is taken out of its actual execution environment. 2. The actual execution environment is emulated by writing more code (explained later in this section) so that the unit and the emulated environment can be compiled together. 3. The above compiled aggregate is executed with selected inputs. The outcome of such an execution is collected in a variety of ways, such as straightforward observation on a screen, logging on files, and software instrumentation of the code to reveal run time behavior. The result is compared with the expected outcome. Any difference between the actual and expected outcome implies a failure and the fault is in the code. An environment for dynamic unit testing is created by emulating the context of the unit under test, as shown in Figure 3.2. The context of a unit test consists of two parts: (i) a caller of the unit and (ii) all the units called by the unit. The environment of a unit is emulated because the unit is to be tested in isolation and the emulating environment must be a simple one so that any fault found as a result of running the unit can be solely attributed to the unit under test. The caller unit is known as a test driver, and all the emulations of the units called by the unit under test are called stubs. The test driver and the stubs are together called scaffolding. The functions of a test driver and a stub are explained as follows: • Test Driver: A test driver is a program that invokes the unit under test. The unit under test executes with input values received from the driver and, upon termination, returns a value to the driver. The driver compares the actual outcome, that is, the actual value returned by the unit under test, with the expected outcome from the unit and reports the ensuing test result. The test driver functions as the main unit in the execution process. The driver not only facilitates compilation, but also provides input data to the unit under test in the expected format. • Stubs: A stub is a “dummy subprogram” that replaces a unit that is called by the unit under test. Stubs replace the units called by the unit under test. A stub performs two tasks. First, it shows an evidence that the stub was, 3.4 DYNAMIC UNIT TESTING 63 Test driver Results Call and pass input parameters Output parameters Unit Under Test Call Acknowledge Print Stub Figure 3.2 Dynamic unit test environment. Call Acknowledge Stub Print in fact, called. Such evidence can be shown by merely printing a message. Second, the stub returns a precomputed value to the caller so that the unit under test can continue its execution. The driver and the stubs are never discarded after the unit test is completed. Instead, those are reused in the future in regression testing of the unit if there is such a need. For each unit, there should be one dedicated test driver and several stubs as required. If just one test driver is developed to test multiple units, the driver will be a complicated one. Any modification to the driver to accommodate changes in one of the units under test may have side effects in testing the other units. Similarly, the test driver should not depend on the external input data files but, instead, should have its own segregated set of input data. The separate input data file approach becomes a very compelling choice for large amounts of test input data. For example, if hundreds of input test data elements are required to test more than one unit, then it is better to create a separate input test data file rather than to include the same set of input test data in each test driver designed to test the unit. The test driver should have the capability to automatically determine the success or failure of the unit under test for each input test data. If appropriate, the driver should also check for memory leaks and problems in allocation and deallocation of memory. If the module opens and closes files, the test driver should check that these files are left in the expected open or closed state after each test. The test driver can be designed to check the data values of the internal variable that 64 CHAPTER 3 UNIT TESTING normally would not be available for checking at integration, system, or acceptance testing levels. The test driver and stubs are tightly coupled with the unit under test and should accompany the unit throughout its life cycle. A test driver and the stubs for a unit should be reusable and maintainable. Every time a unit is modified, the programmer should check whether or not to modify the corresponding driver and stubs. Whenever a new fault is detected in the unit, the corresponding test driver should be updated with a new set of input data to detect that fault and similar faults in the future. If the unit is expected to run on different platforms, the test driver and stubs should also be built to test the unit on new platforms. Finally, the test driver and stubs should be reviewed, cross-referenced with the unit for which these are written, and checked in to the version control system as a product along with the unit. The low-level design document provides guidance for the selection of input test data that are likely to uncover faults. Selection of test data is broadly based on the following techniques: • Control Flow Testing: The following is an outline of control flow testing: (i) draw a control flow graph from a program unit; (ii) select a few control flow testing criteria; (iii) identify paths in the control flow graph to satisfy the selection criteria; (iv) derive path predicate expressions from the selected paths; and (v) by solving the path predicate expression for a path, generate values of the inputs to the program unit that are considered as a test case to exercise the corresponding path. A thorough discussion of control flow testing is given in Chapter 4. • Data Flow Testing: The following is an outline of data flow testing: (i) draw a data flow graph from a program unit; (ii) select a few data flow testing criteria; (iii) identify paths in the data flow graph to satisfy the selection criteria; (iv) derive path predicate expressions from the selected paths; and (v) by solving the path predicate expression, generate values of the inputs to the program unit that are considered as a test case to exercise the corresponding path. Chapter 5 discusses data flow testing in greater detail. • Domain Testing: In control flow and data flow testing, no specific types of faults are explicitly considered for detection. However, domain testing takes a new approach to fault detection. In this approach, a category of faults called domain errors are defined and then test data are selected to catch those faults. It is discussed in detail in Chapter 6. • Functional Program Testing: In functional program testing one performs the following steps: (i) identify the input and output domains of a program; (ii) for a given input domain, select some special values and compute the expected outcome; (iii) for a given output domain, select some special values and compute the input values that will cause the unit to produce those output values; and (iv) consider various combinations of the input values chosen above. Chapter 9 discusses functional testing. 3.5 MUTATION TESTING 65 3.5 MUTATION TESTING Mutation testing has a rich and long history. It can be traced back to the late 1970s [8–10]. Mutation testing was originally proposed by Dick Lipton, and the article by DeMillo, Lipton, and Sayward [9] is generally cited as the seminal reference. Mutation testing is a technique that focuses on measuring the adequacy of test data (or test cases). The original intention behind mutation testing was to expose and locate weaknesses in test cases. Thus, mutation testing is a way to measure the quality of test cases, and the actual testing of program units is an added benefit. Mutation testing is not a testing strategy like control flow or data flow testing. It should be used to supplement traditional unit testing techniques. A mutation of a program is a modification of the program created by introducing a single, small, legal syntactic change in the code. A modified program so obtained is called a mutant. The term mutant has been borrowed from biology. Some of these mutants are equivalent to the original program, whereas others are faulty. A mutant is said to be killed when the execution of a test case causes it to fail and the mutant is considered to be dead . Some mutants are equivalent to the given program, that is, such mutants always produce the same output as the original program. In the real world, large programs are generally faulty, and test cases too contain faults. The result of executing a mutant may be different from the expected result, but a test suite does not detect the failure because it does not have the right test case. In this scenario the mutant is called killable or stubborn, that is, the existing set of test cases is insufficient to kill it. A mutation score for a set of test cases is the percentage of nonequivalent mutants killed by the test suite. The test suite is said to be mutation adequate if its mutation score is 100%. Mutation analysis is a two-step process: 1. The adequacy of an existing test suite is determined to distinguish the given program from its mutants. A given test suite may not be adequate to distinguish all the nonequivalent mutants. As explained above, those nonequivalent mutants that could not be identified by the given test suite are called stubborn mutants. 2. New test cases are added to the existing test suite to kill the stubborn mutants. The test suite enhancement process iterates until the test suite has reached a desired level of mutation score. Let us consider the following program P that finds rank corresponding to the first time the maximum value appears in the array. For simplicity, we assume that the program P reads three input arguments and prints the message accordingly: 1. main(argc,argv) 2. int argc, r, i; 3. char *argv[]; 4. { r = 1; 5. for i = 2 to 3 do 6. if (atoi(argv[i]) > atoi(argv[r])) r = i; 66 CHAPTER 3 UNIT TESTING 7. printf("Value of the rank is %d \n", r); 8. exit(0); } Now let us assume that we have the following test suite that tests the program P: Test case 1: Input: 1 2 3 Output: Value of the rank is 3 Test case 2: Input: 1 2 1 Output: Values of the rank is 2 Test case 3: Input: 3 1 2 Output: Value of the rank is 1 Now, let us mutate the program P . We can start with the following changes: Mutant 1: Change line 5 to for i = 1 to 3 do Mutant 2: Change line 6 to if (i > atoi(argv[r])) r = i; Mutant 3: Change line 6 to if (atoi(argv[i]) >= atoi(argv[r])) r = i; Mutant 4: Change line 6 to if (atoi(argv[r]) > atoi(argv[r])) r = i; If we run the modified programs against the test suite, we will get the following results: Mutants 1 and 3: The programs will completely pass the test suite. In other words, mutants 1 and 3 are not killed. Mutant 2: The program will fail test case 2. Mutant 4: The program will fail test case 1 and test case 2. If we calculate the mutation score, we see that we created four mutants, and two of them were killed. This tells us that the mutation score is 50%, assuming that mutants 1 and 3 are nonequivalent. The score is found to be low. It is low because we assumed that mutants 1 and 3 are nonequivalent to the original program. We have to show that either mutants 3.5 MUTATION TESTING 67 1 and 3 are equivalent mutants or those are killable. If those are killable, we need to add new test cases to kill these two mutants. First, let us analyze mutant 1 in order to derive a “killer” test. The difference between P and mutant 1 is the starting point. Mutant 1 starts with i = 1, whereas P starts with i = 2. There is no impact on the result r. Therefore, we conclude that mutant 1 is an equivalent mutant. Second, we add a fourth test case as follows: Test case 4: Input: 2 2 1 Then program P will produce the output “Value of the rank is 1” and mutant 3 will produce the output “Value of the rank is 2.” Thus, this test data kills mutant 3, which give us a mutation score of 100%. In order to use the mutation testing technique to build a robust test suite, the test engineer needs to follow the steps that are outlined below: Step 1: Begin with a program P and a set of test cases T known to be correct. Step 2: Step 3: Run each test case in T against the program P . If a test case fails, that is, the output is incorrect, program P must be modified and retested. If there are no failures, then continue with step 3. Create a set of mutants {P i }, each differing from P by a simple, syntactically correct modification of P . Step 4: Execute each test case in T against each mutant Pi . If the output of the mutant Pi differs from the output of the original program P , the mutant Pi is considered incorrect and is said to be killed by the test case. If P i produces exactly the same results as the original program P for the tests in T , then one of the following is true: • P and P i are equivalent. That is, their behaviors cannot be distinguished by any set of test cases. Note that the general problem of deciding whether or not a mutant is equivalent to the original program is undecidable. • P i is killable. That is, the test cases are insufficient to kill the mutant P i . In this case, new test cases must be created. Step 5: Calculate the mutation score for the set of test cases T . The mutation score is the percentage of nonequivalent mutants killed by the test data, that is, Mutation score = 100 × D/(N − E ), where D is the dead mutants, N the total number of mutants, and E the number of equivalent mutants. Step 6: If the estimated mutation adequacy of T in step 5 is not sufficiently high, then design a new test case that distinguishes Pi from P , add the new test case to T , and go to step 2. If the computed adequacy of T is more than an appropriate threshold, then accept T as a good measure of the correctness of P with respect to the set of mutant programs Pi , and stop designing new test cases. 68 CHAPTER 3 UNIT TESTING Mutation testing makes two major assumptions: 1. Competent Programmer Hypothesis: This assumption states that programmers are generally competent, and they do not create “random” programs. Therefore, we can assume that for a given problem a programmer will create a correct program except for simple errors. In other words, the mutants to be considered are the ones falling within a small deviation from the original program. In practice, such mutants are obtained by systematically and mechanically applying a set of transformations, called mutation operators, to the program under test. These mutation operators are expected to model programming errors made by programmers. In practice, this may be only partly true. 2. Coupling Effect: This assumption was first hypothesized in 1978 by DeMillo et al. [9]. The assumption can be restated as complex faults are coupled to simple faults in such a way that a test suite detecting all simple faults in a program will detect most of the complex faults. This assumption has been empirically supported by Offutt [11] and theoretically demonstrated by Wah [12]. The fundamental premise of mutation testing as coined by Geist et al. [13] is: If the software contains a fault, there will usually be a set of mutants that can only be killed by a test case that also detect that fault. Mutation testing helps the tester to inject, by hypothesis, different types of faults in the code and develop test cases to reveal them. In addition, comprehensive testing can be performed by proper choice of mutant operations. However, a relatively large number of mutant programs need to be tested against many of the test cases before these mutants can be distinguished from the original program. Running the test cases, analyzing the results, identifying equivalent mutants [14], and developing additional test cases to kill the stubborn mutants are all time consuming. Robust automated testing tools such as Mothra [15] can be used to expedite the mutation testing process. Recently, with the availability of massive computing power, there has been a resurgence of mutation testing processes within the industrial community to use as a white-box methodology for unit testing [16, 17]. Researchers have shown that with an appropriate choice of mutant programs mutation testing is as powerful as path testing, domain testing [18], and data flow testing [19]. 3.6 DEBUGGING The programmer, after a program failure, identifies the corresponding fault and fixes it. The process of determining the cause of a failure is known as debugging. Debugging occurs as a consequence of a test revealing a failure. Myers proposed three approaches to debugging in his book The Art of Software Testing [20]: • Brute Force: The brute-force approach to debugging is preferred by many programmers. Here, “let the computer find the error” philosophy is used. 3.6 DEBUGGING 69 Print statements are scattered throughout the source code. These print statements provide a crude trace of the way the source code has executed. The availability of a good debugging tool makes these print statements redundant. A dynamic debugger allows the software engineer to navigate by stepping through the code, observe which paths have executed, and observe how values of variables change during the controlled execution. A good tool allows the programmer to assign values to several variables and navigate step by step through the code. Instrumentation code can be built into the source code to detect problems and to log intermediate values of variables for problem diagnosis. One may use a memory dump after a failure has occurred to understand the final state of the code being debugged. The log and memory dump are reviewed to understand what happened and how the failure occurred. • Cause Elimination: The cause elimination approach can be best described as a process involving induction and deduction [21]. In the induction part, first, all pertinent data related to the failure are collected , such as what happened and what the symptoms are. Next, the collected data are organized in terms of behavior and symptoms, and their relationship is studied to find a pattern to isolate the causes. A cause hypothesis is devised, and the above data are used to prove or disprove the hypothesis. In the deduction part, a list of all possible causes is developed in order of their likelihoods, and tests are conducted to eliminate or substantiate each cause in decreasing order of their likelihoods. If the initial tests indicate that a particular hypothesis shows promise, test data are refined in an attempt to isolate the problem as needed. • Backtracking: In this approach, the programmer starts at a point in the code where a failure was observed and traces back the execution to the point where it occurred. This technique is frequently used by programmers, and this is useful in small programs. However, the probability of tracing back to the fault decreases as the program size increases, because the number of potential backward paths may become too large. Often, software engineers notice other previously undetected problems while debugging and applying a fix. These newly discovered faults should not be fixed along with the fix in focus. This is because the software engineer may not have a full understanding of the part of the code responsible for the new fault. The best way to deal with such a situation is to file a CR. A new CR gives the programmer an opportunity to discuss the matter with other team members and software architects and to get their approval on a suggestion made by the programmer. Once the CR is approved, the software engineer must file a defect in the defect tracking database and may proceed with the fix. This process is cumbersome, and it interrupts the debugging process, but it is useful for very critical projects. However, programmers often do not follow this because of a lack of a procedure to enforce it. A Debugging Heuristic The objective of debugging is to precisely identify the cause of a failure. Once the cause is identified, corrective measures are taken to 70 CHAPTER 3 UNIT TESTING fix the fault. Debugging is conducted by programmers, preferably by those who wrote the code, because the programmer is the best person to know the source code well enough to analyze the code efficiently and effectively. Debugging is usually a time consuming and error-prone process, which is generally performed under stress. Debugging involves a combination of systematic evaluation, intuition, and, sometimes, a little bit of luck. Given a symptom of a problem, the purpose is to isolate and determine its specific cause. The following heuristic may be followed to isolate and correct it: Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Reproduce the symptom(s). • Read the troubleshooting guide of the product. This guide may include conditions and logs, produced by normal code, or diagnostics code specifically written for troubleshooting purpose that can be turned on. • Try to reproduce the symptoms with diagnostics code turned on. • Gather all the information and conduct causal analysis The goal of causal analysis is to identify the root cause of the problem and initiate actions so that the source of defects is eliminated. Formulate some likely hypotheses for the cause of the problem based on the causal analysis. Develop a test scenario for each hypothesis to be proved or disproved. This is done by designing test cases to provide unambiguous results related to a hypothesis. The test cases may be static (reviewing code and documentation) and/or dynamic in nature. Preferably, the test cases are nondestructive, have low cost, and need minimum additional hardware needs. A test case is said to be destructive if it destroys the hardware setup. For example, cutting a cable during testing is called destructive testing. Prioritize the execution of test cases. Test cases corresponding to the highly probable hypotheses are executed first. Also, the cost factor cannot be overlooked. Therefore, it is desirable to execute the low-cost test cases first followed by the more expensive ones. The programmer needs to consider both factors. Execute the test cases in order to find the cause of a symptom. After executing a test case, examine the result for new evidence. If the test result shows that a particular hypothesis is promising, test data are refined in an attempt to isolate the defect. If necessary, go back to earlier steps or eliminate a particular hypothesis. Fix the problem. • Fixing the problem may be a simple task, such as adding a line of code or changing a variable in a line of code, or a more complex task such as modifying several lines of code or designing a new unit. In the complex case, defect fixing is a rigorous activity. • If code review has already been completed for the module which received a fix, then code review must be done once again to avoid 3.7 UNIT TESTING IN EXTREME PROGRAMMING 71 any side effects (collateral damage) due to the changes effected in the module. • After a possible code review, apply the fix. • Retest the unit to confirm that the actual cause of failure had been found. The unit is properly debugged and fixed if tests show that the observed failure does not occur any more. • If there are no dynamic unit test cases that reveal the problem, then add a new test case to the dynamic unit testing to detect possible reoccurrences or other similar problems. • For the unit under consideration, identify all the test cases that have passed. Now, perform a regression test on the unit with those test cases to ensure that new errors have not been introduced. That is why it is so important to have archived all the test cases that have been designed for a unit. Thus, even unit-level test cases must be managed in a systematic manner to reduce the cost of software development. Step 7: Document the changes which have been made. Once a defect is fixed, the following changes are required to be applied: • Document the changes in the source code itself to reflect the change. • Update the overall system documentation. • Changes to the dynamic unit test cases. • File a defect in the defect tracking database if the problem was found after the code was checked in to the version control system. 3.7 UNIT TESTING IN EXTREME PROGRAMMING A TDD approach to code development is used in the XP methodology [22, 23]. The key aspect of the TDD approach is that a programmer writes low-level tests before writing production code. This is referred to as test first [24] in software development. Writing test-driven units is an important concept in the XP methodology. In XP, a few unit tests are coded first, then a simple, partial system is implemented to pass the tests. Then, one more new unit test is created, and additional code is written to pass the new test, but not more, until a new unit test is created. The process is continued until nothing is left to test. The process is illustrated in Figure 3.3 and outlined below: Step 1: Step 2: Step 3: Step 4: Pick a requirement, that is, a story. Write a test case that will verify a small part of the story and assign a fail verdict to it. Write the code that implements a particular part of the story to pass the test. Execute all tests. 72 CHAPTER 3 UNIT TESTING Story Understand Add a single test Add code for the test Execute all tests Fail Pass Rework on code Results No Yes Story complete Next story Figure 3.3 Test-first process in XP. (From ref. 24. © 2005 IEEE.) Step 5: Rework the code, and test the code until all tests pass. Step 6: Repeat steps 2–5 until the story is fully implemented. The simple cycle in Figure 3.3 shows that, at the beginning of each cycle, the intention is for all tests to pass except the newly added test case. The new test case is introduced to drive the new code development. At the end of the cycle, the programmer executes all the unit tests, ensuring that each one passes and, hence, the planned task of the code still works. A TDD developer must follow the three laws proposed by Robert C. Martin [25]: • One may not write production code unless the first failing unit test is written. • One may not write more of a unit test than is sufficient to fail. • One may not write more production code than is sufficient to make the failing unit test pass. These three laws ensure that one must write a portion of a unit test that fails and then write just enough production code to make that unit test pass. The goal of these three laws is not to follow them strictly—it is to decrease the interval between writing unit tests and production code. Creating unit tests helps a developer focus on what needs to be done. Requirements, that is, user stories, are nailed down firmly by unit tests. Unit tests are released into the code repository along with the code they test. Code without unit tests may not be released. If a unit test is discovered to be missing, it must be created immediately. Creating unit tests independently before coding sets up checks and balances and improves the chances of getting the system right the first time. 3.8 JUNIT: FRAMEWORK FOR UNIT TESTING 73 Unit tests provide a safety net of regression tests and validation tests so that XP programmers can refactor and integrate effectively. In XP, the code is being developed by two programmers working together side by side. The concept is called pair programming. The two programmers sit side by side in front of the monitor. One person develops the code tactically and the other one inspects it methodically by keeping in mind the story they are implementing. It is similar to the two-person inspection strategy proposed by Bisant and Lyle [26]. Code inspection is carried out by an author–examiner pair in discrete steps, examining a small part of the implementation of the story in isolation, which is key to the success of the code review process. 3.8 JUNIT: FRAMEWORK FOR UNIT TESTING The JUnit is a unit testing framework for the Java programming language designed by Kent Beck and Erich Gamma. Experience gained with JUnit has motivated the development of the TDD [22] methodology. The idea in the JUnit framework has been ported to other languages, including C# (NUnit), Python (PyUnit), Fortran (fUnit) and C++ (CPPUnit). This family of unit testing frameworks is collectively referred to as xUnit. This section will introduce the fundamental concepts of JUnit to the reader. Suppose that we want to test the individual methods of a class called PlanetClass. Let Move() be a method in PlanetClass such that Move() accepts only one input parameter of type integer and returns a value of type integer. One can follow the following steps, illustrated using pseudocode in Figure 3.4, to test Move(): • Create an object instance of PlanetClass. Let us call the instance Mars. Now we are interested in testing the method Move() by invoking it on object Mars. • Select a value for all the input parameters of Move()—this function has just one input parameter. Let us represent the input value to Move() by x . • Know the expected value to be returned by Move(). Let the expected returned value be y. : Planet Mars = new Planet(); // Instantiate class Planet to create // an object Mars. x = ... ; // Select a value for x. y = ... ; // The expected value to be returned by the call Move(x). z = Mars.Move(x); // Invoke method Mars() on object Mars. if (z == y) print("Test passed"); else print("Test failed."); : Figure 3.4 Sample pseudocode for performing unit testing. 74 CHAPTER 3 UNIT TESTING • Invoke method Move() on object Mars with input value x . Let z denote the value returned by Move(). • Now compare y with z . If the two values are identical, then the method Move() in object Mars passes the test. Otherwise, the test is said to have failed. In a nutshell, the five steps of unit testing are as follows: • Create an object and select a method to execute. • Select values for the input parameters of the method. • Compute the expected values to be returned by the method. • Execute the selected method on the created object using the selected input values. • Verify the result of executing the method. Performing unit testing leads to a programmer consuming some resources, especially time. Therefore, it is useful to employ a general programming framework to code individual test cases, organize a set of test cases as a test suite, initialize a test environment, execute the test suite, clean up the test environment, and record the result of execution of individual test cases. In the example shown in Figure 3.4, creating the object Mars is a part of the initialization process. The two print() statements are examples of recording the result of test execution. Alternatively, one can write the result of test execution to a file. The JUnit framework has been developed to make test writing simple. The framework provides a basic class, called TestCase, to write test cases. Programmers need to extend the TestCase class to write a set of individual test cases. It may be noted that to write, for example, 10 test cases, one need not write 10 subclasses of the class TestCase. Rather, one subclass, say MyTestCase, of TestCase, can contain 10 methods—one for each test case. Programmers need to make assertions about the state of objects while extending the TestCase class to write test cases. For example, in each test case it is required to compare the actual outcome of a computation with the expected outcome. Though an if() statement can be used to compare the equality of two values or two objects, it is seen to be more elegant to write an assert statement to achieve the same. The class TestCase extends a utility class called Assert in the JUnit framework. Essentially, the Assert class provides methods, as explained in the following, to make assertions about the state of objects created and manipulated while testing. assertTrue(Boolean condition): This assertion passes if the condition is true; otherwise, it fails. assertEquals(Object expected, Object actual): This assertion passes if the expected and the actual objects are equal according to the equals() method; otherwise, the assertion fails. assertEquals(int expected, int actual): This assertion passes if expected and actual are equal according to the = = operator; otherwise, the assertion 3.8 JUNIT: FRAMEWORK FOR UNIT TESTING 75 fails. For each primitive type int, float, double, char, byte, long, short, and boolean, the assertion has an overloaded version. assertEquals(double expected, double actual, double tolerance): This assertion passes if the absolute value of the difference between expected and actual is less than or equal to the tolerance value; otherwise, the assertion fails. The assertion has an overloaded version for float inputs. assertSame(Object expected, Object actual): This assertion passes if the expected and actual values refer to the same object in memory; otherwise, the assertion fails. assertNull(Object testobject): This assertion passes if testobject is null; otherwise the assertion fails. assertFalse(Boolean condition): This is the logical opposite of assertTrue(). The reader may note that the above list of assertions is not exhaustive. In fact, one can build other assertions while extending the TestCase class. When an assertion fails, a programmer may want to know immediately the nature of the failure. This can be done by displaying a message when the assertion fails. Each assertion method listed above accepts an optional first parameter of type String—if the assertion fails, then the String value is displayed. This facilitates the programmer to display a desired message when the assertion fails. As an aside, upon failure, the assertEquals() method displays a customized message showing the expected value and the actual value. For example, an assertEquals() method can display the following: junit.framework.AssertionFailedError: expected: <2006> but was:<2060>. At this point it is interesting to note that only failed tests are reported. Failed tests can be reported by various means, such as displaying a message, displaying an identifier for the test case, and counting the total number of failed test cases. Essentially, an assertion method throws an exception, called AssertionFailedError, when the assertion fails, and JUnit catches the exception. The code shown in Figure 3.5 illustrates how the assertTrue() assertion works: When the JUnit framework catches an exception, it records the fact that the assertion failed and proceeds to the next test case. Having executed all the test cases, JUnit produces a list of all those tests that have failed. In Figure 3.6, we show an example of a test suite containing two test cases. In order to execute the two test cases, one needs to create an object instance of static public void assertTrue(Boolean condition) { if (!condition) throw new AssertionFailedError(); } Figure 3.5 The assertTrue() assertion throws an exception. 76 CHAPTER 3 UNIT TESTING import TestMe; // TestMe is the class whose methods are going //to be tested. import junit.framework.*; // This contains the TestCase class. public class MyTestSuite extends TestCase { // Create a subclass //of TestCase public void MyTest1() { // This method is the first test case TestMe object1 = new TestMe( ... ); // Create an //instance of TestMe with //desired parameters int x = object1.Method1(...); // invoke Method1 //on object1 assertEquals(365, x); // 365 and x are expected and //actual values, respectively } public void MyTest2() { // This method is the second test case TestMe object2 = new TestMe( ... ); // Create another //instance of TestMe //with desired parameters double y = object2.Method2(...); // invoke Method2 //on object2 assertEquals(2.99, y, 0.0001d); // 2.99 is the expected // value; y is the actual // value; 0.0001 is tolerance // level } } Figure 3.6 Example test suite. MyTestSuite and invoke the two methods MyTest1() and MyTest2(). Whether or not the two methods, namely Method1() and Method()2, are to be invoked on two different instances of the class TestMe depends on the individual objectives of those two test cases. In other words, it is the programmer who decides whether or not two instances of the class TestMe are to be created. This section is by no means a thorough exposition of the capabilities of the JUnit framework. Readers are referred to other sources, such as JUnit Recipes by Rainsberger [27] and Pragmatic Unit Testing by Hunt and Thomas [28]. In addition, tools such as Korat [29], Symstra [30], and Eclat [31] for Java unit testing are being developed and used by researchers. 3.9 TOOLS FOR UNIT TESTING Programmers can benefit from using tools in unit testing by reducing testing time without sacrificing thoroughness. The well-known tools in everyday life are an editor, a compiler, an operating system, and a debugger. However, in some cases, 3.9 TOOLS FOR UNIT TESTING 77 the real execution environment of a unit may not be available to a programmer while the code is being developed. In such cases, an emulator of the environment is useful in testing and debugging the code. Other kinds of tools that facilitate effective unit testing are as follows: 1. Code Auditor: This tool is used to check the quality of software to ensure that it meets some minimum coding standards. It detects violations of programming, naming, and style guidelines. It can identify portions of code that cannot be ported between different operating systems and processors. Moreover, it can suggest improvements to the structure and style of the source code. In addition, it counts the number of LOC which can be used to measure productivity, that is, LOC produced per unit time, and calculate defect density, that is, number of defects per KLOC. 2. Bound Checker: This tool can check for accidental writes into the instruction areas of memory or to any other memory location outside the data storage area of the application. This fills unused memory space with a signature pattern (distinct binary pattern) as a way of determining at a later time whether any of this memory space has been overwritten. The tool can issue diagnostic messages when boundary violations on data items occur. It can detect violation of the boundaries of array, for example, when the array index or pointer is outside its allowed range. For example, if an array z is declared to have a range from z [0] to z [99], it can detect reads and writes outside this range of storage, for example, z [−3] or z [10]. 3. Documenters: These tools read source code and automatically generate descriptions and caller/callee tree diagram or data model from the source code. 4. Interactive Debuggers: These tools assist software developers in implementing different debugging approaches discussed in this chapter. These tools should have the trace-back and breakpoint capabilities to enable the programmers to understand the dynamics of program execution and to identify problem areas in the code. Breakpoint debuggers are based on deductive logic. Breakpoints are placed according to a heuristic analysis of code [32]. Another popular kind of debugger is known as omniscient debugger (ODB), in which there is no deduction. It simply follows the trail of “bad” values back to their source—no “guessing” where to put the breakpoints. An ODB is like “the snake in the grass,” that is, if you see a snake in the grass and you pull its tail, sooner or later you get to its head. In contrast, breakpoint debuggers suffer from the “lizard in the grass” problem, that is, when you see the lizard and grab its tail, the lizard breaks off its tail and gets away [33]. 5. In-Circuit Emulators: An in-circuit emulator, commonly known as ICE, is an invaluable software development tool in embedded system design. It provides a high-speed Ethernet connection between a host debugger and a target microprocessor, enabling developers to perform common source-level debugging activities, such as watching memory and controlling large numbers of registers, in a matter of seconds. It is vital for board bring-up, solving complex problems, and manufacturing or testing of products. Many emulators have advanced features, such as 78 CHAPTER 3 UNIT TESTING performance analysis, coverage analysis, buffering of traces, and advance trigger and breakpoint possibilities. 6. Memory Leak Detectors: These tools test the allocation of memory to an application which requests for memory, but fails to deallocate. These detect the following overflow problems in application programs: • Illegal read, that is, accesses to memory which is not allocated to the application or which the application is not authorized to access. • Reads memory which has not been initialized. • Dynamic memory overwrites to a memory location that has not been allocated to the application. • Reading from a memory location not allocated, or not initialized, prior to the read operation. The tools watch the heap, keep track of heap allocations to applications, and detect memory leaks. The tools also build profiles of memory use, for example, which line-of-code source instruction accesses a particular memory address. 7. Static Code (Path) Analyzer: These tools identify paths to test, based on the structure of the code such as McCabe’s cyclomatic complexity measure (Table 3.3). Such tools are dependent on source language and require the source code to be recompiled with the tool. These tools can be used to improve productivity, resource management, quality, and predictability by providing complexity measurement metrics. 8. Software Inspection Support: Tools can help schedule group inspections. These can also provide status of items reviewed and follow-up actions and distribute the reports of problem resolution. They can be integrated with other tools, such as static code analyzers. 9. Test Coverage Analyzer: These tools measure internal test coverage, often expressed in terms of the control structure of the test object, and report the coverage metric. Coverage analyzers track and report what paths were exercised during dynamic unit testing. Test coverage analyzers are powerful tools that increase confidence in product quality by assuring that tests cover all of the structural parts of a unit or a program. An important aspect in test coverage analysis is to identify parts of source code that were never touched by any dynamic unit test. Feedback from the coverage reports to the source code makes it easier to design new unit test cases to cover the specific untested paths. 10. Test Data Generator: These tools assist programmers in selecting test data that cause a program to behave in a desired manner. Test data generators can offer several capabilities beyond the basics of data generation: • They have generate a large number of variations of a desired data set based on a description of the characteristics which has been fed into the tool. • They can generate test input data from source code. • They can generate equivalence classes and values close to the boundaries. • They can calculate the desired extent of boundary value testing. 3.9 TOOLS FOR UNIT TESTING 79 TABLE 3.3 McCabe Complexity Measure McCabe’s complexity measure is based on the cyclomatic complexity of a programe graph for a module. The metric can be computed using the formula v = e − n + 2, where v = cyclomatic complexity of graph, e = number of edges (program flow between nodes) n = number of nodes (sequential group of program statements) If a strongly connected graph is constructed (one in which there is an edge between the exit node and the entry node), the calculation is v = e − n + 1. Example: A program graph, illustrated below, is used to depict control flow. Each circled node represents a sequence of program statements, and the flow of control is represented by directed edges. For this graph the cyclomatic complexity is v = 9 − 8 + 2 = 3 n=8 e=9 Source: From ref. 6. • They can estimate the likelihood of the test data being able to reveal faults. • They can generate data to assist in mutation analysis. Automatic generation of test inputs is an active area of research. Several tools, such as CUTE [34], DART [35], and EGT system [36], have been developed by researchers to improve test coverage. 11. Test Harness: This class of tools supports the execution of dynamic unit tests by making it almost painless to (i) install the unit under test in a test environment, (ii) drive the unit under test with input data in the expected input format, (iii) generate stubs to emulate the behavior of subordinate modules, and (iv) capture the actual outcome as generated by the unit under test and log or display it in a usable form. Advanced tools may compare the expected outcome with the actual outcome and log a test verdict for each input test data. 12. Performance Monitors: The timing characteristics of software components can be monitored and evaluated by these tools. These tools are essential for any real-time system in order to evaluate the performance characteristics of the system, such as delay and throughput. For example, in telecommunication systems, these tools can be used to calculate the end-to-end delay of a telephone call. 80 CHAPTER 3 UNIT TESTING 13. Network Analyzers: Network operating systems such as software that run on routers, switches, and client/server systems are tested by network analyzers. These tools have the ability to analyze the traffic and identify problem areas. Many of these networking tools allow test engineers to monitor performance metrics and diagnose performance problems across the networks. These tools are enhanced to improve the network security monitoring (NSM) capabilities to detect intrusion [37]. 14. Simulators and Emulators: These tools are used to replace the real software and hardware that are currently not available. Both kinds of tools are used for training, safety, and economy reasons. Some examples are flight simulators, terminal emulators, and emulators for base transceiver stations in cellular mobile networks. These tools are bundled with traffic generators and performance analyzers in order to generate a large volume of input data. 15. Traffic Generators: Large volumes of data needed to stress the interfaces and the integrated system are generated by traffic generators. These produce streams of transactions or data packets. For example, in testing routers, one needs a traffic that simulates streams of varying size Internet Protocol (IP) packets arriving from different sources. These tools can set parameters for mean packet arrival rate, duration, and packet size. Operational profiles can be used to generate traffic for load and stability testing. 16. Version Control: A version control system provides functionalities to store a sequence of revisions of the software and associated information files under development. A system release is a collection of the associated files from a version control tool perspective. These files may contain source code, compiled code, documentation, and environment information, such as version of the tool used to write the software. The objective of version control is to ensure a systematic and traceable software development process in which all changes are precisely managed, so that a software system is always in a well-defined state. With most of the version control tools, the repository is a central place that holds the master copy of all the files. The configuration management system (CMS) extends the version control from software and documentation to control the changes made to hardware, firmware, software, documentation, test, test fixtures, test documentation, and execution environments throughout the development and operational life of a system. Therefore, configuration management tools are larger, better variations of version control tools. The characteristics of the version control and configuration management tools are as follows: • Access Control: The tools monitor and control access to components. One can specify which users can access a component or group of components. One can also restrict access to components currently undergoing modification or testing. • Cross Referencing: The tools can maintain linkages among related components, such as problem reports, components, fixes, and documentations. 3.10 SUMMARY 81 One can merge files and coordinate multiple updates from different versions to produce one consolidated file. • Tracking of Modifications: The tools maintain records of all modifications to components. These also allow merging of files and coordinate multiple updates from different versions to produce one consolidated file. These can track similarities and differences among versions of code, documentation, and test libraries. They also provide an audit trail or history of the changes from version to version. • Release Generation: The tools can automatically build new system releases and insulate the development, test, and shipped versions of the product. • System Version Management: The tools allow sharing of common components across system versions and controlled use of system versions. They support coordination of parallel development, maintenance, and integration of multiple components among several programmers or project teams. They also coordinate geographically dispersed development and test teams. • Archiving: The tools support automatic archiving of retired components and system versions. 3.10 SUMMARY This chapter began with a description of unit-level testing, which means identifying faults in a program unit analyzed and executed in isolation. Two complementary types of unit testing were introduced: static unit testing and dynamic unit testing. Static unit testing involves visual inspection and analysis of code, whereas a program unit is executed in a controlled manner in dynamic unit testing. Next, we described a code review process, which comprises six steps: readiness, preparation, examination, rework, validation, and exit. The goal of code review is to assess the quality of the software in question, not the quality of the process used to develop the product. We discussed a few basic metrics that can be collected from the code review process. Those metrics facilitate estimation of review time and resources required for similar projects. Also, the metrics make code review visible to the upper management and allow upper management to be satisfied with the viability of code review as a testing tool. We explained several preventive measures that can be taken during code development to reduce the number of faults in a program. The preventive measures were presented in the form of a set of guidelines that programmers can follow to construct code. Essentially, the guidelines focus on incorporating suitable mechanisms into the code. Next, we studied dynamic unit testing in detail. In dynamic unit testing, a program unit is actually executed, and the outcomes of program execution are observed. The concepts of test driver and stubs were explained in the context of a unit under test. A test driver is a caller of the unit under test and all the 82 CHAPTER 3 UNIT TESTING “dummy modules” called by the unit are known as stubs. We described how mutation analysis can be used to locate weaknesses in test data used for unit testing. Mutation analysis should be used in conjunction with traditional unit testing techniques such as domain analysis or data flow analysis. That is, mutation testing is not an alternative to domain testing or data flow analysis. With the unit test model in place to reveal defects, we examined how programmers can locate faults by debugging a unit. Debugging occurs as a consequence of a test revealing a defect. We discussed three approaches to debugging: brute force, cause elimination, and backtracking. The objective of debugging is to precisely identify the cause of a failure. Given the symptom of a problem, the purpose is to isolate and determine its specific cause. We explained a heuristic to perform program debugging. Next, we explained dynamic unit testing is an integral part of the XP software development process. In the XP process, unit tests are created prior to coding—this is known as test first. The test-first approach sets up checks and balances to improve the chances of getting things right the first time. We then introduced the JUnit framework, which is used to create and execute dynamic unit tests. We concluded the chapter with a description of several tools that can be useful in improving the effectiveness of unit testing. These tools are of the following types: code auditor, bound checker, documenters, interactive debuggers, in-circuit emulators, memory leak detectors, static code analyzers, tools for software inspection support, test coverage analyzers, test data generators, tools for creating test harness, performance monitors, network analyzers, simulators and emulators, traffic generators, and tools for version control. LITERATURE REVIEW The Institute of Electrical and Electronics Engineers (IEEE) standard 1028-1988 (IEEE Standard for Software Reviews and Audits: IEEE/ANSI Standard) describes the detailed examination process for a technical review, an inspection, a software walkthrough, and an audit. For each of the examination processes, it includes an objective, an abstract, special responsibilities, program input, entry criteria, procedures, exit criteria, output, and auditability. Several improvements on Fagan’s inspection techniques have been proposed by researchers during the past three decades. Those proposals suggest ways to enhance the effectiveness of the review process or to fit specific application domains. A number of excellent articles address various issues related to software inspection as follows: S. Biffl, and M. Halling, “Investigating the Defect Effectiveness and Cost Benefit of Nominal Inspection Teams,” IEEE Transactions on Software Engineering, Vol. 29, No. 5, May 2003, pp. 385–397. A. A. Porter and P. M. Johnson, “Assessing Software Review Meeting: Results of a Comparative Analysis of Two Experimental Studies,” IEEE LITERATURE REVIEW 83 Transactions on Software Engineering, Vol. 23, No. 3, March 1997, pp. 129 – 145. A. A. Porter, H. P. Siy, C. A. Toman, and L. G. Votta, “An Experiment to Assess the Cost-Benefits of Code Inspection in Large Scale Software Development,” IEEE Transactions on Software Engineering, Vol. 23, No. 6, June 1997, pp. 329–346. A. A. Porter and L. G. Votta, “What Makes Inspection Work,” IEEE Software, Vol. 14, No. 5, May 1997, pp. 99–102. C. Sauer, D. Jeffery, L. Land, and P. Yetton, “The Effectiveness of Software Development Technical Reviews: A Behaviorally Motivated Program of Search,” IEEE Transactions on Software Engineering, Vol. 26, No. 1, January 2000, pp. 1–14. An alternative non-execution-based technique is formal verification of code. Formal verification consists of mathematical proofs to show that a program is correct. The two most prominent methods for proving program properties are those of Dijkstra and Hoare: E. W. Dijkstra, A Discipline of Programming, Prentice-Hall, Englewood Cliffs, NJ, 1976. C. A. R. Hoare, “An Axiomatic Basis of Computer Programming,” Communications of the ACM , Vol. 12, No. 10, October 1969, pp. 576–580. Hoare presented an axiomatic approach in which properties of program fragments are described using preconditions and postconditions. An example statement with a precondition and a postcondition is {PRE} P {POST}, where PRE is the precondition, POST is the postcondition, and P is the program fragment. Both PRE and POST are expressed in first-order predicate calculus, which means that they can include the universal quantifier ∀ (“for all”) and existential quantifier ∃ (“there exists”). The interpretation of the above statement is that if the program fragment P starts executing in a state satisfying PRE, then if P terminates, P will do so in a state satisfying POST. Hoare’s logic led to Dijkstra’s closely related “calculus of programs,” which is based on the idea of weakest preconditions. The weakest preconditions R with respect to a program fragment P and a postcondition POST is the set of all states that, when subject to P, will terminate and leave the state of computation in POST. The weakest precondition is written as WP(P, POST). While mutation testing systematically implants faults in programs by applying syntactic transformations, perturbation testing is performed to test a program’s robustness by changing the values of program data during run time, so that the subsequent execution will either fail or succeed. Program perturbation is based on three parts of software hypothesis as explained in the following: • Execution: A fault must be executed. • Infection: The fault must change the data state of the computation directly after the fault location. 84 CHAPTER 3 UNIT TESTING • Propagation: The erroneous data state must propagate to an output variable. In the perturbation technique, the programmer injects faults in the data state of an executing program and traces the injected faults on the program’s output. A fault injection is performed by applying a perturbation function that changes the program’s data state. A perturbation function is a mathematical function that takes a data state as its input, changes the data state according to some specified criteria, and produces a modified data state as output. For the interested readers, two excellent references on perturbation testing are as follows: M. A. Friedman and J. M. Voas, Software Assessment—Reliability, Safety, Testability, Wiley, New York, 1995. J. M. Voas and G. McGraw, Software Fault Injection—Inoculating Programs Against Errors, Wiley, New York, 1998. The paper by Steven J. Zeil (“Testing for Perturbation of Program Statement,” IEEE Transactions on Software Engineering, Vol. 9, No. 3, May 1983, pp. 335–346) describes a method for deducing sufficient path coverage to ensure the absence of prescribed errors in a program. It models the program computation and potential errors as a vector space. This enables the conditions for nondetection of an error to be calculated. The above article is an advanced reading for students who are interested in perturbation analysis. Those readers actively involved in software configuration management (SCM) systems or interested in a more sophisticated treatment of the topic must read the article by Jacky Estublier, David Leblang, Andre´ V. Hoek, Reidar Conradi, Geoffrey Clemm, Walter Tichy, and Darcy Wiborg-Weber (“Impact of Software Engineering Research on the Practice of Software Configuration Management,” ACM Transactions on Software Engineering and Methodology, Vol. 14, No. 4, October 2005, pp. 383–430). The authors discussed the evolution of software configuration management technology, with a particular emphasis on the impact that university and industrial research has had along the way. This article creates a detailed record of the critical value of software configuration management research and illustrates the research results that have shaped the functionality of SCM systems. REFERENCES 1. M. E. Fagan. Design and Code Inspections to Reduce Errors in Program Development. IBM Systems Journal , July 1976, pp. 182– 211; reprinted 1999, pp. 258– 287. 2. E. Yourdon. Structured Walkthroughs. Prentice-Hall, Englewood Cliffs, NJ, 1979. 3. D. Parnas and M. Lawford. The Role of Inspection in Software Quality Assurance. IEEE Transac- tions on Software Engineering, August 2003, pp. 674–676. 4. G. Myers. A Controlled Experimentat in Program Testing and Code Walk-throughs/Inspections. Communications of the ACM , September 1978, pp. 760– 768. 5. H. Sutter and A. Alexandrescu. C++ Coding Standards: 101 Rules, Guidelines, and Best Practices. Addison-Wesley, Reading, MA, 2004. REFERENCES 85 6. T. J. McCabe. A Complexity Measure. IEEE Transactions on Software Engineering, December 1976, pp. 308– 320. 7. A. Davis. Disecting Error Messages. Dr. Dobb’s Journal, June 2005, pp. 34– 41. 8. T. Budd and F. Sayward. Users Guide to the Pilot Mutation System, Technical Report 114. Depart- ment of Computer Science, Yale University, 1977. 9. R. A. DeMillo, R. J. Lipton, and F. Sayward. Hints on Test Data Selection: Help for the Practicing Programmer. IEEE Computer, April 1978, pp. 34–41. 10. R. G. Hamlet. Testing Programs with the Aid of Compiler. IEEE Transactions on Software Engi- neering, July 1977, pp. 279–290. 11. A. J. Offutt. Investigations of the Software Testing Coupling Effect. ACM Transactions on Software Engineering Methodology, January 1992, pp. 3–18. 12. K. S. H. T. Wah. A Theoretical Study of Fault Coupling. Journal of Software Testing, Verification, and Reliability, March 2000, pp. 3–46. 13. R. Geist, A. J. Offutt, and F. Harris. Estimation and Enhancement of Real-Time Software Reliability through Mutation Analysis. IEEE Transactions on Computers, May 1992, pp. 550–558. 14. A. J. Offutt and J. Pan. Automatically Detecting Equivalent Mutants and Infeasible Paths. Journal of Software Testing, Verification, and Reliability, September 1997, pp. 165–192. 15. R. A. DeMillo, D. S. Guindi, K. N. King, W. M. McCracken, and A. J. Offutt. An Extended Overview of the Mothra Software Testing Environment. In Proceedings of the Second Workshop on Software Testing, Verification, and Analysis, Banff, Alberta. IEEE Computer Society Press, New York, July 1988, pp. 142–151. 16. P. Chevalley and P. Thevenod-Fosse. A Mutation Analysis Tool for Java Programs. International Journal on Software Tools for Technology Transfer (STTT), Springer Berlin /Heidelberg, November 2003, pp. 90–103. 17. A. Kolawa. Mutation Testing: A New Approach to Automatic Error-Detection. STAR-EAST , www.StickyMinds.com, 1999. 18. P. G. Frankl and E. J. Weyuker. An Applicable Family of Data Flow Testing Criteria. IEEE Transactions on Software Engineering, March 1993, pp. 202–213. 19. A. P. Mathur and W. E. Wong. An Empirical Comparison of Data Flow and Mutation-Based Test Adequacy Criteria. Journal of Software Testing, Analysis, and Verification, March 1994, pp. 9–31. 20. G. Myers. The Art of Software Testing, 2nd ed. Wiley, New York, 2004. 21. R. S. Pressman. Software Engineering: A Practitioner’s Approach. McGraw-Hill, New York, 2005. 22. K. Beck. Test-Driven Development. Addison-Wesley, Reading, MA, 2003. 23. R. Jeffries and G. Melnik. TDD: The Art of Fearless Programming. IEEE Software, May/June 2007, pp. 24–30. 24. H. Erdogmus, M. Morisio, and M. Torchiano. On the Effectiveness of the Test-First Approach to Programming. IEEE Transactions on Software Engineering, March 2005, pp. 226–237. 25. R. C. Martin. Professionalism and Test-Driven Development. IEEE Software, May/June 2007, pp. 32– 36. 26. D. B. Bisant and J. R. Lyle. Two Person Inspection Method to Improve Programming Productivity. IEEE Transactions on Software Engineering, October 1989, pp. 1294– 1304. 27. J. B. Rainsberger. JUnit Recipes. Manning Publications, Greenwich, Connecticut 2005. 28. A. Hunt and D. Thomas. Pragmatic Unit Testing in Java with JUnit. The Pragmatic Bookshelf, Lewisville, Texas, 2004. 29. C. Boyapati, S. Khurshid, and D. Marinov. Korat: Automated Testing Based on Java Predicates. Paper presented at the ACM International Symposium on Software Testing and Analysis (ISSTA), Rome, Italy, 2002, pp. 123–133. 30. T. Xie, D. Marinov, W. Schulte, and D. Notkin. Symstra: A Framework for Generating Object-Oriented Unit Tests using Symbolic Execution. In Proceedings of the 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, Edinburgh, U.K., Springer Berlin/Hiedelberg, 2005, pp. 365–381. 31. C. Pacheco and M. D. Ernst. Eclat: Automatic Generation and Classification of Test Inputs. Paper presented at ECOOP 2005 Object-Oriented Programming, 19th European Conference, Glasgow, Scotland, July 2005, pp. 504– 527. 32. D. Spinellis. Debuggers and Logging Frameworks. IEEE Software, September 2006, pp. 98–99. 86 CHAPTER 3 UNIT TESTING 33. B. Lewis. Omniscient Debugging. Dr. Dobb’s Journal , June 2005, pp. 16– 24. 34. K. Sen, D. Marinov, and Gul Agha. CUTE: A Concolic Unit Testing Engine for C. In Proceedings of the 10th European Software Engineering Conference, held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Lisbon, Portugal, September 2005, pp. 263–272. 35. P. Godefroid, N. Klarlund, and K. Sen. DART: Directed Automated Random Testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, Chicago IL, ACM Press, New York, 2005, pp. 213–223. 36. C. Cadar and D. Engler. Execution Generated Test Cases: How to Make Systems Code Crash Itself. Lecture Notes in Computer Science (LNCS), Vol. 3639, 2005, pp. 2–23. 37. R. Bejtlich. The Tao of Network Security Monitoring: Beyond Intrusion Detection. Addison-Wesley, Boston, MA, 2005. Exercises 1. Study the Yourdon [2] concept of a design walkthrough and the IBM concept [1] of a design inspection. Discuss the similarities and the differences between them. 2. A software engineering group is developing a mission-critical software system that will launch laser-guided missiles to its destinations. This is a new kind of product that was never built by the company. As a quality assurance manager, which code review methodology—walkthrough or inspection—would you recommend? Justify your answer. 3. What size of a review team would you recommend for the project in exercise 2, and why? What are the different roles of each member of the review team? What groups should send representatives to participate in code review? 4. Suppose that the C programming language is chosen in the project in exercise 2. Recommend a detailed code review checklist to the review team. 5. In addition to code review, what other static unit testing techniques would you recommend for the project in exercise 3? Justify your answer. 6. Describe the special role of a recordkeeper. 7. Discuss the importance of code review rework and validation. 8. Draw a control flow graph for the following sample code. Determine the cyclomatic complexity of the graph. (a) sum_of_all_positive_numbers(a, num_of_entries, sum) (b) sum = 0 (c) init = 1 (d) while(init <= num_of_entries) (e) if a[init] > 0 (f) sum = sum + a[init] endif (g) init = init + 1 endwhile (h) end sum_of_all_positive_numbers REFERENCES 87 9. A test engineer generates 70 mutants of a program P and 150 test cases to test the program P. After the first iteration of mutation testing, the tester finds 58 dead mutants and 4 equivalent mutants. Calculate the mutation score for this test suite. Is the test suite adequate for program P ? Should the test engineer develop additional test cases? Justify your answer. 10. There is some debate as to whether code should be compiled before it is reviewed and vice versa. Based on your experience, give an opinion on this matter. 11. Attempt to draw a control flow graph for a module that you have recently developed. Determine the cyclomatic complexity for the module. Is the module too complex? 12. For your current software project, conduct a formal code review as described in Section 3.2. 13. For your current software project, develop dynamic unit test cases for each of the units in the JUnit framework if the code is in Java or in an appropriate xUnit framework. 4 CHAPTER Control Flow Testing He who controls the present, controls the past. He who controls the past, controls the future. — George Orwell 4.1 BASIC IDEA Two kinds of basic statements in a program unit are assignment statements and conditional statements. An assignment statement is explicitly represented by using an assignment symbol, “ = ”, such as x = 2*y;, where x and y are variables. Program conditions are at the core of conditional statements, such as if(), for() loop, while() loop, and goto. As an example, in if(x! = y), we are testing for the inequality of x and y. In the absence of conditional statements, program instructions are executed in the sequence they appear. The idea of successive execution of instructions gives rise to the concept of control flow in a program unit. Conditional statements alter the default, sequential control flow in a program unit. In fact, even a small number of conditional statements can lead to a complex control flow structure in a program. Function calls are a mechanism to provide abstraction in program design. A call to a program function leads to control entering the called function. Similarly, when the called function executes its return statement, we say that control exits from the function. Though a function can have many return statements, for simplicity, one can restructure the function to have exactly one return. A program unit can be viewed as having a well-defined entry point and a well-defined exit point. The execution of a sequence of instructions from the entry point to the exit point of a program unit is called a program path. There can be a large, even infinite, number of paths in a program unit. Each program path can be characterized by an input and an expected output. A specific input value causes a specific program path to be executed; it is expected that the program path performs the desired computation, thereby producing the expected output value. Therefore, it may seem natural to execute as many program paths as possible. Mere execution of a large number of Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 88 4.2 OUTLINE OF CONTROL FLOW TESTING 89 paths, at a higher cost, may not be effective in revealing defects. Ideally, one must strive to execute fewer paths for better effectiveness. The concepts of control flow in computer programs [1], program paths [2], and control flow testing [2–8] have been studied for many decades. Tools are being developed to support control flow testing [9]. Such tools identify paths from a program unit based on a user-defined criterion, generate the corresponding input to execute a selected path, and generate program stubs and drivers to execute the test. Control flow testing is a kind of structural testing, which is performed by programmers to test code written by them. The concept is applied to small units of code, such as a function. Test cases for control flow testing are derived from the source code, such as a program unit (e.g., a function or method), rather than from the entire program. Structurally, a path is a sequence of statements in a program unit, whereas, semantically, it is an execution instance of the unit. For a given set of input data, the program unit executes a certain path. For another set of input data, the unit may execute a different path. The main idea in control flow testing is to appropriately select a few paths in a program unit and observe whether or not the selected paths produce the expected outcome. By executing a few paths in a program unit, the programmer tries to assess the behavior of the entire program unit. 4.2 OUTLINE OF CONTROL FLOW TESTING The overall idea of generating test input data for performing control flow testing has been depicted in Figure 4.1. The activities performed, the intermediate results produced by those activities, and programmer preferences in the test generation process are explained below. Inputs: The source code of a program unit and a set of path selection criteria are the inputs to a process for generating test data. In the following, two examples of path selection criteria are given. Example. Select paths such that every statement is executed at least once. Example. Select paths such that every conditional statement, for example, an if() statement, evaluates to true and false at least once on different occasions. A conditional statement may evaluate to true in one path and false in a second path. Generation of a Control Flow Graph: A control flow graph (CFG) is a detailed graphical representation of a program unit. The idea behind drawing a CFG is to be able to visualize all the paths in a program unit. The process of drawing a CFG from a program unit will be explained in the following section. If the process of test generation is automated, a compiler can be modified to produce a CFG. 90 CHAPTER 4 CONTROL FLOW TESTING Path selection criteria Program unit Inputs Draw a control flow graph Control flow graph Select paths Selected paths Generate test input data Process of generating test input data Are No the selected paths feasible? Yes Output Test input data Figure 4.1 Process of generating test input data for control flow testing. Selection of Paths: Paths are selected from the CFG to satisfy the path selection criteria, and it is done by considering the structure of the CFG. Generation of Test Input Data: A path can be executed if and only if a certain instance of the inputs to the program unit causes all the conditional statements along the path to evaluate to true or false as dictated by the control flow. Such a path is called a feasible path. Otherwise, the path is said to be infeasible. It is essential to identify certain values of the inputs from a given path for the path to execute. Feasibility Test of a Path: The idea behind checking the feasibility of a selected path is to meet the path selection criteria. If some chosen paths are found to be infeasible, then new paths are selected to meet the criteria. 4.3 CONTROL FLOW GRAPH A CFG is a graphical representation of a program unit. Three symbols are used to construct a CFG, as shown in Figure 4.2. A rectangle represents a sequential Computation True False Decision 4.3 CONTROL FLOW GRAPH 91 Sequential computation Decision point Figure 4.2 Symbols in a CFG. Merge point computation. A maximal sequential computation can be represented either by a single rectangle or by many rectangles, each corresponding to one statement in the source code. We label each computation and decision box with a unique integer. The two branches of a decision box are labeled with T and F to represent the true and false evaluations, respectively, of the condition within the box. We will not label a merge node, because one can easily identify the paths in a CFG even without explicitly considering the merge nodes. Moreover, not mentioning the merge nodes in a path will make a path description shorter. We consider the openfiles() function shown in Figure 4.3 to illustrate the process of drawing a CFG. The function has three statements: an assignment statement int i = 0;, a conditional statement if(), and a return(i) statement. The reader may note that irrespective of the evaluation of the if(), the function performs the same action, namely, null. In Figure 4.4, we show a high-level representation of FILE *fptr1, *fptr2, *fptr3; /* These are global variables. */ int openfiles(){ /* This function tries to open files "file1", "file2", and "file3" for read access, and returns the number of files successfully opened. The file pointers of the opened files are put in the global variables. */ int i = 0; if( ((( fptr1 = fopen("file1", "r")) != NULL) && (i++) && (0)) || ((( fptr2 = fopen("file2", "r")) != NULL) && (i++) && (0)) || ((( fptr3 = fopen("file3", "r")) != NULL) && (i++)) ); return(i); } Figure 4.3 Function to open three files. 92 CHAPTER 4 CONTROL FLOW TESTING Entry point T 1 i=0 2 if() F 3 Exit point return(i) Figure 4.4 High-level CFG representation of openfiles(). The three nodes are numbered 1, 2, and 3. the control flow in openfiles() with three nodes numbered 1, 2, and 3. The flow graph shows just two paths in openfiles(). A closer examination of the condition part of the if() statement reveals that there are not only Boolean and relational operators in the condition part, but also assignment statements. Some of their examples are given below: Assignment statements: fptr1 = fopen(“file1”, “r”) and i ++ Relational operator: fptr1! = NULL Boolean operators: && and || Execution of the assignment statements in the condition part of the if statement depends upon the component conditions. For example, consider the following component condition in the if part: ((( fptr1 = fopen("file1", "r")) != NULL) && (i++) && (0)) The above condition is executed as follows: • Execute the assignment statement fptr1 = fopen(“file1”, “r”). • Execute the relational operation fptr1! = NULL. • If the above relational operator evaluates to false, skip the evaluation of the subsequent condition components (i++) && (0). • If the relational operator evaluates to true, then first (i) is evaluated to true or false. Irrespective of the outcome of this evaluation, the next statement executed is (i++). • If (i) has evaluated to true, then the condition (0) is evaluated. Otherwise, evaluation of (0) is skipped. In Figure 4.5, we show a detailed CFG for the openfiles() function. The figure illustrates a fact that a CFG can take up a complex structure even for a small program unit. We give a Java method, called ReturnAverage(), in Figure 4.6. The method accepts four parameters, namely value, AS, MIN, and MAX, where value is an integer array and AS is the maximum size of the array. The array can hold fewer number of elements than AS; such a scenarion is semantically represented by having the value −999 denoting the end of the array. For example, AS = 15, 4.4 PATHS IN A CONTROL FLOW GRAPH 93 1 i=0 2 fptr1 = fopen("file1", "r") 3 fptr1 != NULL T F 4 T iF 5 i++ 6 i++ 8 fptr2 = fopen("file2", "r") 9 fptr2 != NULL T F 10 T iF 11 12 i++ i++ 7F 0 T 13 0 F T 14 fptr3 = fopen("file3", "r") 15 T fptr3 != NULL 16 Ti 18 F F 17 i++ i++ 20 null 19 null 21 return(i) Figure 4.5 Detailed CFG representation of openfiles(). The numbers 1–21 are the nodes. whereas the 10th element of the array is −999, which means that there are 10 elements—0–9–in the array. MIN and MAX are two integer values that are used to perform certain computations within the method. The method sums up the values of all those elements of the array which fall within the closed range [MIN, MAX], counts their number, and returns their average value. The CFG of the method is shown in Figure 4.7. 4.4 PATHS IN A CONTROL FLOW GRAPH We assume that a control flow graph has exactly one entry node and exactly one exit node for the convenience of discussion. Each node is labeled with a unique 94 CHAPTER 4 CONTROL FLOW TESTING public static double ReturnAverage(int value[], int AS, int MIN, int MAX){ /* Function: ReturnAverage Computes the average of all those numbers in the input array in the positive range [MIN, MAX]. The maximum size of the array is AS. But, the array size could be smaller than AS in which case the end of input is represented by -999. */ int i, ti, tv, sum; double av; i = 0; ti = 0; tv = 0; sum = 0; while (ti < AS && value[i] != -999) { ti++; if (value[i] >= MIN && value[i] <= MAX) { tv++; sum = sum + value[i]; } i++; } if (tv > 0) av = (double)sum/tv; else av = (double) -999; return (av); } Figure 4.6 Function to compute average of selected integers in an array. This program is an adaptation of “Figure 2. A sample program” in ref. 10. (With permission from the Australian Computer Society.) integer value. Also, the two branches of a decision node are appropriately labeled with true (T) or false (F). We are interested in identifying entry–exit paths in a CFG. A path is represented as a sequence of computation and decision nodes from the entry node to the exit node. We also specify whether control exits a decision node via its true or false branch while including it in a path. In Table 4.1, we show a few paths from the control flow graph of Figure 4.7. The reader may note that we have arbitrarily chosen these paths without applying any path selection criterion. We have unfolded the loop just once in path 3, whereas path 4 unfolds the same loop twice, and these are two distinct paths. 4.5 PATH SELECTION CRITERIA A CFG, such as the one shown in Figure 4.7, can have a large number of different paths. One may be tempted to test the execution of each and every path in a program unit. For a program unit with a small number of paths, executing all the paths may 4.5 PATH SELECTION CRITERIA 95 Initialize: value[], AS 1 MIN, MAX i = 0, ti = 0, 2 tv = 0, sum =0 3 F T ti < AS 10 F T tv > 0 4 F value[i] != −999 T 5 ti++ 11 av = (double)−999 12 av = (double)sum/tv 6 F value[i] >= MIN T 7 F value[i] <= MAX T 8 tv++ sum = sum + value[i] 13 return(av) 9 i++ Figure 4.7 A CFG representation of ReturnAverage(). Numbers 1–13 are the nodes. TABLE 4.1 Examples of Path in CFG of Figure 4.7 Path 1 Path 2 Path 3 Path 4 1-2-3(F)-10(T)-12-13 1-2-3(F)-10(F)-11-13 1-2-3(T)-4(T)-5-6(T)-7(T)-8-9-3(F)-10(T)-12-13 1-2-3(T)-4(T)-5-6(T)-7(T)-8-9-3(T)-4(T)-5-6(T)-7(T)-8-9-3(F)-10(T)-12-13 96 CHAPTER 4 CONTROL FLOW TESTING be desirable and achievable as well. On the other hand, for a program unit with a large number of paths, executing every distinct path may not be practical. Thus, it is more productive for programmers to select a small number of program paths in an effort to reveal defects in the code. Given the set of all paths, one is faced with a question “What paths do I select for testing?” The concept of path selection criteria is useful is answering the above question. In the following, we state the advantages of selecting paths based on defined criteria: • All program constructs are exercised at least once. The programmer needs to observe the outcome of executing each program construct, for example, statements, Boolean conditions, and returns. • We do not generate test inputs which execute the same path repeatedly. Executing the same path several times is a waste of resources. However, if each execution of a program path potentially updates the state of the system, for example, the database state, then multiple executions of the same path may not be identical. • We know the program features that have been tested and those not tested. For example, we may execute an if statement only once so that it evaluates to true. If we do not execute it once again for its false evaluation, we are, at least, aware that we have not observed the outcome of the program with a false evaluation of the if statement. Now we explain the following well-known path selection criteria: • Select all paths. • Select paths to achieve complete statement coverage. • Select paths to achieve complete branch coverage. • Select paths to achieve predicate coverage. 4.5.1 All-Path Coverage Criterion If all the paths in a CFG are selected, then one can detect all faults, except those due to missing path errors. However, a program may contain a large number of paths, or even an infinite number of paths. The small, loop-free openfiles() function shown in Figure 4.3 contains more than 25 paths. One does not know whether or not a path is feasible at the time of selecting paths, though only eight of all those paths are feasible. If one selects all possible paths in a program, then we say that the all-path selection criterion has been satisfied. Let us consider the example of the openfiles() function. This function tries to open the three files file1, file2, and file3. The function returns an integer representing the number of files it has successfully opened. A file is said to be successfully opened with “read” access if the file exists. The existence of a file is either “yes” or “no.” Thus, the input domain of the function consists of eight combinations of the existence of the three files, as shown in Table 4.2. We can trace a path in the CFG of Figure 4.5 for each input, that is, each row of Table 4.2. Ideally, we identify test inputs to execute a certain path in a 4.5 PATH SELECTION CRITERIA 97 TABLE 4.2 Input Domain of openfiles() Existence of file1 Existence of file2 Existence of file3 No No No No No Yes No Yes No No Yes Yes Yes No No Yes No Yes Yes Yes No Yes Yes Yes TABLE 4.3 Inputs and Paths in openfiles() Input Path < No, No, No > < Yes, No, No > < Yes, Yes, Yes > 1-2-3(F)-8-9(F)-14-15(F)-19-21 1-2-3(T)-4(F)-6-8-9(F)-14-15(F)-19-21 1-2-3(T)-4(F)-6-8-9(T)-10(T)-11-13(F)-14-15(T)-16(T)-18-20-21 program; this will be explained later in this chapter. We give three examples of the paths executed by the test inputs (Table 4.3). In this manner, we can identify eight possible paths in Figure 4.5. The all-paths selection criterion is desirable since it can detect faults; however, it is difficult to achieve in practice. 4.5.2 Statement Coverage Criterion Statement coverage refers to executing individual program statements and observing the outcome. We say that 100% statement coverage has been achieved if all the statements have been executed at least once. Complete statement coverage is the weakest coverage criterion in program testing. Any test suite that achieves less than statement coverage for new software is considered to be unacceptable. All program statements are represented in some form in a CFG. Referring to the ReturnAverage() method in Figure 4.6 and its CFG in Figure 4.7, the four assignment statements i = 0; ti = 0; tv = 0; sum = 0; have been represented by node 2. The while statement has been represented as a loop, where the loop control condition (ti < AS && value[i] != -999) 98 CHAPTER 4 CONTROL FLOW TESTING has been represented by nodes 3 and 4. Thus, covering a statement in a program means visiting one or more nodes representing the statement, more precisely, selecting a feasible entry–exit path that includes the corresponding nodes. Since a single entry–exit path includes many nodes, we need to select just a few paths to cover all the nodes of a CFG. Therefore, the basic problem is to select a few feasible paths to cover all the nodes of a CFG in order to achieve the complete statement coverage criterion. We follow these rules while selecting paths: • Select short paths. • Select paths of increasingly longer length. Unfold a loop several times if there is a need. • Select arbitrarily long, “complex” paths. One can select the two paths shown in Figure 4.4 to achieve complete statement coverage. 4.5.3 Branch Coverage Criterion Syntactically, a branch is an outgoing edge from a node. All the rectangle nodes have at most one outgoing branch (edge). The exit node of a CFG does not have an outgoing branch. All the diamond nodes have two outgoing branches. Covering a branch means selecting a path that includes the branch. Complete branch coverage means selecting a number of paths such that every branch is included in at least one path. In a preceding discussion, we showed that one can select two paths, SCPath 1 and SCPath 2 in Table 4.4, to achieve complete statement coverage. These two paths cover all the nodes (statements) and most of the branches of the CFG shown in Figure 4.7. The branches which are not covered by these two paths have been highlighted by bold dashed lines in Figure 4.8. These uncovered branches correspond to the three independent conditions value[i] != -999 value[i] >= MIN value[i] <= MAX evaluating to false. This means that as a programmer we have not observed the outcome of the program execution as a result of the conditions evaluating to false. Thus, complete branch coverage means selecting enough number of paths such that every condition evaluates to true at least once and to false at least once. We need to select more paths to cover the branches highlighted by the bold dashed lines in Figure 4.8. A set of paths for complete branch coverage is given in Table 4.5. TABLE 4.4 Paths for Statement Coverage of CFG of Figure 4.7 SCPath 1 SCPath 2 1-2-3(F)-10(F)-11-13 1-2-3(T)-4(T)-5-6(T)-7(T)-8-9-3(F)-10(T)-12-13 4.5 PATH SELECTION CRITERIA 99 Initialize: value[], AS 1 MIN, MAX i = 0, ti = 0, 2 tv = 0, sum = 0 3 F T ti < AS 4 F value[i] != −999 T 10 F T tv > 0 5 ti++ 11 av = (double)−999 12 av = (double)sum/tv 6 F value[i] >= MIN T 7 F value[i] <= MAX T 8 tv++ sum = sum + value[i] 13 return(av) 9 i++ Figure 4.8 Dashed arrows represent the branches not covered by statement covering in Table 4.4. TABLE 4.5 Paths for Branch Coverage of CFG of Figure 4.7 BCPath 1 BCPath 2 BCPath 3 BCPath 4 BCPath 5 1-2-3(F)-10(F)-11-13 1-2-3(T)-4(T)-5-6(T)-7(T)-8-9-3(F)-10(T)-12-13 1-2-3(T)-4(F)-10(F)-11-13 1-2-3(T)-4(T)-5-6(F)-9-3(F)-10(F)-11-13 1-2-3(T)-4(T)-5-6(T)-7(F)-9-3(F)-10(F)-11-13 100 CHAPTER 4 CONTROL FLOW TESTING 4.5.4 Predicate Coverage Criterion We refer to the partial CFG of Figure 4.9a to explain the concept of predicate coverage. OB1, OB2, OB3, and OB are four Boolean variables. The program computes the values of the individual variables OB1, OB2, and OB3— details of their computation are irrelevant to our discussion and have been omitted. Next, OB is computed as shown in the CFG. The CFG checks the value of OB and executes either OBlock1 or OBlock2 depending on whether OB evaluates to true or false, respectively. We need to design just two test cases to achieve both statement coverage and branch coverage. We select inputs such that the four Boolean conditions in Figure 4.9a evaluate to the values shown in Table 4.6. The reader may note that we have shown just one way of forcing OB to true. If we select inputs so that these two cases hold, then we do not observe the effect of the computations taking place in nodes 2 and 3. There may be faults in the computation parts of nodes 2 and 3 such that OB2 and OB3 always evaluate to false. 1 Compute OB1 2 Compute OB2 1 Compute AB1 2 Compute AB2 3 Compute OB3 3 Compute AB3 4 OB = OB1 || OB2 || OB3 4 AB = AB1 && AB2 && AB3 5 T F if(OB) 6 OBlock1 7 OBlock2 5 T F if(AB) 6 ABlock1 7 ABlock2 (a) (b) Figure 4.9 Partial CFG with (a) OR operation and (b) AND operation. TABLE 4.6 Two Cases for Complete Statement and Branch Coverage of CFG of Figure 4.9a Cases OB1 OB2 OB3 OB 1 T F F T 2 F F F F 4.6 GENERATING TEST INPUT 101 Therefore, there is a need to design test cases such that a path is executed under all possible conditions. The False branch of node 5 (Figure 4.9a) is executed under exactly one condition, namely, when OB1 = False, OB2 = False, and OB3 = False, whereas the true branch executes under seven conditions. If all possible combinations of truth values of the conditions affecting a selected path have been explored under some tests, then we say that predicate coverage has been achieved. Therefore, the path taking the true branch of node 5 in Figure 4.9a must be executed for all seven possible combinations of truth values of OB1, OB2, and OB3 which result in OB = True. A similar situation holds for the partial CFG shown in Figure 4.9b, where AB1, AB2, AB3, and AB are Boolean variables. 4.6 GENERATING TEST INPUT In Section 4.5 we explained the concept of path selection criteria to cover certain aspects of a program with a set of paths. The program aspects we considered were all statements, true and false evaluations of each condition, and combinations of conditions affecting execution of a path. Now, having identified a path, the question is how to select input values such that when the program is executed with the selected inputs, the chosen paths get executed. In other words, we need to identify inputs to force the executions of the paths. In the following, we define a few terms and give an example of generating test inputs for a selected path. 1. Input Vector: An input vector is a collection of all data entities read by the routine whose values must be fixed prior to entering the routine. Members of an input vector of a routine can take different forms as listed below: • Input arguments to a routine • Global variables and constants • Files • Contents of registers in assembly language programming • Network connections • Timers A file is a complex input element. In one case, mere existence of a file can be considered as an input, whereas in another case, contents of the file are considered 102 CHAPTER 4 CONTROL FLOW TESTING to be inputs. Thus, the idea of an input vector is more general than the concept of input arguments of a function. Example. An input vector for openfiles() (Figure 4.3) consists of individual presence or absence of the files file1, file2, and file3. Example. The input vector of the ReturnAverage() method shown in Figure 4.6 is < value [], AS, MIN, MAX > . 2. Predicate: A predicate is a logical function evaluated at a decision point. Example. The construct ti < AS is the predicate in decision node 3 of Figure 4.7. Example. The construct OB is the predicate in decision node 5 of Figure 4.9. 3. Path Predicate: A path predicate is the set of predicates associated with a path. The path in Figure 4.10 indicates that nodes 3, 4, 6, 7, and 10 are decision nodes. The predicate associated with node 3 appears twice in the path; in the first instance it evaluates to true and in the second instance it evaluates to false. The path predicate associated with the path under consideration is shown in Figure 4.11. We also specify the intended evaluation of the component predicates as found in the path specification. For instance, we specify that value[i] ! = −999 must evaluate to true in the path predicate shown in Figure 4.11. We keep this additional information for the following two reasons: • In the absence of this additional information denoting the intended evaluation of a predicate, we will have no way to distinguish between the two instances of the predicate ti < AS, namely 3(T) and 3(F), associated with node 3. 1-2-3(T)-4(T)-5-6(T)-7(T)-8-9-3(F)-10(T)-12-13 Figure 4.10 Example of a path from Figure 4.7. ti < AS value[i] != -999 value[i] >= MIN value[i] <= MAX ti < AS tv > 0 ≡ True ≡ True ≡ True ≡ True ≡ False ≡ True Figure 4.11 Path predicate for path in Figure 4.10. 4.6 GENERATING TEST INPUT 103 • We must know whether the individual component predicates of a path predicate evaluate to true or false in order to generate path forcing inputs. 4. Predicate Interpretation: The path predicate shown in Figure 4.11 is composed of elements of the input vector < value[], AS, MIN, MAX >, a vector of local variables < i, ti, tv >, and the constant −999. The local variables are not visible outside a function but are used to • hold intermediate results, • point to array elements, and • control loop iterations. In other words, they play no roles in selecting inputs that force the paths to execute. Therefore, we can easily substitute all the local variables in a predicate with the elements of the input vector by using the idea of symbolic substitution. Let us consider the method shown in Figure 4.12. The input vector for the method in Figure 4.12 is given by < x1, x2 > . The method defines a local variable y and also uses the constants 7 and 0. The predicate x1 + y >= 0 can be rewritten as x1 + x2 + 7 >= 0 by symbolically substituting y with x 2 + 7. The rewritten predicate x1 + x2 + 7 >= 0 has been expressed solely in terms of the input vector < x1,x2 > and the constant vector < 0,7 > . Thus, predicate interpretation is defined as the process of symbolically substituting operations along a path in order to express the predicates solely in terms of the input vector and a constant vector. In a CFG, there may be several different paths leading up to a decision point from the initial node, with each path doing different computations. Therefore, a predicate may have different interpretations depending on how control reaches the predicate under consideration. public static int SymSub(int x1, int x2){ int y; y = x2 + 7; if (x1 + y >= 0) return (x2 + y); else return (x2 - y); } Figure 4.12 Method in Java to explain symbolic substitution [11]. 104 CHAPTER 4 CONTROL FLOW TESTING 5. Path Predicate Expression: An interpreted path predicate is called a path predicate expression. A path predicate expression has the following properties: • It is void of local variables and is solely composed of elements of the input vector and possibly a vector of constants. • It is a set of constraints constructed from the elements of the input vector and possibly a vector of constants. • Path forcing input values can be generated by solving the set of constraints in a path predicate expression. • If the set of constraints cannot be solved, there exist no input which can cause the selected path to execute. In other words, the selected path is said to be infeasible. • An infeasible path does not imply that one or more components of a path predicate expression are unsatisfiable. It simply means that the total combination of all the components in a path predicate expression is unsatisfiable. • Infeasibility of a path predicate expression suggests that one considers other paths in an effort to meet a chosen path selection criterion. Example. Consider the path shown in Figure 4.10 from the CFG of Figure 4.7. Table 4.7 shows the nodes of the path in column 1, the corresponding description of each node in column 2, and the interpretation of each node in column 3. The TABLE 4.7 Interpretation of Path Predicate of Path in Figure 4.10. Node Node Description Interpreted Description 1 2 3(T) 4(T) 5 6(T) 7(T) 8 9 3(F) 10(T) 12 13 Input vector: < value[], AS, MIN, MAX > i = 0, ti = 0, tv = 0, sum = 0 ti < AS value[i]! = − 999 ti++ value[i] > = MIN value[i] < = MAX tv++ sum = sum + value[i] i++ ti < AS tv > 0 av = (double) sum/tv return(av) 0 < AS value[0]! = − 999 ti = 0 + 1 = 1 value[0] > = MIN value[0] < = MAX tv = 0 + 1 = 1 sum = 0 + value[0] = value[0] i = 0+1= 1 1 < AS 1>0 av = (double) value[0]/1 return(value[0]) Note: The bold entries in column 1 denote interpreted predicates. 4.6 GENERATING TEST INPUT 105 intended evaluation of each interpreted predicate can be found in column 1 of the same row. We show the path predicate expression of the path under consideration in Figure 4.13 for the sake of clarity. The rows of Figure 4.13 have been obtained from Table 4.11 by combining each interpreted predicate in column 3 with its intended evaluation in column 1. Now the reader may compare Figures 4.11 and 4.13 to note that the predicates in Figure 4.13 are interpretations of the corresponding predicates in Figure 4.11. Example. We show in Figure 4.14 an infeasible path appearing in the CFG of Figure 4.7. The path predicate and its interpretation are shown in Table 4.8, and the path predicate expression is shown in Figure 4.15. The path predicate expression is unsolvable because the constraint 0 > 0 ≡ True is unsatisfiable. Therefore, the path shown in Figure 4.14 is an infeasible path. 0 < AS value[0] != -999 value[0] >= MIN value[0] <= MAX 1 < AS 1>0 ≡ True ≡ True ≡ True ≡ True ≡ False ≡ True ........ (1) ........ (2) ........ (3) ........ (4) ........ (5) ........ (6) Figure 4.13 Path predicate expression for path in Figure 4.10. 1-2-3(T)-4(F)-10(T)-12-13. Figure 4.14 Another example path from Figure 4.7. TABLE 4.8 Interpretation of Path Predicate of Path in Figure 4.14. Node Node Description Interpreted Description 1 2 3(T) 4(F) 10(T) 12 13 Input vector: < value[], AS, MIN, MAX > i = 0, ti = 0, tv = 0, sum = 0 ti < AS value[i]! = − 999 tv > 0 av = (double)sum/tv return(av) 0 < AS value[0]! = − 999 0>0 av = (double)value[0]/0 return((double) value[0]/0) Note: The bold entries in column 1 denote interpreted predicates. 106 CHAPTER 4 CONTROL FLOW TESTING 0 < AS ≡ True ........ (1) value[0] != -999 ≡ True ........ (2) 0 > 0 ≡ True ........ (3) Figure 4.15 Path predicate expression for path in Figure 4.14. AS MIN MAX value[0] =1 = 25 = 35 = 30 Figure 4.16 Input data satisfying constraints of Figure 4.13. 6. Generating Input Data from Path Predicate Expression: We must solve the corresponding path predicate expression in order to generate input data which can force a program to execute a selected path. Let us consider the path predicate expression shown in Figure 4.13. We observe that constraint 1 is always satisfied. Constraints 1 and 5 must be solved together to obtain AS = 1. Similarly, constraints 2, 3, and 4 must be solved together. We note that MIN < = value[0] < = MAX and value[0]! = −999. Therefore, we have many choices to select values of MIN, MAX, and value[0]. An instance of the solutions of the constraints of Figure 4.13 is shown in Figure 4.16. 4.7 EXAMPLES OF TEST DATA SELECTION We give examples of selected test data to achieve complete statement and branch coverage. We show four sets of test data in Table 4.9. The first two data sets cover all statements of the CFG in Figure 4.7. However, we need all four sets of test data for complete branch coverage. If we execute the method ReturnAverage shown in Figure 4.6 with the four sets of test input data shown in Figure 4.9, then each statement of the method is executed at least once, and every Boolean condition evaluates once to true and TABLE 4.9 Test Data for Statement and Branch Coverage Input Vector Test Data Set AS MIN MAX value[] 1 1 5 20 [10] 2 1 5 20 [ − 999] 3 1 5 20 [4] 4 1 5 20 [25] 4.8 CONTAINING INFEASIBLE PATHS 107 once to false. We have thoroughly tested the method in the sense of complete branch coverage. However, it is possible to introduce simple faults in the method which can go undetected when the method with the above four sets of test data is executed. Two examples of fault insertion are given below. Example. We replace the correct statement av = (double) sum/tv; with a faulty statement av = (double) sum/ti; in the method. Here the fault is that the method computes the average of the total number of inputs, denoted by ti, rather than the total number of valid inputs, denoted by tv. Example. We replace the correct statement sum = sum + value[i]; with a faulty statement sum = value[i]; in the method. Here the fault is that the method no more computes the sum of all the valid inputs in the array. In spite of the fault, the first set of test data produce the correct result due to coincidental correctness. The above two examples of faults lead us to the following conclusions: • One must generate test data to satisfy certain selection criteria, because those selection criteria identify the aspects of a program that we want to cover. • Additional tests, which are much longer than the simple tests generated to meet coverage criteria, must be generated after the coverage criteria have been met. • Given a set of test data for a program, we can inject faults into the program which go undetected by those test cases. 4.8 CONTAINING INFEASIBLE PATHS Woodward, Hedley, and Hennell [12] have identified some practical problems in applying the idea of path testing. First, a CFG may contain a very large number of paths; therefore, the immediate challenge is to decide which paths to select to derive test cases. Second, it may not be feasible to execute many of the selected paths. Thus, it is useful to apply a path selection strategy: First, select as many short paths as feasible; next choose longer paths to achieve better coverage of statements, branches, and predicates. A large number of infeasible paths in a CFG complicate the process of test selection. To simplify path-based unit testing, it is 108 CHAPTER 4 CONTROL FLOW TESTING useful to reduce the number of infeasible paths in a program unit through language design, program design, and program transformation. Brown and Nelson [13] have demonstrated the possibility of writing code with no infeasible paths. Bertolino and Marre [14] have given an algorithm to generate a set of paths, to cover all the branchs of a CFG, to reduce the number of infeasible paths in the chosen set. Their algorithm is based on the idea of a reduced flow graph, called a ddgraph. The algorithm uses the concepts of dominance and implications among the arcs of a ddgraph. Yates and Malevris [15] have suggested a strategy to reduce the number of infeasible paths in a set of paths to achieve branch coverage. They suggest selecting a path cover, that is, a set of paths, whose constituent paths each involve a minimum number of predicates. On the contrary, if a path involves a large number of predicates, it is less likely that all the predicates simultaneously hold, thereby making the path infeasible. They have statistically demonstrated the efficacy of the strategy. McCabe’s [16] cyclomatic complexity measure (Table 3.3) gives an interesting graph-theoretic interpretation of a program flow graph. If we consider cyclomatic complexity measures as paths in a flow graph, it is likely that a few infeasible paths will be constructed. The above discussion leads us to conclude that though the idea of statement coverage and branch coverage appear simple and straightforward, it is not easy to fully achieve those coverage criteria even for small programs. 4.9 SUMMARY The notion of a path in a program unit is a fundamental concept. Assuming that a program unit is a function, a path is an executable sequence of instructions from the start of execution of the function to a return statement in the function. If there is no branching condition in a program unit, then there is just one path in the function. Generally, there are many branching conditions in a program unit, and thus there are numerous paths. One path differs from another path by at least one instruction. A path may contain one or more loops, but, ultimately, a path is expected to terminate its execution. Therefore, a path is of finite length in terms of number of instructions it executes. One can have a graphical representation of a program unit, called a control flow graph, to capture the concept of control flow in the program unit. Each path corresponds to a distinct behavior of the program unit, and therefore we need to test each path with at least one test case. If there are a large number of paths in a program, a programmer may not have enough time to test all the paths. Therefore, there is a need to select a few paths by using some path selection criteria. A path selection criterion allows us to select a few paths to achieve a certain kind of coverage of program units. Some well-known coverage metrics are statement coverage, branch coverage, and predicate coverage. A certain number of paths are chosen from the CFG to achieve a desired degree of coverage of a program unit. At an abstract level, each path is composed of a sequence of predicates and assignment (computation) statements. The predicates can be functions of local variables, global LITERATURE REVIEW 109 variables, and constants, and those are called path predicates. All the predicates along the path must evaluate to true when control reaches the predicates for a path to be executable. One must select inputs, called path forcing inputs, such that the path predicates evaluate to true in order to be able to execute the path. The process of selecting path forcing inputs involves transforming the path predicates into a form that is void of local variables. Such a form of path predicates is called a path predicate expression. A path predicate expression is solely composed of the input vector and possibly a vector of constants. One can generate values of the input vector, which is considered as a test case, to exercise a path by solving the corresponding path predicate expression. Tools are being designed for generating test inputs from program units. If a program unit makes function calls, it is possible that the path predicates are functions of the values returned by those functions. In such a case, it may be difficult to solve a path predicate expression to generate test cases. Path testing is more applicable to lower level program units than to upper level program units containing many function calls. LITERATURE REVIEW Clarke [3] describes an automated system to generate test data from FORTRAN programs. The system is based on the idea of selecting program paths, identifying path conditions, and solving those conditions to generate inputs. When the program is executed with the selected inputs, the corresponding paths are executed. Automatically generating test inputs is a difficult task. The general problem of test generation from source code is an unsolvable problem. To mitigate the problem, there have been suggestions to select paths in certain ways. For example, select paths that execute loops for a restricted number of times. Similarly, select paths that are restricted to a maximum statement count. This is because longer paths are likely to have more predicates and are likely to be more complex. The system generates test inputs for paths that can be described by a set of linear path constraints. The students are encouraged to read the tutorial by J. C. Huang entitled “An Approach to Program Testing,” ACM Computing Surveys, Vol. 8, No. 3, September 1975, pp. 113–128. This article discusses a method for determining path conditions to enable achievement of branch coverage. It introduces the reader to the predicate calculus notation for expressing path conditions. Ramamoorthy, Ho, and Chen [7] discuss the usefulness of symbolic substitution in generating path predicates for testing a path. Array referencing is a major problem in symbolic substitution because index values may not be known during symbolic execution. References to arrays are recorded in a table while performing symbolic execution, and ambiguities are resolved when test input are generated to evaluate the subscript expressions. Another major problem is determination of the number of times to execute a loop. Considering that symbolic execution requires complex algebraic manipulations, Korel [17] suggested an alternative idea based on actual execution of the program under test, function minimization methods, and data flow analysis. Test 110 CHAPTER 4 CONTROL FLOW TESTING data are gathered for the program using concrete values of the input variables. A program’s control flow is monitored while executing the program. If an execution, that is, a program path, is an undesirable one, then function minimization algorithms are used to locate the values of input variables which caused the undesirable path to be executed. In this approach, values of array indexes and pointers are known at each step of program execution. Thus, this approach helps us in overcoming the difficulties in handling arrays and pointers. An excellent book on path-based program testing is Software Testing Techniques by Beizer [5]. The reader can find a more through treatment of the subject in the said book. The test tool from ParaSoft [9] allows programmers to perform flow-based testing of program units written in C, C++, and Java. If a program unit under test calls another program unit, the tool generates a stub replacing the called unit. If a programmer wants to control what return values are used, he or she can create a stub table specifying the input–outcome mapping. REFERENCES 1. F. E. Allen and J. Cocke. Graph Theoretic Constructs for Program Control Flow Analysis, Technical Report RC3923. IBM T. J. Watson Research Center, New York, 1972. 2. J. B. Goodenough and S. L. Gerhart. Toward a Theory of Test data Selection. IEEE Transactions on Software Engineering, June 1975, pp. 26–37. 3. L. A. Clarke. A System to Generate Test Data and Symbolically Execute Programs. IEEE Transactions on Software Engineering, September 1976, pp. 215– 222. 4. W. E. Howden. Reliability of the Path Analysis Testing Strategy. IEEE Transactions on Software Engineering, September 1976, pp. 38–45. 5. B. Beizer. Software Testing Techniques, 2nd ed. Van Nostrand Reinhold, New York, 1990. 6. G. J. Myers. The Art of Software Testing. Wiley, New York, 1979. 7. C. V. Ramamoorthy, S. F. Ho, and W. T. Chen. On the Automated Generation of Program Test Data. IEEE Transactions on Software Engineering, December 1976, pp. 293– 300. 8. H. Zhu, P. A. V. Hall, and J. H. R. May. Software Unit Test Coverage and Adequacy. ACM Computing Surveys, December 1997, pp. 366– 427. 9. Parasoft Corporation, available: http://www.parasoft.com/. Parasoft Application Development Qual- ity Solution, 1996-2008. 10. P. M. Herman. A Data Flow Analysis Approach to Program Testing. Australian Computer Journal , November 1976, pp. 92–96. 11. J. C. King. Symbolic Execution and Program Testing. Communications of the ACM , July 1976, pp. 385–394. 12. M. R. Woodward, D. Hedley, and M. A. Hennell. Experience with Path Analysis and Testing of Programs. IEEE Transactions on Software Engineering, May 1980, pp. 278– 286. 13. J. R. Brown and E. C. Nelson. Functional programming, TRW Defence and Space Systems Group for Rome Air Development Center, Technical Report on Contract F30602-76-C-0315, July 1977. 14. A. Bertolino and M. Marre. Automatic Generation of Path Covers Based on the Control Flow Analysis of Computer Programs. IEEE Transactions on Software Engineering, December 1994, pp. 885–899. 15. D. F. Yates and N. Malevris. Reducing the Effects of Infeasible Paths in Branch Testing. ACM SIGSOFT Software Engineering Notes, December 1989, pp. 48–54. 16. T. J. McCabe. A complexity Measure. IEEE Transactions on Software Engineering, December 1976, pp. 308–320. 17. B. Korel. Automated Software Test Data Generation. IEEE Transactions on Software Engineering, August 1990, pp. 870–879. int binsearch(int X, int V[], int n){ int low, high, mid; low = 0; high = n - 1; while (low <= high) { mid = (low + high)/2; if (X < V[mid]) high = mid - 1; else if (X > V[mid]) low = mid + 1; else return mid; } return -1; } Figure 4.17 Binary search routine. REFERENCES 111 Exercises You are given the binary search routine in C shown in Figure 4.17. The input array V is assumed to be sorted in ascending order, n is the array size, and you want to find the index of an element X in the array. If X is not found in the array, the routine is supposed to return − 1. The first eight questions refer to the binary search() function. 1. Draw a CFG for binsearch(). 2. From the CFG, identify a set of entry–exit paths to satisfy the complete state- ment coverage criterion. 3. Identify additional paths, if necessary, to satisfy the complete branch coverage criterion. 4. For each path identified above, derive their path predicate expressions. 5. Solve the path predicate expressions to generate test input and compute the corresponding expected outcomes. 6. Are all the selected paths feasible? If not, select and show that a path is infeasible, if it exists. 7. Can you introduce two faults in the routine so that these go undetected by your test cases designed for complete branch coverage? 8. Suggest a general way to detect the kinds of faults introduced in the previous step. 9. What are the limitations of control flow–based testing? 10. Show that branch coverage includes statement coverage. 5 CHAPTER Data Flow Testing An error does not become truth by reason of multiplied propagation, nor does truth become error because nobody sees it. — Mohandas Karamchand Gandhi 5.1 GENERAL IDEA A program unit, such as a function, accepts input values, performs computations while assigning new values to local and global variables, and, finally, produces output values. Therefore, one can imagine a kind of “flow” of data values between variables along a path of program execution. A data value computed in a certain step of program execution is expected to be used in a later step. For example, a program may open a file, thereby obtaining a value for a file pointer; in a later step, the file pointer is expected to be used. Intuitively, if the later use of the file pointer is never verified, then we do not know whether or not the earlier assignment of value to the file pointer variable is all right. Sometimes, a variable may be defined twice without a use of the variable in between. One may wonder why the first definition of the variable is never used. There are two motivations for data flow testing as follows. First, a memory location corresponding to a program variable is accessed in a desirable way. For example, a memory location may not be read before writing into the location. Second, it is desirable to verify the correctness of a data value generated for a variable—this is performed by observing that all the uses of the value produce the desired results. The above basic idea about data flow testing tells us that a programmer can perform a number of tests on data values, which are collectively known as data flow testing. Data flow testing can be performed at two conceptual levels: static data flow testing and dynamic data flow testing. As the name suggests, static data flow testing is performed by analyzing the source code, and it does not involve actual execution of source code. Static data flow testing is performed to reveal potential defects in programs. The potential program defects are commonly known as data Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 112 5.2 DATA FLOW ANOMALY 113 flow anomaly. On the other hand, dynamic data flow testing involves identifying program paths from source code based on a class of data flow testing criteria. The reader may note that there is much similarity between control flow testing and data flow testing. Moreover, there is a key difference between the two approaches. The similarities stem from the fact that both approaches identify program paths and emphasize on generating test cases from those program paths. The difference between the two lies in the fact that control flow test selection criteria are used in the former, whereas data flow test selection criteria are used in the latter approach. In this chapter, first we study the concept of data flow anomaly as identified by Fosdick and Osterweil [1]. Next, we discuss dynamic data flow testing in detail. 5.2 DATA FLOW ANOMALY An anomaly is a deviant or abnormal way of doing something. For example, it is an abnormal situation to successively assign two values to a variable without using the first value. Similarly, it is abnormal to use a value of a variable before assigning a value to the variable. Another abnormal situation is to generate a data value and never use it. In the following, we explain three types of abnormal situations concerning the generation and use of data values. The three abnormal situations are called type 1, type 2, and type 3 anomalies [1]. These anomalies could be manifestations of potential programming errors. We will explain why program anomalies need not lead to program failures. Defined and Then Defined Again (Type 1): Consider the partial sequence of computations shown in Figure 5.1, where f1(y) and f2(z) denote functions with the inputs y and z , respectively. We can interpret the two statements in Figure 5.1 in several ways as follows: • The computation performed by the first statement is redundant if the second statement performs the intended computation. • The first statement has a fault. For example, the intended first computation might be w = f1(y). • The second statement has a fault. For example, the intended second computation might be v = f2(z). • A fourth kind of fault can be present in the given sequence in the form of a missing statement between the two. For example, v = f3(x) may be the desired statement that should go in between the two given statements. : x = f1(y) x = f2(z) : Figure 5.1 Sequence of computations showing data flow anomaly. 114 CHAPTER 5 DATA FLOW TESTING It is for the programmer to make the desired interpretation, though one can interpret the given two statements in several ways, However, it can be said that there is a data flow anomaly in those two statements, indicating that those need to be examined to eliminate any confusion in the mind of a code reader. Undefined but Referenced (Type 2): A second form of data flow anomaly is to use an undefined variable in a computation, such as x = x − y − w, where the variable w has not been initialized by the programmer. Here, too, one may argue that though w has not been initialized, the programmer intended to use another initialized variable, say y, in place of w. Whatever may be the real intention of the programmer, there exists an anomaly in the use of the variable w, and one must eliminate the anomaly either by initializing w or replacing w with the intended variable. Defined but Not Referenced (Type 3): A third kind of data flow anomaly is to define a variable and then to undefine it without using it in any subsequent computation. For example, consider the statement x = f (x , y) in which a new value is assigned to the variable x . If the value of x is not used in any subsequent computation, then we should be suspicious of the computation represented by x = f (x , y). Hence, this form of anomaly is called “defined but not referenced.” Huang [2] introduced the idea of “states” of program variables to identify data flow anomaly. For example, initially, a variable can remain in an “undefined” (U ) state, meaning that just a memory location has been allocated to the variable but no value has yet been assigned. At a later time, the programmer can perform a computation to define (d ) the variable in the form of assigning a value to the variable—this is when the variable moves to a “defined but not referenced” (D) state. At a later time, the programmer can reference (r), that is, read, the value of the variable, thereby moving the variable to a “defined and referenced” state (R). The variable remains in the R state as long as the programmer keeps referencing the value of the variable. If the programmer assigns a new value to the variable, the variable moves back to the D state. On the other hand, the programmer can take an action to undefine (u) the variable. For example, if an opened file is closed, the value of the file pointer is no more recognized by the underlying operating system, and therefore the file pointer becomes undefined. The above scenarios describe the normal actions on variables and are illustrated in Figure 5.2. However, programmers can make mistakes by taking the wrong actions while a variable is in a certain state. For example, if a variable is in the state U —that is, the variable is still undefined—and a programmer reads (r) the variable, then the variable moves to an abnormal (A) state. The abnormal state of a variable means that a programming anomaly has occurred. Similarly, while a variable is in the state D and the programmer undefines (u) the variable or redefines (d ) the variable, then the variable moves to the abnormal (A) state. Once a variable enters the abnormal state, it remains in that state irrespective of what action—d, u, or r —is taken. The actions that take a variable from a desired state, such as U or D, to an abnormal state are illustrated in Figure 5.2. U d u r u d D d, u 5.3 OVERVIEW OF DYNAMIC DATA FLOW TESTING 115 R r r A d, u, r Legend: States U: Undefined D: Defined but not referenced R: Defined and referenced A: Abnormal Actions d: Define r: Reference u: Undefine Figure 5.2 State transition diagram of a program variable. (From ref. 2. © 1979 IEEE.) Now it is useful to make an association between the type 1, type 2, and type 3 anomalies and the state transition diagram shown in Figure 5.2. The type 1, type 2, and type 3 anomalies are denoted by the action sequences dd , ur, and du, respectively, in Figure 5.2. Data flow anomaly can be detected by using the idea of program instrumentation. Intuitively, program instrumentation means incorporating additional code in a program to monitor its execution status. For example, we can write additional code in a program to monitor the sequence of states, namely the U , D, R, and A, traversed by a variable. If the state sequence contains the dd , ur, and du subsequence, then a data flow anomaly is said to have occurred. The presence of a data flow anomaly in a program does not necessarily mean that execution of the program will result in a failure. A data flow anomaly simply means that the program may fail, and therefore the programmer must investigate the cause of the anomaly. Let us consider the dd anomaly shown in Figure 5.1. If the real intention of the programmer was to perform the second computation and the first computation produces no side effect, then the first computation merely represents a waste of processing power. Thus, the said dd anomaly will not lead to program failure. On the other hand, if a statement is missing in between the two statements, then the program can possibly lead to a failure. The programmers must analyze the causes of data flow anomalies and eliminate them. 5.3 OVERVIEW OF DYNAMIC DATA FLOW TESTING In the process of writing code, a programmer manipulates variables in order to achieve the desired computational effect. Variable manipulation occurs in several ways, such as initialization of the variable, assignment of a new value to the 116 CHAPTER 5 DATA FLOW TESTING variable, computing a value of another variable using the value of the variable, and controlling the flow of program execution. Rapps and Weyuker [3] convincingly tell us that one should not feel confident that a variable has been assigned the correct value if no test case causes the execution of a path from the assignment to a point where the value of the variable is used . In the above motivation for data flow testing, (i) assignment of a correct value means whether or not a value for the variable has been correctly generated and (ii) use of a variable refers to further generation of values for the same or other variables and/or control of flow. A variable can be used in a predicate, that is, a condition, to choose an appropriate flow of control. The above idea gives us an indication of the involvement of certain kinds of program paths in data flow testing. Data flow testing involves selecting entry–exit paths with the objective of covering certain data definition and use patterns, commonly known as data flow testing criteria. Specifically, certain program paths are selected on the basis of data flow testing criteria. Following the general ideas in control flow testing that we discussed in Chapter 4, we give an outline of performing data flow testing in the following: • Draw a data flow graph from a program. • Select one or more data flow testing criteria. • Identify paths in the data flow graph satisfying the selection criteria. • Derive path predicate expressions from the selected paths and solve those expressions to derive test input. The reader may recall that the process of deriving a path predicate expression from a path has been explained in Chapter 4. The same idea applies to deriving a path predicate expression from a path obtained from a data flow graph. Therefore, in the rest of this chapter we will explain a procedure for drawing a data flow graph from a program unit, and discuss data flow testing criteria. 5.4 DATA FLOW GRAPH In this section, we explain the main ideas in a data flow graph and a method to draw it. In practice, programmers may not draw data flow graphs by hand. Instead, language translators are modified to produce data flow graphs from program units. A data flow graph is drawn with the objective of identifying data definitions and their uses as motivated in the preceding section. Each occurrence of a data variable is classified as follows: Definition: This occurs when a value is moved into the memory location of the variable. Referring to the C function VarTypes() in Figure 5.3, the assignment statement i = x; is an example of definition of the variable i . Undefinition or Kill : This occurs when the value and the location become unbound. Referring to the C function VarTypes(), the first (iptr = malloc(sizeof(int)); int VarTypes(int x, int y){ int i; int *iptr; i = x; iptr = malloc(sizeof(int)); *iptr = i + x; if (*iptr > y) return (x); else { iptr = malloc(sizeof(int)); *iptr = x + y; return(*iptr); } } Figure 5.3 Definition and uses of variables. 5.4 DATA FLOW GRAPH 117 statement initializes the integer pointer variable iptr and iptr = i + x; initializes the value of the location pointed to by iptr. The second iptr = malloc(sizeof(int)); statement redefines variable iptr, thereby undefining the location previously pointed to by iptr. Use: This occurs when the value is fetched from the memory location of the variable. There are two forms of uses of a variable as explained below. • Computation use (c-use): This directly affects the computation being performed. In a c-use, a potentially new value of another variable or of the same variable is produced. Referring to the C function VarTypes(), the statement *iptr = i + x; gives examples of c-use of variables i and x . • Predicate use (p-use): This refers to the use of a variable in a predicate controlling the flow of execution. Referring to the C function VarTypes(), the statement if (*iptr > y) ... gives examples of p-use of variables y and iptr. A data flow graph is a directed graph constructed as follows: • A sequence of definitions and c-uses is associated with each node of the graph. • A set of p-uses is associated with each edge of the graph. 118 CHAPTER 5 DATA FLOW TESTING • The entry node has a definition of each parameter and each nonlocal variable which occurs in the subprogram. • The exit node has an undefinition of each local variable. Example: We show the data flow graph in Figure 5.4 for the ReturnAverage() example discussed in Chapter 4, The initial node, node 1, represents initialization of the input vector < value, AS, MIN, MAX > . Node 2 represents the initialization of the four local variables i , ti, tv, and sum in the routine. Next we introduce a NULL node, node 3, keeping in mind that control will come back to the beginning of the while loop. Node 3 also denotes the fact that program control exits from the while loop at the NULL node. The statement ti++ is represented by node 4. The predicate associated with edge (3, 4) is the condition part of the while loop, namely, ((ti < AS) && (value[i] != -999)) Statements tv++ and sum = sum + value[i] are represented by node 5. Therefore, the condition part of the first if statement forms the predicate associated with edge (4, 5), namely, 1 Initialize: value[], AS, MIN, MAX True 2 i = 0, ti = 0, tv = 0, sum = 0 True 3 NULL ~((ti < AS) && (value[i] != −999)) ((ti < AS) && (value[i] != −999)) NULL 7 4 ti++ ~(tv > 0) 8 (tv > 0) 9 ((value[i] >= MIN) && (value[i] <= MAX)) 5 av = (double) −999 av = (double)sum/tv tv++ sum = sum + value[i] ~((value[i] >= MIN) && (value[i] <= MAX)) True True True 10 return(av) Figure 5.4 Data flow graph of ReturnAverage() example. 6 True i++ 5.5 DATA FLOW TERMS 119 ((value[i] >= MIN) && (value[i] <= MAX)) The statement i++ is represented by node 6. The predicate associated with edge (4, 6) is the negation of the condition part of the if statement, namely, ((value[i] >= MIN) && (value[i] <= MAX)). The predicate associated with edge (5, 6) is true because there is an unconditional flow of control from node 5 to node 6. Execution of the while loop terminates when its condition evaluates to false. Therefore, the predicate associated with edge (3, 7) is the negation of the predicate associated with edge (3, 4), namely, ~((ti < AS) && (value[i] != -999)) It may be noted that there is no computation performed in a NULL node. Referring to the second if statement, av = (double) − 999 is represented by node 8, and av = (double) sum/tv is represented by node 9. Therefore, the predicate associated with edge (7, 9) is (tv > 0), and the predicate associated with edge (7, 8) is ~(tv > 0). Finally, the return(av) statement is represented by node 10, and the predicate True is associated with both the edges (7, 8) and (7, 9). 5.5 DATA FLOW TERMS A variable defined in a statement is used in another statement which may occur immediately or several statements after the definition. We are interested in finding paths that include pairs of definition and use of variables. In this section, we explain a family of path selection criteria that allow us to select paths with varying strength. The reader may note that for every feasible path we can generate a test case. In the following, first we explain a few terms, and then we explain a few selection criteria using those terms. Global c-use: A c-use of a variable x in node i is said to be a global c-use if x has been defined before in a node other than node i . Example: The c-use of variable tv in node 9 is a global c-use since tv has been defined in nodes 2 and 5 (Figure 5.4). Definition Clear Path: A path (i − n1 − · · · − nm − j ), m ≥ 0, is called a definition clear path (def-clear path) with respect to variable x • from node i to node j and • from node i to edge (nm , j ) 120 CHAPTER 5 DATA FLOW TESTING if x has been neither defined nor undefined in nodes n1, . . . ,nm . The reader may note that the definition of a def-clear path is unconcerned about the status of x in nodes i and j . Also, a def-clear path does not preclude loops. Therefore, the path 2-3-4-6-3-4-6-3-4-5, which includes a loop, is a def-clear path. Example: The paths 2-3-4-5 and 2-3-4-6 are def-clear paths with respect to variable tv from node 2 to 5 and from node 2 to 6, respectively (Figure 5.4). Global Definition: A node i has a global definition of a variable x if node i has a definition of x and there is a def-clear path with respect to x from node i to some • node containing a global c-use or • edge containing a p-use of variable x . The reader may note that we do not define global p-use of a variable similar to global c-use. This is because every p-use is associated with an edge—and not a node. In Table 5.1, we show all the global definitions and global c-uses appearing in the data flow graph of Figure 5.4; def(i) denotes the set of variables which have global definitions in node i . Similarly, c-use(i) denotes the set of variables which have global c-uses in node i . We show all the predicates and p-uses appearing in the data flow graph of Figure 5.4 in Table 5.2; predicate(i,j) denotes the predicate associated with edge (i, j) of the data flow graph in Figure 5.4; p-use(i, j) denotes the set of variables which have p-uses on edge (i, j). Simple Path: A simple path is a path in which all nodes, except possibly the first and the last, are distinct. TABLE 5.1 Def() and c-use() Sets of Nodes in Figure 5.4 Nodes i def(i) c-use(i) 1 {value, AS, MIN, MAX} {} 2 {i, ti, tv, sum} {} 3 {} {} 4 {ti} {ti} 5 {tv, sum} {tv, i, sum, value} 6 {i} {i} 7 {} {} 8 {av} {} 9 {av} {sum, tv} 10 {} {av} 5.6 DATA FLOW TESTING CRITERIA 121 TABLE 5.2 Predicates and p-use() Set of Edges in Figure 5.4 Edges (i, j) predicate(i, j) (1, 2) (2, 3) (3, 4) (4, 5) (4, 6) (5, 6) (6, 3) (3, 7) (7, 8) (7, 9) (8, 10) (9, 10) True True (ti < AS) && (value[i] ! = − 999) (value[i] < = MIN) && (value[i] > = MAX) ∼((value[i] < = MIN) && (value[i] > = MAX)) True True ∼((ti < AS) && (value[i] ! = − 999)) ∼(tv > 0) (tv > 0) True True p-use(i, j) {} {} {i, ti, AS, value} {i, MIN, MAX, value} {i, MIN, MAX, value} {} {} {i, ti, AS, value} {tv} {tv} {} {} Example: Paths 2-3-4-5 and 3-4-6-3 are simple paths (Figure 5.4). Loop-Free Path: A loop-free path is a path in which all nodes are distinct. Complete Path: A complete path is a path from the entry node to the exit node. Du-path: A path (n1 − n2 − · · · − nj − nk ) is a definition-use path (du-path) with respect to (w.r.t) variable x if node n1 has a global definition of x and either • node nk has a global c-use of x and (n1 − n2 − · · · − nj − nk ) is a def-clear simple path w.r.t. x or • edge (nj ,nk ) has a p-use of x and (n1 − n2 − · · · − nj ) is a def-clear, loop-free path w.r.t. x . Example: Considering the global definition and global c-use of variable tv in nodes 2 and 5, respectively, 2-3-4-5 is a du-path. Example: Considering the global definition and p-use of variable tv in nodes 2 and on edge (7, 9), respectively, 2-3-7-9 is a du-path. 5.6 DATA FLOW TESTING CRITERIA In this section, we explain seven types of data flow testing criteria. These criteria are based on two fundamental concepts, namely, definitions and uses—both c-uses and p-uses—of variables. All-defs: For each variable x and for each node i such that x has a global definition in node i , select a complete path which includes a def-clear path from node i to 122 CHAPTER 5 DATA FLOW TESTING • node j having a global c-use of x or • edge (j ,k ) having a p-use of x . Example: Consider the variable tv, which has global definitions in nodes 2 and 5 (Figure 5.4 and Tables 5.1 and 5.2). First, we consider its global definition in node 2. We find a global c-use of tv in node 5, and there exists a def-clear path 2-3-4-5 from node 2 to node 5. We choose a complete path 1-2-3-4-5-6-3-7-9-10 that includes the def-clear path 2-3-4-5 to satisfy the all-defs criterion. We also find p-uses of variable tv on edge (7, 8), and there exists a def-clear path 2-3-7-8 from node 2 to edge (7, 8). We choose a complete path 1-2-3-7-8-10 that includes the def-clear path 2-3-7-8 to satisfy the all-defs criterion. Now we consider the definition of tv in node 5. In node 9 there is a global c-use of tv, and in edges (7, 8) and (7, 9) there are p-uses of tv. There is a def-clear path 5-6-3-7-9 from node 5 to node 9. Thus, we choose a complete path 1-2-3-4-5-6-3-7-9-10 that includes the def-clear path 5-6-3-7-9 to satisfy the all-defs criterion. The reader may note that the complete path 1-2-3-4-5-6-3-7-9-10 covers the all-defs criterion for variable tv defined in nodes 2 and 5. To satisfy the all-defs criterion, similar paths must be obtained for variables i , ti, and sum. All-c-uses: For each variable x and for each node i , such that x has a global definition in node i , select complete paths which include def-clear paths from node i to all nodes j such that there is a global c-use of x in j . Example: Let us obtain paths to satisfy the all-c-uses criterion with respect to variable ti. We find two global definitions of ti in nodes 2 and 4. Corresponding to the global definition in node 2, there is a global c-use of ti in node 4. However, corresponding to the global definition in node 4, there is no global c-use of ti. From the global definition in node 2, there is a def-clear path to the global c-use in node 4 in the form of 2-3-4. The reader may note that there are four complete paths that include the def-clear path 2-3-4 as follows: 1-2-3-4-5-6-3-7-8-10, 1-2-3-4-5-6-3-7-9-10, 1-2-3-4-6-3-7-8-10, and 1-2-3-4-6-3-7-9-10. One may choose one or more paths from among the four paths above to satisfy the all-c-uses criterion with respect to variable ti. All-p-uses: For each variable x and for each node i such that x has a global definition in node i , select complete paths which include def-clear paths from node i to all edges (j ,k ) such that there is a p-use of x on edge (j ,k ). Example: Let us obtain paths to satisfy the all-p-uses criterion with respect to variable tv. We find two global definitions of tv in nodes 2 and 5. Corresponding to the global definition in node 2, there is a p-use of tv on edges (7, 8) and (7, 9). There are def-clear paths from node 2 to edges (7, 8) and (7, 9), namely 2-3-7-8 and 2-3-7-9, respectively. Also, there are def-clear paths from node 5 to edges 5.6 DATA FLOW TESTING CRITERIA 123 (7, 8) and (7, 9), namely, 5-6-3-7-8 and 5-6-3-7-9, respectively. In the following, we identify four complete paths that include the above four def-clear paths: 1-2-3-7-8-10, 1-2-3-7-9-10, 1-2-3-4-5-6-3-7-8-10, and 1-2-3-4-5-6-3-7-9-10. All-p-uses/Some-c-uses: This criterion is identical to the all-p-uses criterion except when a variable x has no p-use. If x has no p-use, then this criterion reduces to the some-c-uses criterion explained below. Some-c-uses: For each variable x and for each node i such that x has a global definition in node i , select complete paths which include def-clear paths from node i to some nodes j such that there is a global c-use of x in node j . Example: Let us obtain paths to satisfy the all-p-uses/some-c-uses criterion with respect to variable i . We find two global definitions of i in nodes 2 and 6. There is no p-use of i in Figure 5.4. Thus, we consider some c-uses of variable i . Corresponding to the global definition of variable i in node 2, there is a global c-use of i in node 6, and there is a def-clear path from node 2 to node 6 in the form of 2-3-4-5-6. Therefore, to satisfy the all-p-uses/some-c-uses criterion with respect to variable i , we select the complete path 1-2-3-4-5-6-3-7-9-10 that includes the def-clear path 2-3-4-5-6. All-c-uses/Some-p-uses: This criterion is identical to the all-c-uses criterion except when a variable x has no global c-use. If x has no global c-use, then this criterion reduces to the some-p-uses criterion explained below. Some-p-uses: For each variable x and for each node i such that x has a global definition in node i , select complete paths which include def-clear paths from node i to some edges (j ,k ) such that there is a p-use of x on edge (j ,k ). Example: Let us obtain paths to satisfy the all-c-uses/some-p-uses criterion with respect to variable AS. We find just one global definition of AS in node 1. There is no global c-use of AS in Figure 5.4. Thus, we consider some p-uses of AS. Corresponding to the global definition of AS in node 1, there are p-uses of AS on edges (3, 7) and (3, 4), and there are def-clear paths from node 1 to those two edges, namely, 1-2-3-7 and 1-2-3-4, respectively. There are many complete paths that include those two def-clear paths. One such example path is given as 1-2-3-4-5-6-3-7-9-10. All-uses: This criterion is the conjunction of the all-p-uses criterion and the all-c-uses criterion discussed above. All-du-paths: For each variable x and for each node i such that x has a global definition in node i , select complete paths which include all du-paths from node i 124 CHAPTER 5 DATA FLOW TESTING • to all nodes j such that there is a global c-use of x in j and • to all edges (j ,k ) such that there is a p-use of x on (j ,k ). In Chapter 4, we explained a procedure to generate a test input from an entry–exit program path. There is much similarity between the control flow–based testing and the data flow–based testing. Their difference lies in the ways the two techniques select program paths. 5.7 COMPARISON OF DATA FLOW TEST SELECTION CRITERIA Having seen a relatively large number of test selection criteria based on the concepts of data flow and control flow, it is useful to find relationships among them. Given a pair of test selection criteria, we should be able to compare the two. If we cannot compare them, we realize that they are incomparable. Rapps and Weyuker [3] defined the concept of an includes relationship to find out if, for a given pair of selection criteria, one includes the other. In the following, by a complete path we mean a path from the entry node of a flow graph to one of its exit nodes. Definition: Given two test selection criteria c1 and c2, c1 includes c2 if for every def/use graph any set of complete paths of the graph that satisfies c1 also satisfies c2. Definition: Given two test selection criteria c1 and c2, c1 strictly includes c2, denoted by c1 → c2, provided c1 includes c2 and for some def/use graph there is a set of complete paths of the graph that satisfies c2 but not c1. It is easy to note that the “→” relationship is a transitive relation. Moreover, given two criteria c1 and c2, it is possible that neither c1 → c2 nor c2 → c1 holds, in which case we call the two criteria incomparable. Proving the strictly includes relationship or the incomparable relationship between two selection criteria in a programming language with arbitrary semantics may not be possible. Thus, to show the strictly includes relationship between a pair of selection criteria, Rapps and Weyuker [3] have considered a restricted programming language with the following syntax: Start statement: start Input statement: read x 1, . . . ,x n, where x i , . . . ,n are variables. Assignment statement: y←f (x 1, . . . ,x n ), where y, x i , . . . ,n are variables, and f is a function. Output statement: print e1, . . . ,en , where e1, . . . ,en are output values. Unconditional transfer statement: goto m, where m is a label. Conditional transfer statement: if p(x 1, . . . ,x n ), then goto m, where p is a predicate. Halt statement: stop 5.8 FEASIBLE PATHS AND TEST SELECTION CRITERIA 125 All-paths All-du-paths All-uses All-c-uses/Some-p-uses All-p-uses/Some-c-uses All-c-uses All-defs All-p-uses All-branches All-statements Figure 5.5 Relationship among DF (data flow) testing criteria. (From ref. 4. © 1988 IEEE.) Frankl and Weyuker [4] have further extended the relationship; what they have proved has been summarized in Figure 5.5. For example, the all-paths selection criterion strictly includes the all-du-paths criterion. Similarly, the all-c-uses/some-p-uses criterion strictly includes the all-defs criterion. However, we cannot find a strictly includes relationship between the pair all-c-uses and all-p-uses. Let Pxc be a set of paths selected by the all-c-uses criterion with respect to a variable x . Now we cannot say with certainty whether or not the SpaimthilsaerltyP, xlcetsaPtxipsfibees the all-p-uses criterion with a set of paths selected by the respect to all-p-uses the same variable x . criterion with respect to the variable x . Now we cannot say with certainty whether or not the path set Pxp satisfies the all-c-uses criterion with respect to the same variable x . Thus, the two criteria all-c-uses and all-p-uses are incomparable. Note the relationship between data flow–based test selection criteria and control flow–based test selection criteria, as shown in Figure 5.5. The two control flow–based test selection criteria in Figure 5.5 are all-branches and all-statements. The all-p-uses criterion strictly includes the all-branches criterion, which implies that one can select more paths from a data flow graph of a program unit than from its control flow graph. 5.8 FEASIBLE PATHS AND TEST SELECTION CRITERIA Given a data flow graph, a path is a sequence of nodes and edges. A complete path is a sequence of nodes and edges starting from the initial node of the graph to one of its exit nodes. A complete path is executable if there exists an assignment 126 CHAPTER 5 DATA FLOW TESTING of values to input variables and global variables such that all the path predicates evaluate to true, thereby making the path executable. Executable paths are also known as feasible paths. If no such assignment of values to input variables and global variables exists, then we call the path infeasible or inexecutable. Since we are interested in selecting inputs to execute paths, we must ensure that a test selection criterion picks executable paths. Assume that we want to test a program by selecting paths to satisfy a certain selection criterion C . Let P C be the set of paths selected according to criterion C for a given program unit. As an extreme example, if all the paths in P C are infeasible, then the criterion C has not helped us in any way. For a criterion C to be useful, it must select a set of executable, or feasible, paths. Frankl and Weyuker [4] have modified the definitions of the test selection criteria so that each criterion selects only feasible paths. In other words, we modify the definition of criterion C to obtain a criterion C * which selects only feasible paths, and C * is called a feasible data flow (FDF) testing criterion. As an example, the criterion (All-c-uses)* is an adaptation of All-c-uses such that only feasible paths are selected by (All-c-uses)*, as defined below. (All-c-uses)* : For each variable x and for each node i , such that x has a global definition in node i , select feasible complete paths which include def-clear paths from node i to all nodes j such that there is a global c-use of x in j . Thus, test selection criteria (All-paths)*, (All-du-paths)*, (All-uses)*, (All-c-uses/Some-p-uses)*, (All-p-uses/Some-c-uses)*, (All-c-uses)*, (All-puses)*, (All-defs)*, (All-branches)*, and (All-statements)* choose only feasible paths, and, therefore, these are called feasible data flow (FDF) testing criteria. Frankl and Weyuker [4] have shown that the strictly includes relationships among test selection criteria, as shown in Figure 5.5, do not hold if the selection criteria choose only feasible paths. The new relationship among FDF test selection criteria is summarized in Figure 5.6. Though it is seemingly useful to select only feasible paths, and therefore consider only the FDF test selection criteria, we are faced with the decidability problem. More specifically, it is undecidable to know if a given set of paths is executable. We cannot automate the application of an FDF test selection criterion, if we do not know the executability of the path. On the other hand, a data flow testing criterion may turn out to be inadequate if all its selected paths are infeasible, in which case the criterion is considered to be inadequate. Consequently, a test engineer must make a choice between using an inadequate selection criterion and one that cannot be completely automated. 5.9 COMPARISON OF TESTING TECHNIQUES So far we have discussed two major techniques for generating test data from source code, namely control flow–based path selection and data flow–based path selection. We also explained a few criteria to select paths from a control flow graph and data flow graph of a program. Programmers often randomly select test data based on their own understanding of the code they have written. Therefore, it is natural to (All-paths)* 5.9 COMPARISON OF TESTING TECHNIQUES 127 (All-du-paths)* (All-branches)* (All-uses)* (All-statements)* (All-c-uses/Some-p-uses)* (All-p-uses/Some-c-uses)* (All-c-uses)* (All-defs)* (All-p-uses)* Figure 5.6 Relationship among FDF (feasible data flow) testing criteria. (From ref. 4. © 1988 IEEE.) compare the effectiveness of the three test generation techniques, namely random test selection, test selection based on control flow, and test selection based on data flow. Comparing those techniques does not seem to be an easy task. An acceptable, straightforward way of comparing them is to apply those techniques to the same set of programs with known faults and express their effectiveness in terms of the following two metrics: • Number of test cases produced • percentage of known faults detected Ntafos [5] has reported on the results of an experiment comparing the effectiveness of three test selection techniques. The experiment involved seven mathematical programs with known faults. For the control flow–based technique, the branch coverage criterion was selected, whereas the all-uses criterion was chosen for data flow testing. Random testing was also applied to the programs. The data flow testing, branch testing, and random testing detected 90%, 85.5%, and 79.5%, respectively, of the known defects. A total of 84 test cases were designed to achieve all-uses coverage, 34 test cases were designed to achieve branch coverage, and 100 test cases were designed in the random testing approach. We interpret the experimental results as follows: • A programmer can randomly generate a large number of test cases to find most of the faults. However, one will run out of test cases to find some of the remaining faults. Random testing does not look to be ineffective, but 128 CHAPTER 5 DATA FLOW TESTING Total number of faults in a program Reduce this gap Number of faults detected by using a technique Random Control flow– Data flow– New testing testing based testing based testing techniques Figure 5.7 Limitation of different fault detection techniques. it incurs higher costs than the systematic techniques, namely, the control flow and the data flow techniques. • Test selection based on branch coverage produces much fewer test cases than the random technique, but achieves nearly the same level of fault detection as the random technique. Thus, there is significant saving in the cost of program testing. • The all-uses testing criterion gives a programmer a new way to design more test cases and reveal more faults than the branch coverage criterion. • All these techniques have inherent limitations which prevent them from revealing all faults. Therefore, there is a need to use many different testing techniques and develop new techniques. This idea is depicted in Figure 5.7. Our goal is to reduce the gap between the total number of faults present in a program and the faults detected by various test generation techniques. 5.10 SUMMARY Flow of data in a program can be visualized by considering the fact that a program unit accepts input data, transforms the input data through a sequence of computations, and, finally, produces the output data. Therefore, one can imagine data values to be flowing from one assignment statement defining a variable to another assignment statement or a predicate where the value is used. Three fundamental actions associated with a variable are undefine (u), define (d ), and reference (r). A variable is implicitly undefined when it is created without being assigned a value. On the other hand, a variable can be explicitly undefined. For example, when an opened file is closed, the variable holding the file pointer becomes undefined. We have explained the idea of “states” of a variable, namely, undefined (U ), defined (D), referenced (R), and abnormal (A), by considering the three fundamental actions on a variable. The A state represents the fact that the variable has been accessed in an abnormal manner causing data flow anomaly. LITERATURE REVIEW 129 Individual actions on a variable do not cause data flow anomaly. Instead, certain sequences of actions lead to data flow anomaly, and those three sequences of actions are dd , ur, and du. A variable continues to remain in the abnormal state irrespective of subsequent actions once it enters that state. The mere presence of data flow anomaly in a program may not lead to program failure. The programmer must investigate the cause of an anomaly and modify the code to eliminate it. For example, a missing statement in the code might have caused a dd anomaly, in which case the programmer needs to write new code. The program path is a fundamental concept in testing. One test case can be generated from one executable path. The number of different paths selected for execution is a measure of the extent of testing performed. Path selection based on statement coverage and branch coverage lead to a small number of paths being chosen for execution. Therefore, there exists a large gap between control flow testing and exhaustive testing. The concept of data flow testing gives us a way to bridge the gap between control flow testing and exhaustive testing. The concept of data flow testing gives us new selection criteria for choosing more program paths to test than what we can choose by using the idea of control flow testing. Specifically, the data flow test selection criteria are all-du-paths, all-defs, all-c-uses, all-p-uses, all-uses, all-c-uses/some-p-uses, and all-p-uses/some-c-uses. To compare two selection criteria, the concept of a strictly includes relationship is found to be useful. LITERATURE REVIEW Osterweil and Fosdick [6] have implemented a system, called DAVE, to analyze FORTRAN programs and detect ur, dd , and du types of data flow anomalies. DAVE detects those anomalies by performing a flow graph search for each variable in a given program unit. For programs with subprogram invocations, the system works in a bottom-up manner, that is, the called subprograms are analyzed before analyzing the caller. Programmers need to be aware that it is difficult to apply the idea of data flow analysis to all kinds of data structures and program constructs. The analysis of arrays is one such difficulty. Fosdick and Osterweil [1] have noted that problems arise when different elements of the same array are acted upon in different ways, thereby giving rise to different patterns of definition, reference, and undefinition. Static data flow analysis systems, such as DAVE, do not evaluate index expressions and therefore cannot tell us what array element is being referenced in a given expression. Such systems try to get around this problem by treating an entire array as one single variable, rather than a set of different variables of the same type. Fosdick and Osterweil have shown that recursive programs pose difficulty in data flow analysis. A programming style that can pose a difficulty in data flow analysis is to pass a single variable as an argument more than once. This is because DAVE assumes that all subprogram parameters are distinct variables. Laski and Korel [7] argue that data flow testing bridges the gap between branch testing and all-paths testing. On the one hand, in branch testing, one selects 130 CHAPTER 5 DATA FLOW TESTING a set of paths to cover all branches of the control flow graph of a program unit; one needs to select a small number of paths to satisfy the criterion. On the other hand, all-paths testing is the same as exhaustive testing. Data flow testing allows programmers to select many more paths than chosen by branch testing. Essentially, in data flow testing, loops are unfolded to exercise the definition–use pairs. Herman [8] had a programmer apply data flow testing to a number of medium-sized program units of about 800 statements. It is interesting to note that faults detected during testing were usually found while attempting to devise test data to satisfy the chosen paths, rather than while examining the test run output. The fact that program faults were found during the process of test design is significant in the sense that system development and selection of tests can simultaneously be done in producing a better quality system. The article by Ural [9] presents a method for generating test cases from the specifications of communications protocols given in the Estelle language. The method involves static data flow analysis of specifications. The method is summarized as follows: (i) transform a specification into a graph containing both the control flow and the data flow aspects; (ii) detect data flow anomalies in the specification; and (iii) generate test cases to cover all definition–use pairs. The article by Ntafos [10] explains an extended overview of data flow testing strategies in terms of their relative coverage of a program’s structure and the number of test cases needed to satisfy each strategy. In addition, the article extends the subsumption hierarchy introduced by Rapps and Weyuker [3] by including TERn = 1. For details about testing hierarchy levels, denoted by n above and test effectiveness ratio (TER), the reader is referred to the article by Woodward, Hedley, and Hennell [11]. The concept of selecting program paths based on data flow has been studied in different ways by different researchers, namely, Laski and Korel [7], Ntafos [5], and Rapps and Weyuker [3]. To facilitate the comparison and simplify the discussion, Clarke, Podgurski, Richardson, and Zeil [12] define all the data flow criteria using a single set of terms. They give a new subsumption hierarchy of the data flow test selection criteria by modifying the subsumption hierarchy of Rapps and Weyuker [7] shown in Figure 5.5. Koh and Liu [3] have presented a two-step approach for generating paths that test both the control flow and the data flow in implementations of communication protocols based on the idea of extended finite-state machines. First, select a set of paths to cover a data flow selection criterion. Second, selectively augment the state transitions in the chosen set of paths with state check sequences so that control flow can be ensured and data flow coverage can be preserved. The test design methodology of Sarikaya, Bochmann, and Cerny [14] also generates paths to achieve joint coverage of control flow and data flow in the Estelle specifications of communication protocols. Researchers have extended the classical data flow testing approach to the testing of object-oriented programs [15–18]. Harrold and Rothermel [16] have applied the concept of data flow testing to the testing of classes in object-oriented programs. The three levels of testing that they have proposed are intramethod testing, intermethod testing, and intraclass testing. Intermethod testing is the same as data REFERENCES 131 flow testing performed on a unit in a procedural programming language, such as C. Intermethod testing is similar to integrating program units in a procedural programming language. Finally, intraclass testing refers to calling the public methods of a class in a random, acceptable sequence. The concept of classical data flow testing that is applied to one program unit at a time has been extended to interprocedural data flow testing. The idea of interprocedural data flow has been extensively studied in the literature (see ref. 19 and the bibliography of the article by Harrold and Soffa [20]). Often programmers utilize the capability of pointers in the C and C++ languages. Data flow analysis becomes difficult when pointers are passed between procedures. Pande, Landi, and Ryder [21] have defined a term called reaching definitions, which is the set of all points where a value of a variable was last written. For one level of pointer indirection, they give a polynomial time algorithm for the problem. To develop the algorithm, the authors have introduced the concept of an interprocedural control flow graph, which is a hybrid of the control flow graph and call graph. They prove that the general problem of identifying interprocedural reaching definitions is NP-hard. Lemos, Vincenzi, Maldonado, and Masiero [22] have applied the idea of data flow testing to aspect-oriented programs [23]. The concept of aspect-oriented programming was developed to address the difficulty in clearly capturing certain high-level design decisions at the code level. The properties of those design decisions are called aspects, and hence the name aspect-oriented programming. The reason some design decisions are difficult to represent at the code level is that they cross-cut the system’s basic functionality [23]. Naturally, an aspect that is difficult to code is likely to be more difficult to test and verify. To this end, the work of Lemos et al. [22] gains significance. They have proposed the concept of an aspect-oriented def-use (AODU) graph, based on the idea of a data flow instruction graph [24], and identified new coverage criteria, such as all-exception-independent-uses, all-exceptiondependent-uses, and all-crosscutting-uses. Zhao [25] has applied the idea of data flow testing at a coarse-grain level in aspect-oriented programs. The author extended the concept of class testing studied by Harrold and Rothermel [16]. The 1990 book Software Testing Techniques by Beizer [26] gives an excellent exposition of the concept of data flow testing. REFERENCES 1. L. D. Fosdick and L. J. Osterweil. Data Flow Analysis in Software Reliability. Computing Surveys, September 1976, pp. 305– 330. 2. J. C. Huang. Detection of Data Flow Anomaly through Program Instrumentation. IEEE Transactions on Software Engineering, May 1979, pp. 226–236. 3. S. Rapps and E. J. Weyuker. Selecting Software Test Data Using Data Flow Information. IEEE Transactions on Software Engineering, April 1985, pp. 367– 375. 4. P. G. Frankl and E. J. Weyuker. An Applicable Family of Data Flow Testing Criteria. IEEE Transactions on Software Engineering, October 1988, pp. 1483– 1498. 5. S. C. Ntafos. On Required Element Testing. IEEE Transactions on Software Engineering, November 1984, pp. 795– 803. 132 CHAPTER 5 DATA FLOW TESTING 6. L. J. Osterweil and L. D. Fosdick. Dave—A Validation, Error Detection, and Documentation System for Fortran Programs. Software—Practice and Experience, October/December 1976, pp. 473– 486. 7. J. W. Laski and B. Korel. A Data Flow Oriented Program Testing Strategy. IEEE Transactions on Software Engineering, May 1983, pp. 347– 354. 8. P. M. Herman. A Data Flow Analysis Approach to Program Testing. Australian Computer Journal , November 1976, pp. 92–96. 9. H. Ural. Test Sequence Selection Based on Static Data Flow Analysis. Computer Communications, October 1987, pp. 234– 242. 10. S. C. Ntafos. A Comparison of Some Structural Testing Strategies. IEEE Transactions on Software Engineering, June 1988, pp. 868–874. 11. M. R. Woodward, D. Hedley, and M. A. Hennell. Experience with Path Analysis and Testing of Programs. IEEE Transactions on Software Engineering, May 1980, pp. 278– 286. 12. L. A. Clarke, A. Podgurski, D. J. Richardson, and S. J. Zeil. A formal Evaluation of Data Flow Path Selection Criteria. IEEE Transactions on Software Engineering, November 1989, pp. 1318– 1332. 13. L. S. Koh and M. T. Liu. Test Path Selection Based on Effective Domains. In Proceedings of the International Conference on Network Protocols, Boston, October 1994, IEEE Press, Piscataway, pp. 64–71. 14. B. Sarikaya, G. v. Bochmann, and E. Cerny. A Test Design Methodology for Protocol Testing. IEEE Transactions on Software Engineering, May 1987, pp. 518–531. 15. R. Doong and P. Frankl. The ASTOOT Approach to Testing Object-Oriented Programs. ACM Transactions on Software Engineering and Methodology, April 1994, pp. 101– 130. 16. M. J. Harrold and G. Rothermel. Performing Data Flow Testing on Classes. In Proceedings of ACM SIGSOFT Foundation of Software Engineering, New Orleans, December 1994, ACM Press, New York pp. 154–163. 17. D. Kung, J. Gao, P. Hsia, Y. Toyoshima, C. Chen, K.-S. Kim, and Y.-K. Song. Developing an Object-Oriented Software Testing and Maintenance Environment. Communications of the ACM , October 1995, pp. 75–86. 18. A. S. Parrish, R. B. Borie, and D. W. Cordes. Automated Flow Graph-Based Testing of Object-Oriented Software Modules. Journal of Systems and Software, November 1993, pp. 95– 109. 19. J. M. Barth. A Practical Interprocedural Data Flow Analysis Algorithm. Communications of the ACM , September 1978, pp. 724– 736. 20. M. J. Harrold and M. L. Sofa. Efficient Computation of Interprocedural Definition-Use Chains. ACM Transactions on Programming Languages and Systems, March 1994, pp. 175– 204. 21. H. Pande, W. Landi, and B. G. Ryder. Interprocedural Def-Use Associations in C Programs. IEEE Transactions on Software Engineering, May 1994, pp. 385– 403. 22. O. A. L. Lemos, A. M. R. Vincenzi, J. C. Maldonado, and P. C. Masiero. Control and Data Flow Structural Testing Criteria for Aspect-Oriented Programs. Journal of Systems and Software, June 2007, pp. 862–882. 23. G. Kiczales, J. Lamping, A. Mendhekar, C. Maeda, C. V. Lopes, J.-M. Loingtier, and J. Irwin. Aspect-Oriented Programming. In Proceedings of the European Conference on Object-Oriented Programming, LNCS 1241, Finland, June 1997, pp. 220–242. 24. A. M. R. Vincenzi, J. C. Maldonado, W. E. Wong, and M. E. Delamaro. Coverage Testing of Java Programs and Components. Science of Computer Programming, April 2005, pp. 211– 230. 25. J. Zhao. Data-Flow Based Unit Testing of Aspect-Oriented Programs. In Proceedings of the 27th Annual International Computer Software and Applications Conference, Dallas, Texas, IEEE Press, Piscataway, 2003, pp. 188– 197. 26. B. Beizer. Software Testing Techniques, 2nd ed. Van Nostrand Reinhold, New York, 1990. Exercises 1. Draw a data flow graph for the binsearch() function given in Figure 5.8. 2. Assuming that the input array V [ ] has at least one element in it, find an infeasible path in the data flow graph for the binsearch() function. int binsearch(int X, int V[], int n){ int low, high, mid; low = 0; high = n - 1; while (low <= high) { mid = (low + high)/2; if (X < V[mid]) high = mid - 1; else if (X > V[mid]) low = mid + 1; else return mid; } return -1; } Figure 5.8 Binary search routine. REFERENCES 133 int modifiedbinsearch(int X, int V[], int n){ int low, high, mid; low = 0; high = n - 1; while (low <= high) { mid = (low + high)/2; if (X < V[mid]) { high = mid - 1; mid = mid - 1; } else if (X > V[mid]) low = mid + 1; else return mid; } return -1; } Figure 5.9 Modified binary search routine. 3. Identify a data flow anomaly in the code given in Figure 5.9. 4. By referring to the data flow graph obtained in exercise 1, find a set of complete paths satisfying the all-defs selection criterion with respect to variable mid. 5. By referring to the data flow graph obtained in exercise 1, find a set of complete paths satisfying the all-defs selection criterion with respect to variable high. 6. Write a function in C such that the all-uses criterion produces more test cases than the all-branches criterion. 7. What is meant by the gap between all-branches testing and all-paths testing and how does data flow testing fill the gap? 134 CHAPTER 5 DATA FLOW TESTING 8. Explain why the presence of data flow anomaly does not imply that execution of the program will definitely produce incorrect results. 9. Program anomaly has been defined by considering three operations, namely, define (d ), reference (r), and undefine (u). The three sequences of operations identified to be program anomaly are dd , du, and ur. Explain why the rest of the two-operation sequences are not considered to be program anomaly. 10. Identify some difficulties in identifying data flow anomaly in programs. 6 CHAPTER Domain Testing Even granting that the genius subjected to the test of critical inspection emerges free from all error, we should consider that everything he has discovered in a given domain is almost nothing in comparison with what is left to be discovered. — Santiago Ramo´ n y Cajal 6.1 DOMAIN ERROR Two fundamental elements of a computer program are input domain and program paths. The input domain of a program is the set of all input data to the program. A program path is a sequence of instructions from the start of the program to some point of interest in the program. For example, the end of the program is a point of interest. Another point of interest is when the program waits to receive another input from its environment so that it can continue its execution. In other words, a program path, or simply path, corresponds to some flow of control in the program. A path is said to be feasible if there exists an input data which causes the program to execute the path. Otherwise, the path is said to be infeasible. Howden [1] identified two broad classes of errors, namely, computation error and domain error, by combining the concepts of input data and program path. The two kinds of errors have been explained in the following. Computation Error: A computation error occurs when a specific input data causes the program to execute the correct, i.e., desired path, but the output value is wrong. Note that the output value can be wrong even if the desired path has been executed. This can happen due to a wrong function being executed in an assignment statement. For example, consider a desired path containing the statement result = f(a, b), where a and b are input values. A computation error may occur if the statement is replaced by a faulty one, such as result = f(b, a). Therefore, the result of executing the path can be erroneous because of a fault in the assignment statement, and this can happen in spite of executing a correct path. Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 135 136 CHAPTER 6 DOMAIN TESTING Domain Error: A domain error occurs when a specific input data causes the program to execute a wrong, that is, undesired, path in the program. An incorrect path can be selected by a program if there is a fault in one or more of the conditional statements in the program. Let us consider a conditional statement of the form if (p) then f1() else f2(). If there is a fault in the formulation of the predicate p, then the wrong function call is invoked, thereby causing an incorrect path to be executed. The above two kinds of program errors lead us to view a computer program as performing an abstract mapping function as follows. Ideally, for each input value, the program assigns a program path to execute; the same program path can be exclusively assigned (i.e., executed) for a subset of the input values. Here, the subset of the input values causing the same path to be executed is referred to an input domain or subdomain. Thus, the program is said to map a domain to a path within itself. Since there are a large number of values in the input domain of the program and there are a large number of paths in a program, we can view a program as partitioning the input space into a finite number of subdomains and assigning a distinct program path to each of the input subdomains. We further explain the concept of program domains using Figure 6.1. The set D is the entire input set of a program P (Figure 6.1a). We call D the domain of the entire program. Set D can be an infinite set, and P may not have different computation behavior for each element of D. Instead, P may perform the same computation for all the elements in a certain subset of D. For example, as shown in Figure 6.1b, P performs five different computations, one for each subset D 1, . . . , D5. It may be noted that the partition of D is not visible outside P . Instead, P has a conceptual, in-built mechanism, as illustrated in Figure 6.1c, to decide the computation method needed to choose a specific branch when P is invoked with a certain input. Such an input classifier may not exist in a program in a single, clearly identifiable form. The concept can exist as an entity as a cross-cutting concept; it is cross-cutting because portions of the input classifier can be found in different program modules. We show five different computations, computation for D1 through computation for D5, for subsets D1, . . . , D5, respectively (Figure 6.1c). The part of P that decides what computation to invoke for a given element of D is called an input classifier. We remind the reader that the structure of a program may not resemble the case we have shown inside the larger circle in Figure 6.1c. The figure simply denotes the fact that a program does different computations for different subsets of its input domain. Programs perform input classification through sequences of predicates, though an input classifier may not exist as a single module. Therefore, a program will perform the wrong computation if there are faults in the input classification portion. With the above backdrop, we define the following two terms: • A domain is a set of input values for which the program performs the same computation for every member of the set. We are interested in maximal domains such that the program performs different computations on adjacent domains. Input domain D Input domain D D3 D1 D4 D5 D2 Input domain D D3 D1 D4 D5 D2 6.2 TESTING FOR DOMAIN ERRORS 137 Program P Program output (a) Program P (b) Program P Conceptual input classifier Computation for D1 : : Computation for D5 Program output Program output (c) Figure 6.1 Illustration of the concept of program domains. • A program is said to have a domain error if the program incorrectly performs input classification. Assuming that adjacent domains perform different computations, a domain error will cause the program to produce incorrect output. 6.2 TESTING FOR DOMAIN ERRORS The idea of domain testing was first studied by White and Cohen in 1978 [2, 3]. There is a fundamental difference between flow graph–based testing techniques and domain testing. By flow graph we mean control flow graph and data flow graph. The difference is explained as follows: • Select paths from a control flow graph or a data flow graph to satisfy certain coverage criteria. To remind the reader, the control flow coverage 138 CHAPTER 6 DOMAIN TESTING criteria are statement coverage, branch coverage, and predicate coverage. Similarly, the criteria studied to cover the definition and use aspects of variables in a program are all-defs, all-c-uses, all-p-uses, and all-uses, to name a few. The path predicates were analyzed to derive test data. While selecting paths and the corresponding test data, no assumption is made regarding the actual type of faults that the selected test cases could potentially uncover, that is, no specific types of faults are explicitly considered for detection. • Domain testing takes an entirely new approach to fault detection. One defines a category of faults, called domain errors, and selects test data to detect those faults. If a program has domain errors, those will be revealed by the test cases. We discuss the following concepts in detail: • Sources of Domains: By means of an example program, we explain how program predicates behave as an input classifier. • Types of Domain Errors: We explain how minor modifications to program predicates, which can be interpreted as programming defects, can lead to domain errors. • Selecting Test Data to Reveal Domain Errors: A test selection criterion is explained to pick input values. The test data so chosen reveal the specific kinds of domain errors. 6.3 SOURCES OF DOMAINS Domains can be identified from both specifications and programs. We explain a method to identify domains from source code using the following steps: • Draw a control flow graph from the given source code. • Find all possible interpretations of the predicates. In other words, express the predicates solely in terms of the input vector and, possibly, a vector of constants. The reader may note that a predicate in a program may have multiple interpretations, because control may arrive at a predicate node via different paths. • Analyze the interpreted predicates to identify domains. In the following, we explain the above procedure to identify domains. We show an example C function in Figure 6.2 to illustrate a procedure to identify domains. The function accepts two inputs x and y and returns an integer. A control flow graph representation of codedomain() is shown in Figure 6.3. The two predicates in the two if() statements have been represented by nodes 3 and 6 in Figure 6.3. The predicate P1 : c > 5 int codedomain(int x, int y){ int c, d, k c = x + y; if (c > 5) d = c - x/2; else d = c + x/2; if (d >= c + 2) k = x + d/2; else k = y + d/4; return(k); } Figure 6.2 A function to explain program domains. 6.3 SOURCES OF DOMAINS 139 in the first if() statement has just one interpretation, namely, P1 : x + y > 5 because program control reaches the if() statement via only one path from the initial node. However, predicate Initialize: x, y 1 c=x+y 2 False P1 c>5 3 True 5 4 d = c + x/2 d = c − x/2 P1: x + y > 5 P1: False P2: x ≥ 4 False P2 6 d>=c+2 True 8 7 k = y + d/4 k = x + d/2 P1: True P2: x ≤ − 4 9 return (k) Figure 6.3 Control flow graph representation of the function in Figure 6.2. 140 CHAPTER 6 DOMAIN TESTING P2 : d ≥ c + 2 in the second if() statement gets two interpretations, because program control can reach the second if() statement along two paths: (i) when the first if() evaluates to true and (ii) when the first if() evaluates to false. These two interpretations are summarized in Table 6.1. We explain a procedure to obtain domains from the interpretations of P 1 and P2 (Figure 6.3). We show a two-dimensional grid labeled x and y in Figure 6.4. The grid size is large enough to show all the domains of the program under consideration. We consider the predicate nodes of the control flow graph one by one (Figure 6.3). Predicate P 1 divides the grid into two regions. The P 1 boundary is shown by a straight line represented by the equality x + y = 5. All the points above, but excluding this line, satisfy predicate P1. TABLE 6.1 Two Interpretations of Second if() Statement in Figure 6.2 Evaluation of P1 True False Interpretation of P2 x ≤ −4 x ≥4 12 14 TT TF P2 (P1 = False) P1 P2 (P1 = True) 67 −1 0 1 ... y FF FT −7 −6 −7 −6 −4 −1 0 1 ... 4 7 x Figure 6.4 Domains obtained from interpreted predicates in Figure 6.3. 6.4 TYPES OF DOMAIN ERRORS 141 Next, we consider the two interpretations of predicate P 2. For P 1 = True, P 2 has the following interpretation P2 : x ≤ −4 Therefore, P 2 further divides the area, or set of points, defined by P 1 = True into two sets corresponding to its two truth values. The P 2 boundary, when P 1 evaluates to true, is represented by the straight line x = − 4. The area to the left of the P 2 boundary and above the P 1 boundary corresponds to P 1P 2 = TT, and the area to the right of the P2 boundary and above the P1 boundary corresponds to P 1P 2 = TF. For P 1 = False, P 2 has the following interpretation: P2 : x > 4 In other words, P2 further divides the area, or set of points, defined by P 1 = False into two sets corresponding to its two truth values. The P 2 boundary, when P 1 evaluates to false, is represented by the straight line x = 4. The area to the right of the P 2 boundary and below the P 1 boundary corresponds to P 1P 2 = FT, and the area to the left of the P 2 boundary and below the P 1 boundary corresponds to P 1P 2 = FF in Figure 6.4. The reader may note that if a program contains k predicates in a sequence, the maximum number of domains obtained is 2k . In practice, the number of domains obtained is much smaller than 2k , because certain combinations of truth values of those k predicates may not hold simultaneously. 6.4 TYPES OF DOMAIN ERRORS The reader may recall the following properties of a domain: • A domain is a set of values for which the program performs identical computations. • A domain can be represented by a set of predicates. Individual elements of the domain satisfy the predicates of the domain. Example: The domain TT in Figure 6.4 is mathematically represented by the set of predicates shown in Figure 6.5. A domain is defined, from a geometric perspective, by a set of constraints called boundary inequalities. Properties of a domain are discussed in terms of the properties of its boundaries as follows: P1: x+y>5 ≡ True P2: x < = −4 ≡ True Figure 6.5 Predicates defining the TT domain in Figure 6.4. 142 CHAPTER 6 DOMAIN TESTING Closed Boundary: A boundary is said to be closed if the points on the boundary are included in the domain of interest. Example: Consider the domain TT in Figure 6.4 and its boundary defined by the inequality P2 : x ≤ −4 The above boundary is a closed boundary of the domain TT. Open Boundary: A boundary is said to be open if the points on the boundary do not belong to the domain of interest. Example: Consider the domain TT in Figure 6.4 and its boundary defined by the inequality P1 : x + y > 5 The above boundary is an open boundary of the domain TT. The reader may notice that it is the equality symbol ( = ) in a relational operator that determines whether or not a boundary is closed. If the relational operator in a boundary inequality has the equality symbol in it, then the boundary is a closed boundary; otherwise it is an open boundary. Closed Domain: A domain is said to be closed if all of its boundaries are closed. Open Domain: A domain is said to be open if some of its boundaries are open. Extreme Point: An extreme point is a point where two or more boundaries cross. Adjacent Domains: Two domains are said to be adjacent if they have a boundary inequality in common. A program path will have a domain error if there is incorrect formulation of a path predicate. After an interpretation of an incorrect path predicate, the path predicate expression causes a boundary segment to • be shifted from its correct position or • have an incorrect relational operator. A domain error can be caused by • an incorrectly specified predicate or • an incorrect assignment which affects a variable used in the predicate. Now we discuss different types of domain errors: 6.4 TYPES OF DOMAIN ERRORS 143 Closure Error: A closure error occurs if a boundary is open when the intention is to have a closed boundary, or vice versa. Some examples of closure error are: • The relational operator ≤ is implemented as < . • The relational operator < is implemented as ≤ . Shifted-Boundary Error: A shifted-boundary error occurs when the implemented boundary is parallel to the intended boundary. This happens when the constant term of the inequality defining the boundary takes up a value different from the intended value. In concrete terms, a shifted-boundary error occurs due to a change in the magnitude or the sign of the constant term of the inequality. Example: Consider the boundary defined by the following predicate (Figure 6.4): P1 : x + y > 5 If the programmer’s intention was to define a boundary represented by the predicate P1 : x + y > 4 then the boundary defined by P1 is parallel, but not identical, to the boundary defined by P1. Tilted-Boundary Error: If the constant coefficients of the variables in a predicate defining a boundary take up wrong values, then the tilted-boundary error occurs. Example: Consider the boundary defined by the following predicate (Figure 6.4): P1 : x + y > 5 If the programmer’s intention was to define a boundary represented by the predicate P1 : x + 0.5y > 5 then the boundary defined by P1 is tilted with respect to the boundary defined by P1 . The reader may recall that for all the data points in a domain the program performs identical computations. It is not difficult to notice that input data points fall in the wrong domain if there is a closure defect, a shifted boundary, or a tilted boundary. Assuming that domains are maximal in size in the sense that adjacent domains perform different computations, a program will produce a wrong outcome because of wrong computations performed on those input data points which fall in the wrong domains. 144 CHAPTER 6 DOMAIN TESTING 6.5 ON AND OFF POINTS In domain testing a programmer targets domain errors where test cases are designed with the objective of revealing the domain errors as discussed in Section 6.4. Therefore, it is essential that we consider an important characteristic of domain errors, stated as follows: Data points on or near a boundary are most sensitive to domain errors. In this observation, by sensitive we mean data points falling in the wrong domains. Therefore, the objective is to identify the data points that are most sensitive to domain errors so that errors can be detected by executing the program with those input values. In the following, we define two kinds of data points near domain boundaries, namely, ON point and OFF point: ON Point: Given a boundary, an ON point is a point on the boundary or “very close” to the boundary. This definition suggests that we can choose an ON point in two ways. Therefore, one must know when to choose an ON point in which way: • If a point can be chosen to lie exactly on the boundary, then choose such a point as an ON point. If the boundary inequality leads to an exact solution, choose such an exact solution as an ON point. • If a boundary inequality leads to an approximate solution, choose a point very close to the boundary. Example: Consider the following boundary inequality. This inequality is not related to our running example of Figure 6.4. PON1 : x + 7y ≥ 6 For x = −1, the predicate P ON1 leads to an exact solution of y = 1. Therefore, the point (−1, 1) lies on the boundary. However, if we choose x = 0, the predicate P ON1 leads to an approximate solution of y in the form of y = 0.8571428 . . . . Since y does not have an exact solution, we either truncate it to 0.857 or round it off to 0.858. We notice that the point (0, 0.857) does not satisfy the predicate P ON1, whereas the point (0, 0.858) does. Thus, (0, 0.858) is an ON point which lies very close to the P ON1 boundary. Example: Consider a domain with the following open boundary: PON2 : x + 7y < 6 Points lying exactly on the boundary defined by the predicate PON2 : x + 7y = 6 are not a part of the domain under consideration. The point (−1, 1) lies exactly on the boundary PON2 and is an ON point. Note that the point (−1, 1) is not a part of the domain under consideration. Similarly, the point (0, 0.858), which is almost on the boundary, that is, very close to the boundary, is an ON point and it lies outside the domain of interest. 6.5 ON AND OFF POINTS 145 OFF Point: An OFF point of a boundary lies away from the boundary. However, while choosing an OFF point, we must consider whether a boundary is open or closed with respect to a domain: • If the domain is open with respect to the boundary, then an OFF point of that boundary is an interior point inside the domain within an -distance from the boundary. • If the domain is closed with respect to the boundary, then an OFF point of that boundary is an exterior point outside the boundary within an -distance. The symbol denotes an arbitrarily small value. Example: Consider a domain D1 with a closed boundary as follows: POFF1 : x + 7y ≥ 6 Since the boundary is closed, an OFF point lies outside the domain; this means that the boundary inequality is not satisfied. Note that the point (−1, 1) lies exactly on the boundary and it belongs to the domain. Therefore, (−1, 1) is not an OFF point. However, the point (−1, 0.99) lies outside the domain, and it is not a part of the domain under consideration. This is easily verified by substituting x = −1 and y = 0.99 in the above P OFF1 inequality which produces a value of 5.93. Therefore, (−1, 0.99) is an OFF point. Example: Consider a domain D2 which is adjacent to domain D1 in the above example with an open boundary as follows: POFF2 : x + 7y < 6 It may be noted that we have obtained POFF2 from POFF1 by simply reversing the ≥ inequality. Since the POFF2 boundary is open, an OFF point lies inside the domain. It can be easily verified that the point (− 1, 0.99) lies inside D2, and hence it is an OFF point for domain D2 with respect to boundary P OFF2. Summary The above ideas of ON and OFF points lead to the following conclusions: • While testing a closed boundary, the ON points are in the domain under test, whereas the OFF points are in an adjacent domain. • While testing an open boundary, the ON points are in an adjacent domain, whereas the OFF points are in the domain being tested. The above ideas have been further explained in Figure 6.6, which shows two domains D1 and D2 defined by predicates x < 4 and x ≥ 4, respectively. Therefore, the actual boundary is defined by the following predicate: PON,OFF : x=4 In the figure, we show two ON points A and B , where A lies exactly on the boundary and B lies “very close” to the boundary. Therefore, we have A = 4 and 146 CHAPTER 6 DOMAIN TESTING Boundary: Open with respect to D1 Closed with respect to D2 Domain D1 (x < 4) Domain D2 (x ≥ 4) B x C x ε x axix x A 4 ON point for D1 and D2 (very close to boundary) ON point for D1 and D2 (lying exactly on boundary) OFF point for D1 and D2 (lying away from boundary) Figure 6.6 ON and OFF points. B = 4.00001, for example. We show an OFF point C lying in D1 away from the boundary. Point C = 3.95 lies inside domain D1 and outside domain D2. 6.6 TEST SELECTION CRITERION In this section, we explain a criterion for test selection and show that test data so selected reveal the domain errors identified in Section 6.4. Before we explain the selection criterion, we state the assumptions made in domain testing as follows: • A program performs different computations in adjacent domains. If this assumption does not hold, then data points falling in the wrong domains may not have any influence on program outcome, and therefore failures will not be observed. • Boundary predicates are linear functions of input variables. This is not a strong assumption given that most of the predicates in real-life programs are linear. This is because programmers can easily visualize linear predicates and use them. 6.6 TEST SELECTION CRITERION 147 We present the following criterion for domain testing and show that test data selected using this criterion reveal domain errors: Test Selection Criterion: For each domain and for each boundary, select three points A, C , and B in an ON–OFF–ON sequence. This criterion generates test data that reveal domain errors. Specifically, the following kinds of errors are considered: 1. Closed inequality boundary a. Boundary shift resulting in a reduced domain b. Boundary shift resulting in an enlarged domain c. Boundary tilt d. Closure error 2. Open inequality boundary a. Boundary shift resulting in a reduced domain b. Boundary shift resulting in an enlarged domain c. Boundary tilt d. Closure error 3. Equality boundary In our analysis below, we consider two adjacent domains D1 and D2. We assume that the program computation associated with D1 and D2 are f 1 and f 2, respectively, and f 1 = f 2. 1a (Closed Inequality) Boundary Shift Resulting in Reduced Domain: The boundary between the two domains D1 and D2 has shifted by a certain amount (see Figure 6.7). The figure shows the actual boundary between the two domains and an arbitrary position of the expected boundary. One must remember that we do not know the exact position of the expected boundary. The expected boundary has been shown only to explain that the actual boundary has moved away from the expected boundary for conceptual understanding of boundary shift. The boundary between the two domains is closed with respect to domain D1. Therefore, the two D2, f2 xC x x A B D1, f1 Expected boundary Actual boundary (closed with respect to D1) Figure 6.7 Boundary shift resulting in reduced domain (closed inequality). 148 CHAPTER 6 DOMAIN TESTING ON points A and B belong to domain D1, and the OFF point C belongs to domain D2. Hence the actual output from the program corresponding to test data A, B , and C are f 1(A), f 1(B ), and f 2(C ), respectively. It is obvious from Figure 6.7 that in the absence of any boundary shift all the test points belong to domain D1. Therefore, the expected output corresponding to test data A, B , and C are f 1(A), f 1(B ), and f 1(C ), respectively. These outputs are listed in Table 6.2. We observe, by comparing the second and the third columns of Table 6.2, that the actual output and the expected output are not identical for data point C . Hence, data point C reveals the shifted-boundary fault. It is important to understand the following at this point: • We do not need to know the exact position of the expected boundary. This is because what we actually need are the expected program outcomes in response to the three data points A, B, and C , which can be computed from the specification of a program without explicitly finding out the expected boundary. • All three data points A, B , and C need not reveal the same fault. Our purpose is to show that test data selected according to the stated criterion reveal all domain errors. The purpose is satisfied if at least one data point reveals the fault. Different elements of the set {A, B , C } reveal different kinds of domain errors. • If point C is away from the boundary by a magnitude of , then a boundary shift of magnitude less than cannot be detected. This is because the expected output f 2(C ) is identical to the actual output f 2(C ). 1b (Closed Inequality) Boundary Shift Resulting in Enlarged Domain: To detect this fault, we use Figure 6.8, where the boundary between the two domains D1 and D2 has shifted from its expected position such that the size of the domain D1 under consideration has enlarged. Once again, we do not know the exact position of the expected boundary. The boundary between the two domains is closed with respect to domain D1. Therefore, the two ON points A and B belong to domain D1, and the OFF point C belongs to domain D2. Hence the actual outputs from the program corresponding to test data A, B , and C are f 1(A), f 1(B ), and f 2(C ), respectively. From Figure 6.8 it is clear that, in the absence of any boundary shift, all the test points belong to domain D2. Therefore, the expected outputs corresponding to test data A, B , and C are f 2(A), f 2(B ), and f 2(C ), respectively. We observe from Table 6.3 that the actual output and the expected output are not identical for data points A and B . Hence, data points A and B TABLE 6.2 Detection of Boundary Shift Resulting in Reduced Domain (Closed Inequality) Test Data Actual Output Expected Output Fault Detected A f 1(A) f 1(A) No B f 1(B ) f 1(B ) No C f 2(C ) f 1(C ) Yes 6.6 TEST SELECTION CRITERION 149 D2, f2 xC x x A B D1, f1 Actual boundary (closed with respect to D1) Expected boundary Figure 6.8 Boundary shift resulting in enlarged domain (closed inequality). TABLE 6.3 Detection of Boundary Shift Resulting in Enlarged Domain (Closed Inequality) Test Data Actual Output Expected Output Fault Detected A f 1(A) f 2(A) Yes B f 1(B ) f 2(B ) Yes C f 2(C ) f 2(C ) No reveal the shifted-boundary fault. If the magnitude of the shift is less than —the magnitude by which the OFF point is away from the boundary—the boundary shift cannot be detected by these test data. 1c (Closed Inequality) Boundary Tilt: In Figure 6.9 the boundary between the two domains D1 and D2 has tilted by an appreciable amount. The boundary between the two domains is closed with respect to domain D1. Therefore, the two ON points A and B belong to domain D1, and the OFF point C belongs to domain D2. Hence the actual outputs from the program corresponding to test data A, B , and C are f 1(A), f 1(B ), and f 2(C ), respectively. It is clear from Figure 6.9 that in the absence of any boundary tilt test point A falls in domain D1 and test points B and C fall in domain D2. Therefore, the expected outputs corresponding to test D2, f2 xC x x A B D1, f1 Expected boundary Actual boundary (closed with respect to D1) Figure 6.9 Tilted boundary (closed inequality). 150 CHAPTER 6 DOMAIN TESTING data A, B , and C are f 1(A), f 2(B ), and f 2(C ), respectively. By comparing the second and the third columns of Table 6.4 we observe that the actual output and the expected output are not identical for test point B. Hence, test point B reveals the tilted-boundary fault. 1d (Closed Inequality) Closure Error: The expected boundary between the two domains in Figure 6.10 is closed with respect to domain D1. However, in an actual implementation, it is open with respect to D1, resulting in a closure error. The boundary between the two domains belongs to domain D2. The two ON points A and B belong to domain D2, and the OFF point C belongs to domain D1. Hence the actual outputs from the program corresponding to test data A, B , and C are f 2(A), f 2(B ), and f 1(C ), respectively. In the absence of any closure error all three test points A, B and C fall in domain D1. These outputs are listed in Table 6.5. By comparing the second and the third columns of Table 6.5 we observe that the actual output and the expected output are not identical for data points A and B. Therefore, data points A and B reveal the closure boundary fault. 2a (Open Inequality) Boundary Shift Resulting in Reduced Domain: To explain the detection of this type of error, we use Figure 6.11, where the boundary between the two domains D1 and D2 has shifted by a certain amount. The boundary between the two domains is open with respect to domain D1. Therefore, the two ON points A and B belong to domain D2, and the OFF point C belongs to domain D1. Hence the actual outputs from the program corresponding to test data A, B , and C are f 2(A), f 2(B ), and f 1(C ), respectively. It is obvious from Figure 6.11 that, in the absence of any boundary shift, all the test points belong to domain D1. Therefore, the expected outputs corresponding to test data A, B , TABLE 6.4 Detection of Boundary Tilt (Closed Inequality) Test Data Actual Output Expected Output A f 1(A) B f 1(B ) C f 2(C ) f 1(A) f 2(B ) f 2(C ) Fault Detected No Yes No D2, f2 x A xB xC D1, f1 Expected boundary (closed with respect to D1) Actual boundary (open with respect to D1) Figure 6.10 Closure error (closed inequality). 6.6 TEST SELECTION CRITERION 151 TABLE 6.5 Detection of Closure Error (Closed Inequality) Test Data Actual Output Expected Output A f 2(A) B f 2(B ) C f 1(C ) f 1(A) f 1(B ) f 1(C ) Fault Detected Yes Yes No D2, f2 Expected boundary x x A x B C D1, f1 Actual boundary (open with respect to D1) Figure 6.11 Boundary shift resulting in reduced domain (open inequality). and C are f 1(A), f 1(B ), and f 1(C ), respectively. By comparing the second and third columns of Table 6.6 we observe that the actual output and the expected output are not identical for the data point C . Therefore, data point C reveals the shifted-boundary fault. 2b (Open Inequality) Boundary Shift Resulting in Enlarged Domain: We use Figure 6.12 to explain the detection of this kind of errors. The boundary between the two domains D1 and D2 has shifted to enlarge the size of the domain D1 under consideration. The boundary between the two domains is open with respect to domain D1. Therefore, the two ON points A and B belong to domain D2, and the OFF point C belongs to domain D1. Hence the actual outputs from the program corresponding to test data A, B , and C are f 2(A), f 2(B ), and f 1(C ), respectively. It follows from Figure 6.12 that, in the absence of any boundary shift, all the test points belong to domain D2. Therefore, the expected outputs corresponding to test data A, B , and C are f 2(A), f 2(B ), and f 2(C ), respectively. These outputs TABLE 6.6 Detection of Boundary Shift Resulting in Reduced Domain (Open Inequality) Test Data Actual Output Expected Output Fault Detected A f 2(A) f 1(A) Yes B f 2(B ) f 1(B ) Yes C f 1(C ) f 1(C ) No 152 CHAPTER 6 DOMAIN TESTING D2, f2 x x A x B C D1, f1 Actual boundary (open with respect to D1) Expected boundary Figure 6.12 Boundary shift resulting in enlarged domain (open inequality). are listed in Table 6.7. From Table 6.7 we observe that data point C reveals the shifted-boundary fault. 2c (Open Inequality) Boundary Tilt: We explain the boundary tilt fault by referring to Figure 6.13, where the boundary between the two domains D1 and D2 has tilted. Once again, we do not know the exact position of the expected boundary. The boundary between the two domains is open with respect to domain D1. Therefore, the two ON points A and B belong to domain D2, and the OFF point C belongs to domain D1. Hence the actual outputs from the program corresponding to test data A, B , and C are f 2(A), f 2(B ), and f 1(C ), respectively. Figure 6.13 shows that in the absence of any boundary tilt test points A and C fall in domain D1, and test point B falls in domain D2. Therefore, the expected outputs corresponding to test data A, B , and C are f 1(A), f 2(B ), and f 1(C ), respectively. We compare the second and third columns of Table 6.8 to observe that the actual output and the expected output are not identical for the test point A. Hence, the test point A reveals the tilted-boundary fault. 2d (Open Inequality) Closure Error: Detection of this kind of fault is explained by using the two domains of Figure 6.14, where the expected boundary between the two domains is open with respect to domain D1. However, in an actual implementation it is closed with respect to D1, resulting in a closure error. The two ON points A and B belong to domain D1, and the OFF point C belongs to domain D2. Hence the actual outputs from the program corresponding to test data A, B , and C are f 1(A), f 1(B ), and f 2(C ), respectively. Figure 6.14 shows that, in the absence of any closure error, all three test points A, B and C fall in domain D2. Table 6.9 shows the actual outputs and the expected outputs. By TABLE 6.7 Detection of Boundary Shift Resulting in Enlarged Domain (Open Inequality) Test Data Actual Output Expected Output Fault Detected A f 2(A) f 2(A) No B f 2(B ) f 2(B ) No C f 1(C ) f 2(C ) Yes D2, f2 6.6 TEST SELECTION CRITERION 153 Expected boundary x x A xC B D1, f1 Actual boundary (open with respect to D1) Figure 6.13 Tilted boundary (open inequality). TABLE 6.8 Detection of Boundary Tilt (Open Inequality) Test Data Actual Output Expected Output A f 2(A) B f 2(B ) C f 1(C ) f 1(A) f 2(B ) f 1(C ) Fault Detected Yes No No comparing the second and the third columns of Table 6.9 we observe that the actual output and the expected output are not identical for data points A and B. Therefore, data points A and B reveal the closure boundary fault. 3. Equality Boundary: Sometimes a domain may consist of an equality boundary sandwiched between two open domains, as shown in Figure 6.15, where D1 and D2 are two domains open with respect to their common equality boundary. In this case, to test the common boundary, we choose two ON points A and B on the boundary and two OFF points C and D—one in each open domain. D2, f2 xC x x A B D1, f1 Expected boundary (open with respect to D1) Actual boundary (closed with respect to D1) Figure 6.14 Closure error (open inequality). 154 CHAPTER 6 DOMAIN TESTING TABLE 6.9 Detection of Closure Error (Open Inequality) Test Data Actual Output Expected Output A f 1(A) B f 1(B ) C f 2(C ) f 2(A) f 2(B ) f 2(C ) Fault Detected Yes Yes No 6.7 SUMMARY Two kinds of program errors, namely computation error and domain errors, were identified. A computation error occurs when an input value causes the program to execute the correct path, but the program output is incorrect due to a fault in an assignment statement. A domain error occurs when an input value causes the program to execute the wrong path. A program executes a wrong path because of faults in conditional statements. A program can be viewed as an abstract classifier that partitions the input domain into a finite number of subdomains such that a separate program path executes for each input subdomain. Thus, a program is seen to be mapping the input subdomains to its execution paths. Program subdomains can be identified by considering individual paths in the program and evaluating path predicates. Each subdomain, also called domain, is defined by a set of boundaries. Often input data points would fall in a wrong domain if there are faults in defining domain boundaries, thereby executing the wrong paths. Input domains were characterized by means of a few properties, such as closed boundary, open boundary, closed domain, open domain, extreme point, and adjacent domain. Next, three kinds of boundary errors, namely, closure error, shifted-boundary error, and tilted-boundary error, were identified. Given a domain and its boundaries, the concept of ON and OFF points were explained. Finally, a test selection criterion was defined to choose test points to reveal domain errors. Specifically, the selection criterion is as follows: For each domain and for each boundary, select three points A, C , and B in an ON–OFF–ON sequence. Open D2, f2 x C Ax Domain D3 defined by an equality boundary and associated computation f3 x B xD Open D1, f1 Figure 6.15 Equality border. LITERATURE REVIEW 155 LITERATURE REVIEW Since White and Cohen proposed the concept of domain testing in 1978, it has been analyzed and extended in several ways. In 1982, Clarke, Hassell, and Richardson [4] showed that some domain errors go undetected by the White and Cohen strategy. Next, they proposed a strategy, namely the V × V strategy, to improve domain testing. If a domain border under consideration contains V vertices, then the V × V strategy selects V ON points—one ON point as close as possible to each vertex—and V OFF points. The V OFF points are chosen at a uniform distance from the border. Zeil, Afifi, and White [5] introduced a domain testing strategy to detect linear errors in nonlinear predicate functions. A few other variants of domain testing have been proposed by White and Perera [6] and Onoma, Yamaura, and Kobayashi [7]. Zeil [8] considers domain errors that may be caused by faults in arithmetic and simple relational expressions. These expressions are restricted to floating-point or integer computations. Fault detection techniques, called perturbation techniques, are presented to reveal domain errors. Koh and Liu [9] have presented an approach for generating paths that test both the control flow and the data flow in implementations of communication protocols. The protocols are assumed to be modeled as extended finite-state machines. The path selection approach consists of two steps: (i) select a set of paths to cover a data flow selection criterion and (ii) selectively augment the state transitions in the chosen set of paths with state check sequences so that control flow can be ensured and data flow coverage can be preserved. Augmentation of state transitions is performed by using the concept of effective domains. Jeng and Weyuker [10] have proposed a simplified domain testing strategy that is applicable to arbitrary types of predicates and detects both linear and nonlinear errors for both continuous and discrete variable spaces. Moreover, the strategy requires a constant number of test points. That is, the number of test points is independent of the dimension or the type of border or the number of vertices on the border under consideration. Their simplified technique requires us to generate one ON point and one OFF point for an inequality (i.e., ≤ , < , ≥ , or > ) border. For an equality (i.e., = ) or nonequality (i.e., =) border, one ON and two OFF test points are required. The test generation technique requires (i) an ON point to lie on the border, (ii) an OFF point to lie outside the border, and (iii) an ON–OFF pair to be as close to each other as possible. Hajnal and Forgacs [11] have given an algorithm to generate ON–OFF points that can be used by the simplified domain testing strategy. In contrast, the test selection strategy of White and Cohen [3] requires the selection of N ON points in all cases, where N is the dimension of the input space, and the Clarke, Hassell, and Richardson [4] strategy requires the selection of V ON points, where V is the number of vertices on the border under consideration. Zhao, Lyu, and Min [12] have studied an approach to generate ON–OFF test points for character string predicate borders associated with program paths. They use the idea of program slicing [13] to compute the current values of variables in the 156 CHAPTER 6 DOMAIN TESTING predicates. The same authors have shown in reference [14] that partition testing strategies are relatively ineffective in detecting faults related to small shifts in input domain boundary, and presented a different testing approach based on input domain analysis of specifications and programs. An elaborate treatment of domain testing can be found in the book by Beizer [15]. Beizer explains how the idea of domains can be used in testing interfaces between program units. REFERENCES 1. W. E. Howden. Reliability of the Path Analysis Testing Strategy. IEEE Transactions on Software Engineering, September 1976, pp. 208– 215. 2. L. J. White and E. I. Cohen. A Domain Strategy for Computer Program Testing. Paper presented at the IEEE Workshop on Software Testing and Documentation, Fort Lauderdale, FL, 1978, pp. 335– 346. 3. L. J. White and E. I. Cohen. A Domain Strategy for Computer Program Testing. IEEE Transactions on Software Engineering, May 1980, pp. 247–257. 4. L. Clarke, H. Hassell, and D. Richardson. A Close Look at Domain Testing. IEEE Transactions on Software Engineering, July 1982, pp. 380– 392. 5. S. J. Zeil, F. H. Afifi, and L. J. White. Detection Linear Errors via Domain Testing. ACM Transactions on Software Engineering and Methodology, October 1992, pp. 422– 451. 6. L. J. White and I. A. Perera. An Alternative Measure for Error Analysis of the DomainTesting Strategy. In Proceedings of the ACM SIGSOFT/IEEE Workshop on Software Testing, Banff, Canada, IEEE Press, New York, 1986, pp. 122– 131. 7. A. K. Onoma, T. Yamaura, and Y. Kobayashi. Practical Approaches to Domain Testing: Improvements and Generalization. In Proceedings of COMPSAC , Tokyo, IEEE Computer Society Press, Piscataway, NJ, 1987, pp. 291– 297. 8. S. J. Zeil. Perturbation Technique for Detecting Domain Errors. IEEE Transactions on Software Engineering, June 1989, pp. 737–746. 9. L. S. Koh and M. T. Liu. Test Path Selection Based on Effective Domains. In Proceedings of the International Conference on Network Protocols, Boston, IEEE Computer Society Press, Piscataway, NJ, October 1994, pp. 64–71. 10. B. Jeng and E. J. Weyuker. A Simplified Domain Testing Strategy. ACM Transactions on Software Engineering and Methodology, July 1994, pp. 254– 270. 11. A. Hajnal and I. Forgacs. An Applicable Test Data Generation Algorithm for Domain Errors. In Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis, Clearwater Beach, FL, ACM Press, New York, March 1998, pp. 63–72. 12. R. Zhao, M. R. Lyu, and Y. Min. Domain Testing Based on Character String Predicate. Paper presented at the Asian Test Symposium, Xian, China, IEEE Computer Society Press, Piscataway, NJ, 2003, pp. 96–101. 13. F. Tip. A Survey of Program Slicing Techniques. Journal of Programming Languages, September 1995, pp. 121–189. 14. R. Zhao, M. R. Lyu, and Y. Min. A New Software Testing Approach Based on Domain Analysis of Specifications and Programs. In Proceedings of 14th Symposium on Software Reliability Engineering, Colorado, IEEE Computer Society Press, New York, 2003, pp. 60–70. 15. B. Beizer. Software Testing Techniques, 2nd ed. Van Nostrand Reinhold, New York, 1990. Exercises 1. Explain what are computation error and domain error. 2. Give an example of code showing a domain error. Y-axis 5 D1 REFERENCES 157 D3 −5 (0, 0) D2 −5 Figure 6.16 Domains D1, D2, and D3. 5 X-axis 3. Explain the difference between control flow–based testing and domain error–based testing. 4. Recall that the domain testing strategy requires us to select test points on and/or very close to domain boundaries. Why do we not select test points far from the boundaries? 5. Consider the three domains D1, D2, and D3 shown in Figure 6.16. Domain D3 consists of all those points lying on the indicated straight line. Assuming that the maximum X and Y span of all the three domains are [−5, 5] and [−5, 5], respectively, give concrete values of test points for domain D3. 6. State four kinds of domain errors and explain how they occur. 7. Explain the following terms: closed boundary, open boundary, closed domain, open domain, extreme point, adjacent domain. 8. Explain the idea of ON points and OFF points. 9. Clearly explain the test selection criterion in domain-based testing and show the closed inequality error (boundary shift resulting in a reduced domain) is detected by the test points chosen by the selection criterion. 10. Identify some difficulties in applying the concept of domain testing to actual program testing. 7 CHAPTER System Integration Testing I criticize by creation, not by finding fault. — Marcus Tullius Cicero 7.1 CONCEPT OF INTEGRATION TESTING A software module, or component, is a self-contained element of a system. Modules have well-defined interfaces with other modules. A module can be a subroutine, function, procedure, class, or collection of those basic elements put together to deliver a higher level service. A system is a collection of modules interconnected in a certain way to accomplish a tangible objective. A subsystem is an interim system that is not fully integrated with all the modules. It is also known as a subassembly. In moderate to large projects, from tens to hundreds of programmers implement their share of the code in the form of modules. Modules are individually tested, which is commonly known as unit testing, by their respective programmers using white-box testing techniques. At the unit testing level, the system exists in pieces under the control of the programmers. The next major task is to put the modules, that is, pieces, together to construct the complete system. Constructing a working system from the pieces is not a straightforward task, because of numerous interface errors. Even constructing a reasonably stable system from the components involves much testing. The path from tested components to constructing a deliverable system contains two major testing phases, namely, integration testing and system testing. The primary objective of integration testing is to assemble a reasonably stable system in a laboratory environment such that the integrated system can withstand the rigor of a full-blown system testing in the actual environment of the system. The importance of integration testing stems from three reasons as outlined below. • Different modules are generally created by groups of different developers. The developers may be working at different sites. In spite of our best effort in system design and documentation, misinterpretation, mistakes, and Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 158 7.2 DIFFERENT TYPES OF INTERFACES AND INTERFACE ERRORS 159 oversights do occur in reality. Interface errors between modules created by different programmers and even by the same programmers are rampant. We will discuss the sources of interface errors in Section 7.2. • Unit testing of individual modules is carried out in a controlled environment by using test drivers and stubs. Stubs are dummy modules which merely return predefined values. If a module under unit test invokes several other modules, the effectiveness of unit testing is constrained by the programmer’s ability to effectively test all the paths. Therefore, with the inherent limitations of unit testing, it is difficult to predict the behavior of a module in its actual environment after the unit testing is performed. • Some modules are more error prone than other modules, because of their inherent complexity. It is essential to identify the ones causing most failures. The objective of system integration is to build a “working” version of the system by (i) putting the modules together in an incremental manner and (ii) ensuring that the additional modules work as expected without disturbing the functionalities of the modules already put together. In other words, system integration testing is a systematic technique for assembling a software system while conducting tests to uncover errors associated with interfacing. We ensure that unit-tested modules operate correctly when they are combined together as dictated by the design. Integration testing usually proceeds from small subassemblies containing a few modules to larger ones containing more and more modules. Large, complex software products can go through several iterations of build-and-test cycles before they are fully integrated. Integration testing is said to be complete when the system is fully integrated together, all the test cases have been executed, all the severe and moderate defects found have been fixed, and the system is retested. 7.2 DIFFERENT TYPES OF INTERFACES AND INTERFACE ERRORS Modularization is an important principle in software design, and modules are interfaced with other modules to realize the system’s functional requirements. An interface between two modules allows one module to access the service provided by the other. It implements a mechanism for passing control and data between modules. Three common paradigms for interfacing modules are as follows: • Procedure Call Interface: A procedure in one module calls a procedure in another module. The caller passes on control to the called module. The caller can pass data to the called procedure, and the called procedure can pass data to the caller while returning control back to the caller. • Shared Memory Interface: A block of memory is shared between two modules. The memory block may be allocated by one of the two modules 160 CHAPTER 7 SYSTEM INTEGRATION TESTING or a third module. Data are written into the memory block by one module and are read from the block by the other. • Message Passing Interface: One module prepares a message by initializing the fields of a data structure and sending the message to another module. This form of module interaction is common in client–server-based systems and web-based systems. Programmers test modules to their satisfaction. The question is: If all the unit-tested modules work individually, why can these modules not work when put together? The problem arises when we “put them together” because of rampant interface errors. Interface errors are those that are associated with structures existing outside the local environment of a module but which the module uses [1]. Perry and Evangelist [2] reported in 1987 that interface errors accounted for up to a quarter of all errors in the systems they examined. They found that of all errors that required a fix within one module, more than half were caused by interface errors. Perry and Evangelist have categorized interface errors as follows: 1. Construction: Some programming languages, such as C, generally separate the interface specification from the implementation code. In a C program, programmers can write a statement #include header.h, where header.h contains an interface specification. Since the interface specification lies somewhere away from the actual code, programmers overlook the interface specification while writing code. Therefore, inappropriate use of #include statements cause construction errors. 2. Inadequate Functionality: These are errors caused by implicit assumptions in one part of a system that another part of the system would perform a function. However, in reality, the “other part” does not provide the expected functionality—intentionally or unintentionally by the programmer who coded the other part. 3. Location of Functionality: Disagreement on or misunderstanding about the location of a functional capability within the software leads to this sort of error. The problem arises due to the design methodology, since these disputes should not occur at the code level. It is also possible that inexperienced personnel contribute to the problem. 4. Changes in Functionality: Changing one module without correctly adjusting for that change in other related modules affects the functionality of the program. 5. Added Functionality: A completely new functional module, or capability, was added as a system modification. Any added functionality after the module is checked in to the version control system without a CR is considered to be an error. 6. Misuse of Interface: One module makes an error in using the interface of a called module. This is likely to occur in a procedure–call interface. Interface misuse can take the form of wrong parameter type, wrong parameter order, or wrong number of parameters passed. 7.2 DIFFERENT TYPES OF INTERFACES AND INTERFACE ERRORS 161 7. Misunderstanding of Interface: A calling module may misunderstand the interface specification of a called module. The called module may assume that some parameters passed to it satisfy a certain condition, whereas the caller does not ensure that the condition holds. For example, assume that a called module is expected to return the index of an element in an array of integers. The called module may choose to implement binary search with an assumption that the calling module gives it a sorted array. If the caller fails to sort the array before invoking the second module, we will have an instance of interface misunderstanding. 8. Data Structure Alteration: These are similar in nature to the functionality problems discussed above, but they are likely to occur at the detailed design level. The problem arises when the size of a data structure is inadequate or it fails to contain a sufficient number of information fields. The problem has its genesis in the failure of the high-level design to fully specify the capability requirements of the data structure. Let us consider an example in which a module reads the data and keeps it in a record structure. Each record holds the person name followed by their employee number and salary. Now, if the data structure is defined for 1000 records, then as the number of record grows beyond 1000, the program is bound to fail. In addition, if management decides to award bonuses to a few outstanding employees, there may not be any storage space allocated for additional information. 9. Inadequate Error Processing: A called module may return an error code to the calling module. However, the calling module may fail to handle the error properly. 10. Additions to Error Processing: These errors are caused by changes to other modules which dictated changes in a module error handling. In this case either necessary functionality is missing from the current error processing that would help trace errors or current techniques of error processing require modification. 11. Inadequate Postprocessing: These errors are caused by a general failure to release resources no longer required, for example, failure to deallocate memory. 12. Inadequate Interface Support: The actual functionality supplied was inadequate to support the specified capabilities of the interface. For example, a module passes a temperature value in Celsius to a module which interprets the value in Fahrenheit. 13. Initialization/Value Errors: A failure to initialize, or assign, the appropriate value to a variable data structure leads to this kind of error. Problems of this kind are usually caused by simple oversight. For example, the value of a pointer can change; it might point to the first character in a string, then to the second character, after that to the third character, and so on. If the programmer forgets to reinitialize the pointer before using that function once again, the pointer may eventually point to code. 162 CHAPTER 7 SYSTEM INTEGRATION TESTING 14. Violation of Data Constraints: A specified relationship among data items was not supported by the implementation. This can happen due to incomplete detailed design specifications. 15. Timing/Performance Problems: These errors were caused by inadequate synchronization among communicating processes. A race condition is an example of these kinds of error. In the classical race, there are two possible events event a and event b happening in communicating processes process A and process B, respectively. There is logical ground for expecting event a to precede event b. However, under an abnormal condition event b may occur before event a. The program will fail if the software developer did not anticipate the possibility of event b preceding event a and did not write any code to deal with the situation. 16. Coordination of Changes: These errors are caused by a failure to communicate changes to one software module to those responsible for other interrelated modules. 17. Hardware/Software Interfaces: These errors arise from inadequate software handling of hardware devices. For example, a program can send data at a high rate until the input buffer of the connected device is full. Then the program has to pause until the device frees up its input buffer. The program may not recognize the signal from the device that it is no longer ready to receive more data. Loss of data will occur due to a lack of synchronization between the program and the device. Interface errors cannot be detected by performing unit testing on modules since unit testing causes computation to happen within a module, whereas interactions are required to happen between modules for interface errors to be detected. It is difficult to observe interface errors by performing system-level testing, because these errors tend to be buried in system internals. The major advantages of conducting system integration testing are as follows: • Defects are detected early. • It is easier to fix defects detected earlier. • We get earlier feedback on the health and acceptability of the individual modules and on the overall system. • Scheduling of defect fixes is flexible, and it can overlap with development. System integration testing is performed by the system integration group, also known as a build engineering group. The integration test engineers need to know the details of the software modules. This means that the team of engineers who built the modules needs to be involved in system integration. The integration testers should be familiar with the interface mechanisms. The system architects should be involved in the integration testing of complex software systems because of the fact that they have the bigger picture of the system. 7.3 GRANULARITY OF SYSTEM INTEGRATION TESTING 163 7.3 GRANULARITY OF SYSTEM INTEGRATION TESTING System integration testing is performed at different levels of granularity. Integration testing includes both white- and black-box testing approaches. Black-box testing ignores the internal mechanisms of a system and focuses solely on the outputs generated in response to selected inputs and execution conditions. The code is considered to be a big black box by the tester who cannot examine the internal details of the system. The tester knows the input to the black box and observes the expected outcome of the execution. White-box testing uses information about the structure of the system to test its correctness. It takes into account the internal mechanisms of the system and the modules. In the following, we explain the ideas of intrasystem testing, intersystem testing, and pairwise testing. 1. Intrasystem Testing: This form of testing constitutes low-level integration testing with the objective of combining the modules together to build a cohesive system. The process of combining modules can progress in an incremental manner akin to constructing and testing successive builds, explained in Section 7.4.1. For example, in a client–server-based system both the client and the server are distinct entities running at different locations. Before the interactions of clients with a server are tested, it is essential to individually construct the client and the server systems from their respective sets of modules in an incremental fashion. The low-level design document, which details the specification of the modules within the architecture, is the source of test cases. 2. Intersystem Testing: Intersystem testing is a high-level testing phase which requires interfacing independently tested systems. In this phase, all the systems are connected together, and testing is conducted from end to end. The term end to end is used in communication protocol systems, and end-to-end testing means initiating a test between two access terminals interconnected by a network. The purpose in this case is to ensure that the interaction between the systems work together, but not to conduct a comprehensive test. Only one feature is tested at a time and on a limited basis. Later, at the time of system testing, a comprehensive test is conducted based on the requirements, and this includes functional, interoperability, stress, performance, and so on. Integrating a client–server system, after integrating the client module and the server module separately, is an example of intersystem testing. Integrating a call control system and a billing system in a telephone network is another example of intersystem testing. The test cases are derived from the high-level design document, which details the overall system architecture. 3. Pairwise Testing: There can be many intermediate levels of system integration testing between the above two extreme levels, namely intrasystem testing and intersystem testing. Pairwise testing is a kind of intermediate level of integration testing. In pairwise integration, only two interconnected systems in an overall system are tested at a time. The purpose of pairwise testing is to ensure that two systems under consideration can function together, assuming that the other systems 164 CHAPTER 7 SYSTEM INTEGRATION TESTING within the overall environment behave as expected. The whole network infrastructure needs to be in place to support the test of interactions of the two systems, but the rest of the systems are not subject to tests. The network test infrastructure must be simple and stable during pairwise testing. While pairwise testing may sound simple, several issues can complicate the testing process. The biggest issue is unintended side effects. For example, in testing communication between a network element (radio node) and the element management systems, if another device (radio node controller) within the 1xEV-DO wireless data network, discussed in Chapter 8, fails during the test, it may trigger a high volume of traps to the element management systems. Untangling this high volume of traps may be difficult. 7.4 SYSTEM INTEGRATION TECHNIQUES One of the objectives of integration testing is to combine the software modules into a working system so that system-level tests can be performed on the complete system. Integration testing need not wait until all the modules of a system are coded and unit tested. Instead, it can begin as soon as the relevant modules are available. A module is said to be available for combining with other modules when the module’s check-in request form, to be discussed in this section, is ready. Some common approaches to performing system integration are as follows: • Incremental • Top down • Bottom up • Sandwich • Big bang In the remainder of this section, we explain the above approaches. 7.4.1 Incremental In this approach, integration testing is conducted in an incremental manner as a series of test cycles as suggested by Deutsch [3]. In each test cycle, a few more modules are integrated with an existing and tested build to generate a larger build. The idea is to complete one cycle of testing, let the developers fix all the errors found, and continue the next cycle of testing. The complete system is built incrementally, cycle by cycle, until the whole system is operational and ready for system-level testing. The system is built as a succession of layers, beginning with some core modules. In each cycle, a new layer is added to the core and tested to form a new core. The new core is intended to be self-contained and stable. Here, “self-contained” means containing all the necessary code to support a set of functions, and “stable” means that the subsystem (i.e., the new, partial system) can stay up for 24 hours without any anomalies. The number of system integration test cycles and the total integration time are determined by the following parameters: 7.4 SYSTEM INTEGRATION TECHNIQUES 165 • Number of modules in the system • Relative complexity of the modules (cyclomatic complexity) • Relative complexity of the interfaces between the modules • Number of modules needed to be clustered together in each test cycle • Whether the modules to be integrated have been adequately tested before • Turnaround time for each test–debug–fix cycle Constructing a build is a process by which individual modules are integrated to form an interim software image. A software image is a compiled software binary. A build is an interim software image for internal testing within the organization. Eventually, the final build will be a candidate for system testing, and such a tested system is released to the customers. Constructing a software image involves the following activities: • Gathering the latest unit tested, authorized versions of modules • Compiling the source code of those modules • Checking in the compiled code to the repository • Linking the compiled modules into subassemblies • Verifying that the subassemblies are correct • Exercising version control A simple build involves only a small number of modules being integrated with a previously tested build on a reliable and well-understood platform. No special tool or procedure needs to be developed and documented for a simple build. On the other hand, organized, well-documented procedures are applied for complex builds. A build process becomes complicated if a large number of modules are integrated together, and a significant number of those modules are new with complex interfaces. These interfaces can be between software modules and hardware devices, across platforms, and across networks. For complex builds, a version control tool is highly recommended for automating the build process and for fast turnaround of a test–debug–fix cycle. Creating a daily build [4] is very popular in many organizations because it facilitates to a faster delivery of the system. It puts emphasis on small incremental testing, steadily increasing the number of test cases, and regression testing from build to build. The integrated system is tested using automated, reusable test cases. An effort is made to fix the defects that were found during the testing cycle. A new version of the system is constructed from the existing, revised, and newly developed modules and is made available for retesting. Prior versions of the build are retained for reference and rollback. If a defect is not found in a module of a build in which the module was introduced, the module will be carried forward from build to build until one is found. Having access to the version where the defective module was originally introduced is useful in debugging and fixing, limiting the side effects of the fixes, and performing a root cause analysis. During system development, integration, and testing, a typical practice is to retain the past 7–10 builds. 166 CHAPTER 7 SYSTEM INTEGRATION TESTING The software developer fills out a check-in request form before a new software module or a module with an error fix is integrated into a build. The form is reviewed by the build engineering group for giving approval. Once it is approved, the module can be considered for integration. The main portions of a check-in form are given in Table 7.1. The idea behind having a check-in request mechanism is fourfold: 1. All the files requiring an update must be identified and known to other team members. 2. The new code must have been reviewed prior to its integration. 3. The new code must have been unit tested. 4. The scope of the check-in is identified. A release note containing the following information accompanies a build: • What has changed since the last build? • What outstanding defects have been fixed? • What are the outstanding defects in the build? • What new modules or features have been added? TABLE 7.1 Check-in Request Form Author Today’s date Check-in request date Category (identify all that apply) Short description of check-in Number of files to be checked in Code reviewer names Command line interface changes made Does this check-in involve changes to global header? Does this check-in involve changes in output logging? Unit test description Comments Name of the person requesting this check-in month, day, year month, day, year New Feature: (Y, N) Enhancement: (Y, N) Defect: (Y, N); if yes: defect numbers: Are any of these major defects: (Y, N) Are any of these moderate defects: (Y, N) Describe in a short paragraph the feature, the enhancement, or the defect fixes to be checked in. Give the number of files to be checked in. Include the file names, if possible. Provide the names of the code reviewers. (Y, N); if yes, were they: Documented? (Y, N) Reviewed? (Y, N, pending) (Y, N); if yes, include the header file names. (Y, N); if yes, were they documented? (Y, N) Description of the unit tests conducted Any other comments and issues 7.4 SYSTEM INTEGRATION TECHNIQUES 167 • What existing modules or features have been enhanced, modified, or deleted? • Are there any areas where unknown changes may have occurred? A test strategy is created for each new build based on the above information. The following issues are addressed while planning a test strategy: • What test cases need to be selected from the system integration test plan, as discussed in Section 7.6, in order to test the changes? Will these test cases give feature coverage of the new and modified features? If necessary, add new test cases to the system integration test plan. • What existing test cases can be reused without modification in order to test the modified system? What previously failed test cases should now be reexecuted in order to test the fixes in the new build? • How should the scope of a partial regression test be determined? A full regression test may not be run on each build because of frequent turnaround of builds. At the least, any earlier test cases which pertain to areas that have been modified must be reexecuted. • What are the estimated time, resource demand, and cost to test this build? Some builds may be skipped based on this estimate and the current activities, because the integration test engineers may choose to wait for a later build. 7.4.2 Top Down Systems with hierarchical structures easily lend themselves to top-down and bottom-up approaches to integration. In a hierarchical system, there is a first, top-level module which is decomposed into a few second-level modules. Some of the second-level modules may be further decomposed into third-level modules, and so on. Some or all the modules at any level may be terminal modules, where a terminal module is one that is no more decomposed. An internal module, also known as a nonterminal module, performs some computations, invokes its subordinate modules, and returns control and results to its caller. In top-down and bottom-up approaches, a design document giving the module hierarchy is used as a reference for integrating modules. An example of a module hierarchy is shown in Figure 7.1, where module A is the topmost module; module A has been decomposed into modules B, C, and D. Modules B, D, E, F, and G are terminal modules, as these have not been further decomposed. The top-down approach is explained in the following: Step 1: Let IM represent the set of modules that have already been integrated and the required stubs. Initially, IM contains the top-level module and stubs corresponding to all the subordinate modules of the top-level 168 CHAPTER 7 SYSTEM INTEGRATION TESTING A B C D E F G Figure 7.1 Module hierarchy with three levels and seven modules. Step 2: Step 3: Step 4: module. It is assumed that the top-level module has passed its entry criteria. Choose a stub member M in set IM. Let M be the actual module corresponding to stub M . We obtain a new set CM from IM by replacing stub M with M and including in CM all stubs corresponding to the subordinate modules of M. We consider CM to be a union of four sets: {M}, CMs, CMi, CMr, where CMs is the set of stubs, CMi is the set of modules having direct interfaces with M, and CMr is the rest of the modules in CM. Now, test the combined behavior of CM. Testing CM means applying input to the top-level module of the system. It may be noted that though the integration team has access to the top module of the system, all kinds of tests cannot be performed. This is apparent from the fact that CM does not represent the full system. In this step, the integration team tests a subset of the system functions implemented by the actual modules in CM. The integration team performs two kinds of tests: 1. Run test cases to discover any interface defects between M and members of CMi. 2. Perform regression tests to ensure that integration of the modules in the two sets CMi and CMr is satisfactory in the presence of module M. One may note that in previous iterations the interfaces between modules in CMi and CMr were tested and the defects fixed. However, the said tests were executed with M —a stub of M—and not M. The presence of M in the integrated system up to this moment allows us to test the interfaces between the modules in the combined set of CMi and CMr, because of the possibility of the system supporting more functionalities with M. The above two kinds of tests are continued until the integration team is satisfied that there is no known interface error. In case an interface error is discovered, the error must be fixed before moving on to the next step. If the set CMs is empty, then stop; otherwise, set IM = CM and go to step 2. 7.4 SYSTEM INTEGRATION TECHNIQUES 169 Now, let us consider an example of top-down integration using Figure 7.1. The integration of modules A and B by using stubs C and D (represented by grey boxes) is shown in Figure 7.2. Interactions between modules A and B is severely constrained by the dummy nature of C and D . The interactions between A and B are concrete, and, as a consequence, more tests are performed after additional modules are integrated. Next, as shown in Figure 7.3, stub D has been replaced with its actual instance D. We perform two kinds of tests: first, test the interface between A and D; second, perform regression tests to look for interface defects between A and B in the presence of module D. Stub C has been replaced with the actual module C, and new stubs E , F , and G have been added to the integrated system (Figure 7.4). We perform tests as follows: First, test the interface between A and C; second, test the combined modules A, B, and D in the presence of C (Figure 7.4). The rest of the integration process is depicted in Figures 7.5 and 7.6 to obtain the final system of Figure 7.7. The advantages of the top-down approach are as follows: • System integration test (SIT) engineers continually observe system-level functions as the integration process continues. How soon such functions are observed depends upon their choice of the order in which modules are integrated. Early observation of system functions is important because it gives them better confidence. • Isolation of interface errors becomes easier because of the incremental nature of top-down integration. However, it cannot be concluded that an interface error is due to a newly integrated module M. The interface error A Figure 7.2 Top-down integration of modules B C D A and B. A Figure 7.3 Top-down integration of modules B C D A, B, and D. A B C D E F G Figure 7.4 Top-down integration of modules A, B, D, and C. 170 CHAPTER 7 SYSTEM INTEGRATION TESTING A B C D E F G Figure 7.5 Top-down integration of modules A, B, C, D, and E. A B C D E F G Figure 7.6 Top-down integration of modules A, B, C, D, E, and F. A B C D E F G Figure 7.7 Top-down integration of modules A, B, C, D, E, F and G. may be due to faulty implementation of a module that was already integrated much earlier. This is possible because earlier tests were conducted with or without a stub for M, and the full capability of M simply allowed the test engineers to conduct more tests that were possible due to M. • Test cases designed to test the integration of a module M are reused during the regression tests performed after integrating other modules. • Since test inputs are applied to the top-level module, it is natural that those test cases correspond to system functions, and it is easier to design those test cases than test cases designed to check internal system functions. Those test cases can be reused while performing the more rigorous, system-level tests. The limitations of the top-down approach are as follows: • Until a certain set of modules has been integrated, it may not be possible to observe meaningful system functions because of an absence of lower level modules and the presence of stubs. Careful analysis is required to identify an ordering of modules for integration so that system functions are observed as early as possible. 7.4 SYSTEM INTEGRATION TECHNIQUES 171 • Test case selection and stub design become increasingly difficult when stubs lie far away from the top-level module. This is because stubs support limited behavior, and any test run at the top level must be constrained to exercise the limited behavior of lower level stubs. 7.4.3 Bottom Up In the bottom-up approach, system integration begins with the integration of lowest level modules. A module is said to be at the lowest level if it does not invoke another module. It is assumed that all the modules have been individually tested before. To integrate a set of lower level modules in this approach, we need to construct a test driver module that invokes the modules to be integrated. Once the integration of a desired group of lower level modules is found to be satisfactory, the driver is replaced with the actual module and one more test driver is used to integrate more modules with the set of modules already integrated. The process of bottom-up integration continues until all the modules have been integrated. Now we give an example of bottom-up integration for the module hierarchy of Figure 7.1. The lowest level modules are E, F, and G. We design a test driver to integrate these three modules, as shown in Figure 7.8. It may be noted that modules E, F, and G have no direct interfaces among them. However, return values generated by one module is likely to be used in another module, thus having an indirect interface. The test driver in Figure 7.8 invokes modules E, F, and G in a way similar to their invocations by module C. The test driver mimics module C to integrate E, F, and G in a limited way, because it is much simpler in capability than module C. The test driver is replaced with the actual module—in this case C—and a new test driver is used after the testers are satisfied with the combined behavior of E, F, and G (Figure 7.9). At this moment, more modules, such as B and D, are integrated with the so-far integrated system. The test driver mimics the behavior of module A. We need to include modules B and D because those are invoked by A and the test driver mimics A (Figure 7.9). The test driver is replaced with module A (Figure 7.10), and further tests are performed after the testers are satisfied with the integrated system shown in Figure 7.9. The advantages of the bottom-up approach are as follows. If the low-level modules and their combined functions are often invoked by other modules, then it is more useful to test them first so that meaningful effective integration of other modules can be done. In the absence of such a strategy, the testers write stubs to emulate the commonly invoked low-level modules, which will provide only a limited test capability of the interfaces. Test driver Figure 7.8 Bottom-up integration of E F G modules E, F, and G. 172 CHAPTER 7 SYSTEM INTEGRATION TESTING Test driver B C D E F G Figure 7.9 Bottom-up integration of modules B, C, and D with E, F, and G. A B C D E F G Figure 7.10 Bottom-up integration of module A with all others. The disadvantages of the bottom-up approach are as follows: • Test engineers cannot observe system-level functions from a partly integrated system. In fact, they cannot observe system-level functions until the top-level test driver is in place. • Generally, major design decisions are embodied in top-level modules, whereas most of the low-level modules largely perform commonly known input–output functions. Discovery of major flaws in system design may not be possible until the top-level modules have been integrated. Now we compare the top-down and bottom-up approaches in the following: • Validation of Major Design Decisions: The top-level modules contain major design decisions. Faults in design decisions are detected early if integration is done in a top-down manner. In the bottom-up approach, those faults are detected toward the end of the integration process. • Observation of System-Level Functions: One applies test inputs to the top-level module, which is akin to performing system-level tests in a very limited way in the top-down approach. This gives an opportunity to the SIT personnel and the development team to observe system-level functions early in the integration process. However, similar observations can be done in the bottom-up approach only at the end of system integration. • Difficulty in Designing Test Cases: In the top-down approach, as more and more modules are integrated and stubs lie farther away from the top-level module, it becomes increasingly difficult to design stub behavior and test input. This is because stubs return predetermined values, and a 7.4 SYSTEM INTEGRATION TECHNIQUES 173 test engineer must compute those values for a given test input at the top level. However, in the bottom-up approach, one designs the behavior of a test driver by simplifying the behavior of the actual module. • Reusability of Test Cases: In the top-down approach, test cases designed to test the interface of a newly integrated module is reused in performing regression tests in the following iteration. Those test cases are reused as system-level test cases. However, in the bottom-up approach, all the test cases incorporated into test drivers, except for the top-level test driver, cannot be reused. The top-down approach saves resources in the form of time and money. 7.4.4 Sandwich and Big Bang In the sandwich approach, a system is integrated by using a mix of the top-down and bottom-up approaches. A hierarchical system is viewed as consisting of three layers. The bottom layer contains all the modules that are often invoked. The bottom-up approach is applied to integrate the modules in the bottom layer. The top layer contains modules implementing major design decisions. These modules are integrated by using the top-down approach. The rest of the modules are put in the middle layer. We have the advantages of the top-down approach where writing stubs for the low-level module is not required. As a special case, the middle layer may not exist, in which case a module falls either in the top layer or in the bottom layer. On the other hand, if the middle layer exists, then this layer can be integrated by using the big-bang approach after the top and the bottom layers have been integrated. In the big-bang approach, first all the modules are individually tested. Next, all those modules are put together to construct the entire system which is tested as a whole. Sometimes developers use the big-bang approach to integrate small systems. However, for large systems, this approach is not recommended for the following reasons: • In a system with a large number of modules, there may be many interface defects. It is difficult to determine whether or not the cause of a failure is due to interface errors in a large and complex system. • In large systems, the presence of a large number of interface errors is not an unlikely scenario in software development. Thus, it is not cost effective to be optimistic by putting the modules together and hoping it will work. Solheim and Rowland [5] measured the relative efficacy of top-down, bottom-up, sandwich, and big-bang integration strategies for software systems. The empirical study indicated that top-down integration strategies are most effective in terms of defect correction. Top-down and big-bang strategies produced the most reliable systems. Bottom-up strategies are generally least effective at correcting defects and produce the least reliable systems. Systems integrated by the sandwich strategy are moderately reliable in comparison. 174 CHAPTER 7 SYSTEM INTEGRATION TESTING 7.5 SOFTWARE AND HARDWARE INTEGRATION A component is a fundamental part of a system, and it is largely independent of other components. Many products require development of both hardware and software components. These two kinds of components are integrated to form the complete product. In addition, a third kind of component, a product documentation, is developed in parallel with the first two components. A product documentation is an integration of different kinds of individual documentations. The overall goal is to reduce the time to market of the product by removing the sequential nature of product development processes. On the hardware side, the individual hardware modules, or components, are diverse in nature, such as a chassis, a printed circuit board, a power supply, a fan tray for cooling, and a cabinet to hold the product. On the documentation side, the modules that are integrated together include an installation manual, a troubleshooting guide, and a user’s manual in more than one natural language. It is essential to test both the software and the hardware components individually as much as possible before integrating them. In many products, neither component can be completely tested without the other. Usually, the entry criteria for both the hardware and software components are establishedand satisfied before beginning to integrate those components. If the target hardware is not available at the time of system integration, then a hardware emulator is developed. The emulator replaces the hardware platform on which the software is tested until the real hardware is available. However, there is no guarantee that the software will work on the real hardware even if it worked on the emulator. Integration of hardware and software components is often done in an iterative manner. A software image with a minimal number of core software modules is loaded on the prototype hardware. In each step, a small number of tests are performed to ensure that all the desired software modules are present in the build. Next, additional tests are run to verify the essential functionalities. The process of assembling the build, loading on the target hardware, and testing the build continues until the entire product has been integrated. If a problem is discovered early in the hardware/software integration and the problem can be resolved easily, then the problem is fixed without any delay. Otherwise, integration of software and hardware components may continue in a limited way until the root cause of the problem is found and analyzed. The integration is delayed until the fixes, based on the outcome of the root cause analysis, are applied. 7.5.1 Hardware Design Verification Tests A hardware engineering process is viewed as consisting of four phases: (i) planning and specification, (ii) design, prototype implementation, and testing, (iii) integration with the software system, and (iv) manufacturing, distribution, and field service. Testing of a hardware module in the second phase of hardware development without software can be conducted to a limited degree. A hardware design verification test (DVT) plan is prepared and executed by the hardware group before integration with the software system. The main hardware tests are discussed below. 7.5 SOFTWARE AND HARDWARE INTEGRATION 175 Diagnostic Test Diagnostic tests are the most fundamental hardware tests. Such tests are often imbedded in the basic input–output system (BIOS) component and are executed automatically whenever the system powers up. The BIOS component generally resides on the system’s read-only memory (ROM). This test is performed as a kind of sanity test of the hardware module. A good diagnostic test covers all the modules in the system. A diagnostic test is the first test performed to isolate a faulty hardware module. Electrostatic Discharge Test The concept of electrostatic discharge (ESD) testing is very old and it ensures that the system operation is not susceptible to ESD after having taken commonly accepted precautions. There are three common industry standards on ESD testing based on three different models: the human body model (HBM), the machine model (MM), and the charged device model (CDM). The HBM is the oldest one, and it is the most widely recognized ESD model. It was developed at a time when most ESD damages occurred as people touched hardware components without proper grounding. The capacitance and impedance of the human body vary widely, so the component values in the model were arbitrarily set to facilitate comparative testing. Devices damaged by the HBM generally have thermally damaged junctions, melted metal lines, or other types of damages caused by a high peak current and a high charge dissipated over several hundred nanoseconds. This model still applies whenever people handle devices so one should perform HBM testing on all new devices. The MM is used primarily in Japan and Europe. This model was developed originally as a “worst-case” HBM to duplicate the type of failures caused by automated pick-and-place machines used to assemble printed circuit boards (PCBs). The model simulates a low-impedance machine that discharges a moderately high capacitance (e.g., 200 pF) through a device. A discharge produced using the MM can cause damage at relatively low voltages. Finally, the CDM reproduces realistic ESD damages that occur in small, plastic-packaged devices. As a packaged device slides on a surface, it accumulates charge due to triboelectric (friction) action between the plastic body and the surface. Thus, the device picks up a charge that produces a potential. In the HBM and the MM, something external to the device accumulates the charge. For small devices, the potential can be surprisingly high. Potentials of at least 650 V are needed to duplicate the observed damage. Electromagnetic Emission Test Tests are conducted to ensure that the system does not emit excessive radiation to impact operation of adjacent equipments. Similarly, tests are conducted to ensure that the system does not receive excessive radiation to impact its own operation. The emissions of concern are as follows: • Electric field radiated emissions • Magnetic field radiated emissions • Alternating-current (AC) power lead conducted emission (voltage) • AC and direct-current (DC) power and signal lead conducted emission (current) • Analog voice band lead conducted emission 176 CHAPTER 7 SYSTEM INTEGRATION TESTING Electrical Test A variety of electrical tests are performed on products with a hardware component. One such test is called “signal quality” testing in which different parts of the system are checked for any inappropriate voltages or potential current flows at the externally accessible ports and peripherals. Another type of electrical test is observing how the system behaves in response to various types of power conditions such as AC, DC, and batteries. In addition, tests are conducted to check the safety limits of the equipment when it is exposed to abnormal conditions. An abnormal condition can result from lightning surges or AC power faults. Thermal Test Thermal tests are conducted to observe whether or not the system can stand the temperature and humidity conditions it will experience in both operating and nonoperating modes. The system is placed in a thermal chamber and run through a series of temperature and humidity cycles. The heat producing components, such as CPU and Ethernet cards, are instrumented with thermal sensors to verify whether the components exceed their maximum operating temperatures. A special kind of thermal test is thermal shock, where the temperature changes rapidly. Environmental Test Environmental tests are designed to verify the response of the system to various types of strenuous conditions encountered in the real world. One such test involves shock and vibration from adjacent constructions and highways. Nearby heavy machinery, heavy construction, heavy industrial equipment, truck/train traffic, or standby generators can result in low-frequency vibration which can induce intermittent problems. It is important to know if such low-frequency vibration will cause long-term problems such as connectors becoming loose or short-term problems such as a disk drive crashing. Environmental tests also include other surprises the system is likely to encounter. For example, in a battlefield environment, the computers and the base transceiver stations are often subjected to smoke, sand, bullets, fire, and other extreme conditions. Equipment Handling and Packaging Test These tests are intended to determine the robustness of the system to normal shipping and installation activities. Good pack and packaging design ensures that the shipping container will provide damage-free shipment of the system. Early involvement of these design skills will provide input on the positioning of handles and other system protrusions that can be the sources of failure. Selection of reasonable metal thickness and fasteners will provide adequate performance of systems to be installed into their final location. Acoustic Test Acoustic noise limits are specified to ensure that personnel can work near the system without exceeding the safety limits prescribed by the local Occupational Safety and Health Administration (OSHA) agency or other negotiated levels. For example, noise levels of spinning hard disks, floppies, and other drives must be tested against their limits. 7.5 SOFTWARE AND HARDWARE INTEGRATION 177 Safety Test Safety tests are conducted to ensure that people using or working on or near the system are not exposed to hazards. For example, many portable computers contain rechargeable batteries which frequently include dangerous toxic substances such as cadmium and nickel. Adequate care must be taken to ensure that these devices do not leak the dangerous chemicals under any circumstances. Reliability Test Hardware modules tends to fail over time. It is assumed that (i) modules have constant failure rates during their useful operating life periods and (ii) module failure rates follow an exponential law of distribution. Failure rate is often measured in terms of the mean time between failures (MTBF), and it is expressed as MTBF = total time/number of failures. The probability that a module will work for some time T without failure is given by R(T ) = exp( − T /MTBF). The MTBF metric is a reliability measurement metric for hardware modules. It is usually given in units of hours, days, or months. The MTBF for a module or a system is derived from various sources: laboratory tests, actual field failure data, and prediction models. Another way of calculating the reliability and lifetime of a system is to conduct highly accelerated life tests (HALTs). The HALTs rely on the principle of logarithmic time compression to simulate a system’s entire life in just a few days. This is done by applying a much higher level of stress than what exists in actual system use. A high level of stress forces failures to occur in significantly less time than under normal operating conditions. The HALTs generally include rapid temperature cycling, vibrating on all axes, operating voltage variations, and changing clock frequency until the system fails. The HALTs require only a few units of the product and a short testing period to identify the fundamental limits of the technologies in use. Generally, every weak point must be identified and fixed (i.e., redesigned) if it does not meet the system’s specified limits. Understanding the concept of product reliability is important for any organization if the organization intends to offer warranty on the system for an extended period of time. One can predict with high accuracy the exact cost associated with the returns over a limited and an extended period of warranty. For example, for a system with an MTBF of 250,000 hours and an operating time of interest of five years (438,000 hours), we have R(438,000) = exp( − 43,800/250,000) = 0.8392, which says that there is a probability of 0.8932 that the product will operate for five years without a failure. Another interpretation of the quantity is that 83.9% of the units in the field will still be working at the end of five years. In other words, 16.1% of the units need to be replaced within the first five years. 7.5.2 Hardware and Software Compatibility Matrix The hardware and software compatibility information is maintained in the form of a compatibility matrix. Such a matrix documents the compatibility between different revisions of the hardware and different versions of the software and is used for official release of the product. An engineering change order (ECO) is a formal document that describes a change to the hardware or software. An ECO document includes the hardware/software compatibility matrix and is distributed to 178 CHAPTER 7 SYSTEM INTEGRATION TESTING TABLE 7.2 Example Software/Hardware Compatibility Matrix Software Release RNC Hardware Version RN Hardware Version EMS Hardware Version PDSN Hardware Version 2.0 2.5 3.0 3.0 Not yet decided hv1.0 hv2.0 and hv1.0 hv3.0 hv4.0 and hv3.0 hv4.0 and hv3.0 hv2.0 hv2.0 hv3.0 and hv2.0 hv3.0 hv3.0 hv3.2 hv4.0 and hv3.2 hv5.0 hv5.0 hv6.0 hv2.0.3 hv3.0 and hv2.0.3 hv4.0 and hv3.0 hv4.5 hv5.0 Tested by SIT Yes Yes Yes Not yet Not yet Tested by ST Yes Yes Not yet Not yet Not yet the operation, customer support, and sales teams of the organization. An example compatibility matrix for a 1xEV-DO wireless data network, discussed in Chapter 8, is given in Table 7.2. In the following, we provide the hardware and software ECO approval process in an organization. The first scenario describes the ECO process to incorporate a new hardware in the product. The second scenario describes the ECO process for a software revision. Scenario 1 A hardware ECO process is shown in Figure 7.11. Assume that the hardware group needs to release a new revision of a hardware or has to recommend a new revision of an original equipment manufacturer (OEM) hardware to the operation/manufacturing group. The steps of the ECO process to incorporate the new hardware in the product are as follows: 1. The hardware group issues a design change notification for the new hardware revision. This notification includes identification of specific hardware changes likely to impact the software and incompatibilities between the revised hardware and other hardware. The software group reviews the notification with the hardware group to assess the impact of the hardware changes and to identify software changes that affect the hardware structure. 2. The hardware group creates an ECO and reviews it with the change control board (CCB) to ensure that all impacts of the ECO are understood and agreed upon and that the version numbering rules for software components are followed. The CCB constitutes a group of individuals from multiple departments responsible for reviewing and approving each ECO. 3. The ECO is released, and the hardware group updates the hardware/software compatibility matrix based on the information received from the review process. 7.5 SOFTWARE AND HARDWARE INTEGRATION 179 Hardware 1 Change notification 1 HW/SW review 2 ECO 2 System test 4 CCB review 3 Query Compatibility matrix Figure 7.11 Hardware ECO process. Sales, operation, customer, support 4. The system testing group updates the compatibility matrix after it has tested a given combination of released hardware and software versions. Scenario 2 A software release ECO process is shown in Figure 7.12. Assume that the group needs to release a new version of software to the operation/manufacturing group. The steps of the ECO process to incorporate the new version of the software product are as follows: 1. The system integration group releases a build with a release note identifying the hardware compatibility information to the system test group. 2. The system test group tests the build and notifies the software group and other relevant groups in the organization of the results of the tests. 3. The system test group deems a particular build to be viable for customer release. The system test group calls a cross-functional readiness review meeting to ensure, by verifying the test results, that all divisions of the organization are prepared for an official release. 4. The software group writes an ECO to officially release the software build and reviews it with the CCB after the readiness review is completed. 180 CHAPTER 7 SYSTEM INTEGRATION TESTING 2 Software 1 Build + release notes 3 Readiness review 4 Sales– operation, customer support ECO 4 CCB review Query 5 Compatibility matrix Figure 7.12 Software ECO process. 1 3 System test 6 The build is considered to be released after the ECO is approved and documented. 5. The software group updates the hardware/software compatibility matrix with information about the new release. 6. The system test group updates the compatibility matrix after it has tested a given combination of released hardware and software versions. 7.6 TEST PLAN FOR SYSTEM INTEGRATION System integration requires a controlled execution environment, much communication between the developers and the test engineers, judicious decision making along the way, and much time, on the order of months, in addition to the fundamental tasks of test design and test execution. Integrating a large system is a challenging task, which is handled with much planning in the form of developing a SIT plan. A useful framework for preparing an SIT plan is outlined in Table 7.3. 7.6 TEST PLAN FOR SYSTEM INTEGRATION 181 TABLE 7.3 Framework for SIT Plan 1. Scope of testing 2. Structure of integration levels a. Integration test phases b. Modules or subsystems to be integrated in each phase c. Building process and schedule in each phase d. Environment to be set up and resources required in each phase 3. Criteria for each integration test phase n a. Entry criteria b. Exit criteria c. Integration Techniques to be used d. Test configuration set-up 4. Test specification for each integration test phase a. Test case ID number b. Input data c. Initial condition d. Expected results e. Test procedure How to execute this test? How to capture and interpret the results? 5. Actual test results for each integration test phase 6. References 7. Appendix In the scope of testing section, one summarizes the system architecture. Specifically, the focus is on the functional, internal, and performance characteristics to be tested. System integration methods and assumptions are included in this section. The next section, structure of integration levels, contains four subsections. The first subsection explains the division of integration testing into different phases, such as functional, end-to-end, and endurance phases. The second subsection describes the modules to be integrated in each of the integration phases. The third subsection describes the build process to be followed: daily build, weekly build, biweekly build, or a combination thereof. A schedule for system integration is given in the third subsection. Specifically, one identifies the start and end dates for each phase of testing. Moreover, the availability windows for unit-tested modules are defined. In the fourth subsection, the test environment and the resources required are described for each integration phase. The hardware configuration, emulators, software simulators, special test tools, debuggers, overhead software (i.e., stubs and drivers), and testing techniques are discussed in the fourth subsection. 182 CHAPTER 7 SYSTEM INTEGRATION TESTING An important decision to be made for integration testing is establishing the start and stop dates of each phase of integration testing. The start date and stop date for a phase are specified in terms of entry criteria and exit criteria, respectively. These criteria are described in the third section of the plan. A framework for defining entry criteria to start system integration is given in Table 7.4. Similarly, the exit criteria for system integration are given in Table 7.5. Test configuration and integration techniques (e.g., top down or bottom up) to be used in each of these phases are described in this section. The test specification section describes the test procedure to be followed in each integration phase. The detailed test cases, including the input and expected outcome for each case, are documented in the test specification section. The history of actual test results, problems, or peculiarities is recorded in the fifth section of the SIT plan. Finally, references and an appendix, if any, are included in the test plan. System integration testing is performed in phases of increasing complexity for better efficiency and effectiveness. In the first phase interface integrity and functional validity within a system are tested. In the second phase end-to-end and pairwise tests are conducted. Finally, in the third phase, stress and endurance tests are performed. Each of the system integration phases identified in the SIT plan delineates a broad functionality category within the software structure, and it can be related to a specific domain of the system software structure. The categories TABLE 7.4 Framework for Entry Criteria to Start System Integration Softwave functional and design specifications must be writen, reviewed, and approved. Code is reviewed and approved. Unit test plan for each module is written, reviewed, and executed. All of the unit tests passed. The entire check-in request form must be completed, submitted, and approved. Hardware design specification is written, reviewed, and approved. Hardware design verification test is written, reviewed, and executed. All of the design verification tests passed. Hardware/software integration test plan is written, reviewed, and executed. All of the hardware/software integrated tests passed. TABLE 7.5 Framework for System Integration Exit Criteria All code is completed and frozen and no more modules are to be integrated. All of the system integration tests passed. No major defect is outstanding. All the moderate defects found in the SIT phase are fixed and retested. Not more than 25 minor defects are outstanding. Two weeks system uptime in system integration test environment without any anomalies, i.e., crashes. System integration tests results are documented. 7.6 TEST PLAN FOR SYSTEM INTEGRATION 183 of system integration tests and the corresponding test cases discussed below are applicable for the different test phases. Interface Integrity Internal and external interfaces are tested as each module is integrated into the structure. When two modules are integrated, the communication between them is called internal, whereas when they communicate with the outside environment, it is called external. Tests that are required to be designed to reveal interface errors are discussed in Section 7.2. An important part of interface testing is to map an incoming message format to the expected message format of the receiving module and ensure that the two match. Tests are designed for each message passing through the interface between two modules. Essentially, tests must ensure that: • The number of parameters sent in a message agree with the number of parameters expected to be received. • The parameter order in the messages match the order expected. • The field sizes and the data types match. • The boundaries of each data field in a message match the expected bound- aries. • When a message is generated from stored data prior to being sent, the message truly reflects the stored data. • When a received message is stored, data copying is consistent with the received message. Functional Validity Tests are designed to uncover functional errors in each module after it is integrated with the system. Errors associated with local or global data structures are uncovered with such tests. Selected unit tests that were designed for each module are reexecuted during system integration by replacing the stubs with their actual modules. End-to-End Validity Tests are performed to ensure that a completely integrated system works together from end to end. Interim checkpoints on an end-to-end flow provide a better understanding of internal mechanisms. This helps in locating the sources of failures. Pairwise Validity Tests are performed to ensure that any two systems work properly when connected together by a network. The difference between pairwise tests and end-to-end tests lies in the emphasis and type of test cases. For example, a toll-free call to an 800 number is an end-to-end test of a telephone system, whereas a connectivity test between a handset and local private branch exchange (PBX) is a pairwise test. Interface Stress Stress is applied at the module level during the integration of the modules to ensure that the interfaces can sustain the load. On the other hand, full-scale system stress testing is performed at the time of system-level testing. The following areas are of special interest during interface stress testing: 184 CHAPTER 7 SYSTEM INTEGRATION TESTING • Error Handling: Trigger errors that should be handled by the modules. • Event Handling: Trigger events (e.g., messages, timeouts, callbacks) that should be handled by the modules. • Configuration: Repeatedly add, modify, and delete managed objects. • Logging: Turn on the logging mechanism during stress tests to ensure proper operation for boundary conditions. • Module Interactions: Run tests repeatedly that stress the interactions of a module with other modules. • CPU Cycles: Induce high CPU utilization by using a CPU overload tool; pay attention to any resulting queue overflows and other producer/consumer overrun conditions. • Memory/Disk Usage: Artificially reduce the levels of heaps and/or memory buffers and disk space; • Starvation: Ensure that the processes or tasks are not starved; otherwise eventually the input queues overflow for the starved processes. • Resource Sharing: Ensure that resources, such as heap and CPU, are shared among processes without any contention and bottlenecks. • Congestion: Run tests with a mechanism that randomly discards packets; test modules with congested links. • Capacity: Run tests to ensure that modules can handle the maximum numbers of supporting requirements such as connections and routes. System Endurance A completely integrated system is expected to stay up continuously for weeks without any crashes. In the case of communication protocol systems, formal rules govern that two systems communicate with each other via an interface, that is, a communication channel. The idea here is to verify that the format and the message communication across the interface of the modules works for an extended period. 7.7 OFF-THE-SHELF COMPONENT INTEGRATION Instead of developing a software component from scratch, organizations occasionally purchase off-the-shelf (OTS) components form third-party vendors and integrate them with their own components [6]. In this process, organizations create less expensive software systems. A major issue that can arise while integrating different components is mismatches among code pieces developed by different parties usually unaware of each other [7, 8]. Vigder and Dean [9] have presented elements of an architecture for integration and have defined rules that facilitate integration of components. They have identified a useful set of supporting components for integrating the actual, serving components. The supporting components are wrappers, glue, and tailoring. 7.7 OFF-THE-SHELF COMPONENT INTEGRATION 185 A wrapper is a piece of code that one builds to isolate the underlying components from other components of the system. Here isolate means putting restrictions around the underlying component to constrain its capabilities. A glue component provides the functionality to combine different components. Component tailoring refers to the ability to enhance the functionality of a component. Tailoring is done by adding some elements to a component to enrich it with a functionality not provided by the vendor. Tailoring does not involve modifying the source code of the component. An example of tailoring is “scripting,” where an application can be enhanced by executing a script upon the occurrence of some event. Rine et al. [10] have proposed the concept of adapters to integrate components. An adapter is associated with each component; the adapter runs an interaction protocol to manage communications among the components. Components request services from others through their associated adapters. An adapter is responsible for resolving any syntactic interface mismatch. 7.7.1 Off-the-Shelf Component Testing Buyer organizations perform two types of testing on an OTS component before purchasing: (i) acceptance testing of the OTS component based on the criteria discussed in Chapter 14 and (ii) integration of the component with other components developed in-house or purchased from a third party. The most common cause of problems in the integration phase is inadequate acceptance testing of the OTS component. A lack of clear documentation of the system interface and less cooperation from the vendor may create an ordeal in integration testing, debugging, and fixing defects. Acceptance testing of an OTS component requires the development and execution of an acceptance test plan based on the acceptance criteria for the candidate component. All the issues are resolved before the system integration process begins. During integration testing, additional software components, such as a glue or a wrapper, can be developed to bind an OTS component with other components for proper functioning. These new software components are also tested during the integration phase. Integration of OTS components is a challenging task because of the following characteristics identified by Basili and Boehm [11]: The buyer has no access to the source code. The vendor controls its development. The vendor has nontrivial installed base. Voas [12] proposed three types of testing techniques to determine the suitability of an OTS component: • Black-Box Component Testing: This is used to determine the quality of the component. • System-Level Fault Injection Testing: This is used to determine how well a system will tolerate a failing component. System-level fault injection does not demonstrate the reliability of the system; instead, it can predict the behavior of the system if the OTS component fails. 186 CHAPTER 7 SYSTEM INTEGRATION TESTING • Operational System Testing: This kind of test is used to determine the tolerance of a software system when the OTS component is functioning correctly. Operational system testing is conducted to ensure that an OTS component is a good match for the system. The OTS components produced by the vendor organizations are known as commercial off-the-shelf (COTS) components. A COTS component is defined by Szyperski et al. [13] (p. 34) as “a unit of composition with contractually specified interfaces and explicit context dependencies only. A software component can be deployed independently and is subject to composition by third parties.” Interfaces are the access points of COTS components through which a client component can request a service declared in an interface of the service providing component. Weyuker [14] recommends that vendors building COTS components must try to envision many possible uses of the component and develop a comprehensive range of test scenarios. A component should be defect free because prudent buyers will perform significant amount of testing while integrating the component with their system. If potential buyers encounter a large number of defects in a COTS component, they may not accept the component. Therefore, it is in the best interest of the component builder to ensure that components are thoroughly tested. Several related artefacts should be archived and/or modified for each COTS component, because a potential buyer may demand these artifacts to be of high quality. The artefacts to be archived include the following: • Individual requirements, including the pointers between the software functional specification and the corresponding implementations: This information makes it easy to track when either the code or the specification is modified, which implies that the specification remains up to date. • The test suite, including the requirement traceability matrix: This will show which part of the functionality is tested inadequately or not at all. In addition, it identifies the test cases that (i) must be executed as regression tests or (ii) need to be updated when the component undergoes a change. • The individual pass–fail result of each test case in the test suite: This indicates the quality of the COTS component. • The details of individual test cases, including input and expected output: This facilitates regression testing of changes to a component. • The system test report corresponding to the final release of the COTS component, which includes performance characteristics, scalability limitation, stability observations, and interoperability of the COTS component: This document is a useful resource for potential buyers before they begin acceptance testing of the COTS components. 7.7.2 Built-in Testing A component reused in a new application environment requires real-time detection, diagnosis, and handling of software faults. Built-in test (BIT) methods for producing 7.8 SUMMARY 187 self-testable software components hold potential for detecting faults during run time. A software component can contain test cases or can possess facilities that are capable of generating test cases which can be accessed by a component user on demand [15]. The corresponding capabilities allowing this are called built-in testing capabilities of software components. In the BIT methodology, testability is incorporated into software components, so that testing and maintenance can be self-contained. Wang et al. [16] have proposed a BIT model that can operate in two modes, namely, normal mode and maintenance mode. In the normal mode, the BIT capabilities are transparent to the component user, and the component does not differ from other non-BIT-enabled components. In the maintenance mode, however, the component user can test the component with the help of its BIT features. The component user can invoke the respective methods of the component, which execute the test, evaluate autonomously its results, and output the test summary. The authors describe a generic technical framework for enhancing BIT. One of their assumptions is that the component is implemented as a class. A benefit of such an implementation is that the methods for BIT can be passed to a subclass by inheritance. Ho¨rnstein and Edler [17] have proposed a component + BIT architecture comprising three types of components, namely, BIT, testers, and handlers. The BIT components are the BIT-enabled components. These components implement certain mandatory interfaces. Testers are components which access the BIT capabilities of BIT components through the corresponding interfaces and contain the test cases in a certain form. Finally, handlers are components that do not directly contribute to testing but provide recovery mechanisms in case of failures. 7.8 SUMMARY This chapter began with the objective and a description of system integration testing. One creates a “working version of the system” by putting the modules together in an incremental manner while performing tests to uncover different types of errors associated with interfacing. Next, we explored various levels of granularities in system integration testing: intrasystem testing, intersystem testing, and pairwise testing. We then examined five types of commonly used system integration techniques: topdown, bottomup, sandwich, bigbang, and incremental. We compared those techniques in detail. The incremental technique is widely used in the industry. We described the integration of hardware and software components to form a complete product. This led to the discussion of the hardware engineering process and, specifically, of different types of hardware design verification tests: diagnostic, electrostatic discharge, electromagnetic emission, electrical, thermal, environmental, packaging and handling, acoustic, safety, and reliability. Finally, we described two scenarios of an engineering change order process. The two scenarios are used to keep track of the hardware/software compatibility matrix of a released product. 188 CHAPTER 7 SYSTEM INTEGRATION TESTING We provided a framework of an integration test plan. The following categories of tests, which are included in an integration test plan, were discussed in detail: interface integrity tests, functional validity tests, end-to-end validity tests, pairwise validity tests, interface stress tests, and endurance tests. Finally, we described the integration of OTS components with other components. An organization, instead of developing a software component from scratch, may decide to purchase a COTS software from a third-party source and integrate it with its own software system. A COTS component seller must provide BITs along with the components, whereas a buyer organization must perform three types of testing to assess the COTS software: (i) acceptance testing of the OTS software component to determine the quality of the component, (ii) system-level fault injection testing, which is used to determine the tolerance of the software system to a failing OTS component, and (iii) operational system testing, which is used to determine the tolerance of the software system to a properly functioning OTS component. LITERATURE REVIEW For those actively involved in software testing or interested in knowing more about common software errors, appendix A of the book by C. Kaner, J. Falk, and H. Q. Nguyen (Testing Computer Software, Wiley, New York, 1999) is an excellent repository of real-life software errors. The appendix contains 12 categories of approximately 400 different types of errors with illustrations. A good discussion of hardware test engineering, such as mechanical, electronics, and accelerated tests, can be found in Patrick O’Connor’s book (Test Engineering: A Concise Guide to Cost-effective Design, Development and Manufacture, Wiley, New York, 2001). The author (i) describes a broad spectrum of modern methods and technologies in hardware test engineering, (ii) offers principles of cost-effective design, development, and manufacture of products and systems, and (iii) gives a breakdown of why product and systems fail and which methods would best prevent these failures. Researchers are continuously working on topics related to certification of COTS components. The interested reader is recommended to read the following articles. Each of these articles helps in understanding the issues of software certification and why it is important: J. Voas, “Certification Reducing the Hidden Costs of Poor Quality,” IEEE Software, Vol. 16, No. 4, July/August 1999, pp. 22–25. J. Voas, “Certifying Software for High-Assurance Environments,” IEEE Software, Vol. 16, No. 4, July/August 1999, pp. 48–54. J. Voas, “Developing Usage-Based Software Certification Process,” IEEE Computer, Vol. 16, No. 8, August 2000, pp. 32–37. S. Wakid, D. Kuhn, and D. Wallace, “Toward Credible IT Testing and Certification,” IEEE Software, Vol. 16, No. 4, July/August 1999, pp. 39–47. REFERENCES 189 Readers actively involved in COTS component testing or interested in a more sophisticated treatment of the topic are recommended to read the book edited by S. Beydeda and V. Gruhn (Testing Commercial-off-the-Shelf Components and Systems, Springer, Bonn, 2005). The book contains 15 articles that discuss in great detail: (i) testing components context independently, (ii) testing components in the context of a system, and (iii) testing component–based systems. The book lists several excellent references on the subject in a bibliography. REFERENCES 1. V. R. Basili and B. T. Perricone. Software Errors and Complexity: An Empirical Investigation. Communications of the ACM , January 1984, pp. 42–52. 2. D. E. Perry and W. M. Evangelist. An Empirical Study of Software Interface Faults—An Update. In Proceedings of the Twentieth Annual Hawaii International Conference on Systems Sciences, Hawaii Vol. II, IEEE Computer Society Press, Piscataway, NJ, January 1987, pp. 113– 126. 3. M. S. Deutsch. Software Verification and Validation: Realistic Project Approaches. Prentice-Hall, Englewood Cliffs, NJ, 1982, pp. 95–101. 4. M. Cusumano and R. W. Selby. How Microsoft Builds Software. Communications of the ACM , June 1997, pp. 53–61. 5. J. A. Solheim and J. H. Rowland. An Empirical Study of Testing and Integration Strategies Using Artificial Software Systems. IEEE Transactions on Software Engineering, October 1993, pp. 941–949. 6. S. Mahmood, R. Lai, and Y. S. Kim. Survey of Component-Based Software Development. IET Software, April 2007, pp. 57–66. 7. G. T. Heineman and W. T. Councill. Component-Based Software Engineering: Putting the Pieces Together. Addison-Wesley, Reading, MA, 2001. 8. A. Cechich, M. Piattini, and A. Vallecillo. Assessing Component Based Systems. Component Based Software Quality. Lecture Notes in Computer Science, Vol. 2693, 2003, pp. 1–20. 9. M. R. Vigder and J. Dean. An Architectural Approach to Building Systems from COTS Software Components. In Proceedings of the 22nd Annual Software Engineering Workshop, NASA Goddard Space Flight Center, Greenbelt, MD, December 1997, pp. 99–113, NRC41610. 10. D. Rine, N. Nada, and K. Jaber. Using Adapters to Reduce Interaction Complexity in Reusable Component-Based Software Development. In Proceedings of the 1999 Symposium on Software Reusability, Los Angeles, CA, ACM Press, New York, May 1999, pp. 37–43. 11. V. R. Basili and B. Boehm. COTS-Based Systems Top 10 List. IEEE Computer, May 2001, pp. 91– 93. 12. J. Voas. Certifying Off-the-Shelf Software Component. IEEE Computer, June 1998, pp. 53–59. 13. C. Szyperski, D. Gruntz, and S. Murer. Component Software: Beyond Object-Oriented Program- ming, 2nd ed. Addison-Wesley, Reading, MA, 2002. 14. E. J. Weyuker. Testing Component-Based Software: A Cautionary Tale. IEEE Software, Septem- ber/October 1998, pp. 54–59. 15. S. Beydeda and V. Gruhn. Merging Components and Testing Tools: The Self-Testing COTS Compo- nents (STECC) Strategy. In Proceedings of the 29th EUROMICRO Conference (EUROMICRO’03), Belek-Antalya, Turkey, IEEE Computer Society Press, Piscataway, September 2003, pp. 107– 114. 16. Y. Wang, G. King, and H. Wickburg. A Method for Built-in Tests in Component-Based Software Maintenance. In Proceedings of the IEEE International Conference on Software Maintenance and Reengineering (CSMR-99), University of Amsterdam, The Netherland, IEEE Computer Society Press, Piscataway, March 1999, pp. 186–189. 190 CHAPTER 7 SYSTEM INTEGRATION TESTING 17. J. Ho¨ornstein and H. Edler. Test Reuse in CBSE Using Built-in Tests. In Proceedings of the 9th IEEE Conference and Workshops on Engineering of Computer Based Systems, Workshop on Component-Based Software Engineering, Lund University, Lund, Sweden, IEEE Computer Society Press, Piscataway, 2002. Exercises 1. Describe the difference between black-box and white-box testing techniques? 2. If a program passes all the black-box tests, it means that the program should work properly. Then, in addition to black-box testing, why do you need to perform white-box testing? 3. Describe the difference between unit testing and integration testing? 4. Why should integration testing be performed? What types of errors can this phase of testing reveal? 5. Discuss the advantages and disadvantages of top-down and bottom-up approaches to integration testing. 6. Does automation of integration tests help the verification of the daily build process? Justify your answer. 7. Using the module hierarchy given in Figure 7.13, show the orders of module integration for the top-down and bottom-up integration approaches. Estimate the number of stubs and drivers needed for each approach. Specify the integration testing activities that can be done in parallel, assuming you have three SIT engineers. Based on the resource needs and the ability to carry out concurrent SIT activities, which approach would you select for this system and why? 8. Suppose that you plan to purchase COTS components and integrate them with your communication software project. What kind of acceptance criteria will you develop to conduct acceptance testing of the COTS components? 9. During integration testing of COTS components with a software system, it may be required to develop a wrapper software around the OTS component to limit what it can do. Discuss the general characteristics that a wrapping software A B C E F G J K L Figure 7.13 Module hierarchy of software system. D H I M REFERENCES 191 should have in order to be able to integrate COTS with the software system without any problem. 10. Describe the circumstances under which you would apply white-box testing, back-box testing, or both techniques to evaluate a COTS component. 11. For your current test project, develop an integration test plan. 12. Complete Section 5 (i.e., actual test results for each integration test phase) of the integration test plan after executing the integration test cases you developed in exercise 11. 8 CHAPTER System Test Categories As a rule, software systems do not work well until they have been used, and have failed repeatedly, in real applications. — Dave Parnas 8.1 TAXONOMY OF SYSTEM TESTS The objective of system-level testing, also called system testing, is to establish whether an implementation conforms to the requirements specified by the customers. It takes much effort to guarantee that customer requirements have been met and the system is acceptable. A variety of tests are run to meet a wide range of unspecified expectations as well. As integrated systems, consisting of both hardware and software components, are often used in reality, there is a need to have a much broader view of the behavior of the systems. For example, a telephone switching system not only is required to provide a connection between two users but also is expected to do so even if there are many ongoing connections below a certain upper limit. When the upper limit on the number of simultaneous connections is reached, the system is not expected to behave in an undesired manner. In this chapter, we identify different categories of tests in addition to the core functionality test. Identifying test categories brings us the following advantages: • Test engineers can accurately focus on different aspects of a system, one at a time, while evaluating its quality. • Test engineers can prioritize their tasks based on test categories. For example, it is more meaningful and useful to identify the limitations of a system only after ensuring that the system performs all basic functions to the test an engineer’s satisfactions. Therefore, stress tests, which thrive to identify the limitations of a system, are executed after functionality tests. • Planning the system testing phase based on test categorization lets a test engineer obtain a well-balanced test suite. Practical limitations make it difficult to be exhaustive, and economic considerations may restrict the Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 192 8.1 TAXONOMY OF SYSTEM TESTS 193 testing process from continuing any further. However, it is important to design a balanced test suite, rather than an unbalanced one with many test cases in one category and no tests in another. In the following, first we present the taxonomy of system tests (Figure 8.1). Thereafter, we explain each category in detail. • Basic tests provide an evidence that the system can be installed, configured, and brought to an operational state. • Functionality tests provide comprehensive testing over the full range of the requirements within the capabilities of the system. • Robustness tests determine how well the system recovers from various input errors and other failure situations. • Interoperability tests determine whether the system can interoperate with other third-party products. • Performance tests measure the performance characteristics of the system, for example, throughput and response time, under various conditions. • Scalability tests determine the scaling limits of the system in terms of user scaling, geographic scaling, and resource scaling. • Stress tests put a system under stress in order to determine the limitations of a system and, when it fails, to determine the manner in which the failure occurs. Types of system tests Figure 8.1 Types of system tests. Basic Functionality Robustness Interoperability Performance Scalability Stress Load and stability Reliability Regression Documentation Regulatory 194 CHAPTER 8 SYSTEM TEST CATEGORIES • Load and stability tests provide evidence that the system remains stable for a long period of time under full load. • Reliability tests measure the ability of the system to keep operating for a long time without developing failures. • Regression tests determine that the system remains stable as it cycles through the integration of other subsystems and through maintenance tasks. • Documentation tests ensure that the system’s user guides are accurate and usable. • Regulatory tests ensure that the system meets the requirements of government regulatory bodies in the countries where it will be deployed. 8.2 BASIC TESTS The basic tests (Figure 8.2) give a prima facie evidence that the system is ready for more rigorous tests. These tests provide limited testing of the system in relation to the main features in a requirement specification. The objective is to establish that there is sufficient evidence that a system can operate without trying to perform thorough testing. Basic tests are performed to ensure that commonly used functions, not all of which may directly relate to user-level functions, work to our satisfaction. We emphasize the fact that test engineers rely on the proper implementation of these functions to carry out tests for user-level functions. The following are the major categories of subsystems whose adequate testing is called the basic test. 8.2.1 Boot Tests Boot tests are designed to verify that the system can boot up its software image (or build) from the supported boot options. The boot options include booting from ROM, FLASH card, and PCMCIA (Personal Computer Memory Card International Association) card. The minimum and maximum configurations of the system must be tried while booting. For example, the minimum configuration of a router consists of one line card in its slots, whereas the maximum configuration of a router means that all slots contains line cards. Types of basic tests Figure 8.2 Types of basic tests. Boot Upgrade/downgrade Light emitting diode Diagnostic Command line interface 8.2 BASIC TESTS 195 8.2.2 Upgrade/Downgrade Tests Upgrade/downgrade tests are designed to verify that the system software can be upgraded or downgraded (rollback) in a graceful manner from the previous version to the current version or vice versa. Suppose that the system is running the (n − 1)th version of the software build and the new nth version of the software build is available. The question is how one upgrades the build from the (n − 1)th version to the nth version. An upgradation process taking a system from the (n − 1)th version to the nth version may not be successful, in which case the system is brought back to the (n − 1)th version. Tests are designed in this subgroup to verify that the system successfully reverts back, that is, rolls back, to the (n − 1)th version. An upgradation process may fail because of a number of different conditions: user-invoked abort (the user interrupts the upgrade process), in-process network disruption (the network environment goes down), in-process system reboot (there is a power glitch), or self-detection of upgrade failure (this is due to such things as insufficient disk space and version incompatibilities). 8.2.3 Light Emitting Diode Tests The LED (light emitting diode) tests are designed to verify that the system LED status indicators function as desired. The LEDs are located on the front panels of the systems. These provide visual indication of the module operational status. For example, consider the status of a system chassis: Green indicates that the chassis is operational, off indicates that there is no power, and a blinking green may indicate that one or more of its submodules are faulty. The LED tests are designed to ensure that the visual operational status of the system and the submodules are correct. Examples of LED tests at the system and subsystem levels are as follows: • System LED test: green = OK, blinking green = fault, off = no power. • Ethernet link LED test: green = OK, blinking green = activity, off = fault. • Cable link LED test: green = OK, blinking green = activity, off = fault. • User defined T1 line card LED test: green = OK, blinking green = activity, red = fault, off = no power. 8.2.4 Diagnostic Tests Diagnostic tests are designed to verify that the hardware components (or modules) of the system are functioning as desired. It is also known as the built-in self-test (BIST). Diagnostic tests monitor, isolate, and identify system problems without manual troubleshooting. Some examples of diagnostic tests are as follows: • Power-On Self-Test (POST): This is a set of automatic diagnostic routines that are executed during the boot phase of each submodule in the system. The POSTs are intended to determine whether or not the hardware is in a proper state to execute the software image. It is not intended to be comprehensive in the analysis of the hardware; instead, it provides a high 196 CHAPTER 8 SYSTEM TEST CATEGORIES level of confidence that the hardware is operational. The POSTs execute on the following kinds of elements: Memory Address and data buses Peripheral devices • Ethernet Loop-Back Test: This test generates and sends out the desired number, which is a tunable parameter, of packets and expects to receive the same number of Ethernet packets through the loop-back interface—external or internal. If an error occurs (e.g., packet mismatch or timeout), an error message indicating the type of error, its probable cause(s), and recommended action(s) is displayed on the console. The data sent out are generated by a random-number generator and put into a data buffer. Each time a packet is sent, it is selected from a different starting point of the data buffer, so that any two consecutively transmitted packets are unlikely to be identical. These tests are executed to ensure that the Ethernet card is functioning as desired. • Bit Error Test (BERT): The on-board BERT provides standard bit error patterns, which can be transmitted over a channel for diagnostic purpose. BERT involves transmitting a known bit pattern and then testing the transmitted pattern for errors. The ratio of the number of bits with errors to the total number of bits transmitted is called the bit error rate (BER). Tests are designed to configure all BERTs from the command line interface (CLI). These tests are executed to ensure that that the hardware is functioning as desired. 8.2.5 Command Line Interface Tests The CLI tests are designed to verify that the system can be configured, or provisioned, in specific ways. This is to ensure that the CLI software module processes the user commands correctly as documented. This includes accessing the relevant information from the system using CLI. In addition to the above tests, test scenarios may be developed to verify the error messages displayed. 8.3 FUNCTIONALITY TESTS Functionality tests (Figure 8.3) verify the system as thoroughly as possible over the full range of requirements specified in the requirements specification document. This category of tests is partitioned into different functionality subgroups as follows. 8.3.1 Communication Systems Tests Communication systems tests are designed to verify the implementation of the communication systems as specified in the customer requirements specification. 8.3 FUNCTIONALITY TESTS 197 Communication systems Module Types of functionality tests Logging and tracing Element management systems Managament information base Graphical user interface Security Figure 8.3 Types of functionality tests. Feature For example, one of the customer requirements can be to support Request for Comment (RFC) 791, which is the Internet Protocol (IP) specification. Tests are designed to ensure that an IP implementation conforms to the RFC791 standard. Four types of communication systems tests are recommended according to the extent to which they provide an indication of conformance [1]: • Basic interconnection tests provide evidence that an implementation can establish a basic connection before thorough testing is performed. • Capability tests check that an implementation provides the observable capabilities based on the static communication systems requirements. The static requirements describes the options, ranges of values for parameters, and timers. • Behavior tests endeavor to verify the dynamic communication systems requirements of an implementation. These are the requirements and options that define the observable behavior of a protocol. A large part of behavior tests, which constitute the major portion of communication systems tests, can be generated from the protocol standards. • Systems resolution tests probe to provide definite “yes” or “no” answers to specific requirements. 8.3.2 Module Tests Module tests are designed to verify that all the modules function individually as desired within the systems. Mutual interactions among the modules glue these 198 CHAPTER 8 SYSTEM TEST CATEGORIES components together into a whole system. The idea here is to ensure that individual modules function correctly within the whole system. One needs to verify that the system, along with the software that controls these modules, operate as specified in the requirement specification. For example, an Internet router contains modules such as line cards, system controller, power supply, and fan tray. Tests are designed to verify each of the functionalities. For Ethernet line cards, tests are designed to verify (i) autosense, (ii) latency, (iii) collisions, (iv) frame types, and (v) frame lengths. Tests are designed to ensure that the fan status is accurately read, reported by the software, and displayed in the supply module LEDs (one green “in service” and one red “out of service”). For T1/E1 line cards, tests are designed to verify: • Clocking: Internal (source timing) and receive clock recovery (loop timing). • Alarms: Detection of loss of signal (LOS), loss of frame (LOF), alarm indication signal (AIS), and insertion of AIS. • Line Coding: Alternate mark inversion (AMI) for both T1 and E1, bipolar 8 zero substitution (B8ZS) for T1 only, and high-density bipolar 3 (HDB3) for E1 only. • Framing: Digital signal 1 (DS1) and E1 framing. • Channelization: Ability to transfer user traffic across channels multi- plexed from one or more contiguous or non contiguous time slots on a T1 or E1 link. 8.3.3 Logging and Tracing Tests Logging and tracing tests are designed to verify the configurations and operations of logging and tracing. This also includes verification of “flight data recorder: non-volatile flash memory” logs when the system crashes. Tests may be designed to calculate the impact on system performance when all the logs are enabled. 8.3.4 Element Management Systems Tests The EMS tests verify the main functions which are to manage, monitor, and upgrade the communication system network elements (NEs). Table 8.1 summarizes the functionalities of an EMS. An EMS communicates with its NEs by using, for example, the Simple Network Management Protocol (SNMP) [2] and a variety of proprietary protocols and mechanisms. An EMS is a valuable component of a communication systems network. Not all EMSs will perform all of the tasks listed in Table 8.1. An EMS can support a subset of the tasks. A user working through the EMS graphical user interface (GUI) may accomplish some or all of the tasks. Remote access to an EMS allows the operators to access management and control information from any location. This facilitates the deployment of a distributed workforce that can rapidly respond to failure notifications. This means that thin client workstations can operate over the Internet and service provider intranets. In this subgroup, tests are designed to 8.3 FUNCTIONALITY TESTS 199 TABLE 8.1 EMS Functionalities Fault Management Configuration Management Alarm handling System turn-up Trouble detection Trouble correction Test and acceptance Network recovery Network provisioning Autodiscovery Back-up and restore Database handling Accounting Management Performance Security Management Management Track service usage Bill for services Data collection Report generation Data analysis Control NE access Enable NE functions Access logs verify the five functionalities of an EMS (Table 8.1). This includes both the EMS client and the EMS server. Examples of EMS tests are given below. • Auto-Discovery: EMS discovery software can be installed on the server to discover elements attached to the EMS through the IP network. • Polling and Synchronization: An EMS server detects a system unreachable condition within a certain time duration. The EMS server synchronizes alarm status, configuration data, and global positioning system (GPS) time from the NE. • Audit Operations: An audit mechanism is triggered whenever an out-of-service network element comes back. The mechanism synchronizes alarms between out-of-service NEs coming back online and the EMS. • Fault Manager: Generation of critical events, such as reboot and reset, are converted to an alert and stored in the EMS database. The EMS can send an email/page to a configured address when an alarm is generated. • Performance Manager: Data are pushed to the EMS server when the data buffer is full in the NE. Data in the server buffer are stored in the backup files once it is full. • Security Manager: It supports authentication and authorization of EMS clients and NEs. The EMS server does the authorization based on user privileges. • Policies: An EMS server supports schedulable log file transfer from the system. The EMS database is periodically backed up to a disk. • Logging: An EMS server supports different logging levels for every major module to debug. An EMS server always logs errors and exceptions. • Administration and Maintenance: This test configures the maximum number of simultaneous EMS clients. The EMS server backs up and restores database periodically. 200 CHAPTER 8 SYSTEM TEST CATEGORIES • Invoke Clients: Several clients can be invoked to interact with the EMS server. • Live Data View: A client can provide a live data viewer to monitor performance and show statistics of the system. • System Administration Task: A client can shut down the server, configure logging levels, display server status, and enable auto-discovery. • File Transfer: A client can check on-demand file transfer with a progress bar to indicate the progress and abort an on-demand file transfer operation. SNMP Example The SNMP is an application layer protocol that facilitates the exchange of management information between network elements. The SNMP is a part of the Internet network management architecture consisting of three components: network elements, agents, and network management stations (NMSs). A NE is a network node that contains an SNMP agent and that resides on a managed network. Network elements collect and store management information and make this information available to the NMS over the SNMP protocol. Network elements can be routers, servers, radio nodes, bridges, hubs, computer hosts, printers, and modems. An agent is a network management software module that (i) resides on a NE, (ii) has the local knowledge of management information, and (iii) translates that information into a form compatible with the SNMP. An NMS, sometimes referred to as a console, executes management applications to monitor and control network elements. One or more NMSs exist on each managed networks. An EMS can act as an NMS. A management information base (MIB) is an important component of a network management system. The MIB identifies the network elements (or managed objects) that are to be managed. Two types of managed objects exist: • Scalar objects define a single object instance. • Tabular objects define multiple related object instances that are grouped in MIB tables. Essentially, a MIB is a virtual store providing a model of the managed information. For example, a MIB can contain information about the number of packets that have been sent and received across an interface. It contains statistics on the number of connections that exist on a Transmission Control Protocal (TCP) port as well as information that describes each user’s ability to access elements of the MIB. The SNMP does not operate on the managed objects directly; instead, the protocol operates on a MIB. In turn, a MIB is the reflection of the managed objects, and its management mechanism is largely proprietary, perhaps through the EMS. The important aspect of a MIB is that it defines (i) the elements that are managed, (ii) how a user accesses them, and (iii) how they can be reported. A MIB can be depicted as an abstract tree with an unnamed root; individual items are represented as leaves of the tree. An object identifier uniquely identifies a MIB object in the tree. The organization of object identifiers is similar to a telephone number hierarchy; they are organized hierarchically with specific digits assigned by different organizations. The Structure of Management Information (SMI) defines 8.3 FUNCTIONALITY TESTS 201 the rules for describing the management information. The SMI specifies that all managed objects have a name, a syntax, and an encoding mechanism. The name is used as the object identifier. The syntax defines the data type of the object. The SMI syntax uses a subset of the Abstract Syntax Notation One (ASN.1) definitions. The encoding mechanism describes how the information associated with the managed object is formatted as a series of data items for transmission on the network. Network elements are monitored and controlled using four basic SNMP commands: read , write, trap, and traversal operations. The read command is used by an NMS to monitor NEs. An NMS examines different variables that are maintained by NEs. The write command is used by an NMS to control NEs. With the write command, an NMS changes the values of variables stored within NEs. The trap command is used by NEs to asynchronously report events to an NMS. When certain types of events occur, an NE sends a trap to an NMS. The traversal operations are used by the NMS to determine the variables an NE supports and to sequentially gather information in a variable table, such as a routing table. The SNMP is a simple request/response protocol that can send multiple requests. Six SNMP operations are defined. The Get operation allows an NMS to retrieve an object instant from an agent. The GetNext operation allows an NMS to retrieve the next object instance from a table or list within an agent. The GetBulk operation allows an NMS to acquire large amounts of related information without repeatedly invoking the GetNext operation. The Set operation allows an NMS to set values for object instances within an agent. The Trap operation is used by an agent to asynchronously inform an NMS of some event. The Inform operation allows one NMS to send trap information to another. The SNMP messages consist of two parts: a header and a protocol data unit (PDU). The message header contains two fields, namely, a version number and a community name. A PDU has the following fields: PDU type, request ID, error status, error index, and variable bindings. The following descriptions summarize the different fields: • Version Number: The version number specifies the version of the SNMP being used. • Community Name: This defines an access environment for a group of NMSs. NMSs within a community are said to exist within the same administrative domain. Community names serve as a weak form of authentication because devices that do not know the proper community name are precluded from SNMP operations. • PDU Type: This specifies the type of PDU being transmitted. • Request ID: A request ID is associated with the corresponding response. • Error Status: This indicates an error and shows an error type. In GetBulk operations, this field becomes a nonrepeater field by defining the number of requested variables listed that should be retrieved no more than once from the beginning of the request. The field is used when some of the variables are scalar objects with one variable. 202 CHAPTER 8 SYSTEM TEST CATEGORIES • Error Index: An error index associates the error with a particular object instance. In GetBulk operations, this field becomes a Max-repetitions field. This field defines the maximum number of times that other variables beyond those specified by the nonrepeater field should be retrieved. • Variable Bindings (varbinds): This comprises the data of an SNMP PDU. Variables bindings associate particular object instances with their current values (with the exception of Get and GetNext requests, for which the value is ignored). 8.3.5 Management Information Base Tests The MIB tests are designed to verify (i) standard MIBs including MIB II and (ii) enterprise MIBs specific to the system. Tests may include verification of the agent within the system that implements the objects it claims to. Every MIB object is tested with the following primitives: Get, GetNext, GetBulk, and Set. It should be verified that all the counters related to the MIBs are incremented correctly and that the agent is capable of generating (i) well known traps (e.g., coldstart, warmstart, linkdown, linkup) and (ii) application-specific traps (X.25 restart and X.25 reset). 8.3.6 Graphical User Interface Tests In modern-day software applications, users access functionalities via GUIs. Users of the client-server technology find it convenient to use GUI based applications. The GUI tests are designed to verify the interface to the users of an application. These tests verify different components (objects) such as icons, menu bars, dialogue boxes, scroll bars, list boxes, and radio buttons. Ease of use (usability) of the GUI design and output reports from the viewpoint of actual system users should be checked. Usefulness of the online help, error messages, tutorials, and user manuals are verified. The GUI can be utilized to test the functionality behind the interface, such as accurate response to database queries. GUIs need to be compatible, as discussed in Section 8.5, and consistent across different operating systems, environments, and mouse-driven and keyboard-driven inputs. Similar to GUI testing, another branch of testing, called usability testing, has been evolving over the past several years. The usability characteristics which can be tested include the following: • Accessibility: Can users enter, navigate, and exit with relative ease? • Responsiveness: Can users do what they want and when they want in a way that is clear? It includes ergonomic factors such as color, shape, sound, and font size. • Efficiency: Can users do what they want with a minimum number of steps and time? • Comprehensibility: Do users understand the product structure with a minimum amount of effort? 8.3 FUNCTIONALITY TESTS 203 8.3.7 Security Tests Security tests are designed to verify that the system meets the security requirements: confidentiality, integrity, and availability. Confidentiality is the requirement that data and the processes be protected from unauthorized disclosure. Integrity is the requirement that data and process be protected from unauthorized modification. Availability is the requirement that data and processes be protected from the denial of service to authorized users. The security requirements testing approach alone demonstrates whether the stated security requirements have been satisfied regardless of whether or not those requirements are adequate. Most software specifications do not include negative and constraint requirements. Security testing should include negative scenarios such as misuse and abuse of the software system. The objective of security testing is to demonstrate [3] the following: • The software behaves securely and consistently under all conditions—both expected and unexpected. • If the software fails, the failure does not leave the software, its data, or its resources to attack. • Obscure areas of code and dormant functions cannot be compromised or exploited. • Interfaces and interactions among components at the application, framework/middleware, and operating system levels are consistently secure. • Exception and error handling mechanisms resolve all faults and errors in ways that do not leave the software, its resources, its data, or its environment vulnerable to unauthorized modification or denial-of-service attack. The popularity of the Internet and wireless data communications technologies have created new types security threats, such as un-authorized access to the wireless data networks, eavesdropping on transmitted data traffic, and denial of service attack [4]. Even within an enterprise, wireless local area network intruders can operate inconspicuously because they do not need a physical connection to the network [5]. Several new techniques are being developed to combat these kinds of security threats. Tests are designed to ensure that these techniques work—and this is a challenging task. Useful types of security tests include the following: • Verify that only authorized accesses to the system are permitted. This may include authentication of user ID and password and verification of expiry of a password. • Verify the correctness of both encryption and decryption algorithms for systems where data/messages are encoded. • Verify that illegal reading of files, to which the perpetrator is not authorized, is not allowed. • Ensure that virus checkers prevent or curtail entry of viruses into the system. • Ensure that the system is available to authorized users when a zero-day attack occurs. 204 CHAPTER 8 SYSTEM TEST CATEGORIES • Try to identify any “backdoors” in the system usually left open by the software developers. Buffer overflows are the most commonly found vulnerability in code that can be exploited to compromise the system. Try to break into the system by exploiting the backdoors. • Verify the different protocols used by authentication servers, such as Remote Authentication Dial-in User Services (RADIUS), Lightweight Directory Access Protocol (LDAP), and NT LAN Manager (NTLM). • Verify the secure protocols for client–server communications, such as the Secure Sockets Layer (SSL). The SSL provides a secure channel between clients and servers that choose to use the protocol for web sessions. The protocol serves two functions: (i) authenticate the web servers and/or clients and (ii) encrypt the communication channel. • Verify the IPSec protocol. Unlike the SSL, which provides services at layer 4 and secures the communications between two applications, IPSec works at layer 3 and secures communications happening on the network. • Verify different wireless security protocols, such as the Extensible Authentication Protocol (EAP), the Transport Layer Security (TLS) Protocol, the Tunneled Transport Layer Security (TTLS) Protocol, and the Protected Extensible Authentication Protocol (PEAP). 8.3.8 Feature Tests Feature tests are designed to verify any additional functionalities which are defined in the requirement specifications but not covered in the above categories. Examples of those tests are data conversion and cross-functionality tests. Data conversion testing is testing of programs or procedures that are used to convert data from an existing system to a replacement system. An example is testing of a migration tool that converts a Microsoft Access database to MySQL format. Cross-functionality testing provides additional tests of the interdependencies among functions. For example, the verification of the interactions between NEs and an element management system in a 1xEV-DO wireless data network, as illustrated later in Figure 8.5, is considered as cross-functionality testing. 8.4 ROBUSTNESS TESTS Robustness means how sensitive a system is to erroneous input and changes in its operational environment. Tests in this category are designed to verify how gracefully the system behaves in error situations and in a changed operational environment. The purpose is to deliberately break the system, not as an end in itself, but as a means to find error. It is difficult to test for every combination of different operational states of the system or undesirable behavior of the environment. Hence, a reasonable number of tests are selected from each group illustrated in Figure 8.4 and discussed below. 8.4 ROBUSTNESS TESTS 205 Types of robustness tests Figure 8.4 Types of robustness tests. Boundary value Power cycling On-line insertion and removal High availability Degraded node 8.4.1 Boundary Value Tests Boundary value tests are designed to cover boundary conditions, special values, and system defaults. The tests include providing invalid input data to the system and observing how the system reacts to the invalid input. The system should respond with an error message or initiate an error processing routine. It should be verified that the system handles boundary values (below or above the valid values) for a subset of configurable attributes. Examples of such tests for the SNMP protocol are as follows: • Verify that an error response wrong_type is generated when the Set primitive is used to provision a variable whose type does not match the type defined in the MIB. • Verify that an error response wrong_value is generated when the Set primitive is used to configure a varbind list with one of the varbinds set to an invalid value. For example, if the varbind can have values from set 0,1,2,3, then the input value can be − 1 to generate a wrong_value response. • Verify that an error response too_big is generated when the Set primitive is used to configure a list of 33 varbinds. This is because the Set primitive accepts 32 varbinds at a time. • Verify that an error response not_writable is generated when the Set primitive is used to configure a variable as defined in the MIB. • Assuming that the SNMP protocol can support up to 1024 communities, verify that it is not possible to create the 1025th community. Examples of robustness tests for the 1xEV-DO network, as shown in Figure 8.5, are as follows: • Assuming that an EMS can support up to 300 NEs, verify that the EMS cannot support the 301st NE. Check the error message from the EMS when the user tries to configure the 301st network element. 206 CHAPTER 8 SYSTEM TEST CATEGORIES EMS RN RNC AAA IP Backhaul Network PDSN IP core network AT Billing RADIUS Figure 8.5 Typical 1xEV-DO radio access network. (Courtesy of Airvana, Inc.) • Assuming that an RNC can support 160,000 simultaneous sessions, verify that the 160,001th session cannot be established on an RNC. Check the error message from the EMS when the user tries to establish the 160,001th session. • Assuming that a base transceiver station (BTS) can support up to 93 simultaneous users, verify that the 94th user cannot be connected to a BTS. 8.4.2 Power Cycling Tests Power cycling tests are executed to ensure that, when there is a power glitch in a deployment environment, the system can recover from the glitch to be back in normal operation after power is restored. As an example, verify that the boot test is successful every time it is executed during power cycling. 8.4.3 On-Line Insertion and Removal Tests On-line insertion and removal (OIR) tests are designed to ensure that on-line insertion and removal of modules, incurred during both idle and heavy load operations, are gracefully handled and recovered. The system then returns to normal operation after the failure condition is removed. The primary objective is to ensure that the system recovers from an OIR event without rebooting or crashing any other components. OIR tests are conducted to ensure the fault-free operation of the system while a faulty module is replaced. As an example, while replacing an Ethernet card, the system should not crash. 8.4.4 High-Availability Tests High-availability tests are designed to verify the redundancy of individual modules, including the software that controls these modules. The goal is to verify that the 8.4 ROBUSTNESS TESTS 207 system gracefully and quickly recovers from hardware and software failures without adversely impacting the operation of the system. The concept of high availability is also known as fault tolerance. High availability is realized by means of proactive methods to maximize service up-time and to minimize the downtime. One module operates in the active mode while another module is in the standby mode to achieve 1 + 1 redundancy. For this mode of operation, tests are designed to verify the following: • A standby module generates an OIR event, that is, hot swapped, without affecting the normal operation of the system. • The recovery time does not exceed a predefined limit while the system is operational. Recovery time is the time it takes for an operational module to become a standby module and the standby module to become operational. • A server can automatically switch over from an active mode to a standby mode in case a fail-over event occurs. A fail-over is said to occur when a standby server takes over the workload of an active server. Tests can be designed to verify that a fail-over does not happen without any observable failure. A fail-over without an observable failure is called silent fail-over. This can only be observed during the load and stability tests described in Section 8.9. Whenever a silent fail-over occurs, a causal analysis must be conducted to determine its cause. 8.4.5 Degraded Node Tests Degraded node (also known as failure containment) tests verify the operation of a system after a portion of the system becomes nonoperational. It is a useful test for all mission-critical applications. Examples of degraded node tests are as follows: • Cut one of the four T1 physical connections from one router to another router and verify that load balancing occurs among the rest of the three T1 physical connections. Confirm that packets are equally distributed among the three operational T1 connections. • Disable the primary port of a router and verify that the message traffic passes through alternative ports with no discernible interruption of service to end users. Next, reactivate the primary port and verify that the router returns to normal operation. Example of 1xEV-DO Wireless Data Networks The code division multiple-access (CDMA) 2000 1xEV-DO (one-time evolution, data only) is a standardized technology to deliver high data rate at the air interface between an access terminal (AT) and a base transceiver station (BTS), also known as a radio node (RN) [6–8]. The 1xEV-DO Revision 0 delivers a peak data rate of 2.54 Mbits/s on the forward link (from a BTS to an AT) using only 1.25 MHz of spectrum width and a peak data rate of 153.6 kbits/s on the reverse link. We show an architecture for connecting all the BTS with the Internet (IP core network) in Figure 8.5. In this architecture, a base station controller (BSC), also known as 208 CHAPTER 8 SYSTEM TEST CATEGORIES a radio network controller (RNC), need not be directly connected by dedicated, physical links with a set of BTSs. Instead, the BTSs are connected to the RNCs via an IP back-haul network. Such an interconnection results in flexible control of the BTSs by the RNCs. The RNCs are connected with the Internet (IP core network) via one or more packet data serving nodes (PDSNs). Finally, the EMS allows the operator to manage the 1xEV-DO network. The ATs (laptop, PDA, mobile telephone) implement the end-user side of the 1xEV-DO and TCP/IP. The ATs communicate with the RNs over the 1xEV-DO airlink. The RNs are the components that terminate the airlink to/from the ATs. The functions performed by the RNs are (i) control and processing of the physical airlink, (ii) processing of the 1xEV-DO media access control (MAC) layer, and (iii) communication via a back-haul network to the RNC. In addition, RNs in conjunction with the RNCs perform the softer handoff mobility function of the 1xEV-DO protocol, where an AT is in communication with multiple sector antennas of the same RN. An RNC is an entity that terminates the higher layer components of the 1xEV-DO protocol suite. An RNC has logical interfaces to RNs, the authentication, authorization, and accounting (AAA) servers, other RNCs, and the PDSN. An RNC terminates 1xEV-DO signaling interactions from the ATs and processes the user traffic to pass it on to the PDSN. It manages radio resources across all the RNs in its domain and performs mobility management in the form of softer and soft handoffs. The AAA servers are carrier-class computing devices running RADIUS protocol and having an interface to a database. These servers may be configured for two AAA functions, namely, access network AAA and core network AAA. The access network AAA is connected to the RNCs, which perform the terminal authentication function. The core network AAA is connected to the PDSN, which performs user authentication at the IP level. The PDSN is a specialized router implementing IP and mobile IP. The PDSN may be implemented as a single, highly available device or as multiple devices clustered together to form a high-availability device. The PDSN is the edge of the core IP network with respect to the AT, that is, the point where (i) the Point-to-Point Protocol (PPP) traffic of the AT is terminated, (ii) the user is authenticated, and (iii) the IP service options are determined. The core IP network is essentially a network of routers. The EMS server is a system that directly controls the NEs. It is responsible for fault handling, network configuration, and statistics management of the network interfaces. The EMS server interfaces with the NEs via TCP/IP. The EMS server is accessed via a client—and not directly—by the network management staff of a wireless operator. An EMS client is a workstation from which a network operator manages the radio access network. 8.5 INTEROPERABILITY TESTS In this category, tests are designed to verify the ability of the system to interoperate with third-party products. An interoperability test typically combines different network elements in one test environment to ensure that they work together. In other words, tests are designed to ensure that the software can be connected with other 8.6 PERFORMANCE TESTS 209 systems and operated. In many cases, during interoperability tests, users may require the hardware devices to be interchangeable, removable, or reconfigurable. Often, a system will have a set of commands or menus that allow users to make the configuration changes. The reconfiguration activities during interoperability tests are known as configuration testing [9]. Another kind of interoperability test is called a (backward ) compatibility test. Compatibility tests verify that the system works the same way across different platforms, operating systems, and database management systems. Backward compatibility tests verify that the current software build flawlessly works with older version of platforms. As an example, let us consider a 1xEV-DO radio access network as shown in Figure 8.5. In this scenario, tests are designed to ensure the interoperability of the RNCs with the following products from different vendors: (i) PDSN, (ii) PDA with 1xEV-DO card, (iii) AAA server, (iv) PC with 1xEV-DO card, (v) laptop with 1xEV-DO card, (vi) routers from different vendors, (vii) BTS or RNC, and (viii) switches. 8.6 PERFORMANCE TESTS Performance tests are designed to determine the performance of the actual system compared to the expected one. The performance metrics needed to be measured vary from application to application. An example of expected performance is: The response time should be less than 1 millisecond 90% of the time in an application of the “push-to-talk” type. Another example of expected performance is: A transaction in an on-line system requires a response of less than 1 second 90% of the time. One of the goals of router performance testing is to determine the system resource utilization, for maximum aggregation throughput rate considering zero drop packets. In this category, tests are designed to verify response time, execution time, throughput, resource utilization, and traffic rate. For performance tests, one needs to be clear about the specific data to be captured in order to evaluate performance metrics. For example, if the objective is to evaluate the response time, then one needs to capture (i) end-to-end response time (as seen by external user), (ii) CPU time, (iii) network connection time, (iv) database access time, (v) network connection time, and (vi) waiting time. Some examples of performance test objectives for an EMS server are as follows: • Record the CPU and memory usage of the EMS server when 5, 10, 15, 20, and 25 traps per second are generated by the NEs. This test will validate the ability of the EMS server to receive and process those number of traps per second. • Record the CPU and memory usage of the EMS server when log files of different sizes, say, 100, 150, 200, 250 and 300 kb, are transferred from NEs to the EMS server once every 15 minutes. Some examples of performance test objectives of SNMP primitives are as follows: • Calculate the response time of the Get primitive for a single varbind from a standard MIB or an enterprise MIB. 210 CHAPTER 8 SYSTEM TEST CATEGORIES • Calculate the response time of the GetNext primitive for a single varbind from a standard MIB or an enterprise MIB. • Calculate the response time of the GetBulk primitive for a single varbind from a standard MIB or an enterprise MIB. • Calculate the response time of the Set primitive for a single varbind from a standard MIB or an enterprise MIB. Some examples of performance test objectives of a 1xEV-DO Revision 0 are as follows: • Measure the maximum BTS forward-link throughput. • Measure the maximum BTS reverse-link throughput. • Simultaneously generate maximum-rate BTS forward- and reverse-link data capacities. • Generate the maximum number of permissible session setups per hour. • Measure the AT-initiated connection setup delay. • Measure the maximum BTS forward-link throughput per sector carrier for 16 users in the 3-km/h mobility model. • Measure the maximum BTS forward-link throughput per sector carrier for 16 users in the 30-km/h mobility model. The results of performance are evaluated for their acceptability. If the performance metric is unsatisfactory, then actions are taken to improve it. The performance improvement can be achieved by rewriting the code, allocating more resources, and redesigning the system. 8.7 SCALABILITY TESTS All man-made artifacts have engineering limits. For example, a car can move at a certain maximum speed in the best of road conditions, a telephone switch can handle a certain maximum number of calls at any given moment, a router has a certain maximum number of interfaces, and so on. In this group, tests are designed to verify that the system can scale up to its engineering limits. A system may work in a limited-use scenario but may not scale up. The run time of a system may grow exponentially with demand and may eventually fail after a certain limit. The idea is to test the limit of the system, that is, the magnitude of demand that can be placed on the system while continuing to meet latency and throughput requirements. A system which works acceptably at one level of demand may not scale up to another level. Scaling tests are conducted to ensure that the system response time remains the same or increases by a small amount as the number of users are increased. Systems may scale until they reach one or more engineering limits. There are three major causes of these limitations: i. Data storage limitations—limits on counter field size and allocated buffer space 8.8 STRESS TESTS 211 ii. Network bandwidth limitations—Ethernet speed 10 Mbps and T1 card line rate 1.544 Mbps iii. Speed limit—CPU speed in megahertz Extrapolation is often used to predict the limit of scalability. The system is tested on an increasingly larger series of platforms or networks or with an increasingly larger series of workloads. Memory and CPU utilizations are measured and plotted against the size of the network or the size of the load. The trend is extrapolated from the measurable and known to the large-scale operation. As an example, for a database transaction system calculate the system performance, that is, CPU utilization and memory utilization for 100, 200, 400, and 800 transactions per second, then draw graphs of number of transactions against CPU and memory utilization. Extrapolate the measured results to 20,0000 transactions. The drawback in this technique is that the trend line may not be accurate. The system behavior may not degrade gradually and gracefully as the parameters are scaled up. Examples of scalability tests for a 1xEV-DO network are as follows: • Verify that the EMS server can support the maximum number of NEs, say, 300, without any degradation in EMS performance. • Verify that the maximum number of BTS, say, 200, can be homed onto one BSC. • Verify that the maximum number of EV-DO sessions, say, 16,000, can be established on one RNC. • Verify that the maximum number of EV-DO connections, say, 18,400, can be established on one RNC. • Verify that the maximum BTS capacity for the three-sector configuration is 93 users per BTS. • Verify the maximum softer handoff rate with acceptable number of call drops per BTS. Repeat the process every hour for 24 hours. • Verify the maximum soft handoff rate with no call drops per BTS. Repeat the process for every hour for 24 hours. 8.8 STRESS TESTS The goal of stress testing is to evaluate and determine the behavior of a software component while the offered load is in excess of its designed capacity. The system is deliberately stressed by pushing it to and beyond its specified limits. Stress tests include deliberate contention for scarce resources and testing for incompatibilities. It ensures that the system can perform acceptably under worst-case conditions under an expected peak load. If the limit is exceeded and the system does fail, then the recovery mechanism should be invoked. Stress tests are targeted to bring out the problems associated with one or more of the following: • Memory leak • Buffer allocation and memory carving 212 CHAPTER 8 SYSTEM TEST CATEGORIES One way to design a stress test is to impose the maximum limits on all system performance characteristics at the same time, such as the response time, availability, and throughput thresholds. This literally provides the set of worst-case conditions under which the system is still expected to operate acceptably. The best way to identify system bottlenecks is to perform stress testing from different locations inside and outside the system. For example, individually test each component of the system, starting with the innermost components that go directly to the core of the system, progressively move outward, and finally test from remote locations far outside the system. Testing each link involves pushing it to its full-load capacity to determine the correct operation. After all the individual components are tested beyond their highest capacity, test the full system by simultaneously testing all links to the system at their highest capacity. The load can be deliberately and incrementally increased until the system eventually does fail; when the system fails, observe the causes and locations of failures. This information will be useful in designing later versions of the system; the usefulness lies in improving the robustness of the system or developing procedures for a disaster recovery plan. Some examples of stress tests of a 1xEV-DO network are as follows: • Verify that repeated establishment and teardown of maximum telnet sessions to the BSC and BTS executed over 24 hours do not result in (i) leak in the number of buffers or amount of memory or (ii) significant increase in the degree of fragmentation of available memory. Tests should be done for both graceful and abrupt teardowns of the telnet session. • Stress the two Ethernet interfaces of a BTS by sending Internet traffic for 24 hours and verify that no memory leak or crash occurs. • Stress the four T1/E1 interfaces of a BTS by sending Internet traffic for 24 hours and verify that no memory leak or crash occurs. • Verify that repeated establishment and teardown of AT connections through a BSC executed over 24 hours do not result in (i) leaks in the number of buffers or amount of memory and (ii) significant increase in the degree of fragmentation of available memory. The sessions remain established for the duration of the test. • Verify that repeated soft and softer handoffs executed over 24 hours do not result in leaks in the number of buffers or amount of memory and do not significantly increase the degree of fragmentation of available memory. • Verify that repeated execution of all CLI commands over 24 hours do not result in leaks in the number of buffers or amount of memory and do not significantly increase the degree of fragmentation of available memory. Examples of stress tests of an SNMP agent are as follows: • Verify that repeated walking of the MIBs via an SNMP executed over 24 hours do not result in leaks in number of buffers or amount of memory and do not significantly increase the degree of fragmentation of available memory 8.9 LOAD AND STABILITY TESTS 213 • Verify that an SNMP agent can successfully respond to a GetBulk request that generates a large PDU, preferably of the maximum size, which is 8 kbytes under the following CPU utilization: 0, 50, and 90%. • Verify that an SNMP agent can simultaneously handle multiple GetNext and GetBulk requests over a 24-hour testing period under the following CPU utilizations: 0, 50, and 90%. • Verify that an SNMP agent can handle multiple Get requests containing a large number of varbinds over a 24-hour testing period under the following CPU utilizations: 0, 50, and 90%. • Verify that an SNMP agent can handle multiple Set requests containing a large number of varbinds over a 24-hour testing period under the following CPU utilizations: 0, 50, and 90%. 8.9 LOAD AND STABILITY TESTS Load and stability tests are designed to ensure that the system remains stable for a long period of time under full load. A system might function flawlessly when tested by a few careful testers who exercise it in the intended manner. However, when a large number of users are introduced with incompatible systems and applications that run for months without restarting, a number of problems are likely to occur: (i) the system slows down, (ii) the system encounters functionality problems, (iii) the system silently fails over, and (iv) the system crashes altogether. Load and stability testing typically involves exercising the system with virtual users and measuring the performance to verify whether the system can support the anticipated load. This kind of testing helps one to understand the ways the system will fare in real-life situations. With such an understanding, one can anticipate and even prevent load-related problems. Often, operational profiles are used to guide load and stability testing [10]. The idea is to test the system the way it will be actually used in the field. The concept of operation profile is discussed in Chapter 15 on software reliability. Examples of load and stability test objectives for an EMS server are as follows: • Verify the EMS server performance during quick polling of the maximum number of nodes, say, 300. Document how long it takes to quick poll the 300 nodes. Monitor the CPU utilization during quick polling and verify that the results are within the acceptable range. The reader is reminded that quick polling is used to check whether or not a node is reachable by doing a ping on the node using the SNMP Get operation. • Verify the EMS performance during full polling of the maximum number of nodes, say, 300. Document how long it takes to full poll the 300 nodes. Monitor the CPU utilization during full polling and verify that the results are within the acceptable range. Full polling is used to check the status and any configuration changes of the nodes that are managed by the server. 214 CHAPTER 8 SYSTEM TEST CATEGORIES • Verify the EMS server behavior during an SNMP trap storm. Generate four traps per second from each of the 300 nodes. Monitor the CPU utilization during trap handling and verify that the results are within an acceptable range. • Verify the EMS server’s ability to perform software downloads to the maximum number of nodes, say, 300. Monitor CPU utilization during software download and verify that the results are within an acceptable range. • Verify the EMS server’s performance during log file transfers of the maximum number of nodes. Monitor the CPU utilization during log transfer and verify that the results are within an acceptable range. In load and stability testing, the objective is to ensure that the system can operate on a large scale for several months, whereas, in stress testing, the objective is to break the system by overloading it to observe the locations and causes of failures. 8.10 RELIABILITY TESTS Reliability tests are designed to measure the ability of the system to remain operational for long periods of time. The reliability of a system is typically expressed in terms of mean time to failure (MTTF). As we test the software and move through the system testing phase, we observe failures and try to remove the defects and continue testing. As this progresses, we record the time durations between successive failures. Let these successive time intervals be denoted by t1, t2, . . ., ti . The average of all the i time intervals is called the MTTF. After a failure is observed, the developers analyze and fix the defects, which consumes some time—let us call this interval the repair time. The average of all the repair times is known as the mean time to repair (MTTR). Now we can calculate a value called mean time between failures (MTBF) as MTBF = MTTF + MTTR. The random testing technique discussed in Chapter 9 is used for reliability measurement. Software reliability modeling and testing are discussed in Chapter 15 in detail. 8.11 REGRESSION TESTS In this category, new tests are not designed. Instead, test cases are selected from the existing pool and executed to ensure that nothing is broken in the new version of the software. The main idea in regression testing is to verify that no defect has been introduced into the unchanged portion of a system due to changes made elsewhere in the system. During system testing, many defects are revealed and the code is modified to fix those defects. As a result of modifying the code, one of four different scenarios can occur for each fix [11]: • The reported defect is fixed. • The reported defect could not be fixed in spite of making an effort. 8.12 DOCUMENTATION TESTS 215 • The reported defect has been fixed, but something that used to work before has been failing. • The reported defect could not be fixed in spite of an effort, and something that used to work before has been failing. Given the above four possibilities, it appears straightforward to reexecute every test case from version n − 1 to version n before testing anything new. Such a full test of a system may be prohibitively expensive. Moreover, new software versions often feature many new functionalities in addition to the defect fixes. Therefore, regression tests would take time away from testing new code. Regression testing is an expensive task; a subset of the test cases is carefully selected from the existing test suite to (i) maximize the likelihood of uncovering new defects and (ii) reduce the cost of testing. Methods for test selection for regression testing are discussed in Chapter 13. 8.12 DOCUMENTATION TESTS Documentation testing means verifying the technical accuracy and readability of the user manuals, including the tutorials and the on-line help. Documentation testing is performed at three levels as explained in the following: Read Test: In this test a documentation is reviewed for clarity, organization, flow, and accuracy without executing the documented instructions on the system. Hands-On Test: The on-line help is exercised and the error messages verified to evaluate their accuracy and usefulness. Functional Test: The instructions embodied in the documentation are followed to verify that the system works as it has been documented. The following concrete tests are recommended for documentation testing: • Read all the documentations to verify (i) correct use of grammar, (ii) consistent use of the terminology, and (iii) appropriate use of graphics where possible. • Verify that the glossary accompanying the documentation uses a standard, commonly accepted terminology and that the glossary correctly defines the terms. • Verify that there exists an index for each of the documents and the index block is reasonably rich and complete. Verify that the index section points to the correct pages. • Verify that there is no internal inconsistency within the documentation. • Verify that the on-line and printed versions of the documentation are same. • Verify the installation procedure by executing the steps described in the manual in a real environment. 216 CHAPTER 8 SYSTEM TEST CATEGORIES • Verify the troubleshooting guide by inserting error and then using the guide to troubleshoot the error. • Verify the software release notes to ensure that these accurately describe (i) the changes in features and functionalities between the current release and the previous ones and (ii) the set of known defects and their impact on the customer. • Verify the on-line help for its (i) usability, (ii) integrity, (iii) usefulness of the hyperlinks and cross-references to related topics, (iv) effectiveness of table look-up, and (v) accuracy and usefulness of indices. • Verify the configuration section of the user guide by configuring the system as described in the documentation. • Finally, use the document while executing the system test cases. Walk through the planned or existing user work activities and procedures using the documentation to ensure that the documentation is consistent with the user work. 8.13 REGULATORY TESTS In this category, the final system is shipped to the regulatory bodies in those countries where the product is expected to be marketed. The idea is to obtain compliance marks on the product from those bodies. The regulatory approval bodies of various countries have been shown in Table 8.2. Most of these regulatory bodies issue safety and EMC (electromagnetic compatibility)/EMI (electromagnetic interference) compliance certificates (emission and immunity). The regulatory agencies are interested in identifying flaws in software that have potential safety consequences. The safety requirements are primarily based on their own published standards. For example, the CSA (Canadian Standards Association) mark is one of the most recognized, accepted, and trusted symbols in the world. The CSA mark on a product means that the CSA has tested a representative sample of the product and determined that the product meets the CSA’s requirements. Safety-conscious and concerned consumers look for the CSA mark on products they buy. Similarly, the CE (Conformite´ Europe´enne) mark on a product indicates conformity to the European Union directive with respect to safety, health, environment, and consumer protection. In order for a product to be sold in the United States, the product needs to pass certain regulatory requirements of the Federal Communications Commission (FCC). Software safety is defined in terms of hazards. A hazard is a state of a system or a physical situation which when combined with certain environmental conditions could lead to an accident or mishap. An accident or mishap is an unintended event or series of events that results in death, injury, illness, damage or loss of property, or harm to the environment [12]. A hazard is a logical precondition to an accident. Whenever a hazard is present, the consequence can be an accident. The existence of a hazard state does not mean that an accident will happen eventually. The concept of safety is concerned with preventing hazards. 8.13 REGULATORY TESTS 217 TABLE 8.2 Regulatory Approval Bodies of Different Countries Country Regulatory Certification Approval Body Argentina Australia and New Zealand Canada Czech Republic European Union Japan Korea Mexico Peoples Republic of China Poland Russia Singapore South Africa Taiwan United States IRAM is a nonprofit private association and is the national certification body of Argentina for numerous product categories. The IRAM safety mark is rated based on compliance with the safety requirements of a national IRAM standard. The Australian Communications Authority (ACA) and the Radio Spectrum Management Group (RSM) of New Zealand have agreed upon a harmonized scheme in producing the C-tick mark that regulates product EMC compliance. Canadian Standards Association The Czech Republic is the first European country to adopt conformity assessment regulations based on the European Union CE mark without additional approval certification or testing. Conformite´ Europe´enne The VCCI mark (Voluntary Control Council for Interference by Information Technology Equipment) is administered by VCCI for information technology equipment (ITE) sold in Japan. All products sold in Korea are required to be compliant and subject to the MIC (Ministry of Information and Communication) mark certification. EMC and safety testing are both requirements. The products must be tested in Mexico for the mandatory NOM (Normality of Mexico) mark. The CCC (China Compulsory Certification) mark is required for a wide range of products sold in the Peoples Republic of China. The Polish Safety B-mark (B for bezpieczny, which means “safe”) must be shown on all hazardous domestic and imported products. Poland does not accept the CE mark of the European Union. GOST-R certification. This certification system is administered by the Russian State Committee on Standardization, Metrology, and Certification (Gosstandart). Gosstandart oversees and develops industry mandatory and voluntary certification programs. The PSB mark is issued by the Singapore Productivity and Standards Board. The Safety Authority (PSB) is the statutory body appointed by the Ministry of Trade and Industry to administer the regulations. The safety scheme for electrical goods is operated by the South African Bureau of Standards (SABS) on behalf of the government. Compliance can be provided by the SABS based on the submission of the test report from any recognized laboratory. Most products sold in Taiwan must be approved in accordance with the regulations as set forth by the BSMI (Bureau of Standards, Metrology and Inspection). Federal Communications Commission 218 CHAPTER 8 SYSTEM TEST CATEGORIES A software in isolation cannot do physical damage. However, a software in the context of a system and an embedding environment could be vulnerable. For example, a software module in a database application is not hazardous by itself, but when it is embedded in a missile navigation system, it could be hazardous. If a missile takes a U-turn because of a software error in the navigation system and destroys the submarine that launched it, then it is not a safe software. Therefore, the manufacturers and the regulatory agencies strive to ensure that the software is safe the first time it is released. The organizations developing safety-critical software systems should have a safety assurance (SA) program to eliminate hazards or reduce their associated risk to an acceptable level [13]. Two basic tasks are performed by an SA engineering team as follows: • Provide methods for identifying, tracking, evaluating, and eliminating hazards associated with a system. • Ensure that safety is embedded into the design and implementation in a timely and cost-effective manner such that the risk created by the user/operator error is minimized. As a consequence, the potential damage in the event of a mishap is minimized. 8.14 SUMMARY In this chapter we presented a taxonomy of system tests with examples from various domains. We explained the following categories of system tests: • Basic tests provide an evidence that the system can be installed, configured, and brought to an operational state. We described five types of basic tests: boot, upgrade/downgrade, light emitting diode, diagnostic, and command line interface tests. • Functionality tests provide comprehensive testing over the full range of the requirements within the capabilities of the system. In this category, we described eight types of tests: communication systems, module, logging and tracing, element management systems, management information base, graphical user interface, security, and feature tests. • Robustness tests determine the system recovery process from various error conditions or failure situations. In this category, we described five types of robustness tests: boundary value, power cycling, on-line insertion and removal, high availability, degraded node tests. • Interoperability tests determine if the system can interoperate with other third-party products. • Performance tests measure the performance characteristics of the system, for example, throughput and response time, under various conditions. • Scalability tests determine the scaling limits of the system. LITERATURE REVIEW 219 • Stress tests stress the system in order to determine the limitations of the system and determine the manner in which failures occur if the system fails. • Load and stability tests provide evidence that, when the system is loaded with a large number of users to its maximum capacity, the system is stable for a long period of time under heavy traffic. • Reliability tests measure the ability of the system to keep operating over long periods of time. • Regression tests determine that the system remains stable as it cycles through the integration with other projects and through maintenance tasks. • Documentation tests ensure that the system’s user guides are accurate and usable. • Regulatory tests ensure that the system meets the requirements of government regulatory bodies. Most of these regulatory bodies issue safety, emissions, and immunity compliance certificates. LITERATURE REVIEW A good discussion of software safety concepts, such as mishap, hazard, hazard analysis, fault tree analysis, event tree analysis, failure modes and effects analysis, and firewall , can be found in Chapter 5 of the book by Freidman and Voas [13]. The book presents useful examples and opinions concerning the relationship of these concepts to software reliability. For those readers who are actively involved in usability testing or are interested in a more detailed treatment of the topic, Jeffrey Rubin’s book (Handbook of Usability Testing—How to Plan, Design, and Conduct Effective Tests, Wiley, New York, 1994) provides an excellent guide. Rubin describes four types of usability tests in great detail: (i) exploratory, (ii) assessment, (iii) validation, and (iv) comparison. The book lists several excellent references on the subject in its bibliography section. Memon et al. have conducted innovative research on test adequacy criteria and automated test data generation algorithms that are specifically tailored for programs with graphical user interfaces. The interested readers are recommended to study the following articles: A. M. Memon, “GUI Testing: Pitfalls and Process,” IEEE Computer, Vol. 35, No. 8, 2002, pp. 90–91. A. M. Memon, M. E. Pollock, and M. L. Soffa, “Hierarchical GUI Test Case Generation Using Automated Planning,” IEEE Transactions on Software Engineering, Vol. 27, No. 2, 2001, pp. 144–155. A. M. Memon, M. L. Soffa, and M. E. Pollock, “Coverage Criteria for GUI Testing,” in Proceedings of the 9th ACM SIGSOFT International Symposium on Foundation of Software Engineering, ACM Press, New York 2001, pp. 256–267. 220 CHAPTER 8 SYSTEM TEST CATEGORIES In the above-mentioned work, a GUI is represented as a series of operators that have preconditions and postconditions related to the state of the GUI. This representation classifies the GUI events into four categories: menu-open events, unrestricted-focused events, restricted-focus events, and system interaction events. Menu-open events are normally associated with the usage of the pull-down menus in a GUI. The unrestricted-focus events simply expand the interaction options available to a GUI user, whereas the restricted-focus events require the attention of the user before additional interactions can occur. Finally, system interaction events require the GUI to interact with the actual application. An excellent collection of essays on the usability and security aspects of a system appears in the book edited by L. F. Cranor and S. Garfinkel (Security and Usability, O’Reilly, Sebastopol, CA, 2005). This book contains 34 groundbreaking articles that discuss case studies of usable secure system design. This book is useful for researchers, students, and practitioners in the fields of security and usability. Mathematically rigorous treatments of performance analysis and concepts related to software performance engineering may be found in the following books: R. Jain, The Art of Computer Systems Performance Analysis, John, New York, 1991. C. U. Smith, Performance Engineering of Software Systems, Addison-Wesley, Reading, MA, 1990. Each of these books provides a necessary theoretical foundation for our understanding of performance engineering. REFERENCES 1. D. Rayner. OSI Conformance Testing. Computer Networks and ISDN Systems, Vol. 14, 1987, pp. 79–98. 2. D. K. Udupa. TMN: Telecommunications Management Network . McGraw-Hill, New York, 1999. 3. J. Jarzombek and K. M. Goertzel. Security in the Software Cycle. Crosstalk, Journal of Defense Software Engineering, September 2006, pp. 4–9. 4. S. Northcutt, L. Zeltser, S. Winters, K. Kent, and R. W. Ritchey. Inside Network Perimeter Security, 2nd ed. Sams Publishing, Indianapolis, IN, 2005. 5. K. Sankar, S. Sundaralingam, A. Balinsky, and D. Miller. Cisco Wireless LAN Security. Cisco Press, Indianapolis, IN, 2004. 6. IOS for 1 × EV, IS-878, 3Gpp2, http://www.3gpp2.org, June 2001. 7. CDMA2000 IOS Standard, IS-2001, 3Gpp2, http://www.3gpp2.org, Nov. 2001. 8. CDMA2000 High Rate Packet Data Air Interface, IS-856–1, 3Gpp2, http://www.3gpp2.org, Dec. 2001. 9. B. Beizer. Software Testing and Quality Assurance. Von Nostrand Reinhold, New York, 1984. 10. A. Avritzer and E. J. Weyuker. The Automatic Generation of Load Test Suites and Assessment of the Resulting Software. IEEE Transactions on Software Engineering, September 1995, pp. 705–716. 11. J. A. Whittaker. What Is Software Testing? And Why Is It So Hard? IEEE Software, January/February 2000, pp. 70–79. 12. N. G. Leveson. Software Safety: Why, What, and How. ACM Computing Surveys, June 1986, pp. 125–163. 13. M. A. Friedman and J. M. Voas. Software Assessment: Reliability, Safety, Testability. Wiley, New York, 1995. REFERENCES 221 Exercises 1. What is an element management system (EMS)? How is it different from a network management station (NMS)? 2. What are the differences between configuration, compatibility, and interoperability testing? 3. What are the differences between performance, stress, and scalability testing? What are the differences between load testing and stress testing? 4. What is the difference between performance and speed? 5. Buffer overflow is the most commonly found vulnerability in network-aware code that can be exploited to compromise a system. Explain the reason. 6. What are zero-day attacks? Discuss its significance with respect to security testing. 7. Discuss the importance of regression testing when developing a new software release. What test cases from the test suite would be more useful in performing a regression test? 8. What are the differences between safety and reliability? What are the differences between safety testing and security testing? 9. What is the similarity between software safety and fault tolerance? 10. For each of the following situations, explain whether it is a hazard or a mishap: (a) Water in a swimming pool becomes electrified. (b) A room fills with carbon dioxide. (c) A car stops abruptly. (d) A long-distance telephone company suffers an outage. (e) A nuclear weapon is destroyed in an unplanned manner. 11. What are the similarities and differences between quality assurance (QA) and safety assurance (SA)? 12. For your current test project, develop a taxonomy of system tests that you plan to execute against the implementation. 9 CHAPTER Functional Testing The test of a first-rate intelligence is the ability to hold two opposed ideas in the mind at the same time, and still retain the ability to function. — F. Scott Fitzgerald 9.1 FUNCTIONAL TESTING CONCEPTS OF HOWDEN William E. Howden developed the idea of functional testing of programs while visiting the International Mathematics and Statistics Libraries (IMSL) in Houston in 1977–1978. IMSL is presently known as Visual Numerics (http://www.vni.com/). The IMSL libraries are a comprehensive set of mathematical and statistical functions that programmers can embed into their software applications. IMSL uses proven technology that has been thoroughly tested, well documented, and continuously maintained. Howden applied the idea of functional testing to programs from edition 5 of the IMSL package. The errors he discovered can be considered to be of some subtlety to have survived to edition 5 status [1]. A function in mathematics is defined to be a set of ordered pairs (Xi, Yi), where Xi is a vector of input values and Yi is a vector of output values. In functional testing, a program P is viewed as a function that transforms the input vector Xi into an output vector Yi such that Yi = P (Xi). Examples √ 1. Let Y1 = X1. Here, P is a square-root computing function which calculates the squareroot Y1 of nonnegative integer X1. The result is assigned to Y1. 2. Let Y2 = C_compiler(X2). The program P is viewed as a C_compiler function that produces object code from C program X2. The object code is held in Y2. Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 222 9.1 FUNCTIONAL TESTING CONCEPTS OF HOWDEN 223 3. Let Y3 = TelephoneSwitch(X3). A telephone switch program P produces a variety of tones and voice signals represented by the vector Y3 = {idle, dial, ring, fast busy, slow busy tone, voice} by processing input data represented by the vector X3 = {off hook, on hook, phone number, voice}. 4. Let Y4 = sort(X4). The program P in this example is an implementation of a sorting algorithm which produces a sorted array Y4 from the input vector X4 = {A, N }, where A is the array to be sorted and N is the number of elements in A. The above four examples suggest that sometimes it is easy to view a program as a function in the mathematical sense and sometimes it is more difficult. It is easier to view a program as a function when the input values are algorithmically, or mathematically, transformed into output values, such as in the first and the fourth examples above. In the fourth example, Y4 is a certain permutation of the input array A. It is more difficult to view a program as a function when the input values are not directly transformed into the output values. For instance, in the third example above, an off-hook input is not mathematically transformed into a dial tone output. In functional testing we are not concerned with the details of the mechanism by which an input vector is transformed into an output vector. Instead, a program is treated as a function in the general sense. Three key elements of a function are its input, output, and expected transformation of input to output. Ignoring the details of the actual transformation of input to output, we analyze the domains of the input and the output variables of programs to generate test data. The four key concepts in functional testing [2] are as follows: • Precisely identify the domain of each input and each output variable. • Select values from the data domain of each variable having important properties. • Consider combinations of special values from different input domains to design test cases. • Consider input values such that the program under test produces special values from the domains of the output variables. One can identify the domain of an input or an output variable by analyzing the requirements specification and the design documents. In the following sections, we discuss Howden’s method for selecting test data from the domains of input and output variables. 224 CHAPTER 9 FUNCTIONAL TESTING 9.1.1 Different Types of Variables In this section, we consider numeric variables, arrays, substructures, and subroutine arguments and their important values. These types of variables are commonly used as input to and output from a large number of systems for numeric calculations. The MATLAB package and a number of tax filing software systems are examples of such systems. Numeric Variable The domain of a numeric variable is specified in one of two ways as follows: • A set of discrete values: An example of this type of domain is MODE = {23, 79} from the Bluetooth specification. The variable MODE is a numeric variable which takes one of the two values from the set {23, 79}. The MODE value is used in a modulo operation to determine the channel frequency for packet transmission. • A few contiguous segments of values: As an example, the gross income input to a tax filing software for a person or company is specified as a value from the range {0, . . . , ∞}. Each contiguous segment is characterized by a minimum (MIN) value and a maximum (MAX) value. Example: The inputs and the output variables of the frequency selection box (FSB) module of the Bluetooth wireless communication system are shown in Figure 9.1. Bluetooth communication technology uses a frequency hopping spread-spectrum technique for accessing the wireless medium. A piconet channel is viewed as a possibly infinite sequence of slots, where one slot is 625 μs long. The frequency on which a data packet will be transmitted during a given slot is computed by the FSB module illustrated in Figure 9.1. The FSB module accepts three input variables MODE, CLOCK, and ADDRESS and generates values of the output variable INDEX. All four are numeric variables. The domains of these variables are characterized as follows: MODE : The domain of variable MODE is the discrete set {23, 79}. CLOCK : The CLOCK variable is represented by a 28-bit unsigned number with MIN = 0x0000000 and MAX = 0xFFFFFFF. The smallest increment Mode: {23, 79} 48-bit Address Lower 28 bits 28-bit Clock Upper 27 bits Frequency selection box Index: 0–78 or 0–22 Figure 9.1 Frequency selection box of Bluetooth specification. 9.1 FUNCTIONAL TESTING CONCEPTS OF HOWDEN 225 in CLOCK represents the elapse of 312.5 μs. The FSB module uses the upper 27 bits of the 28-bit CLOCK in frequency calculations. ADDRESS : The ADDRESS variable is represented by a 48-bit unsigned number. The FSB module uses the lower 28 bits of the 48-bit ADDRESS. Therefore, the range of ADDRESS from the viewpoint of the FSB module is specified as follows: • MIN = 0xyyyyy0000000, where yyyyy is a 20-bit arbitrary value. • MAX = 0xzzzzzFFFFFFF, where zzzzz is a 20-bit arbitrary value. INDEX : This variable assumes values in a given range as specified in the following: • MIN = 0. • MAX = 22 if MODE = 23. • MAX = 78 if MODE = 79. Having characterized the domain of a numeric variable as a discrete set of values or as a set of contiguous segments, test data are chosen by applying different selection criteria depending upon the use of those variables, namely, (i) input, (ii) output, (iii) dualuse, and (iv) multipletype. A dual-use variable is one that holds an input to the system under test at the beginning of system execution and receives an output value from the system at some point thereafter. A multiple-type variable is one that can hold input (or even output) values of different types, such as numeric and string, at different times. In this section we give an example of such a variable from a real-life system. Next, we explain the four selection criteria. 1. Selection Criteria for Input Variables: If the input domain is a discrete set of values, then tests involving each value are performed. The domain of the input variable MODE in Figure 9.1 consists of the set {23, 79}. The FSB module is tested at least once with MODE = 23 and at least once with MODE = 79. If the domain of a variable consists of one or more segments of values, then test data are selected as follows: • Consider the minimum value of a segment. • Consider the maximum value of a segment. • Consider a typical representative value in a segment. • Consider certain values which have special mathematical properties. These values include 0, 1, and real numbers with small absolute values. • Consider, if possible, values lying outside a segment. Here the idea is to observe the behavior of the program in response to invalid input. • In case the conceptual minimum (maximum) of a variable is −∞ (+∞), then a large negative (positive) value is chosen to represent −∞ (+∞). 2. Selection Criteria for Output Variables: If the domain of an output variable consists of a small set of discrete values, then the program is tested with input which results in the generation of each of the output values. The output variable of the frequency hopping box in Figure 9.1 has a 226 CHAPTER 9 FUNCTIONAL TESTING domain of two discrete sets {0, . . . , 22} and {0, . . . , 78} for MODE = 23 and MODE = 79, respectively. The frequency hopping box must be adequately tested so that it produces each of the output values as desired. For a large set of discrete values, the program is tested with many different inputs that cause the program to produce many different output values. If an output variable has a domain consisting of one or more segments of numbers, then the program is tested as follows: • Test the program with different inputs so that it produces the minimum values of the segments. • Test the program with different inputs so that it produces the maximum values of the segments. • Test the program with different inputs so that it produces some interior values in each segment. 3. Selection Criteria for Dual-Use Variables: A variable often serves as an input to a program (or function) and holds the output from the program (or function) at the end of the desired computation. Such a variable is called a dual-use variable. Additional test cases are designed to meet the following selection criteria for dual-use variables: • Consider a test case such that the program produces an output value which is different from the input value of the same dual-use variable. The idea behind this test is to avoid coincidental correctness of the output. • Consider a test case such that the program produces an output value which is identical to the input value of the same dual-use variable. 4. Selection Criteria for Multiple-Type Variables: Sometimes an input variable can take on values of different types. For example, a variable may take on values of type integer in one program invocation and of type string in another invocation. It may be unlikely for a programmer to define a single storage space to hold values of different types. The program will read the input value into different locations (i.e., variables) depending upon the type of value provided by the user. This scenario requires us to test that the program correctly reads an input value depending upon its type and, subsequently, correctly processes the value. Such multiple-type variables may arise in real-life systems, and programs must take necessary actions to handle them. If an input or output variable can take on values of different types, then the following criteria are used: • For an input variable of multipletype, the program is tested with input values of all the types. • For an output variable of multipletype, the program is tested with different inputs so that the variable holds values of all different types. Example: We show a part of the tax forms prepared by Canada Customs and Revenue Agency (CCRA) in Figure 9.2. By analyzing this specification we conclude that a taxpayer (user) inputs a real number or a blank in line 2 . The 9.1 FUNCTIONAL TESTING CONCEPTS OF HOWDEN 227 Enter your net income from line 236 of your return 1 Enter your spouse or common-law partner's net income from page 1 of your return + 2 Add lines 1 and 2 Income for Ontario credits = 3 Involuntary separation 6089 If, on December 31, 2001, you and your spouse or common-law _____________ partner occupied separate principal residences for medical, _____________ educational, or business reasons, leave line 2 blank _____________ and enter his or her address in the area beside box 6089. _____________ Figure 9.2 Part of form ON479 of T1 general—2001, published by the CCRA. value input in line 2 is a real number representing the net income of the spouse or common-law partner of the user if both of them occupied the same residence on December 31, 2001. Otherwise, if they occupied separate principal residences for the specified reasons, then line 2 must be left blank and the address of the spouse or common-law partner must be provided in box 6089. Clearly, line 2 is an input to a tax filing software system which must be able to handle different types of values input in line 2 —namely real values and blank , and a blank is not the same as 0.0. This is because, if we equate a blank with 0.0, then the software system may not know when and how to interpret the information given in box 6089. Referring to line 2 input of Figure 9.2, a tax filing program must be tested as follows: 1. Interpret line 2 as a numeric variable taking on values from an interval and apply selection criteria such as selecting the minimum value, the maximum value, and an interior value of the defined interval. 2. Interpret line 2 as a string variable taking on values from a discrete set of values, where the discrete set consists of just one member, namely a blank. Arrays An array holds values of the same type, such as integer and real. Individual elements of an array are accessed by using one or more indices. In some programming languages, such as MATLAB, an array can hold values of both integer and real types. An array has a more complex structure than an individual numeric variable. This is because of the following three distinct properties of an array. The three properties are individually and collectively considered while testing a program. • An array can have one or more dimensions. For example, A[i][j ] is the element at row i and column j of a two-dimensional array A. Array dimensions are considered in testing because their values are likely to be used in controlling for and while loops. Just as we select extremal—both minimum and maximum—values and an intermediate value of a numeric variable, we need to consider arrays of different configurations, such as an array of 228 CHAPTER 9 FUNCTIONAL TESTING minimum size, an array of maximum size, an array with a minimum value for the first dimension and a maximum value for the second dimension, and so on. • Individual array elements are considered as distinct numeric variables. Each value of an array element can be characterized by its minimum value, maximum value, and special values, such as 0, 1, and . All these values need to appear in tests in order to observe the program behavior while processing the extremal and special values. • A portion of an array can be collectively interpreted as a distinct substructure with specific application-dependent properties. For example, in the field of numerical analysis a matrix structure is a common representation of a set of linear equations. The diagonal elements of a matrix, the lower triangular matrix, and the upper triangular matrix are substructures with significance in numerical analysis. Just as we consider special values of a numeric variable, such as 0, 1, and a small value , there exist special values of substructures of an array. For example, some well-known substructures of a two-dimensional array are individual rows and columns, diagonal elements, a lower triangular matrix, and an upper triangular matrix. These substructures are interpreted as a whole. The selection criteria for array dimensions is based on the following three intuitive steps: 1. Completely specify the dimensions of an array variable. In programming languages such as C and Java, the dimensions of an array are statically defined. On the other hand, in the programming language MATLAB, there is no such concept as static definition of an array dimension. Instead, an array can be dynamically built without any predefined limit. 2. Construct different, special configurations of the array by considering special values of individual array dimensions and their combinations. Consider an array with k dimensions, where each dimension is characterized by a minimum value, a maximum value, and an intermediate value. These selections can be combined to form 3k different sets of dimensions for a k-dimensional array. Let us take a concrete example with a two-dimensional array, where k = 2. The two dimensions are commonly known as row and column dimensions. The minimum number of rows of an array for an application can be an arbitrary value, such as 1. Similarly, the maximum number of rows can be 20. An intermediate row number is 10. The minimum number of columns of the array can be 2, the maximum number of columns can be 15, and an intermediate column number can be 8. By considering the three row values and three column values, one can enumerate the 3k = 32 = 9 combinations of different array configurations. Some examples of those 9 different configurations of arrays are 1 × 2, 1 × 15, 20 × 2, 20 × 15, and so on. 9.1 FUNCTIONAL TESTING CONCEPTS OF HOWDEN 229 3. Apply the selection criteria of Section 9.1.1 to individual elements of the selected array configurations. Substructure In general, a structure means a data type that can hold multiple data elements. In the field of numerical analysis, a matrix structure is commonly used. For example, a set of n linear equations in n unknown quantities is represented by an n × (n + 1) matrix. The n rows and the first n columns of the matrix represent the coefficients of the n linear equations, whereas the (n + 1)th column represents the constants of the n equations. The n rows and the first n columns can be considered as one substructure, whereas the (n + 1)th column is another substructure. In addition, individual columns, rows, and elements of the matrix can be considered distinct substructures. In general, one can identify a variety of substructures of a given structure. Given the fact that there can be a large number of substructures of a structure, it is useful to find functionally identifiable substructures. In the above example of an n × (n + 1) matrix, the (n + 1)th column is one functionally identifiable substructure, whereas the rest of the matrix is another functionally identifiable substructure. Individual columns and individual rows are functionally identifiable substructures as well. For example, a row of identical values means that the coefficients of all the variables and the constant of an equation have identical values. Sometimes, pairs of columns can form substructures. For example, one may use a pair of columns to represent complex numbers—one column for the “real” part and another column for the “imaginary” part. It is useful to consider both the full structure and the substructures, such as rows, columns, and individual elements, in functional testing involving matrix structures. The following criteria are considered in identifying substructures: • Dimensions of Substructures: Choose structure dimensions such that substructures can take on all possible dimension values. Examples of substructures taking on special values are as follows:(i) the number of elements in a row is 1 and (ii) the number of elements in a row is “large.” • Values of Elements of Substructures: Choose values of elements of structures such that substructures take on all possible special values. Examples of substructures taking on special values are as follows: (i) all elements of a row take on value 0, (ii) all elements take on value 1, and (iii) all elements take on value . • Combinations of Individual Elements with Array Dimensions: Combine the dimension aspect with the value aspect. For example, one can select a large vector—a row or a column—of elements with identical special values, such as all 0’s, all 1’s, and so on. Subroutine Arguments Some programs accept input variables whose values are the names of functions. Such programs are found in numerical analysis and statistical applications [1]. Functional testing requires that each value of such a variable be included in a test case. Consider a program P calling a function f (g, param_list), where f () accepts a list of parameters denoted by param_list and another parameter 230 CHAPTER 9 FUNCTIONAL TESTING g of enumerated type. When f () is invoked, it invokes the function held in g. Let the values of g be represented by the set {g1, g2, g3}. We design three test cases to execute f (g1, list1), f (g2, list2), and f (g3, list3) such that, eventually, g1(), g2(), and g3() are executed. 9.1.2 Test Vector A test vector, also called test data, is an instance of the input to a program. It is a certain configuration of the values of all the input variables. Values of individual input variables chosen in the preceding sections must be combined to obtain a test vector. If a program has n input variables var1, var2, . . . , varn which can take on k1, k2, . . ., kn special values, respectively, then there are k1 × k2 × · · · × kn possible combinations of test data. Example: We show the number of special values of different input variables of the FSB module of Figure 9.1 in Table 9.1. Variable MODE takes on values from a discrete set of size 2 and both the values from the discrete set are considered. Variables CLOCK and MODE take on values from one interval each and three special values for each of them are considered. Therefore, one can generate 2 × 3 × 3 = 18 test vectors from Table 9.1. Some programs accept a large number of input variables. Tax filing software systems are examples of such programs. Consider all possible combinations of a few special values of a large number of input variables is a challenging task. If a program has n input variables, each of which can take on k special values, then there are kn possible combinations of test vectors. We know that k is a small number, but n may be large. We have more than one million test vectors even for k = 3 and n = 20. Therefore, there is a need to identify a method for reducing the number of test vectors obtained by considering all possible combinations of all the sets of special values of input variables. Howden suggested that there is no need of combining values of all input variables to design a test vector if the variables are not functionally related . To reduce the number of input combinations, Howden suggested [3, 4] that we produce all possible combinations of special values of variables falling in the same functionally related subset. In this way, the total number of combinations of special values of the input variables is reduced. It is difficult to give a formal definition TABLE 9.1 Number of Special Values of Inputs to FBS Module of Figure 9.1 Variable Number of Special Values (k) Special Values MODE 2 {23, 79} CLOCK 3 {0x0000000, 0x000FF00, 0xFFFFFFF} ADDRESS 3 {0xFFFFF0000000, 0xFFFFF00FFF00, 0xFFFFFFFFFFFF} 9.1 FUNCTIONAL TESTING CONCEPTS OF HOWDEN 231 Functionally related variables x1 x1 P x2 x2 f1 z z x3 P x3 d x4 x4 f2 x5 x5 Functionally related variables (a) (b) Figure 9.3 Functionally related variables. of the idea of functionally related variables, but it is easy to identify them. Let us consider the following examples: • Variables appearing in the same assignment statement are functionally related. • Variables appearing in the same branch predicate—the condition part of an if statement, for example—are functionally related. Example: The program P in Figure 9.3a has five input variables such that x1, . . . , x4 take on three special values and x5 is a Boolean variable. The total number of combinations of the special values of the five input variables is 34 × 2 = 162. Let us assume that program P has an internal structure as shown in Figure 9.3b, where variables x1 and x2 are functionally related and variables x3 and x4 are functionally related. Function f1 uses the input variables x1 and x2. Similarly, function f2 uses the input variables x3 and x4. Input variable x5 is used to decide whether the output of f1 or the output of f2 will be the output of P. We consider 32 = 9 different combination of x1 and x2 as input to f1, 9 different combinations of x3 and x4 as input to f2, and two different values of x5 to the decision box d in P. We need 36 [(9 + 9) × 2] combinations of the five input variables x1, . . . , x5, which is much smaller than 162. 9.1.3 Testing a Function in Context Let us consider a program P and a function f in P as shown in Figure 9.4. The variable x is an input to P and input to f as well. Suppose that x can take on values in the range [−∞, +∞] and that f is called only when the predicate x ≥ 20 232 CHAPTER 9 FUNCTIONAL TESTING x P x x > 20 f Figure 9.4 Function in context. holds. If we are unaware of the predicate x ≥ 20, then we are likely to select the following set of test data to test P: x = +k, x = −k, x = 0. Here, k is a number with a large magnitude. The reader may note that the function f will be invoked just once for x = +k, assuming that k ≥ 20, and it will not be invoked when P is run with the other two test data because of the conditional execution of f . Testing function f in isolation will require us to generate the same test data as above. It may be noted that the latter two data points are invalid data because they fall outside the range of x for f in P. The valid range of x for f is [20, +∞], and functional testing in context requires us to select the following values of x: x = k where k 20 x = y where 20 < y k x = 20 where the symbols and are read as “much larger” and “much smaller,” respectively. 9.2 COMPLEXITY OF APPLYING FUNCTIONAL TESTING In order to have an idea of the difficulty in applying the concept of functional testing, let us summarize the main points in functional testing in the following: • Identify the input and the output variables of the program and their data domains. • Compute the expected outcomes as illustrated in Figure 9.5a for selected input values. • Determine the input values that will cause the program to produce selected outputs as illustrated in Figure 9.5b. Input domains Input domains 9.2 COMPLEXITY OF APPLYING FUNCTIONAL TESTING 233 Output domains P (a) Output domains P (b) Figure 9.5 (a) Obtaining output values from an input vector and (b) obtaining an input vector from an output value in functional testing. Generating test data by analyzing the input domains has the following two characteristics. • The number of test cases obtained from an analysis of the input domains is likely to be too many because of the need to design test vectors representing different combinations of special values of the input variables. • Generation of the expected output for a certain test vector is relatively simple. This is because a test designer computes an expected output from an understanding and analysis of the specification of the system. On the other hand, generating test data by analyzing the output domains has the following characteristics: • The number of test cases obtained from an analysis of the output domains is likely to be fewer compared to the same number of input variables because there is no need to consider different combinations of special values of the output variables. • Generating an input vector required to produce a chosen output value will require us to analyze the specification in the reverse direction, as illustrated in Figure 9.5b. Such reverse analysis will be a more challenging task than 234 CHAPTER 9 FUNCTIONAL TESTING computing an expected value in the forward direction, as illustrated in Figure 9.5a. So far in this section we have discussed the ways to apply the idea of functional testing to an entire program. However, the underlying concept, that is, analyzing the input and output domains of a program, can as well be applied to individual modules, functions, or even lines of code. This is because every computing element can be described in terms of its input and output domains, and hence the idea of functional testing can be applied to such a computing element. Referring to Figure 9.6, program P is decomposed into three modules M1, M2, and M3. In addition, M1 is composed of functions f1 and f5, M2 is composed of functions f2 and f3, and M3 is composed of functions f4 and f6. We can apply the idea of functional testing to the entire program P , individual modules M1, M2, and M3, and individual functions f1, . . . , f6 by considering their respective input and output domains as listed in Table 9.2. x1 P M1 x2 f1 y1 f5 x3 y5 x4 f2 z f6 y4 y2 y3 y6 f3 f4 M3 M2 Figure 9.6 Functional testing in general. TABLE 9.2 Input and Output Domains of Functions of P in Figure 9.6 Entity Name Input Variables Output Variables P {x1, x2, x3, x4} {z} M1 {x1, x2} {y5} M2 {x3, x4} {y4} M3 {y4, y5} {z} f1 {x1, x2} {y1} f2 {x3, x4, y3} {y2, y4} f3 {y2} {y3} f4 {y4} {y6} f5 {y1} {y5} f6 {y5, y6} {z} 9.3 PAIRWISE TESTING 235 Conceptually, one can apply functional testing at any level of abstraction, from a single line of code at the lowest level to the entire program at the highest level. As we consider individual modules, functions, and lines of code, the task of accurately identifying the input and output domains of the computing element under test becomes more difficult. The methodology for developing functional test cases is an analytical process that decomposes specification of a program into different classes of behaviors. The functional test cases are designed for each class separately by identifying the input and output domains of the class. Identification of input and output domains help in classifying the specification into different classes. However, often, in practice, the total number of input and output combinations can be very large. Several well-known techniques are available to tackle this issue, which we discuss in the following sections. 9.3 PAIRWISE TESTING Pairwise testing is a special case of all-combination testing of a system with n input variables. Let us consider n input variables denoted by {v1, v2, . . . , vi, . . . , vn}. For simplicity, assume that for each variable vi, 1 ≤ i ≤ n, we choose k values of interest. An all-combination testing means considering all the kn different test vectors. On the other hand, pairwise testing means that each possible combination of values for every pair of input variables is covered by at least one test case. Pairwise testing requires a subset of test cases (vectors) covering all pairwise combinations of values instead of all possible combinations of values of a set of variables. Pairwise testing is also referred to as all-pair/two-way testing. One can also generate three- four-way or 4-way tests to cover all combinations of values of three or four variables. As all combinations of values of more and more variables (e.g., 2, 3, 4, . . .) are considered, the size of the test suite grows rapidly. Empirical results [5] for medical devices and distributed database systems show that two-way testing would detect more that 90% of the defects, whereas, four-way testing would detect 100% of the defects. Pairwise testing would detect approximately 70% of the defects, whereas six-way testing would detect 100% of defects for browser and server applications. Consider the system S in Figure 9.7, which has three input variables X, Y, and Z. Let the notation D(w) denote the set of values for an arbitrary variable w. For the three given variables X, Y , and Z, their value sets are as follows: D(X) = {True, False}, D(Y ) = {0, 5}, and D(Z) = {Q, R}. The total number of x y z System S Figure 9.7 System S with three input variables. 236 CHAPTER 9 FUNCTIONAL TESTING TABLE 9.3 Pairwise Test Cases for System S Test Case ID Input X Input Y Input Z TC1 True 0 Q TC2 True 5 R TC3 False 0 Q TC4 False 5 R all-combination test cases is 2 × 2 × 2 = 8. However, a subset of four test cases, as shown in Table 9.3, covers all pairwise combinations. Different test generation strategies for pairwise testing have been reported in the literature [6]. In this section, two popular techniques, namely, orthogonal array (OA) [7] and in parameter order (IPO) [8], are discussed. 9.3.1 Orthogonal Array Orthogonal arrays were originally studied as a kind of numerical curiosity by monks [9]. It was further studied by C. R. Rao, a statistician, in the late 1940s, [10]. Genichi Taguchi first used the idea of orthogonal arrays in his experimental design of total quality management (TQM) [11]. The method, known as the Taguchi method, has been used in experimental design in the manufacturing field and provides an efficient and systematic way to optimize designs for performance, quality, and cost. It has been used successfully in Japan and the United States in designing reliable, high-quality products at low cost in the automobile and consumer electronics industries. Mandl was the first to use the concept of orthogonal array in designing test cases for pairwise testing of compilers [12]. Let us consider the two-dimensional array of integers shown in Table 9.4. The array has an interesting property: Choose any two columns at random and find all pairs (1,1), (1,2), (2,1), and (2,2); however, not all the combinations of 1’s and 2’s appear in the table. For example, (2,2,2) is a valid combination, but it is not in the table. Only four of the eight combinations can be found in the table. This is an example of L4(23) orthogonal array. The 4 indicates that the array has four rows, also known as runs. The 23 part indicates that the array has three columns, known TABLE 9.4 L4 (23 ) Orthogonal Array Factors Runs 1 2 3 1 1 1 1 2 1 2 2 3 2 1 2 4 2 2 1 TABLE 9.5 Commonly Used Orthogonal Arrays Orthogonal Array L4 L8 L9 L12 L16 L16 L18 L25 L27 L32 L32 L36 L36 L50 L54 L64 L64 L81 Number of Runs 4 8 9 12 16 16 18 25 27 32 32 36 36 50 54 64 64 81 Maximum Number of Factors 3 7 4 11 15 5 8 6 13 31 10 23 16 12 26 63 21 40 9.3 PAIRWISE TESTING 237 Maximum Number of Columns at These Levels 2 3 4 5 3 7 — 4 11 15 — — 5 1 7 — — — 6 — 13 31 1 — 9 11 12 3 13 1 — — 11 1 25 63 — — 21 — 40 as factors, and each cell in the array contains two different values, known as levels. Levels mean the maximum number of values that a single factor can take on. The maximum number of columns at levels 1 and 2 are 0 and 3, respectively. Orthogonal arrays are generally denoted by the pattern LRuns (LevelsFactors). Commonly used orthogonal arrays are given in Table 9.5. Let us consider our previous example of the system S, where S has three input variables X, Y, and Z. For the three given variables X, Y , and Z, their value sets are as follows: D(X) = {True, False}, D(Y ) = {0, 5}, and D(Z) = {Q, R}. Let us map the variables to the factors and values to the levels onto the L4(23) orthogonal array (Table 9.4) with the resultant in Table 9.3. In the first column, let 1 = True, 2 = False. In the second column, let 1 = 0, 2 = 5. In the third column, let 1 = Q, 2 = R. Note that not all combinations of all variables have been selected; instead combinations of all pairs of input variables have been covered with four test cases. It is clear from the above example that orthogonal arrays provide a technique for selecting a subset of test cases with the following properties: • The technique guarantees testing the pairwise combinations of all the selected variables. • The technique generates fewer test cases than the all-combination approach. 238 CHAPTER 9 FUNCTIONAL TESTING • The technique generates a test suite that has an even distribution of all pairwise combinations. • The technique can be automated. In the following, the steps of a technique to generate orthogonal arrays are presented. The steps are further explained by means of a detailed example. Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Identify the maximum number of independent input variables with which a system will be tested. This will map to the factors of the array—each input variable maps to a different factor. Identify the maximum number of values that each independent variable will take. This will map to the levels of the array. Find a suitable orthogonal array with the smallest number of runs LRuns(XY ), where X is the number of levels and Y is the number of factors [7, 13]. A suitable array is one that has at least as many factors as needed from step 1 and has at least as many levels for each of those factors as identified in step 2. Map the variables to the factors and values of each variable to the levels on the array. Check for any “left-over” levels in the array that have not been mapped. Choose arbitrary valid values for those left-over levels. Transcribe the runs into test cases. Web Example. Consider a website that is viewed on a number of browsers with various plug-ins and operating systems (OSs) and through different connections as shown in Table 9.6. The table shows the variables and their values that are used as elements of the orthogonal array. We need to test the system with different combinations of the input values. Following the steps laid out previously, let us design an orthogonal array to create a set of test cases for pairwise testing: Step 1: There are four independent variables, namely, Browser, Plug-in, OS, and Connection. Step 2: Each variable can take at most three values. TABLE 9.6 Various Values That Need to Be Tested in Combinations Variables Values Browser Plug-in OS Connection Netscape, Internet Explorer (IE), Mozilla Realplayer, Mediaplayer Windows, Linux, Macintosh LAN, PPP, ISDN Note: LAN, local-area network; PPP, Point-to-Point Protocal; ISDN, Integrated Services Digital Network. TABLE 9.7 L9 (34 ) Orthogonal Array Factors Runs 1 2 3 4 1 1 1 1 1 2 1 2 2 2 3 1 3 3 3 4 2 1 2 3 5 2 2 3 1 6 2 3 1 2 7 3 1 3 2 8 3 2 1 3 9 3 3 2 1 9.3 PAIRWISE TESTING 239 Step 3: Step 4: Step 5: An orthogonal array L9(34) as shown in Table 9.7 is good enough for the purpose. The array has nine rows, three levels for the values, and four factors for the variables. Map the variables to the factors and values to the levels of the array: the factor 1 to Browser, the factor 2 to Plug-in, the factor 3 to OS, and the factor 4 to Connection. Let 1 = Netscape, 2 = IE, and 3 = Mozilla in the Browser column. In the Plug-in column, let 1 = Realplayer and 3 = Mediaplayer. Let 1 = Windows, 2 = Linux, and 3 = Macintosh in the OS column. Let 1 = LAN, 2 = PPP, and 3 = ISDN in the Connection column. The mapping of the variables and the values onto the orthogonal array is given in Table 9.8. There are left-over levels in the array that are not being mapped. The factor 2 has three levels specified in the original array, but there are only two possible values for this variable. This has caused a level (2) to be left TABLE 9.8 L9 (34 ) Orthogonal Array after Mapping Factors Test Case ID Browser Plug-in OS Connection TC1 Netscape Realplayer Windows LAN TC2 Netscape 2 Linux PPP TC3 Netscape Mediaplayer Macintosh ISDN TC4 IE Realplayer Linux ISDN TC5 IE 2 Macintosh LAN TC6 IE Mediaplayer Windows PPP TC7 Mozilla Realplayer Macintosh PPP TC8 Mozilla 2 Windows ISDN TC9 Mozilla Mediaplayer Linux LAN 240 CHAPTER 9 FUNCTIONAL TESTING TABLE 9.9 Generated Test Cases after Mapping Left-Over Levels Test Case ID Browser Plug-in OS Connection TC1 Netscape Realplayer Windows LAN TC2 Netscape Realplayer Linux PPP TC3 Netscape Mediaplayer Macintosh ISDN TC4 IE Realplayer Linux ISDN TC5 IE Mediaplayer Macintosh LAN TC6 IE Mediaplayer Windows PPP TC7 Mozilla Realplayer Macintosh PPP TC8 Mozilla Realplayer Windows ISDN TC9 Mozilla Mediaplayer Linux LAN Step 6: over for variable Plug-in after mapping the factors. One must provide a value in the cell. The choice of this value can be arbitrary, but to have a coverage, start at the top of the Plug-in column and cycle through the possible values when filling in the left-over levels. Table 9.9 shows the mapping after filling in the remaining levels using the cycling technique mentioned. We generate nine test cases taking the test case values from each run. Now let us examine the result: • Each Browser is tested with every Plug-in, with every OS, and with every Connection. • Each Plug-in is tested with every Browser, with every OS, and with every Connection. • Each OS is tested with every Browser, with every Plug-in, and with every Connection. • Each Connection is tested with every Browser, with every Plug-in, and with every OS. 9.3.2 In Parameter Order Tai and Lei [8] have given an algorithm called in parameter order (IPO) to generate a test suite for pairwise coverage of input variables. The algorithm generates a test suite that satisfies pairwise coverage for the values of the first two parameters. Then the test suite is extended by the algorithm to satisfy pairwise coverage for the values of the third parameters and continues to do so for the values of each additional parameter until all parameters are included in the test suite. The algorithm runs in three phases, namely, initialization, horizontal growth, and vertical growth, in that order. In the initialization phase, test cases are generated to cover two input variables. In the horizontal growth phase, the existing test cases are extended with the values of the other input variables. In the vertical growth phase, additional test cases are created such that the test suite satisfies pairwise 9.3 PAIRWISE TESTING 241 coverage for the values of the new variables. In order to use the IPO test generation techniques, one can follow the steps described below. Assume that there are n number of variables denoted by {pi|1 ≤ i ≤ n} and a dash denotes an unspecified value of a variable. Algorithm: In Parameter Order Input: Parameter pi and its domain D(pi) = {v1, v2, . . . , vq}, where i = 1, . . . , n. Output: A test suite T satisfying pairwise coverage. Initialization Phase: Step 1: For the first two parameters p1 and p2, generate the test suite T := {(v1, v2)|v1 and v2 are values of p1 and p2, respectively} Step 2: If i = 2, stop. Otherwise, for i = 3, 4, . . . , n repeat steps 3 and 4. Horizontal Growth Phase: Step 3: Let D(pi) = {v1, v2, . . . , vq}. Create a set πi := {pairs between values of pi and all values of p1, p2, . . . , pi−1} If |T| ≤ q, then { for 1 ≤ j ≤ |T|, extend the jth test in T by adding values vj and remove from π i pairs covered by the extended test} else { for 1 ≤ j ≤ q, extend the jth test in T by adding value vj and remove from π i pairs covered by the extended test; for q < j ≤ |T|, extend the jth test in T by adding one value of pi such that the resulting test covers the most numbers of pairs in π i, and remove from π i pairs covered by the extended test }; Vertical Growth Phase: Step 4: Let T := (empty set) and |πi|> 0; for each pair in π i (let the pairs contain value w of pk, 1 ≤ k < i, and values u of pi) { if (T contains a test with — as the value of pk and u as the value of pi) 242 CHAPTER 9 FUNCTIONAL TESTING modify this test by replacing the — with w; else add a new test to T that has w as the value of pk, u as the value of pi, and — as the value of every other parameter; }; T := T ∪ T ; The test cases may contain — values after the generation of test suite T . If pi is the last parameter, each — value pk, 1 ≤ k ≤ i, is replaced with any value of pk. Otherwise, these — values are replaced with parameter values in the horizontal growth phase for pi+1 as follows: Assuming that value v of pi+1 is chosen for the horizontal growth of a test that contains — as the value for pk, 1 ≤ k ≤ i. If there are uncovered pairs involving v and some values of pk, the — for pk is replaced with one of these values of pk. Otherwise, the — for pk is replaced with any value of pk. Example: Consider the system S in Figure 9.7 which has three input parameters X, Y , and Z. Assume that a set D, a set of input test data values, has been selected for each input variable such that D(X) = {True, False}, D(Y ) = {0, 5}, and D(Z) = {P , Q, R}. The total number of possible test cases is 2 × 2 × 3 = 12, but the IPO algorithm generates six test cases. Let us apply step 1 of the algorithm. Step 1: Generate a test suite consisting of four test cases with pairwise coverage for the first two parameters X and Y : ⎡ (True, T = ⎢⎢⎣ (True, (False, (False, ⎤ 0) 5) 0) ⎥⎥⎦ 5) Step 2: i = 3 > 2; therefore, steps 2 and 3 must be executed. Step 3: Now D(Z) = {P , Q, R}. Create a set π3 := { pairs between values of Z and values of X, Y }, which is ⎡ (True, P ), π3 = ⎢⎢⎣ (False, P ), (0, P ), (5, P ), (True, Q), (False, Q), (0, Q), (5, Q), ⎤ (True, R) (False, R) (0, R) ⎥⎥⎦ (5, R) Since Z has three values, we have q = 3 and |T | = 4 > q = 3. We extended (True, 0), (True, 5), and (False, 0) by adding P , Q, and R, respectively. Next, we remove the pairs (True, P ), (True, Q), (False, R), (0, P ), (5, Q), and (0, R) from the set π3 because these pairs are covered 9.3 PAIRWISE TESTING 243 by the partially extended test suite. The extended test suite T and π3 become ⎡ ⎤ (True, 0, P ) T = ⎢⎢⎣ (True, (False, 5, 0, Q) R) ⎥⎥⎦ (False, 5, ) ⎡ π3 = ⎢⎢⎣ (False, P ), (False, Q) (0, Q) (5, P ), (5, R) ⎤ (True, R) ⎥⎥⎦ Now we need to select one of P , Q, and R for (False, 5). If we add P to (False, 5), the extended test (False, 5, P ) covers two missing pairs (False, P ) and (5, P ). If we add Q to (False, 5), the extended test (False, 5, Q) covers only one missing pair (False, Q). If we add R to (False, 5), the extended test (False, 5, R) covers only one missing pair (5, R). Therefore, the algorithm will choose (False, 5, P ) as the fourth test case. Remove the pairs (False, P ) and (5, P ) from the set π3 because these pairs are covered by the partial extended test suite. Now the extended test suite T and π3 become ⎡ ⎤ (True, 0, P ) T = ⎢⎢⎣ (True, (False, 5, 0, Q) R) ⎥⎥⎦ (False, 5, P ) ⎡ ⎤ (True, R) π3 = ⎢⎢⎣ (False, Q) (0, Q) ⎥⎥⎦ (5, R) Step 4: So far the tests in T have not yet covered the four tests in π3, namely, (True, R), (False, Q), (0, Q), and (5, R). The algorithm will generate a set T = {(True, —, R), (False, —, Q)} from the first two pairs of π3, that is, (True, R) and (False, Q). The algorithm changes the test case (False, —, Q) to (False, 0, Q) without adding a new test case to cover the next pair (0, Q). The algorithm modifies the test case (True, —, R) to (True, 5, R) without adding any new test case to be able to cover the pair (5, R). The union T ∪ T generates the six pairwise test cases as follows: ⎡ ⎤ (True, 0, P ) T = ⎢⎢⎢⎢⎢⎢⎣ (True, (False, (False, (False, 5, 0, 5, 0, Q) R) P) Q) ⎥⎥⎥⎥⎥⎥⎦ (True, 5, R) 244 CHAPTER 9 FUNCTIONAL TESTING 9.4 EQUIVALENCE CLASS PARTITIONING An input domain may be too large for all its elements to be used as test input (Figure 9.8a). However, the input domain can be partitioned into a finite number of subdomains for selecting test inputs. Each subdomain is known as an equivalence class (EC), and it serves as a source of at least one test input (Figure 9.8b). The objective of equivalence partitioning is to divide the input domain of the system under test into classes, or groups, of inputs. All the inputs in the same class have a similar effect on the system under test [14, 15]. An EC is a set of inputs that the system treats identically when the system is tested. It represents certain conditions, or predicates, on the input domain. An input condition on the input domain is a predicate over the values of the input domain. A valid input to a system is an element of the input domain that is expected to return a nonerror value. An invalid input is an input that is expected to return an error value. Input conditions are used to partition the input domain into ECs for the purpose of selecting inputs. Guidelines for EC Partitioning Equivalence classes can be derived from an input domain by a heuristic technique. One can approximate the ECs by identifying classes for which different program behaviors are specified. Identification of ECs becomes easier with experience. Myers suggests the following guidelines to identify ECs [16]. 1. An input condition specifies a range [a, b] : Identify one EC for a ≤ X ≤ b and two other classes for X < a and X > b to test the system with invalid inputs. 2. An input condition specifies a set of values: Create one EC for each element of the set and one EC for an invalid member. For example, if the input is selected from a set of N items, then N + 1 ECs are created: (i) one EC for each element of the set {M1}, {M2}, . . . , {MN } and (ii) one EC for elements outside the set {M1, M2, . . . , MN }. 3. Input condition specifies for each individual value: If the system handles each valid input differently, then create one EC for each valid input. For 1 2 4 3 (a) Input domain (b) Input domain partitioned into four subdomains Figure 9.8 (a) Too many test inputs; (b) one input selected from each subdomain. 9.4 EQUIVALENCE CLASS PARTITIONING 245 example, if the input is from a menu, then create one EC for each menu item. 4. An input condition specifies the number of valid values (say N ): Create one EC for the correct number of inputs and two ECs for invalid inputs—one for zero values and one for more than N values. For example, if a program can accept 100 natural numbers for sorting, then three ECs are created: (i) one for 100 valid input of natural numbers, (ii) one for no input value, and (iii) one for more than 100 natural numbers. 5. An input condition specifies a “must-be” value: Create one EC for a must-be value and one EC for something that is not a must-be value. For example, if the first character of a password must be a numeric character, then we are required to generate two ECs: (i) one for valid values, {pswd | the first character of pswd has a numeric value}, and (ii) one for invalid values, {pswd | the first character of pswd is not numeric}. 6. Splitting of EC : If elements in a partitioned EC are handled differently by the system, then split the EC into smaller ECs. Identification of Test Cases from ECs Having identified the ECs of an input domain of a program, test cases for each EC can be identified by the following: Step 1: Step 2: Step 3: Assign a unique number to each EC. For each EC with valid input that has not been covered by test cases yet, write a new test case covering as many uncovered ECs as possible. For each EC with invalid input that has not been covered by test cases, write a new test case that covers one and only one of the uncovered ECs. In summary, the advantages of EC partitioning are as follows: • A small number of test cases are needed to adequately cover a large input domain. • One gets a better idea about the input domain being covered with the selected test cases. • The probability of uncovering defects with the selected test cases based on EC partitioning is higher than that with a randomly chosen test suite of the same size. • The EC partitioning approach is not restricted to input conditions alone; the technique may also be used for output domains. Example: Adjusted Gross Income. Consider a software system that computes income tax based on adjusted gross income (AGI) according to the following rules: If AGI is between $1 and $29,500, the tax due is 22% of AGI. If AGI is between $29,501 and $58,500, the tax due is 27% of AGI. If AGI is between $58,501 and $100 billion, the tax due is 36% of AGI. 246 CHAPTER 9 FUNCTIONAL TESTING TABLE 9.10 Generated Test Cases to Cover Each Equivalence Class Test Case Number Test Value Expected Result TC1 $22,000 $4,840 TC2 $46,000 $12,420 TC3 $68,000 $24,480 TC4 $-20,000 Rejected with an error message TC5 $150 billion Rejected with an error message Equivalence Class Being Tested EC1 EC3 EC4 EC2 EC5 In this case, the input domain is from $1 to $100 billion. There are three input conditions in the example: 1. $1 ≤ AGI ≤ $29,500. 2. $29,501 ≤ AGI ≤ $58,500. 3. $58,501 ≤ AGI ≤ $100 billion. First we consider condition 1, namely, $1 ≤ AGI ≤ $29,500, to derive two ECs: EC1 : $1 ≤ AGI ≤ $29,500; valid input. EC2 : AGI < 1; invalid input. Then, we consider condition 2, namely, $29,501 ≤ AGI ≤ $58,500, to derive one EC: EC3 : $29,501 ≤ AGI ≤ $58,500; valid input. Finally, we consider condition 3, namely, $58,501 ≤ AGI ≤ $100 billion, to derive two ECs: EC4 : $58,501 ≤ AGI ≤ $100 billion; valid input. EC5 : AGI > $100 billion; invalid input. Note that each condition was considered separately in the derivation of ECs. Conditions are not combined to select ECs. Five test cases are generated to cover the five ECs, as shown in Table 9.10. In the EC partition technique, a single test input is arbitrarily selected to cover a specific EC. We need to generate specific test input by considering the extremes, either inside or outside of the defined EC partitions. This leads us to the next technique, known as boundary value analysis, which focuses on the boundary of the ECs to identify test inputs. 9.5 BOUNDARY VALUE ANALYSIS The central idea in boundary value analysis (BVA) is to select test data near the boundary of a data domain so that data both within and outside an EC are selected. 9.5 BOUNDARY VALUE ANALYSIS 247 It produces test inputs near the boundaries to find failures caused by incorrect implementation of the boundaries. Boundary conditions are predicates that apply directly on and around the boundaries of input ECs and output ECs. In practice, designers and programmers tend to overlook boundary conditions. Consequently, defects tend to be concentrated near the boundaries between ECs. Therefore, test data are selected on or near a boundary. In that sense, the BVA technique is an extension and refinement of the EC partitioning technique [17]. In the BVA technique, the boundary conditions for each EC are analyzed in order to generate test cases. Guidelines for BVA As in the case of EC partitioning, the ability to develop high-quality effective test cases using BVA requires experience. The guidelines discussed below are applicable to both input conditions and output conditions. The conditions are useful in identifying high-quality test cases. By high-quality test cases we mean test cases that can reveal defects in a program. 1. The EC specifies a range: If an EC specifies a range of values, then construct test cases by considering the boundary points of the range and points just beyond the boundaries of the range. For example, let an EC specify the range of −10.0 ≤ X ≤ 10.0. This would result in test data {−9.9 − 10.0, −10.1} and {9.9, 10.0, 10.1}. 2. The EC specifies a number of values: If an EC specifies a number of values, then construct test cases for the minimum and the maximum value of the number. In addition, select a value smaller than the minimum and a value larger than the maximum value. For example, let the EC specification of a student dormitory specify that a housing unit can be shared by one to four students; test cases that include 1, 4, 0, and 5 students would be developed. 3. The EC specifies an ordered set: If the EC specifies an ordered set, such as a linear list, table, or sequential file, then focus attention on the first and last elements of the set. Example: Let us consider the five ECs identified in our previous example to compute income tax based on AGI. The BVA technique results in test as follows for each EC. The redundant data points may be eliminated. EC1 : $1 ≤ AGI ≤ $29,500; This would result in values of $1, $0, $–1, $1.50 and $29,499.50, $29,500, $29,500.50. EC2 : AGI < 1; This would result in values of $1, $0, $–1, $–100 billion. EC3 : $29,501 ≤ AGI ≤ $58,500; This would result in values of $29,500, $29,500.50, $29,501, $58,499, $58,500, $58,500.50, $58,501. EC4 : $58,501 ≤ AGI ≤ $100 billion; This would result in values of $58,500, $58,500.50, $58,501, $100 billion, $101 billion. 248 CHAPTER 9 FUNCTIONAL TESTING EC5 : AGI > $100 billion; This would result in $100 billion, $101 billion, $10000 billion. Remark. Should we test for an AGI value of $29,500.50 (i.e., between the partitions), and if so, what should be the result? Since we have not been told whether the decimal values are actually possible, the best decision to make is to test for this value and report the result. 9.6 DECISION TABLES A major limitation of the EC-based testing is that it only considers each input separately. The technique does not consider combining conditions. Different combinations of equivalent classes can be tried by using a new technique based on the decision table to handle multiple inputs. Decision tables have been used for many years as a useful tool to model software requirements and design decisions. It is a simple, yet powerful notation to describe complex systems from library information management systems to embedded real-time systems [18]. The general structure of a decision table is shown in Table 9.11. It comprises a set of conditions (or causes) and a set of effects (or results) arranged in the form of a column on the left of the table. In the second column, next to each condition, we have its possible values: yes (Y), no (N), and don’t care (dash). To the right of the values column, we have a set of rules. For each combination of the three conditions {C1, C2, C3}, there exists a rule from the set {R1, R2, · · · , R8}. Each rule comprises a yes, no, or don’t care response and contains an associated list of effects {E1, E2, E3}. Then, for each relevant effect, an effect sequence number specifies the order in which the effect should be carried out if the associated set of conditions are satisfied. For example, if C1 and C2 are true but C3 is not true, then E3 should be followed by E1. The checksum is used for verification of the combinations the decision table represents. TABLE 9.11 Decision Table Comprising Set of Conditions and Effects Rules or Combinations Conditions Values R1 R2 R3 R4 R5 R6 R7 R8 C1 C2 C3 Effects E1 E2 E3 Y, N,— Y Y Y Y N N N N Y, N,— Y Y N N Y Y N N Y, N,— Y N Y N Y N Y N 1 2 1 2 1 2 1 2 1 3 1 1 Checksum 8 1 1 1 1 1 1 1 1 9.6 DECISION TABLES 249 Test data are selected so that each rule in a table is exercised and the actual results are verified with the expected results. In other words, each rule of a decision table represents a test case. The steps in developing test cases using the decision table technique are as follows: Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Step 7: Step 8: The test designer needs to identify the conditions and the effects for each specification unit. A condition is a distinct input condition or an EC of input conditions. An effect is an output condition. Determine the logical relationship between the conditions and the effects. List all the conditions and effects in the form of a decision table. Write down the values the condition can take. Calculate the number of possible combinations. It is equal to the number of different values raised to the power of the number of conditions. Fill the columns with all possible combinations—each column corresponds to one combination of values. For each row (condition) do the following: 1. Determine the repeating factor (RF): divide the remaining number of combinations by the number of possible values for that condition. 2. Write RF times the first value, then RF times the next, and so forth until the row is full. Reduce combinations (rules). Find indifferent combinations—place a dash and join column where columns are identical. While doing this, ensure that effects are the same. Check covered combinations (rules). For each column calculate the combinations it represents. A dash represents as many combinations as the condition has. Multiply for each dash down the column. Add up total and compare with step 3. It should be the same. Add effects to the column of the decision table. Read column by column and determine the effects. If more than one effect can occur in a single combination, then assign a sequence number to the effects, thereby specifying the order in which the effects should be performed. Check the consistency of the decision table. The columns in the decision table are transformed into test cases. Decision table–based testing is effective under certain conditions as follows: • The requirements are easily mapped to a decision table. • The resulting decision table should not be too large. One can break down a large decision table into multiple smaller tables. • Each column in a decision table is independent of the other columns. Example: Let us consider the following description of a payment procedure. Consultants working for more than 40 hours per week are paid at their hourly rate for the first 40 hours and at two times their hourly rate for subsequent hours. Consultants working for less than 40 hours per week are paid for the hours worked 250 CHAPTER 9 FUNCTIONAL TESTING at their hourly rates and an absence report is produced. Permanent workers working for less than 40 hours a week are paid their salary and an absence report is produced. Permanent workers working for more than 40 hours a week are paid their salary. We need to describe the above payment procedure using a decision table and generate test cases from the table. Step 1: Step 2: Step 3: Step 4: Step 5: From the above description, the conditions and effects are identified as follows: C1: Permanent workers C2: Worked < 40 hours C3: Worked exactly 40 hours C4: Worked > 40 hours E1: Pay salary E2: Produce an absence report E3: Pay hourly rate E4: Pay 2 × hourly rate The decision table with all the conditions and the effects are shown in Table 9.12. The total number of combinations is 24 = 16. The RFs for row 1, row 2, row 3, and row 4 are 16 2 = 8, 8 2 = 4, 4 2 = 2, and 2 2 = 1, respectively. Therefore, the first row is filled with eight Y followed by eight N. The second row is filled with four Y, followed by four N, and so on. If condition C1: Permanent workers is yes and condition C2: Worked < 40 hours is yes, then conditions C3: Worked exactly 40 hours and C4: worked > 40 hours do not matter. Therefore, rules 1, 2, 3, and 4 can be reduced to a single rule without impacting the effects. TABLE 9.12 Pay Calculation Decision Table with Values for Each Rule Rules or Combinations Conditions Values 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 C1 C2 C3 C4 Effects E1 E2 E4 E4 Y, N Y, N Y, N Y, N YYYYYYYYN N N N N N N N YYYYNNNNY Y Y Y N N N N YYNNYYNNY Y N N Y Y N N YNYNYNYNY N Y N Y N Y N 9.6 DECISION TABLES 251 Step 6: Step 7: Step 8: If condition C1: Permanent workers is yes and condition C2: Worked < 40 hours is no, then conditions C3: Worked exactly 40 hours and C4: worked > 40 hours do not matter. Therefore, rules 5, 6, 7, and 8 can be reduced to a single rule—permanent workers get paid regardless. If condition C1: Permanent workers is no and condition C2: Worked < 40 hours is yes, then conditions C3: Worked exactly 40 hours and C4: worked > 40 hours are immaterial. Therefore, rules 9, 10, 11, and 12 can be reduced to a single rule without impacting the effects. If conditions C1: Permanent workers and C2: Worked < 40 hours are no but condition C3: Worked exactly 40 hours is yes, then rules 13 and 14 can be reduced to a single rule. Rules 15 and 16 stand as they are. In summary, 16 rules can be reduced to a total of 6 rules, which are shown in Table 9.13. The checksum for columns 1, 2, 3, 4, 5, and 6 are 4, 4, 4, 2, 1, and 1, respectively, as shown in Table 9.14. The total checksum is 16, which is the same as calculated in step 3. In this step, the effects are included for each column (rule). For the first column, if the conditions C1: Permanent workers and C2: Worked < 40 hours are satisfied, then the employee must be paid and an absence report must be generated; therefore, E1: Pay salary and E2: Produce an absence report are marked as 1 and 2 in the decision table, respectively—the 1 and 2 indicating the order in which the effects are expected. The final decision table with effects are shown in Table 9.14. Note that for column 6 no effects are marked. A test case purpose can be generated from column 1 which can be described as follows: If an employee is a permanent worker and worked less than 40 hours per week, then the system should pay his or her salary and generate an absence report. Similarly, other test cases can be generated from the rest of the columns. TABLE 9.13 Pay Calculation Decision Table after Column Reduction Rules or Combinations Conditions C1 C2 C3 C4 Effects E1 E2 E4 E4 Values Y, N Y, N Y, N Y, N 1 2 3 4 5 6 Y Y N N N N Y N Y N N N — — — Y N N — — — — Y N 252 CHAPTER 9 FUNCTIONAL TESTING TABLE 9.14 Decision Table for Payment Calculation Rules or Combinations Conditions C1 C2 C3 C4 Effects E1 E1 E2 E4 E4 Checksum Values Y, N Y, N Y, N Y, N 16 1 2 3 4 5 6 Y Y N N N N Y N Y N N N — — — Y N N — — — — Y N 1 1 2 2 1 1 1 2 4 4 4 2 1 1 9.7 RANDOM TESTING In the random testing approach, test inputs are selected randomly from the input domain of the s√ystem. We explain the idea of random testing with a simple example of computing X, where X is an integer. Suppose that the system will be used in an environment where the input X takes on all values from the interval [1, 108] with equal likelihood and that the result must be accurate to within 2 × 10−4. In order to test this program, one can generate uniformly distributed pseudorandom integers within the interval [1, 108]. Then we execute the program on each of these inputs t and obtain the output zt . For each t, we compute zt and zt2 and compare zt2 with t. If any of the outputs fails to be within 2 × 10−4 of the desired results, the program must be fixed and the test repeated. Based on the above example, random testing can be summarized as a four-step procedure [19]: Step 1: Step 2: Step 3: Step 4: The input domain is identified. Test inputs are selected independently from the domain. The system under test is executed on these inputs. The inputs constitute a random test set. The results are compared to the system specification. The test is a failure if any input leads to incorrect results; otherwise it is a success. Random testing corresponds to simple random sampling from the input domain [20]. If the distribution of the selected inputs (step 2) is the same as the distribution of inputs in the expected-use scenario (the operational profile), then statistical estimates for the reliability of the program can be obtained from test outcomes. Random testing gives us an advantage of easily estimating software reliability from test outcomes. Test inputs are randomly generated according to an 9.7 RANDOM TESTING 253 operational profile, and failure times are recorded. The data obtained from random testing can then be used to estimate reliability. Other testing methods cannot be used in this way to estimate software reliability. A large number of test inputs are typically required to get meaningful statistical results. Consequently, some kind of automation is required to generate a large number of inputs for random testing. For effective generation of a large set of inputs for statistical estimation, one needs to know the operational profile of the system. On the other hand, the expected results (step 4) are usually not obvious. Computing expected outcomes becomes difficult if the inputs are randomly chosen. Therefore, the technique requires good test oracles to ensure the adequate evaluation of test results. A test oracle is a mechanism that verifies the correctness of program outputs. The term test oracle was coined by William E. Howden [21]. An oracle provides a method to (i) generate expected results for the test inputs and (ii) compare the expected results with the actual results of execution of the implementation under test (IUT). In other words, it consists of two parts: a result generator to obtain expected results and a comparator. Four common types of oracles are as follows [22]: • Perfect Oracle: In this scheme, the system (IUT) is tested in parallel with a trusted system that accepts every input specified for the IUT and always produces the correct result. A trusted system is a defect-free version of the IUT. • Gold Standard Oracle: A previous versions of an existing application system is used to generate expected results, as shown in Figure 9.9. • Parametric Oracle: An algorithm is used to extract some parameters from the actual outputs and compare them with the expected parameter values, as shown in Figure 9.10. • Statistical Oracle: It is a special case of a parametric oracle. In a statistical oracle, statistical characteristics of the actual test results are verified. Test input Test case input Golden implementation Implementation under test Golden result Comparator Actual result Figure 9.9 Gold standard oracle. Pass or fail Test input Test case input Trusted algorithm Reference parameters Pass or fail Comparator Actual parameters Implementation Actual result Output under test converter Figure 9.10 Parametric oracle. 254 CHAPTER 9 FUNCTIONAL TESTING Moreover, the actual test results are random in the case of randomized software and random testing. Therefore, it is not possible to give an exact expected value. In this scheme, the expected statistical characteristic is compared with the actual test results. A statistical oracle does not check the actual output but only some characteristics of it. Therefore, a statistical oracle cannot decide whether or not a single test case passes. If a failure occurs, identification of the failure cannot be attributed to the success of a single test case; rather, the entire group of test cases is credited with the success. The decision of a statistical oracle is not always correct. In other words, at best the probability for a correct decision can be given. Figure 9.11 shows the structure of a statistical oracle [23]. It consists of a statistical analyzer and a comparator. The statistical analyzer computes various characteristics that may be modeled as random variables and delivers it to the comparator. The comparator computes the empirical sample mean and the empirical sample variance of its inputs. Furthermore, expected values and properties of the characteristics are computed by the comparator based on the distributional parameters of the random test input. Adaptive Random Testing In adaptive random testing the test inputs are selected from the randomly generated set in such a way that these are evenly spread over the entire input domain. The goal is to select a small number of test inputs to detect the first failure. A number of random test inputs are generated, then the “best” one among them is selected. We need to make sure the selected new test input is not too close to any of the previously selected ones. That is, the selected test inputs should be distributed as spaced out as possible. An adaptive random testing technique proposed by Chen et al. [24] keeps two sets, namely, T and C, as follows: • The executed set T is the set of distinct test inputs that have been selected and executed without revealing any failure. • The candidate set C is a set of test inputs that are randomly selected. Initially the set T is empty, and the first test input is randomly selected from the input domain. The set T is then incrementally updated with the selected element from the set C and executed until a failure is revealed. From the set C, an element that is farthest away from all the elements in the set T is selected as the next test input. The criterion “farthest away” can be defined as follows. Let Random test input generator Distributional parameters Pass or fail Comparator Implementation Actual results Test case under test inputs Characteristics Statistical analyzer Figure 9.11 Statistical oracle. 9.8 ERROR GUESSING 255 T = {t1, t2, . . . , tn} be the executed set and C = {c1, c2, . . . , ck} be the candidate set such that C ∩ T = φ. The criterion is to select the element ch such that, for all j ∈ {1, 2, . . . , k} and j = h, n n min dist(ch, ti) ≥ min dist(cj , ti) i=1 i=1 where dist is defined as the Euclidean distance. In an m-dimensional input domain, for inputs a = (a1, a2, . . . , am) and b = (b1, b2, . . . , bm), dist(a, b) = m i=1 (ai − bi )2. The rationale of this criterion is to evenly spread the test input by maximizing the minimum distance between the next test input and the already executed test cases. It should be noted that there are various ways to construct the candidate set C giving rise to various versions of adaptive random testing. For example, a new candidate set can be constructed of size 10 each time a test input is selected. Empir- ical study shows that adaptive random testing does outperform ordinary random testing by 50% [24]. In the above comparison the performance metric is the size of the test suite used to detect the first failure. 9.8 ERROR GUESSING Error guessing is a test case design technique where a test engineer uses experience to (i) guess the types and probable locations of defects and (ii) design tests specifically to reveal the defects. For example, if memory is allocated dynamically, then a good place to look for error is the portion of the code after the allocated memory is used—there is a possibility that unused memory is not deallocated. An experienced test engineer can ask the question: Are all the allocated memory blocks correctly deallocated? Though experience is of much use in guessing errors, it is useful to add some structure to the technique. It is good to prepare a list of types of errors that can be uncovered. The error list can aid us in guessing where errors may occur. Such a list should be maintained from experience gained from earlier test projects. The following are the critical areas of the code where defects are most likely to be found: • Different portions of the code have different complexity. One can measure code complexity by means of cyclomatic complexity. Portions of the code with a high cyclomatic complexity are likely to have defects. Therefore, it is productive to concentrate more efforts on those portions of code. • The code that has been recently added or modified can potentially contain defects. The probability of inadvertently introducing defects with addition and modification of code is high. • Portions of code with prior defect history are likely to be prone to errors. Such code blocks are likely to be defective, because of clustering tendency of the defects, despite the efforts to remove the defects by rewriting the code. 256 CHAPTER 9 FUNCTIONAL TESTING • Parts of a system where new, unproven technology has been used is likely to contain defects. For example, if the code has been automatically generated from a formal specification of the system, then there is higher possibility of defects imbedded into the code. • Portions of the code for which the functional specification has been loosely defined can be more defective. • Code blocks that have been produced by novice developers can be defective. If some developers have not been careful during coding in the past, then any code written by these developers should be examined in greater detail. • Code segments for which a developer may have a low confidence level should receive more attention. The developers know the internal details of the system better than anyone else. Therefore, they should be quizzed on their comfort levels and more test effort be put on those areas where a developer feels less confident about their work. • Areas where the quality practices have been poor should receive additional attention. An example of poor quality practice is not adequately testing a module at the unit level. Another example of poor quality practice is not performing code review for a critical part of a module. • A module that involved many developers should receive more test effort. If several developers worked on a particular part of the code, there is a possibility of misunderstanding among different developers and, therefore, there is a good possibility of errors in these parts of the code. 9.9 CATEGORY PARTITION The category partition method (CPM) is a generalization and formalization of a classical functional testing approach. The reader is reminded of the two steps of the classical functional testing approach: (i) partition the input domain of the functional unit to be tested into equivalence classes and (ii) select test data from each EC of the partition. The CPM [25] is a systematic, specification-based methodology that uses an informal functional specification to produce formal test specification. The test designer’s key job is to develop categories, which are defined to be the major characteristics of the input domain of the function under test. Each category is partitioned into ECs of inputs called choices. The choices in each category must be disjoint, and together the choices in each category must cover the input domain. In a later paper [26], Grochtmann and Grimm extend this approach by capturing the constraints using a tree structure to reduce the number of impossible test cases. The main advantage of this approach is the creation of a formal test specification written in languages such as test specification language (TSL) [27], Z [28], [29], or testing and test control notation (TTCN) [30] to represent an informal or natural language formal specification. The formal test specification gives 9.9 CATEGORY PARTITION 257 the test engineer a logical way to control test generation and is easily modified to accommodate changes or mistakes in the functional specification. The use of Z allows more flexibility in the specification of constraints and more formalities in the representation. The category partition testing method is comprised of the following steps: Step 1: Analyze the Specification The method begins with the decomposition of a functional specification into functional units that can be separately tested. For each functional unit, identify the following: • Parameters of the functional unit • Characteristics of each parameter, that is, the elementary characteristics of the parameters that affect the execution of the unit • Objects in the environment whose state might affect the operation of the functional unit • Characteristics of each environment object Parameters are the explicit input to a functional unit, and environment conditions are the state characteristics of the system at the time of execution of a functional specification. Step 2: Step 3: Identify Categories A category is a classification of the major properties (or characteristics) of a parameter or an environmental condition. Consider, for instance, a program PSORT that reads an input file F containing a variable-length array of values of arbitrary type. The expected output is a permutation of input with values sorted according to some total ordering criterion. The environmental condition for PSORT is status of F , which can be classified into three categories, namely Status of F = Does Not Exist, Status of F = Exists But Empty, and Status of F = Exists and Nonempty. The properties of the input parameter categories for the PSORT are as follows: the array size, the type of elements, the minimum element value, the maximum element value, and the positions in the array of the maximum and minimum values. Categories can be derived directly from the functional specification, implicit design information, or intuition of the test engineer. Often categories are derived from preconditions and type information about the input parameters and system state components. Partition Categories into Choices Partition each category into distinct choices that include all the different kinds of values that are possible for the category. Each choice is an equivalent class of values assumed to have identical properties as far as testing and error detection capability are concerned. The choices must be disjoint and cover all the categories. While the categories are derived from a functional specification, the choices can be based on specification and the test engineer’s past experience of designing effective test cases, such as error guessing. In the sorting program PSORT example, one possible way to partition the category array size is size = 0, size = 1, 2 ⇐ size ⇐ 100, and size > 100. These choices are based primarily on experience with likely 258 CHAPTER 9 FUNCTIONAL TESTING Step 4: Step 5: Step 6: errors. The selection of a single choice from each category determines a test frame, which is the basis for constructing the actual test cases. A test frame consists of a set of choices from the specification, with each category contributing either zero or one choice. Since the choices in different categories frequently interact with each other in ways that affect the resulting test cases, the choices can be annotated with constraints (see next step) to indicate these relations. In the absence of constraints the number of potential test frames is the product of the number of choices in each category—and this is likely to be very large. Determine Constraints among Choices Constraints are restrictions among the choices within different categories that can interact with one another. A typical constraint specifies that a choice from one category cannot occur together in a test frame with certain choices from another category. Choices and constraints are derived from the natural language functional specifications but can often be specified by formal methods, thus making their analysis easier to automate. With a careful specification of constraints, the number of potential test frames can be reduced to a manageable number. Formalize and Evaluate Test Specification Specify the categories, choices, and constraints using a specification technique that is compatible with the test generation tool, such as TSL, Z, or TTCN, that produces test frames. Most test generation tools also provide automated techniques for evaluating the internal consistency of the formal specification. This evaluation often discovers errors or inconsistencies in the specification of constraints and sometimes leads to discovery of errors in the source functional specification. Generate and Validate Test Cases The final step in the test production process is to transform the generated test frames into executable test cases. If the test specification includes postconditions that must be satisfied, then the tool verifies the postconditions. In case a reference implementation is available, the test cases can be validated by executing them and checking their results against the reference implementation. The validation of test cases is a labor-intensive process in the absence of a reference implementation [31]. 9.10 SUMMARY This chapter began with an introduction to the concept of functional testing, which consists of (i) precise identification of the domain of each input and each output variable, (ii) selection of values from a data domain having important properties, and (iii) combination of values of different variables. Next, we examined four different types of variables, namely, numeric, arrays, substructures, and strings. Only functionally related subsets of input variables are considered to be combined 9.10 SUMMARY 259 to reduce the total number of input combinations [3, 4]. In this way, the total number of combinations of special values of the variables is reduced. Then, we discussed the scope and complexity of functional testing. Next, we introduced seven testing techniques as summarized in the following: • Pairwise Testing: Pairwise testing requires that for a given numbers of input parameters to the system each possible combination of values for any pair of parameters be covered by at least one test case. It is a special case of combinatorial testing that requires n-way combinations be tested, n = 1, 2, . . . , N , where N is the total number of parameters in the system. We presented two popular pairwise test selection techniques, namely, OA and IPO. • Equivalence Class Partitioning: The aim of EC partitioning is to divide the input domain of the system under test into classes (or groups) of test cases that have a similar effect on the system. Equivalence partitioning is a systematic method for identifying sets of interesting classes of input conditions to be tested. We provided guidelines to identify (i) ECs and (ii) test data from the ECs that need to be executed. • Boundary Value Analysis: The boundary conditions for each EC are analyzed in order to generate test cases. Boundary conditions are predicates that apply directly on, above, and beneath the boundaries of input ECs and output ECs. We explained useful guidelines which are applicable to the input and the output conditions of the identified EC. These are useful in identifying quality test cases. • Decision Tables: This is a simple but powerful technique to describe a complex system. A decision table comprises a set of conditions placed above a set of effects (or results) to perform in a matrix form. There exists a rule for each combination of conditions. Each rule comprises a Y (yes), N (no), or—(don’t care) response and contains an associated list of effects. Thus, each rule of the decision table represents a test case. • Random Testing: The random testing technique can be summarized as a four-step procedure: 1. The input domain is identified. 2. The test inputs are independently selected from this domain. 3. The system under test is executed on a random set of inputs. 4. The results are compared with the system specification. The test is a failure if any input leads to incorrect results; otherwise it is a success. The procedure requires a test oracle to ensure adequate evaluation of the test results. A test oracle tells us whether a test case passes or not. We discussed four standard types of oracles, namely, perfect, gold, parametric, and statistical. We also examined the concept of adaptive random testing in which test inputs are selected from a randomly generated set. These test 260 CHAPTER 9 FUNCTIONAL TESTING inputs are evenly spread over the entire input domain in order to achieve a small number of test inputs to detect the first failure. • Error Guessing: This is a test case design technique where the experience of the testers is used to (i) guess the type and location of errors and (ii) design tests specifically to expose those errors. Error guessing is an ad hoc approach in designing test cases. • Category Partition: The category partition methodology requires a test engineer to divide the functional specification into independent functional units that can be tested separately. This method identifies the input parameters and the environmental conditions, known as categories, by relying upon the guidance of the test engineer. Next, the test engineer decomposes each identified category into mutually exclusive choices that are used to describe the partition of the input within the category. Then the categories, choices, and constraints are specified using a formal specification language such as TSL. The specification is then processed to produce a set of test frames for the functional unit. The test engineer examines the test frames and determines if any changes to the test specification are necessary. Finally, the test frames are converted to executable test cases. LITERATURE REVIEW The IPO algorithm discussed in this chapter has been implemented in a tool called PairTest [8] which provides a graphical interface to make the tool easy to use. A test tool called automatic efficient test generator (AETG) was created at Telcordia and published in the following two articles: D. M. Cohen, S. R. Dalal, M. L. Fredman, and G. C. Patton, “The AETG System: An Approach to Testing Based on Combinatorial Design,” IEEE Transactions on Software Engineering, Vol. 23, No. 7, July 1997, pp.437 – 444. D. M. Cohen, S. R. Dalal, J. Parelius, and G. C. Patton, “The Combinatorial Design Approach to Automatic Test Generation,” IEEE Software, Vol. 13, No. 5, September 1996, pp.83–89. The strategy used in this tool starts with an empty test suite and adds one test case at a time. The tool produces a number of candidate test cases according to a greedy algorithm and then selects one that covers the most uncovered pairs. Few topics related to software testing techniques seem to be more controversial than the question of whether it is efficient to use randomly generated test input data. The relative effectiveness of random testing (i.e., random selection of test inputs from the entire input domain) versus partition testing (i.e., dividing the input domain into nonoverlapping subdomains and selecting one test input from each subdomain) has been the subject of many research papers. The interested reader is referred to the following papers for some of these discussions: REFERENCES 261 T. Chen and Y. Yu, “On the Expected Number of Failures Detected by Subdomain Testing and Random Testing,” IEEE Transactions on Software Engineering, Vol. 22, 1996, pp.109–119. J .W. Duran and S. C. Ntafos, “An Evaluation of Random Testing,” IEEE Transactions on Software Engineering, Vol. 10, July 1984, pp.438–444. R. Hamlet and R. Taylor, “Partition Testing Does Not Inspire Confidence,” IEEE Transactions on Software Engineering, Vol. 16, 1990, pp.1402 – 1411. W. Gutjahr, “Partition Testing versus Random Testing: The Influence of Uncertainty,” IEEE Transactions on Software Engineering, Vol. 25, 1999, pp.661 – 674. E. J. Weyuker and B. Jeng, “Analyzing Partition Testing Strategies,” IEEE Transactions on Software Engineering, Vol. 17, 1991, pp.703–711. The boundary value analysis technique produces test inputs near the boundaries to find failures caused by incorrect implementation of the boundaries. However, boundary value analysis can be adversely affected by coincidental correctness, that is, the system produces the the expected output, but for the wrong reason. The article by Rob M. Hierons (“Avoiding Coincidental Correctness in Boundary Value Analysis,” ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 3, July 2006, pp. 227–241) describes boundary value analysis that can be adapted in order to reduce the likelihood of coincidental correctness. The work described by the author can be seen as a formalization and generalization of work suggested by Lori A. Clarke, Johnette Hassell, and Debra J. Richardson (“A Close Look at Domain Testing,” IEEE Transactions on Software Engineering, Vol. 8, No. 4, July 1982, pp. 380–390). REFERENCES 1. W. E. Howden. Functional Program Testing. IEEE Transactions on Software Engineering, March 1980, pp. 162– 169. 2. W. E. Howden. Functional Program Testing and Analysis. McGraw-Hill, New York, 1987. 3. W. E. Howden. Applicability of Software Validation Techniques to Scientific Programs. ACM Transactions on Software Engineering and Methodology, July 1980, pp. 307– 320. 4. W. E. Howden. A Functional Approach to Program Testing and Analysis. IEEE Transactions on Software Engineering, October 1986, pp. 997–1005. 5. D. R. Kuhn, D. R. Wallace, and A. M. Gallo, Jr. Software Fault Interactions and Implications for Software Testing. IEEE Transactions on Software Engineering, June 2004, pp. 418– 421. 6. M. Grindal, J. Offutt, and S. F. Andler. Combination Testing Strategies: A Survey. Journal of Software Testing, Verification, and Reliability, September 2005, pp. 97–133. 7. M. S. Phadke. Quality Engineering Using Robust Design. Prentice-Hall, Englewood Cliffs, NJ, 1989. 8. K. C. Tai and Y. Lei. A Test Generation Strategy for Pairwise Testing. IEEE Transactions on Software Engineering, January 1992, pp. 109– 111. 9. L. Copeland. Object-Oriented Testing. Software Quality Engineering, STAR East, Orlando, FL, May 2001. 10. A. S. Hedayat, N. J. A. Sloane, and J. Stufken. Orthogonal Arrays: Theory and Applications, Springer Series in Statistics, Springer-Verlag, New York, 1999. 262 CHAPTER 9 FUNCTIONAL TESTING 11. R. K. Roy. Design of Experiments Using the Taguchi Approach: 16 Steps to Product and Process Improvement. Wiley, New York, 2001. 12. B. Mandl. Orthogonal Latin Square: An Application of Experiment Design to Compiler Testing. Communications of the ACM , October 1985, pp. 1054– 1058. 13. N. J. A. Sloane. A Library of Orthogonnal Arrays. Information Sciences Research Center, AT&T Shannon Labs, 2001. Available at http://www.research.att.com/∼njas/oadir/. 14. B. Beizer. Software Testing Techniques, 2nd ed. Van Nostrand Reinhold, New York, 1990. 15. D. Richardson and L. Clarke. A Partition Analysis Method to Increase Program Reliablity. In Pro- ceedings of the 5th International Conference Software Engineering, San Diego, CA, IEEE Computer Society Press, Piscataway, March 1981, pp. 244–253. 16. G. Myers. The Art of Software Testing, 2nd ed. Wiley, New York, 2004. 17. B. Beizer. Black Box Testing. Wiley, New York, 1995. 18. D. Thomas. Agile Programming: Design to Accommodate Change. IEEE Software, May/June 2005, pp. 14–16. 19. R. Hamlet. Random Testing. In Encyclopedia of Software Engineering, J. Marciniak, Ed. Wiley, New York, 1994, pp. 970– 978. 20. W. Cochran. Sampling Techniques. Wiley, New York, 1977. 21. W. E. Howden. A Survey of Dynamic Analysis Methods. In Software Testing and Validation Techniques, 2nd ed., E. Miller and W. E. Howden, Eds. IEEE Computer Society Press, Los Alamito, CA, 1981. 22. R. V. Binder. Testing Object-Oriented Systems: Models, Patterns, and Tools. Addison-Wesley, Reading, MA, 2000. 23. J. Mayer and R. Guderlei. Test Oracles Using Statistical Methods. SOQUA/TECOS, Erfurt, Germany, 2004, pp. 179– 189. 24. T. Y. Chen, H. Leung, and I. K. Mak. Adaptive Random Testing. Advances in Computer Science—ASIAN 2004, Higher-Level Decision Making, 9th Asian Computing Science Conference, Dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday, Chiang Mai, Thailand, December 8–10, 2004, M. J. Maher, Ed., Lecture Notes in Computer Science, Vol. 3321, Springer, Berlin/Heidelberg, 2004, pp. 320– 329. 25. T. J. Ostrand and M. J. Balcer. The Category-Partition Method for Specifying and Generating Functional Tests. Communications of the ACM , June 1988, pp. 676–686. 26. M. Grochtmann and K. Grimm. Classification Trees for Partition Testing. Journal of Testing, Verification, and Reliability, June 1993, pp. 63–82. 27. M. Balcer, W. Halsing, and T. J. Ostrand. Automatic Generation of Test Scripts from Formal Test Specification. In Proceeding of the Third Symposium on Software Testing, Analysis, and Verification, Key West, FL, ACM Press, New York, December 1989, pp. 210–218. 28. G. Laycock. Formal Specification and Testing: A Case Study. Journal of Testing, Verification, and Reliability, March 1992, pp. 7–23. 29. P. Ammann and J. Offutt. Using Formal Methods to Derive Test Frames in Category-Partition Testing. Paper presented at the Ninth Annual Conference on Computer Assurance, Gaithersburg, MD, June 1994, pp. 69–80. 30. C. Willcock, T. Deiss, S. Tobies, S. Keil, F. Engler, and S. Schulz. An Introduction to TTCN-3 . Wiley, New York, 2005. 31. K. Naik and B. Sarikaya. Test Case Verification by Model Checking. Formal Methods in Systems Design, June 1993, pp. 277–321. Exercises 1. (a) What is the central idea in Howden’s theory of functional testing? (b) What is a functionally identifiable substructure? (c) All combinations of special values of input variables can lead to a large number of test cases being selected. What technique can be used to reduce the number of combinations of test cases? REFERENCES 263 2. Consider the system S in Figure 9.7, which has three input parameters X, Y, and Z. Assume that set D, a set of input test data values, has been selected for each of the input variables such that D(X) = {True, False}, D(Y ) = {0, 5}, and D(Z) = {P , Q, R}. Using the orthogonal array method discussed in this chapter, generate pairwise test cases for this system. Compare the results with the test suite generated using the IPO algorithm. 3. Discuss the drawback of the orthogonal array methodology compared to the IPO algorithm. 4. Consider the system S which can take n input parameters and each parameter can take on m values. For this system answer the following questions: (a) What is the maximum number of pairs a single test case for this system can cover? (b) In the best case, how many test cases can provide full pairwise coverage? (c) Calculate the total number of pairs the test suite must cover. (d) Suppose that n = 13 and m = 3. What is the minimum number of test cases to be selected to achieve pairwise coverage? 5. Consider the following triangle classification system, originally used by Myers [16]: The system reads in three positive values from the standard input. The three values A, B, and C are interpreted as representing the lengths of the sides of a triangle. The system then prints a message on the standard output saying whether the triangle is scalene, isosceles, equilateral, or right angled if a triangle can be formed. Answer the following questions for the above program: (a) What is the input domain of the system? (b) What are the input conditions? (c) Identify the equivalence classes for the system? (d) Identify test cases to cover the identified ECs? 6. Consider again the triangle classification program with a slightly different specification: The program reads floating values from the standard input. The three values A, B, and C are interpreted as representing the lengths of the sides of a triangle. The program then prints a message to the standard output that states whether the triangle, if it can be formed, is scalene, isosceles, equilateral, or right angled. Determine the following for the above program: (a) For the boundary condition A + B > C case (scalene triangle), identify test cases to verify the boundary. (b) For the boundary condition A = C case (isosceles triangle), identify test cases to verify the boundary. (c) For the boundary condition A = B = C case (equilateral triangle), identify test cases to verify the boundary. (d) For the boundary condition A2 + B2 = C2 case (right-angle triangle), identify test cases to verify the boundary. 264 CHAPTER 9 FUNCTIONAL TESTING (e) For the nontriangle case, identify test cases to explore the boundary. (f) For nonpositive input, identify test points. 7. Consider the triangle classification specification. The system reads in three positive values from the standard input. The three values A, B, and C are interpreted as representing the lengths of the sides of a triangle. The system then prints a message to the standard output saying whether the triangle, if it can be formed, is scalene, isosceles, equilateral, or not a triangle. Develop a decision table to generate test cases for this specification. 8. What are the advantages and disadvantages of random testing? 9. What is a test oracle? What are the differences between a parametric oracle and a statistical oracle? 10. Discuss the similarity between the decision table–based and category partition–based testing methodology. 10 CHAPTER Test Generation from FSM Models The sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of certain verbal interpretations describes observed phenomena. The justification of such a mathematical construct is solely and precisely that it is expected to work. — John von Neumann 10.1 STATE-ORIENTED MODEL Software systems can be broadly classified into two groups, namely, stateless and state-oriented systems. The actions of a stateless system do not depend on the previous inputs to the system. A compiler is an example of a stateless system because the result of compiling a program does not depend on the programs that had been previously compiled. The response of the system to the present input depends on the past inputs to the system in a state-oriented system. A state-oriented system memorizes the sequence of inputs it has received so far in the form of a state. A telephone switching system is an example of a state-oriented system. The interpretation of digits by a telephone switch depends on the previous inputs, such as a phone going off the hook, the sequence of digits dialed, and other the keys pressed. A state-oriented system can be viewed as having a control portion and a data portion. The control portion specifies the sequences of interactions with its environment, and the data portion specifies the data to be processed and saved. Depending on the characteristics of systems, a system can be predominantly data oriented, be predominantly control oriented, or have a balanced mix of both data and control, as illustrated in Figure 10.1. In a data dominating system, the system spends most of its time processing user requests, and the interaction sequences with the user are very simple. Therefore, the control portion is simpler compared to the data processing, which is more complex. This situation is depicted in Figure 10.2. Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 265 266 CHAPTER 10 TEST GENERATION FROM FSM MODELS Software systems Stateless systems State-oriented systems or reactive systems Data-dominated systems Control-dominated systems Partly data dominated and partly control dominated systems Figure 10.1 Spectrum of software systems. Data User interactions Control Figure 10.2 Data-dominated systems. Example. A web browsing application is an example of data dominating systems. The system spends a significant amount of time in accessing remote data by making http requests and formatting it for display. The system responds to each command input from the user, and there is not much state information that the system must remember. A need for having state information is to perform the Back operation. Moreover, web browsing is not a time-dependent application, except for its dependence on the underlying Transmission Control Protocol/Internet Protocol (TCP/IP) operations. In a control dominating system, the system performs complex (i.e., many time-dependent and long-sequence) interactions with its user, while the amount of data being processed is relatively small. Therefore, the control portion is a large one, whereas the data processing functionality is very small. This situation is depicted in Figure 10.3. Example. A telephone switching system is an example of a control dominating system. The amount of user data processed is rather minimal. The data involved are 10.1 STATE-ORIENTED MODEL 267 Data User interactions Control Figure 10.3 Control-dominated systems. a mapping of phone numbers to equipment details, off- and on-hook events generated by a user, phone number dialed, and possibly some other events represented by the push of other keys on a telephone. The control portion of a software system can often be modeled as a finite-state machine, (FSM), that is, the interactions between the system and its user or environment (Figures 10.2 and 10.3). We have modeled the interactions of a user with a dual-boot laptop computer (Figure 10.4). Initially, the laptop is in the OFF state. When a user presses the WAKEUP/msg7 STANDBY/msg5 LINUX SHUTDOWN/ msg9 LSTND RESTART/ msg2 LINUX/ msg1 BOOT RESTART/ msg4 WSTND STANDBY/msg6 WAKEUP/msg8 WINDOWS/ msg3 WIN ON/msg0 SHUTDOWN/ msg10 OFF Figure 10.4 FSM model of a dual-boot laptop computer. 268 CHAPTER 10 TEST GENERATION FROM FSM MODELS power ON button, the system moves to the BOOT state, where it receives one of two inputs LINUX and WINDOWS. If the user input is LINUX, then the system boots with the Linux operating system and moves to the LINUX state, whereas the WINDOWS input causes the system to boot with the Windows operating system and moves to the WIN state. Whether the laptop is running Linux or Windows, the user can put the machine in standby states. The standby state for the Linux mode is LSTND and for the Windows mode it is WSTND. The computer can be brought back to its operating state LINUX or WIN from a standby state LSTND or WSTND, respectively, with a WAKEUP input. The laptop can be moved between LINUX and WIN states using RESTART inputs. The laptop can be shut down using the SHUTDOWN input while it is in the LINUX or WIN state. The laptop can also be brought to the OFF state by using the power button, but we have not shown these transitions in Figure 10.4. The reader may note that for the purpose of generating test cases we do not consider the internal behavior of a system; instead we assume that the external behavior of the system has been modeled as an FSM. To be more precise, the interactions of a system with its environment is modeled as an FSM, as illustrated in Figure 10.5. Now we can make a correspondence between Figures 10.5 and 10.4. The software system block in Figure 10.5 can be viewed as the boot software running on a laptop, and the environment block in Figure 10.5 can be viewed as a user. The FSM shown in Figure 10.4 models the interactions shown by the bidirectional arrows in Figure 10.5. An FSM model of the external behavior of a system describes the sequences of input to and expected output from the system. Such a model is a prime source of test cases. In this chapter, we explain how to derive test cases from an FSM model. Environment Software system Interactions Figure 10.5 Interactions between system and its environment modeled as FSM. 10.2 POINTS OF CONTROL AND OBSERVATION 269 10.2 POINTS OF CONTROL AND OBSERVATION A point of control and observation (PCO) is a well-designated point of interaction between a system and its users. We use the term users in a broad sense to include all entities, including human users and other software and hardware systems, lying outside but interacting with the system under consideration. PCOs have the following characteristics: • A PCO represents a point where a system receives input from its users and/or produces output for the users. • There may be multiple PCOs between a system and its users. • Even if a system under test (SUT) is a software system, for a human user a PCO may be “nearer” to the user than to the software under test. For example, a user may interact with a system via a push button, a touch screen, and so on. We want to emphasize that even if we have a software SUT, we may not have a keyboard and a monitor for interacting with the system. • In case a PCO is a physical entity, such as a push button, a keyboard, or a speaker, there is a need to find their computer representations so that test cases can be automatically executed. Example. Assume that we have a software system controlling a telephone switch PBX to provide connections between users. The SUT and the users interact via the different subsystems of a telephone. We show the user–interface details of a basic telephone to explain the concept of a PCO (Figure 10.6) and summarize those details in Table 10.1. Mouthpiece PCO for voice input 123 456 789 *0# Speaker PCO for tone and voice output Hook PCO for on and off hook inputs Keypad PCO for dialing Figure 10.6 PCOs on a telephone. Ring indicator PCO for phone ringing To an exchange 270 CHAPTER 10 TEST GENERATION FROM FSM MODELS TABLE 10.1 PCOs for Testing Telephone PBX In/Out View PCO of System Description Hook In The system receives off-hook and on-hook events. Keypad In The caller dials a number and provides other control input. Ring indicator Out The callee receives ring indication. Speaker Out The caller receives tones (dial, fast busy, slow busy, etc.) and voice. Mouthpiece In The caller produces voice input. 123 456 789 *0# Local phone (LP) PBX 123 456 789 *0# Remote phone (RP) Figure 10.7 FSM model of a PBX. The reader may notice that even for a simple device such as a telephone we have five distinct PCOs via which a user interacts with the switching software. In real life, users interact with the switching software via these distinct PCOs, and automated test execution systems must recognize those distinct PCOs. However, to make our discussion of test case generation from FSM models simple, clear, and concise, we use fewer PCOs. We designate all the PCOs on a local phone by LP and all the PCOs on a remote phone by RP (Figure 10.7). 10.3 FINITE-STATE MACHINE A FSM M is defined as a tuple as follows: M = , where S is a set of states, I is a set of inputs, O is a set of outputs, 10.3 FINITE-STATE MACHINE 271 s0 is the initial state, δ : S × I → S is a next-state function, and λ : S × I → O is an output function. Note the following points related to inputs and outputs because of the importance of the concept of observation in testing a system: • Identify the inputs and the outputs which are observed by explicitly specifying a set of PCOs. For each state transition, specify the PCO at which the input occurs and the output is observed. • There may be many outputs occurring at different PCOs for a single input in a state. An FSM specification of the interactions between a user and a PBX system is shown in Figure 10.8. The FSM has nine distinct states as explained in Table 10.2. LP: ONH/ - OH LP: OFH/ LP: DT LP: ONH/ - LP: #1/ LP: RT, RP: RING AD LP: #2/ LP: SBT LP: ONH/ SB RNG RP: OFH/ LP: IT, RP: IT TK LP: ONH/ RP: IT LP: NOI/ LP: FBT FB RP: NOI/ FBT RP: OFH/ - LP: NOI/ LP: FBT LP: ONH/ - LP: NOI/ LP: IT LP: OFH/ LP: IT LON RP: ONH/ LP: IT LP: NOI/ - RP: ONH/ - LP: NOI/ LP: DT RON LP: ONH/ - IAF LP: ONH/ - OH Figure 10.8 FSM model of PBX. 272 CHAPTER 10 TEST GENERATION FROM FSM MODELS TABLE 10.2 Set of States in FSM of Figure 10.8 Abbreviation Expanded Form Meaning OH AD SB FB RNG TK LON RON IAF On hook Add digit Slow busy Slow busy Ring Talk Local on hook Remote on hook Idle after Fast busy A phone is on hook. The user is dialing a number. The system has produced a slow busy tone. The system has produced a fast busy tone. The remote phone is ringing. A connection is established. The local phone is on hook. The remote phone is on hook. The local phone is idle after a fast busy. TABLE 10.3 Input and Output Sets in FSM of Figure 10.8 Input Output OFH: Off hook ONH: On hook #1: Valid phone number #2: Invalid phone number NOI: No input DT: Dial tone RING: Phone ringing RT: Ring tone SBT: Slow busy tone FBT: Fast busy tone IT: Idle tone —: Don’t care The initial state of the FSM is OH, which appears twice in Figure 10.8, because we wanted to avoid drawing transition lines from states LON, RON, and IAF back to the first occurrence of OH at the top. There are five distinct input symbols (Table 10.3) accepted by the FSM. NOI represents user inaction; that is, the user never provides an input in a certain state. We have introduced the concept of an explicit NOI because we want to describe the behavior of a system without introducing internal events, such as timeouts. There are seven output symbols, one of which denotes a don’t care output. A don’t care output is an absence of output or an arbitrary output which is ignored by the user. There are two abstract PCOs used in the FSM of Figure 10.8. These are called LP and RP to represent a local phone used by a caller and a remote phone represented by the callee, respectively. We call LP and RP abstract PCOs because each LP and RP represents five real, distinct PCOs, as explained in Section 10.2. The input and output parts of a state transition are represented as follows: PCOi : a PCOj : b 10.5 TRANSITION TOUR METHOD 273 where input a occurs at PCOi and output b occurs at PCOj . If a state transition produces multiple outputs, we use the notation PCOi : a {PCOj : b, PCOk : c} where input a occurs at PCOi, output b occurs at PCOj , and output c occurs at PCOk. We will represent a complete transition using the following syntax: or The state transition means that if the FSM is in state OH and receives input OFH (off hook) at port (PCO) LP, it produces output DT (dial tone) at the same port LP and moves to state AD (Figure 10.8). 10.4 TEST GENERATION FROM AN FSM Given an FSM model M of the requirements of a system and an implementation IM of M, the immediate testing task is to confirm that the implementation IM behaves as prescribed by M. The testing process that verifies that an implementation conforms to its specification is called conformance testing. The basic idea in conformance testing is summarized as follows: • Obtain sequences of state transitions from M. • Turn each sequence of a state transition into a test sequence. • Test IM with a set of test sequences and observe whether or not IM pos- sesses the corresponding sequences of state transitions. • The conformance of IM with M can be verified by carefully choosing enough state transition sequences from M. In the following sections, first, we explain the ways to turn a state transition sequence into a test sequence. Next, we explain the process of selecting different state transition sequences. 10.5 TRANSITION TOUR METHOD In this section, we discuss a process to generate a test sequence or test case Tc from a state transition sequence St of a given FSM M. Specifically, we consider transition tours, where a transition tour is a sequence of state transitions beginning and ending at the initial state. Naito and Tsunoyama [1] introduced the transition tour method for generating test cases from FSM specifications of sequential circuits. Sarikaya and Bochmann [2] were the first to observe that the transition tour method can be applied to protocol testing. An example of a transition tour obtained from 274 CHAPTER 10 TEST GENERATION FROM FSM MODELS Test system (Test sequence) PCO 1 SUT PCO 2 Figure 10.9 Interaction of test sequence with SUT. Figure 10.8 is as follows: , . One can easily identify the state, input, and expected output components in the sequence of Figure 10.8. However, it may be noted that a test case is not merely a sequence of pairs of , but rather a complete test case must contain additional behavior such that it can be executed autonomously even if the SUT contains faults. A test system interacting with a SUT is shown in Figure 10.9. A test system consists of a set of test cases and a test case scheduler. The scheduler decides the test case to be executed next depending on the test case dependency constraints specified by a test designer. A test case in execution produces inputs for the SUT and receives outputs from the SUT. It is obvious that a faulty SUT may produce an output which is different from the expected output, and sometimes it may not produce any output at all. Therefore, the test system must be able to handle these exceptional cases in addition to the normal cases. This idea leads us to the following formalization of a process for designing a complete test case: • A test case contains a sequence of input and expected output data. This information is derived from the FSM specification of the SUT. • A test case must be prepared to receive unexpected outputs from the SUT. • A test case must not wait indefinitely to receive an output—expected or unexpected. Example: Transition Tour. Let us derive a test case from the state transition sequence , . It is useful to refer to a PCO of Figure 10.9, which explains an input–output relationship between the test system and a SUT. A sequence of inputs to the SUT can be obtained from its FSM model. For instance, the state transition sequence contains the input sequence {OFH, ONH}. Therefore, the test system must produce an output sequence {OFH, ONH} for the SUT. Therefore, an input in a state transition sequence of an FSM is an output of the test system at the same PCO. An output is represented by prefixing an exclamation mark (‘!’) to an event (or message) 10.5 TRANSITION TOUR METHOD 275 1 LP !OFH 2 START(TIMER1, d1) 3 LP ?DT 4 CANCEL(TIMER1) 5 LP !ONH 6 LP ?OTHERWISE 7 CANCEL(TIMER1) 8 LP !ONH 9 ?TIMER1 10 CANCEL(TIMER1) 11 LP !ONH PASS FAIL FAIL Figure 10.10 Derived test case from transition tour. in Figure 10.10. In line 1 of Figure 10.10, LP !OFH means that the test system outputs an event OFH at PCO LP. An output produced in a state transition of an FMS M is interpreted as an expected output of an implementation IM . Sometimes a faulty implementation may produce unexpected outputs. An output of an SUT becomes an input to the test system. Therefore, an output in a state transition sequence of an FSM is an input to the test system at the same PCO. In Figure 10.10, an input is represented by prefixing a question mark (‘?’) to an event (or message). Therefore, in line 3 of Figure 10.10, LP ?DT means that the test system is ready to receive the input DT at PCO LP. Here the test system expects to receive an input DT at PCO LP, which has been specified as LP ?DT. However, a faulty SUT may produce an unexpected output instead of the expected output DT at PCO LP. In line 6, the test system is ready to receive any event other than a DT at PCO LP. The reader may notice that LP ?DT in line 3 and LP ?OTHERWISE in line 6 appear at the same level of indentation and both lines have the same immediate predecessor action START(TIMER1, d1) in line 2. If an SUT fails to produce any output—expected or unexpected—then the test system and the SUT will be deadlocked, which is prevented by including a timeout mechanism in the test system. Before a test system starts to wait for an input, it starts a timer of certain duration as shown in line 2. The name of the timer is TIMER1 in line 2, and its timeout duration is d1. If the SUT fails to produce the expected output DT or any other output within an interval of d1 after receiving input OFH at PCO LP, the test system will produce an internal timeout event called TIMER1, which will be received in line 9. One of the events specified in lines 3, 6, and 9 eventually occurs. This means that the test system is not deadlocked in the presence of a faulty SUT. Coverage Metrics for Selecting Transition Tours. One can design one test case from one transition tour. One transition tour may not be sufficient to cover an entire FSM, unless it is a long one. Considering the imperative to simplify test design and to test just a small portion of an implementation with one test case, 276 CHAPTER 10 TEST GENERATION FROM FSM MODELS there is a need to design many test cases. A perpetual question is: How many test cases should one design? Therefore, there is a need to identify several transition tours from an FSM. The concept of coverage metrics is used in selecting a set of transition tours. In order to test FSM-based implementations, two commonly used coverage metrics are: • State coverage • Transition coverage Transition Tours for State Coverage. We select a set of transition tours so that every state of an FSM is visited at least once to achieve this coverage criterion. We have identified three transition tours, as shown in Table 10.4, to cover all the states of the FSM shown in Figure 10.8. One can easily obtain test sequences from the three transition tours in Table 10.4 following the design principles explained in Section 10.5 to transform a transition tour into a test sequence. State coverage is the weakest among all the selection criteria used to generate test sequences from FSMs. The three transition tours shown in Table 10.4 cover every state of the FSM shown in Figure 10.8 at least once. However, of the 21 state transitions, only 11 are covered by the three transition tours. The 10 state transitions which have not been covered by those three transition tours are listed in Table 10.5. We next consider a stronger form of coverage criterion, namely, transition coverage. Transition Tours for Transition Coverage. We select a set of transition tours so that every state transition of an FSM is visited at least once to achieve this coverage criterion. We have identified nine transition tours, as shown in Table 10.6, to cover all the state transitions of the FSM shown in Figure 10.8. One can easily obtain test sequences from the nine transition tours in Table 10.6 following the design principles explained in Section 10.5 to transform a transition tour into a test sequence. TABLE 10.4 Transition Tours Covering All States in Figure 10.8 Serial Number Transition Tours States Visited 1 ; ; ; ; 2 ; ; ; ; 3 ; ; ; ; 10.6 TESTING WITH STATE VERIFICATION 277 TABLE 10.5 State Transitions Not Covered by Transition Tours of Table 10.4 Serial Number State Transitions 1 2 3 4 5 6 7 8 9 10 TABLE 10.6 Transition Tours Covering All State Transitions in Figure 10.8 Serial Number Transition Tours 1 ; ; ; ; 2 ; ; ; ; 3 ; ; ; ; 4 ; ; ; 5 ; ; 6 ; ; ; ; ; ; ; ; 7 ; ; ; ; ; 8 ; ; 9 ; ; ; 10.6 TESTING WITH STATE VERIFICATION There are two functions associated with a state transition, namely, an output function (λ) and a next-state function (δ). Test cases generated using the transition tour method discussed in Section 10.5 focused on the outputs. Now we discuss a method 278 CHAPTER 10 TEST GENERATION FROM FSM MODELS Initial state s0 State transition under test Transfer sequence si a/b sj Reset sequence sk State verification sequence Figure 10.11 Conceptual model of test case with state verification. for generating test cases by putting emphasis on both the output and the next state of every state transition of an FSM. It is easy to verify outputs since they appear at PCOs, which are external to a system under test. However, verification of the next state of a state transition is not an easy task because the concept of state is purely internal to a SUT. The next state of a state transition is verified by applying further inputs to an SUT and observing its response at the PCOs. A conceptual model of a method to generate test cases from an FSM with both output and state verifications is illustrated in Figure 10.11. The five steps of the method are explained in the following. The method is explained from the standpoint of testing a state transition from state si to sj with input a. Methodology for Testing with State Verification Step 1: Step 2: Step 3: Step 4: Step 5: Assuming that the FSM is in its initial state, move the FSM from the initial state s0 to state si by applying a sequence of inputs called a transfer sequence denoted by T (si). It may be noted that different states will have different transfer sequences, that is, T (si) = T (sj ) for i = j . For state si, T (si) can be obtained from the FSM. At the end of this step, the FSM is in state si. In this step we apply input a to the SUT and observe its actual output, which is compared with the expected output b of the FSM. At the end of this step, a correctly implemented state transition takes the SUT to its new state sj . However, a faulty implementation can potentially take it to a state different from sj . The new state of the SUT is verified in the following step. Apply a verification sequence VERj to the SUT and observe the corresponding output sequence. An important property of VERj is that λ(sj , VERj ) = λ(s , VERj ) ∀s and s = sj . At the end of this step, the SUT is in state sk. Move the SUT back to the initial state s0 by applying a reset sequence RI. It is assumed that an SUT has correctly implemented a reset mechanism. Repeat steps 1–4 for all state transitions in the given FSM. 10.7 UNIQUE INPUT–OUTPUT SEQUENCE 279 For a selected transition from state si to state sj with input a, the above four steps induce a transition tour defined by the input sequence T (si)@a@VERj @RI applied to the system in its initial state s0. The symbol ‘@’ represents concatenation of two sequences. Applying the test design principles discussed in Section 10.5, one can derive a test case from such transition tours. Identifying a transfer sequence T (si) out of the input sequence T (si)@a@VERj @RI for state si is a straightforward task. However, it is not trivial to verify the next state of an implementation. There are three kinds of commonly used input sequences to verify the next state of a SUT. These input sequences are as follows: • Unique input–output sequence • Distinguishing sequence • Characterizing sequence In the following sections, we explain the meanings of the three kinds of input sequences and the ways to generate those kinds of sequences. 10.7 UNIQUE INPUT–OUTPUT SEQUENCE We define a unique input–output (UIO) sequence and explain an algorithm to compute UIO sequences from an FSM M = , where S, I , O, s0, δ, and λ have been explained in Section 10.3. First, we extend the semantics of δ and λ as follows: • We extend the domain of λ and δ to include strings of input symbols. For instance, for state s0 and input sequence x = a1, . . . , ak, the corresponding output sequence is denoted by λ(s0, x) = b1, . . . , bk, where bi = λ(si−1, ai ) and si = δ(si−1, ai ) for i = 1, . . . , k and the final state is δ(s0, x) = sk. • Similarly, we extend the domain and range of the transition and output functions to include sets of states. For example, if Q is a set of states and x is an input sequence, then δ(Q, x) = {δ(s, x)|s ∈ Q}. • If x and y are two input strings, the notation x@y denotes concatenation of the two strings in that order. Next, we define four properties of FSMs in the following. These properties are essential to the computation of UIO sequences. Completely Specified : An FSM M is said to be completely specified if for each input a ∈ I there is a state transition defined at each state of M. Deterministic: An FSM M is said to be deterministic if (i) for each input a ∈ I there is at most one state transition defined at each state of M and (ii) there is no internal event causing state transitions in the FSM. A timeout is an example of internal events that can cause a state transition. Reduced : An FSM M is said to be reduced if for any pair of states si and sj , i = j , there is an input sequence y such that λ(si, y) = λ(sj , y). 280 CHAPTER 10 TEST GENERATION FROM FSM MODELS Intuitively, an FSM is reduced if no two states are equivalent, that is, there exists an input sequence that distinguishes one state from the other. Strongly Connected : An FSM M is said to be strongly connected if any state is reachable from any other state. An input–output sequence y/λ(si, y) is said to be a UIO sequence for state si if and only if y/λ(si, y) = y/λ(sj , y), i = j , ∀sj in M. Hsieh [3] introduced the concept of simple input–output sequence in 1971 to identify FSMs. Sabnani and Dahbura [4] were the first to coin the term UIO sequences and applied the concept to testing communication protocols. In the following, an efficient algorithm to compute UIO sequences is explained [5]. Given an FSM M, a path vector is a collection of state pairs (s1/s1, . . . , si/si, . . . , sk/sk) with the following two properties: (i) si and si denote the head and tail states, respectively, of a path, where a path is a sequence of state transitions; (ii) an identical input–output sequence is associated with all the paths in the path vector. Given a path vector PV = (s1/s1, . . . , si/si, . . . , sk/sk), the initial vector (IV) is an ordered collection of head states of PV, that is, IV(PV) = (s1, . . . , si, . . . , sk). Similarly, the current vector (CV) is the ordered collection of tail states of PV, that is, CV(PV) = (s1, . . . , si, . . . , sk). A path vector is said to be a singleton vector if it contains exactly one state pair. A path vector is said to be a homogeneous vector if all members of CV(PV) are identical. It may be noted that a singleton vector is also a homogeneous vector. For an n-state FSM, we define a unique initial path vector (s1/s1, . . . , si/si, . . . , sn/sn) such that a null path is associated with all state pairs. Given a path vector and the input–output label a/b of a transition, vector perturbation means computing a new path vector PV from PV and a/b. Given PV = (s1/s1, . . . , si/si, . . . , sk/sk) and a transition label a/b, perturbation of PV with respect to edge label a/b, denoted by PV = pert(PV, a/b), is defined as PV = {si/si |si = δ(si, a) ∧ λ(si, a) = b ∧ si/si ∈ PV}. We can infinitely perturb all the path vectors for all transition labels given a reduced FSM and its initial path vector. One can imagine the perturbation function PV = pert(PV, a/b) as an edge from a node PV to a new node PV with edge label a/b. In addition, given PV and a set of transition labels L, we can arrange the new |L| nodes {pert(PV, a/b) ∀a/b ∈ L} on one level. That is, all the path vectors of a given FSM can be arranged in the form of a tree with successive levels 1, 2, . . . , ∞. Such a tree is called a UIO tree. It may be noted that graphically a path vector is represented by two rows of states—the top row denotes IV(PV) and the bottom row denotes CV(PV). Theoretically, a UIO tree is a tree with infinite levels. However, we need to prune the tree based on some conditions, called pruning conditions. After each perturbation PV = pert(PV, a/b), we check the following conditions: C1 : CV(PV ) is a homogeneous vector or a singleton vector. C2 : On the path from the initial vector to PV, there exists PV such that PV ⊆ PV . 10.7 UNIQUE INPUT–OUTPUT SEQUENCE 281 While constructing a UIO tree, if one of the pruning conditions is satisfied, we declare PV to be a terminal path vector. Given a UIO tree and a state si of an FSM, si has a UIO sequence if and only if the UIO tree obtained from M has a singleton path vector ψ such that vi = I V (ψ), that is, vi is the initial vector of the singleton ψ. The input–output sequence associated with the edges of the path from the initial path vector to a singleton path vector is a UIO sequence of the state found in the initial vector of the singleton. In the following, we present an algorithm to compute a finite UIO tree from an FSM. Algorithm. Generation of UIO Tree Input: M = and L. Output: UIO tree. Method : Execute the following steps: Step 1: Let be the set of path vectors in the UIO tree. Initially, contains the initial vector marked as nonterminal . Step 2: Find a nonterminal member ψ ∈ which has not been perturbed. If no such member exists, then the algorithm terminates. Step 3: Compute ψ = pert(ψ, ai/bi) and add ψ to ∀ai/bi ∈ L. Mark ψ to be perturbed and update the UIO tree. Step 4: If pert(ψ, ai/bi), computed in step 3, satisfies termination condition C1 or C2, then mark ψ as a terminal node. Step 5: Go to step 2. Example. Consider the FSM G1 = of Figure 10.12, where S = {A, B, C, D} is the set of states, I = {0, 1} is the set of inputs, O = {0, 1} is the set of outputs, A is the initial state, δ : S × I → S is the next-state function, and λ : S × I → O is the output function. The set of distinct transition labels is given by L = {0/0, 0/1, 1/0}. The initial path vector is given by A 0/0 1/0 0/0 C 1/0 0/1 1/0 D B 1/0 0/1 Figure 10.12 Finite-state machine G1 (From ref. 5. © 1997 IEEE.) 282 CHAPTER 10 TEST GENERATION FROM FSM MODELS ψ1 = (A/A, B/B, C/C, D/D) and is perturbed by using all members of L as follows: (A/B, B/A) = pert(ψ1, 0/0) (C/D, D/D) = pert(ψ1, 0/1) (A/D, B/B, C/A, D/C) = pert(ψ1, 1/0) Now, we represent the UIO tree in graphical form. We put three edges from the initial path vector ψ1 = (A/A, B/B, C/C, D/D) to path vectors (A/B, B/A), (C/D, D/D), and (A/D, B/B, C/A, D/C) with transition labels 0/0, 0/1, and 1/0, respectively, as shown in Figure 10.13. The reader may recall that the new 0/0 AB BA 0/0 1/0 AB AB AB BD 0/0 1/0 0/1 A B AB A D BC 0/0 0/1 1/0 A B AB A D BA ABCD ABCD 0/1 CD DD 0/0 1/0 ABCD DBAC 0/1 1/0 0/0 BC BA 0/0 C A BC AB 1/0 BC DB 1/0 0/1 B BC D CB 0/0 1/0 0/1 C B BC A D AB AD DD 0/0 BD BA 0/0 D A ABCD CBDA 0/0 0/1 1/0 BD AC ABCD AB DD ABCD 1/0 BD DB 1/0 0/1 B BD D CB 0/0 0/1 1/0 DB BD A D AB Figure 10.13 UIO tree for G1 in Figure 10.12. (From ref. 5. © 1997 IEEE.) 10.7 UNIQUE INPUT–OUTPUT SEQUENCE 283 path vector (C/D, D/D) is a terminal node because its current vector (D, D) is a homogeneous vector. Therefore, path vector (C/D, D/D) is no more perturbed, whereas the other two path vectors (A/B, B/A) and (A/D, B/B, C/A, D/C) are further perturbed, as shown in Figure 10.13. A complete UIO tree is shown in Figure 10.13. Figure 10.13 is redrawn in the form of Figure 10.14 by highlighting a UIO sequence of minimal length for each state. We have identified four singleton path vectors, namely, (A/A), (B/D), (C/A), and (D/A), and highlighted the corresponding paths leading up to them from the initial path vector. We show the UIO sequences of the four states in Table 10.7. 0/0 AB BA 0/0 1/0 AB AB AB BD 0/0 1/0 0/1 A B AB A D BC 0/0 0/1 1/0 A B AB A D BA ABCD ABCD 0/1 CD DD 1/0 ABCD DBAC 0/0 0/1 1/0 0/0 BC BA 0/0 C A BC AB 1/0 BC DB 1/0 0/1 B BC D CB 0/0 1/0 0/1 C B BC A D AB AD DD 0/0 BD BA 0/0 D A ABCD CBDA 0/0 0/1 1/0 BD AC ABCD AB DD ABCD 1/0 BD DB 1/0 0/1 B BD D CB 0/0 0/1 1/0 DB BD A D AB Figure 10.14 Identification of UIO sequences on UIO tree of Figure 10.13. 284 CHAPTER 10 TEST GENERATION FROM FSM MODELS TABLE 10.7 UIO Sequences of Minimal Lengths Obtained from Figure 10.14 State Input Sequence Output Sequence A 010 B 010 C 1010 D 11010 000 001 0000 00000 10.8 DISTINGUISHING SEQUENCE Given an FSM M = , an input sequence x is said to be a distinguishing sequence if and only if λ(si, x) = λ(sj , x) ∀si, sj ∈ S, and si = sj . Therefore, every state produces a unique output sequence in response to a distinguishing input sequence. The concept of distinguishing sequences was studied to address the machine identification problem in the early days of computing [6–9]. The machine identification problem is as follows: Given an implementation of an FSM in the form of a black box and the number of states of the FSM, derive the state table (equivalently, a state machine) which accurately represents the behavior of the black box. In contrast, the objective of a checking experiment is to decide whether a given black-box implementation of an FSM behaves according to a given state table representation of the FSM. Given an FSM M = with |S| = n, a state block is defined as a set B of multisets of S such that the sum of the cardinalities of the multisets of B is equal to n. Thus, a state block may contain just one multiset with all the states of M. Also, a state block may contain n multisets with exactly one element in each of them. Assuming that S = {A, B, C, D} some examples of state blocks are given in Table 10.8. Given a state block B = {W1, . . . , Wi, . . . , Wm} and an input symbol a ∈ I , we define a function B = dpert(B, a) such that for each multiset member Wi = (wi1, wi2, . . . , wik) of B we obtain one or more members of B as follows: • If two states of Wi produce different outputs in response to input a, then their next states are put in different multisets of B . • The next states of all those states in Wi which produce the same output in response to input a are put in the same multiset of B . TABLE 10.8 Examples of State Blocks State block 1 State block 2 State block 3 State block 4 State block 5 {(AB C D )} {(AB), (CC)} {(AB), (CD)} {(A), (B), (C), (D)} {(A), (A), (B), (C)} 10.8 DISTINGUISHING SEQUENCE 285 Given a reduced FSM, the initial state block consists of just one element containing the set of states of the FSM. Next, we can infinitely perturb all the state blocks for all the input symbols using the dpert() function. One can view the perturbation function B = dpert(B, a) as an edge from a node B to a new node B with edge label a. Given a state block and the set of inputs I , we can arrange the new |I | nodes {dpert(B, a), ∀a ∈ I } at the same level. All the state blocks of a given FSM can be arranged in the form of a tree with successive levels 1, 2, . . . , ∞. Such a tree is called a DS tree. Theoretically, a distinguishing sequence (DS) tree is a tree with infinite levels. However, a finite-level tree would serve our purpose. A finite-level tree is obtained by pruning the tree based on some conditions, called pruning conditions, defined using the following terms: D1 : A state block B is a homogeneous state block if at least one multiset member Wi of B has repeated states. D2 : A state block B is a singleton state block if all elements of B have exactly one state member. D3 : On the path from the initial state block to the current state block B, there exists a state block B. While constructing a DS tree, if one of the above three pruning conditions is satisfied, we declare state block B to be a terminal state block. Given a DS tree of an FSM M, M has a distinguishing sequence if and only if there exists a singleton state block in the DS tree. The sequence of inputs from the initial state block to a singleton state block is a DS of machine M. In the following, we present an algorithm to compute a finite DS tree from an FSM. Algorithm. Generation of DS Tree Input: M = . Output: DS tree. Method : Execute the following steps: Step 1: Let be the set of state blocks in the DS tree. Initially, contains the initial state block marked as nonterminal. Step 2: Find a nonterminal member ψ ∈ which has not been perturbed. If no such member exists, then the algorithm terminates. Step 3: Compute ψ = dpert(ψ, a) and add ψ to ∀a ∈ I . Mark ψ to be perturbed and update the DS tree. Step 4: If dpert(ψ, a), computed in step 3, satisfies the termination condition D1, D2, or D3, then mark ψ as a terminal node. Step 5: Go to step 2. Example. Let us consider an FSM G2 = , shown in Figure 10.15, where S = {A, B, C, D} is the set of states, I = {0, 1} is the set of inputs, O = {0, 1} is the set of outputs, A is the initial state, δ : S × I → S 286 CHAPTER 10 TEST GENERATION FROM FSM MODELS A 0/0 1/1 0/0 C 1/0 0/1 1/0 D B 1/1 0/1 Figure 10.15 Finite-state machine G2. (ABCD) 0 (AB)(DD) 1 (DC )(BA) 0 (DD)(BA) 1 Figure 10.16 Distinguishing sequence tree for G2 (A)(C)(D)(B) in Figure 10.15. is the next-state function, and λ : S × I → O is the output function. The DS tree for G2 is shown in Figure 10.16. The initial state block contains just one multiset {A, B, C, D}, which has been represented as the initial node (ABCD). We have perturbed the initial node (ABCD) with the members of the input set I = {0, 1}. Perturbation of the initial state block with input 0 produces the state block (AB)(DD) because of the following reasons: • States A and B produce the same output 0 with input 0 and move the machine to states B and A, respectively. • States C and D produce the same output 1 with input 0 and move the FSM to the same state D. Since (AB)(DD) is a homogeneous state block, it is a terminal node in the DS tree, and, thus, it is no more perturbed. Perturbation of the initial state block with input 1 produces the state block (DC)(BA) because of the following reasons: • States A and D produce the same output 0 with input 1 and move the machine to states D and C, respectively. • States B and C produce the same output 1 with input 1 and move the machine to states B and A, respectively. We obtain a homogeneous state block (DD)(BA) by perturbing the state block (DC)(BA) with input 0 and we obtain a singleton state block (A)(C)(D)(B) by perturbing state block (DC)(BA) with input 1. The complete DS tree is shown TABLE 10.9 Outputs of FSM G2 in Response to Input Sequence 11 in Different States Present State Output Sequence A 00 B 11 C 10 D 01 10.9 CHARACTERIZING SEQUENCE 287 in Figure 10.16. Therefore, the input sequence 11 that takes the DS tree from its initial state block to its only singleton state block is a DS for FSM G2. The output sequences of FSM G2 in Figure 10.15 in response to the distinguishing input sequence 11 in all the four different states are shown in Table 10.9. 10.9 CHARACTERIZING SEQUENCE It is still possible to determine uniquely the state of an FSM for FSMs which do not possess a DS. The FSM shown in Figure 10.17 does not have a DS because there is no singleton state block in the DS tree, as shown in Figure 10.18. The W-method was introduced for FSMs that do not possess a DS [9, 10]. A characterizing set of a state si is a set of input sequences such that, when each sequence is applied to the implementation at state si, the set of output sequences generated by the implementation uniquely identifies state si. Each sequence of the characterizing set of state si distinguishes state si from a group of states. Therefore, applying all of the sequences in the characterizing set distinguishes state si from all other state. For an FSM-based specification, a set that consists of characterizing sets of every state is called the W -set = {W1, W2, . . . , Wp} of the FSM. The members of the W -set are called characterizing sequences of the given FSM. The basic test procedure for testing a state transition (si, sj , a/b) using the W -method follows. B b/y Initial state A a/x a/y a/y b/y b/y C b/y D a/y Figure 10.17 FSM that does not possess distinguishing sequence. (From ref. 11. © 1994 IEEE.) 288 CHAPTER 10 TEST GENERATION FROM FSM MODELS (ABCD) a b (AA)(CB) (ABCD) a b (AA)(BC ) (DD)(CA) a b (AA)(BD) (BB)(CD) a b a b (AA)(C )(A) (DD)(AB) (CC)(B)(A) (AA)(CB) a b (AA)(A)(C) (BB)(DA) a b (CC)(AA) (AA)(BD) Figure 10.18 DS tree for FSM (Figure 10.17). Testing Transition (si , sj , a/b) Using W-Method Repeat the following steps for each input sequence of the W -set: Step 1: Step 2: Step 3: Step 4: Assuming that the SUT is in its initial state, bring the SUT from its initial state s0 to state si by applying a transfer sequence T (si) as shown in Figure 10.11. Apply the input a and verify that the SUT generates the output b. Apply a verification sequence from the W -set to the SUT and verify that the corresponding output sequence is as expected. Assume that the SUT is in state sk at the end of this step. Move the SUT back to the initial state s0 by applying a reset sequence RI in state sk. Example: Characterizing Sequences. Consider the FSM specification M = , shown in Figure 10.17, where S = {A, B, C, D} is the set of states, I = {a, b} is the set of inputs, O = {x, y} is the set of outputs, A is the initial state, δ : ×I → S is the next-state function, and λ : S × I → O is the output function. Kohavi [9] used multiple experiments to construct the W -set for this FSM. Consider the input sequence W1 = aba. The output sequences generated by W1 for each state of the FSM are shown in Table 10.10. The output sequence generated by the input sequence W1 can identify whether the state of an SUT was either B or C before W1 is applied. This is because state B leads to the output sequence yyy, whereas state C leads to the output sequence yyx. However, W1 cannot identify the state of an SUT if the FSM is in A or D because the output sequences are xyx for both states, as shown in Table 10.10. Now let us examine the response of the SUT to the input sequence W2 = ba for each state. The output sequences generated by W2 for each state of the 10.9 CHARACTERIZING SEQUENCE 289 TABLE 10.10 Output Sequences Generated by FSM of Figure 10.17 as Response to W1 Starting States Output Generated by W1 = aba A xyx B yyy C yyx D xyx TABLE 10.11 Output Sequences Generated by FSM of Figure 10.17 as Response to W2 Starting States Output Generated by W2 = ba A yx B yx C yy D yy FSM are shown in Table 10.11. The FSM implementation generates distinct output sequences as a response to W2 if an SUT was at A or D, as shown in Table 10.11. This is because states A and D lead to distinct output sequences yx and yy, respectively. Therefore, the W -set for the FSM consists of two input sequences: W -set = {W1, W2}, where W1 = aba and W2 = ba. The transfer sequences for all the states are T (B) = bb, T (C) = ba, and T (D) = b. The reset input sequence is RI = bababa. The input sequence for testing the state transition (D, A, a/x) is given in Table 10.12. In Table 10.12, the columns labeled “message to SUT” and “message from SUT” represent the input message sent to the SUT and the expected output message generated by the SUT, respectively. The current state and the expected next state of the SUT are shown in the columns labeled “current state” and “next state,” respectively. During testing, the inputs are applied to the SUT in the order denoted by the column “step.” In the first step a transfer sequence is applied to bring the SUT to state D. In step 2, the transition is tested. Then W1 = aba is applied to verify the state (steps 3, 4, and 5). At this point, the state transition is only partially tested, since W1 is not enough to identify the state of an implementation. The reset sequence RI = bababa (steps 6–11) is applied followed by the transfer sequence of T (D) = b (step 12) to bring the SUT into the initial state and into state D, respectively. The test is repeated for the same transition by using W2 = ba (steps 13–21). If all the outputs received from the SUT are defined by the FSM, the state transition test is completed successfully. If the output of the SUT is not the expected response at any step, an error is detected in the SUT. 290 CHAPTER 10 TEST GENERATION FROM FSM MODELS TABLE 10.12 Test Sequences for State Transition (D, A, a/x ) of FSM in Figure 10.17 Step Current State Next State Message to SUT Message from SUT Apply T (D) 1 A D b y Test Transition (D, A, a/x) 2 D A a x Apply W1 3 A A a x 4 A D b y 5 D A a x Apply RI 6 A D b y 7 D A a x 8 A D b y 9 D A a x 10 A D b y 11 D A a x Apply T (D) 12 A D b y Test Transition (D, A, a/x) 13 D A a x Apply W2 14 A D b y 15 D A a x Apply RI 16 A D b y 17 D A a x 18 A D b y 19 D A a x 20 A D b y 21 D A a x Source: From ref. 11. Four major methods—transition tours, distinguishing sequences, characterizing sequences, and unique input–output sequences —are discussed for the generation of tests from an FSM. A question that naturally comes to mind is the effectiveness of these techniques, that is, the types of discrepancies detected by each of these methods. Sidhu and Leung [12] present a fault model based on the Monte Carlo simulation technique for estimating the fault coverage of the above four test generation methods. The authors introduced 10 different classes of randomly faulty specification, each obtained by random altering a given specification. For example, class I faults consist of randomly altering an output operation in a given specification. The authors conclude that all four methods, except for the transition tour method, can detect all single faults as opposed to several faults 10.10 TEST ARCHITECTURES 291 introduced in a given specification. In addition, it is also shown that distinguishing, characterizing, and UIO sequences have the same fault detection capability. Another study, similar to the one by Sidhu and Leung, is reported by Dahbura and Sabnani for the UIO sequence method [13]. 10.10 TEST ARCHITECTURES An overview of four abstract test architectures developed by the ISO is presented in this section. The ISO documents [14–16] and Linn [17] and Rayner [18] provide more detail about the test architectures. The ISO test architectures are based on the Open System Interconnection (OSI) reference architecture, which consists of a hierarchical layer structure of entities. The purpose of an entity at layer N is to provide certain services, called N -services, to its upper layer entity. It uses the service provided by the N − 1 layer while isolating the implementation details of the lower layer entities from the upper layers. Peer N entities communicate with each other through an N − 1 service provider by exchanging N -protocol data units [(N)-PDUs], as shown in Figure 10.19. Interactions of an N -entity with its upper and lower entities are defined by N and N − 1 abstract service primitives (ASPs). The abstract test architectures are described in terms of the inputs to the IUT that can be given in a controlled manner and the corresponding outputs from the IUT are observed. Specifically, an abstract test architecture is described by identifying the points closest to the IUT where controls and observations are specifed. The abstract test architectures are classified into two major categories: local and external . The local test architectures are characterized by observation and control being specified in terms of events occurring within the SUT at the layer boundaries immediately below and above the IUT, as shown in Figure 10.20. On the other N + 1 Entity N ASPs N + 1 Entity N ASPs N Entity N PDUs N Entity N − 1 ASPs N −1 ASPs N − 1 Service provider N Service provider Figure 10.19 Abstraction of N entity in OSI reference architecture. 292 CHAPTER 10 TEST GENERATION FROM FSM MODELS Events Implementation under test Events Figure 10.20 Abstract local test architecture. Events Events Implementation under test Service provider Figure 10.21 Abstract external test architecture. hand, external test architectures are characterized by the observation and control of the events taking place externally from the SUT, on the other side of the underlying service provider from the IUT, as shown in Figure 10.21. The system in which an IUT resides is a SUT. The external architectures assume that an underlying communications service is used in testing. The local test architectures are only applicable to in-house testing by the suppliers. Whereas the external tests architectures are applied to both in-house testing by the supplier and the buyers/third-party test centers. There are three types of external test architectures, which are discussed along with the local test architecture, in the following. 10.10.1 Local Architecture A basic assumption in the local architecture is that exposed interfaces above and below the IUT exist. These interfaces serve as PCOs, that is, points at which a real test system can provide inputs to and observe outputs from the IUT. The test architecture includes two logically distinct elements, called upper tester and lower tester, as shown in Figure 10.22. The test events are specified in terms of N ASPs above the IUT, and N − 1 ASPs and N PDUs below the IUT. The coordination between the upper and the lower testers are provided by test coordination procedures during the testing. In summary, the local test architecture comprises a test harness around the IUT, which coordinates the actions of lower and upper testers. The roles of the upper and lower testers are to stimulate the IUT by exchanging test events at the Test coordination procedures 10.10 TEST ARCHITECTURES 293 Upper tester N ASPs PCO Implementation under test PCO N PDUs N − 1-ASPs Lower tester Figure 10.22 Local architecture. top and bottom interfaces of the IUT. The local architecture implicitly provides the capability to synchronize and control the upper and lower testers because both of them are elements of the same test harness. 10.10.2 Distributed Architecture The distributed architecture is illustrated in Figure 10.23. This architecture is one of three external architectures and makes no assumptions about the existence of a PCO below the IUT. This test architecture defines the PCOs as being at the service boundaries above the IUT and at the opposite side of the N − 1 service provider from the IUT. Note that the lower tester and the IUT reside in two different systems. The lower tester and the IUT are connected by an underlying service which offers an N − 1 service using lower layer protocol and the physical media connecting the two systems. The lower tester is obviously a peer entity of the IUT in Figure 10.23. The arrows between the IUT and the N − 1 service provider are not real interfaces, just Test coordination procedures Upper tester Lower tester PCO N ASPs N PDUs PCO Implementation under test N − 1 ASPs N − 1 Service provider Figure 10.23 Distributed architecture. 294 CHAPTER 10 TEST GENERATION FROM FSM MODELS the conceptual flow of N PDUs. The test events are specified in terms of N ASPs above the IUT and N − 1 ASPs and N PDUs remotely, as shown in Figure 10.23. Three important points should be kept in mind with this architecture: • The lower tester and IUT are physically separated with the implication that they perhaps observe the same test event at different times. • Delivery out of sequence, data corruption, and loss of data are possible because of the unreliable quality of the lower service provider. • Synchronization and control (test coordinate procedures) between upper and lower testers are more difficult due to the distributed nature of the test system. In summary, the distributed test architecture is a logical equivalent of the local architecture with the lower tester and the IUT interconnected by a communication service. However, the structure of the local architecture implicitly gives the capability to synchronize and control the upper and lower testers because they are elements of the same test harness. Thus, distributed architecture is not a functional equivalent to the local architecture. 10.10.3 Coordinated Architecture The coordinated architecture is an enhancement of the distributed architecture. Control and observation of N ASPs are performed by a Test Management Protocol (TMP). There is just one PCO at the opposite side of the N − 1 service provider from the IUT, as shown in Figure 10.24. Note that, even though a PCO appears between the upper tester and the IUT, it is optional; as a choice made by the implementer, the upper tester is perhaps integrated as part of the IUT. Two features that distinguish the coordinated architecture are as follows: • No interface is exposed to the upper tester of an IUT (although this is not precluded). • A standard TMP and TMP data units (TMPDUs) are used to communicate between the upper tester and the lower tester. Lower tester TMPDUs Upper tester PCO N PDUs PCO N − 1 ASPs Implementation under test N − 1 Service provider Figure 10.24 Coordinated architecture. 10.11 TESTING AND TEST CONTROL NOTATION VERSION 3 (TTCN-3) 295 Lower tester Test coordlnation procedures Upper tester Implementation N PDUs under test PCO N − 1 ASPs N − 1 Service provider Figure 10.25 Remote architecture. The lower tester is considered to be the master of the upper tester. The actions of the upper tester are controlled by the lower tester through a TMP. Test events are specified in terms of N − 1 ASPs, N PDUs, and TMPDUs, as illustrated in Figure 10.24. 10.10.4 Remote Architecture The remote architecture is applicable to IUTs that do not have an exposed upper interface. In the absence of an upper tester, the test architecture identifies a PCO away from the IUT, on the opposite side of the N − 1 service provider. The test events are specified in terms of the N − 1 ASPs and N PDUs, as shown in Figure 10.25. There are two major features of this architecture: • No interface at the top of the IUT is assumed. • No explicit test coordination procedure is used. The coordination between upper and lower testers is manual (e.g., talking over the telephone). The coordination is implicit in the PDUs initiated by the lower tester or provided by the actions taken by an upper layer entity to stimulate the IUT. The architecture relies on the protocol being tested for synchronization between the lower tester and the IUT. Verdicts must be formulated based on the stimulus provided by the lower tester and the responses of the IUT as observed by the lower tester. 10.11 TESTING AND TEST CONTROL NOTATION VERSION 3 (TTCN-3) As the name suggests, TTCN-3 is a language for specifying test cases [19]. The language is increasingly being accepted in the industry as a test specification language after it was standardized by the ETSI (European Telecommunication Standards 296 CHAPTER 10 TEST GENERATION FROM FSM MODELS Institute). The language has been designed by keeping in mind the needs of testing complex telecommunication systems. Consequently, the language is being used to write test cases to test complex communication protocols, such as the Session Initiation Protocol (SIP) and the Internet Protocol version 6 (IPv6). In the early efforts to develop a test specification language, the acronym TTCN-1 stood for Tree and Tabular Combined Notation version 1. The TCCN was developed in the mid-1980s and it evolved from TTCN-1 to TTCN-2 (Tree and Tabular Combined Notation version 2) while still retaining its core syntax and semantics to a large extent. Though much effort went into the development of the TCCN, it was not widely accepted as a test specification language in the industry. The reason for an absence of a broad interest in the notation was largely due to a wide gap between the syntax and the execution semantics. In 2001, TTCN-2 got a major face lift and TTCN-3 saw the light of day. Though TTCN-3 still retains the basic characteristics of TTCN-2, the syntax of TTCN-3 was designed in line with a procedural programming language. Programmers and test designers can write test cases the way they write programs, thereby reducing the gap between syntax and execution semantics. The programming language look and feel of TTCN-3 makes it more acceptable. TTCN-3 has seen much improvement since 2001, and the user and support bases are ever expanding. There is increasing tool support and an active team maintaining the language. In this section, we give a brief introduction to TTCN-3. We focus on the core features of TTCN-3, such as module, data types, templates, ports, components, and test cases. 10.11.1 Module A module is a fundamental unit for specifying test cases. In terms of programming, a test case comprises some data declarations and some execution behavior, which are specified in the form of one or more modules. The execution behavior of a test case is referred to as the control part. The structure of a module is shown in Figure 10.26. 10.11.2 Data Declarations TTCN-3 allows the declarations of constants and variables. A constant is denoted by the const keyword, and the value of the constant is assigned at the point of declaration. The value of a constant cannot be changed afterward. TTCN-3 has its own scoping rules for all data definitions. All declarations made at the “top” level, that is, module level, are accessible throughout the module. Here, top level means before the control part. The concept of a “code block,” enclosed by a matching pair of curly braces, helps us understand scoping rules. Definitions made within a specific code block are only accessible within that code block, and identifiers are not reused in nested code blocks. In other words, there do not exist two data items with the same name but different scopes. No variable can be defined at the module level, and, therefore, all module-level declarations are constants. Absence of module-level variables means absence of global variables. 10.11 TESTING AND TEST CONTROL NOTATION VERSION 3 (TTCN-3) 297 /* One can document a module by writing comments in this way. */ // Additional comments can be included here. module ExampleTestModule1 { // A module can be empty. // First, define some data to be used in the control part const integer MaxCount := 15; constant integer UnitPacket = 256; // More data can be defined here . . . // Second, specify the control part to execute control { // The control part is optional var integer counter := 0; var integer loopcount := MaxCount; const integer PacketSize := UnitPacket * 4; // Specify more execution behavior here . . . } // End of the control part } // end of module TestCase1 Figure 10.26 Structure of module in TTCN-3. Since a test case can have distributed components, it is difficult to guarantee the semantics of global variable across distributed test components. TTCN-3 supports a powerful set of built-in types, such as integer, float, Boolean (universal), charstring, verdicttype, bitstring, hexstring, octetstring, objid, and default. Such a rich set of built-in types is essential to protocol testing. TTCN-3 also allows the definitions of structured types and list types from existing types. Constructs to define structured types are enumerated, record, set, and union. Similarly, constructs to define list types are array, set of, and record of. TTCN-3 allows programmers to use the concept of data subtyping. Subtyping means restricting the values of a type to a subset of all values allowed by the original type. For example, given the set of all integers, one can create a subtype of all unsigned numbers that can be represented with 16 bits. Such a subtype is useful in representing port numbers while testing using TCP. Two examples of subtyping are shown in Figure 10.27, where TCPPort is a new user-defined type. A variable of type TCPPort can take on values in the range 0, . . . , 65535. Similarly, IPUserProtocol is a subtype of charstring. A variable of type IPUserProtocol can take on values from the given set and not outside the set. A PDU in a communication protocol can be defined using a record type. In some protocols, their PDUs are simply called messages. TTCN-3 allows one to type integer TCPPort ( 0 .. 65535 ); // a 16 bit unsigned number type charstring IPUserProtocol ( ‘‘TCP’’, ‘‘UDP’’, ‘‘OSPF’’, ‘‘RIP’’ ); Figure 10.27 Definitions of two subtypes. 298 CHAPTER 10 TEST GENERATION FROM FSM MODELS define a PDU field of arbitrary bit length. For example, one can define PDU fields of 1 bit, 4 bits, and 6 bits which are found in the packet header of IPv4 (Internet Protocol version 4). Often protocols put a limit on the length of some of the PDU fields. Such limits are easily expressed using the length attribute of variables. One creates instances of those types by using concrete values after defining a PDU or a message type. Such concrete instances have two applications: (i) send messages to a remote protocol entity and (ii) receive messages having the desired values. Concrete instances of message types are called templates in TTCN-3. TTCN-3 allows parameterization of templates for easily creating messages. An example of a template is shown in Figure 10.28, where MyMessage is a message type and SFCRequest is an instance of MyMessage. The MyMessage type consists of four fields. The response field can be omitted while creating an instance of the template. An example definition of a response message of type MyMessage is shown in Figure 10.29. One can specify what message is expected to be received from a SUT by defining a response message. The identification field can be used to associate a response message with a request message. A “?” value of the input field in the SFCResponse tells us that the field can contain any value, which is ignored. The response field in the received SFCResponse carries the actual value expected to be received from the SUT. 10.11.3 Ports and Components A test infrastructure may consist of one or more test components, where a test component is an entity that can send and/or receive messages (templates). There is template MyMessage SFCRequest( Identification id, Input Ival) := { identification := id, msgtype := Request, input := Ival, response := omit // ‘‘omit’’ is a keyword } Figure 10.28 Parameterized template for constructing message to be sent. template MyMessage SFCResponse( Identification id, Response Rval) := { identification := id, msgtype := Response, input := ?, // This means the field can contain any value response := Rval } Figure 10.29 Parameterized template for constructing message to be received. 10.11 TESTING AND TEST CONTROL NOTATION VERSION 3 (TTCN-3) 299 SFC tester SFC Request Response Time (a) SFC server port SFC tester SFC (b) Figure 10.30 Testing (a) square-root function (SRF) calculator and (b) port between tester and SRF calculator. a need for communication between test components and between the SUT and the test components. The points at which communication takes place are called ports in TTCN-3. A port is modeled as an infinite FIFO (first in–first out) queue from the viewpoint of a receiver. Two kinds of communication semantics are associated with a port: message semantics and procedure call semantics. One can specify the kinds of messages or calls a port handles and the input–output direction of the port. A port can be an in (input) port, an out (output) port, or in–out (both input and output) port. We explain the testing of a square-root function calculator (SFC) in terms of a test component and the SFC component, as shown in Figure 10.30a. A port between the two components is shown in Figure 10.30b. The declaration of an in–out port type that handles messages of type MyMessage is shown in Figure 10.31. Figure 10.32 illustrates the attachment of test component SFCTester with the SFCServerPort. The SFC component is assumed to be running on a port called SFCServerPort. 10.11.4 Test Case Verdicts A test designer wants to conclude something after having executed a test case. For example, two simple conclusions are whether the SUT has passed or failed the test. If a SUT behaves as expected, then it is natural to say that it has passed the test; otherwise the test has failed. Thus, pass and fail are two obvious test 300 CHAPTER 10 TEST GENERATION FROM FSM MODELS type port SFCPort message { // The SFCPort type has a ‘‘message’’ // semantics inout MyMessage // The SFCPort type is of inout type // handling // messages of type MyMessage } Figure 10.31 Defining port type. type component SFCTester { port SFCPort SFCServerPort } Figure 10.32 Associating port with component. verdicts. However, often the test designer may not be in a position to conclusively say whether the system has passed or failed a test. In such a case, the test designer assigns an inconclusive test verdict, which means that further tests need to be conducted to refine the inconclusive verdict into either a pass or a fail verdict. TTCN-3 provides a mechanism to record test verdicts. Associated with each test component there is an implicitly defined variable of type verdicttype. The initial, default value of the implicitly defined test verdict variable is none. A test designer can assign new values to the test verdict variable by calling the operation setverdict. For example, the calls setverdict(Pass), setverdict(Fail), and setverdict(Inconc) assign verdicts pass, fail, and inconclusive, respectively. There is a fourth test verdict, namely, error, which is assigned by the run time system. The run time assigns the test verdict error to the test verdict variable when a run time error occurs. For example, dividing a numeric value by zero leads to a run time error. TTCN-3 does not allow a test designer to explicitly set the value of the verdict variable to error, that is, the operation setverdict(error) is not allowed in TTCN-3. The value of the verdict assigned so far in a test component can be retrieved with the operation getverdict. 10.11.5 Test Case A simple test case running on the SFCTester component is shown in Figure 10.33. We assume a message passing semantic of communication between the test component SFCTester and the SUT, namely, SFC. SFCTester sends a request message to the SFC and waits for a response message from SFC. Since a faulty SFC may not generate a response message, SFCTester must exit from a possible infinite wait. Therefore, we define a timer, namely, responseTimer, by using the keyword timer. Next, the test case sends a message, namely, SFCMessage, and starts a timer with 10.11 TESTING AND TEST CONTROL NOTATION VERSION 3 (TTCN-3) 301 // A test case description with alternative behavior testcase SFCtestcase1() runs on SFCTester { timer responseTimer; // Define a timer SFCPort. send (SFCRequest(7, 625)); responseTimer.start(5.0); alt { // Now handle three alternative cases . . . // Case 1: The expected result of computation // is received. [] SFCPort.receive(SFCResponse(7, 25)) { setverdict(pass); responseTimer.stop; } // Case 2: An unexpected result of computation // is received. [] SFCPort.receive { setverdict(fail); responseTimer.stop; } // Case 3: No result is received within a reasonable // time. [] responseTimer.timeout { setverdict(fail); } } stop; } // End of test case Figure 10.33 Test case for testing SRF calculator. a duration of 5.0 s. SFCMessage contains two important fields, namely, identifier and input. We are interested in calculating the square root of input, which has been given the value of 625. In the given example, the identifier takes on an arbitrary, but known, value of 7. After sending a request to the SFC component and starting a timer, the test component waits for the expected message to arrive from the SFC component. At this point, three different situations can arise. First, the SFC component can correctly respond with the expected result of 25 in a message having an identifier value of 7. Second, the SFC component responds with an incorrect value; for example, the identifier field can have an incorrect value or the response field can have a value not equal to 25. Third, the SFC component may fail to produce a response. In the first case, SFCTester records a pass test verdict. In the second case, the test component records a fail test verdict. In the third case, the test component comes out of an infinite wait and records a fail test verdict. If one of the first two alternatives occur, the test component naturally stops the timer, whereas in the third alternative the test component gets a timeout. The alternative behaviors have been 302 CHAPTER 10 TEST GENERATION FROM FSM MODELS module ExampleTestModule2 { // Define variables and constants to be used . . . // Define templates to be used . . . // Define ports to be used . . . // Associate test components with ports . . . // Define test cases, such as SFCtestcase1 . . . control { execute( SFCtestcase1() ); } } Figure 10.34 Executing test case. expressed by using the alt construct, and individual alternative behaviors have been explicitly represented by using the [ ] symbol. Finally, the test case SFCtestcase1 can be executed by having an execute statement in the control portion of a test module, as shown in Figure 10.34. 10.12 EXTENDED FSMS Two conceptual components of a software system are flow of control and manipulation of data. A FSM model is useful at describing the former but has no provision for specifying the latter. Though there are many systems which can be conveniently and accurately modeled as FSMs, many systems in the real world require us to specify the associated computations while a system makes transitions from state to state. The associated computations can take on the following forms: • Manipulate local variables. • Start and stop timers. • Create instances of processes. • Compare values and make control flow decisions. • Access databases. In this section we will frequently refer to the FSM model of a telephone PBX shown in Figure 10.8. There is a need to record the start time of a call, and this can be done by noting the time when the FSM moves to the talk state. Constructs to start and stop timers are essential to the specification of real-time systems. Manipulation of local variables and conditional jumps are central to repeatedly executing a sequence of state transitions for a certain number of times. Accessing a database is essential to logging values of local variables for business purposes, such as billing and maintenance. Therefore, there is a need to augment the basic structure of a state transition with capability to perform additional computations, such as updating values of variables, manipulating timers, and making decisions. Such an extension of an FSM 10.12 EXTENDED FSMS 303 results in an extended finite-state machine (EFSM). Processes in the Specification and Description Language (SDL) [20, 21] are EFSMs. SDL processes are built around the following basic concepts: • System, which is described hierarchically by elements called systems, blocks, channels, processes, services, signals, and signal routes • Behavior, which is described using am extension of the FSM concept • Data, which are described using the concept of abstract data types with the addition of a notion of program variable and data structure • Communication, which is asynchronous via channels that are infinite queues An SDL specification can be written in two different forms: SDL/GR and SDL/PR. SDL/GR is a graphical syntax which shows most of the language constructs in flow-chart-like graphical form. Data definitions can only be textually represented. On the other hand, SDL/PR is written in textual form for machine processing. The “PR” in SDL/PR stands for processing. A one-to-one mapping is defined between the two forms. We show the structure of a state transition in an FSM in Figure 10.35a. This state transition specifies that if the FSM—the complete one has not been shown here—receives input a in state A, the machine produces an output b and moves to state B. The SDL/GR representation of the same transition is shown in Figure 10.35b. This example shows that one can easily represent an FSM as an SDL process. The state transition of Figure 10.35b has been extended in Figure 10.35c by including a task block that starts a timer. We show two state transitions—one from state A to state B and the other from A A A A a a a True False Start timer (T ) P a /b Get Update value b b resource of variables b b Start timer (T ) c B B B B C (a) (b) (c) (d) Figure 10.35 Comparison of state transitions of FSM and EFSM. 304 CHAPTER 10 TEST GENERATION FROM FSM MODELS state A to state C —that include task blocks and a decision box (Figure 10.35d ). To summarize the discussion of an EFSM, an EFSM is similar to an FSM with computations associated with every state transition. In the following, we give an EFSM model of a real-life system. Example: Door Control System. Let us consider a door control system, as illustrated in Figure 10.36. A door is equipped with an electromechanical unit so that it can be opened and closed by sending electrical signals to the unit from a control panel. The user keys in a four-digit number to access a door. The door control unit compares the user-supplied number with a programmed number. If the two numbers match, then the control system turns the green light on, sends a signal to the electromechanical unit to open the door, and starts a timer called DoorOpen. If the two numbers do not match, then the red light goes on for 5 seconds followed by a welcome message to enter the PIN. The door unit detects whether or not someone has passed through the door. If someone passes through the door, the door unit sends a Passed signal to the control unit. If no one passes through the door, the door unit sends a NotPassed signal to the control unit. If the control unit receives neither a Passed nor a NotPassed signal, it will eventually receive a timeout from the DoorOpen timer. The door unit must produce an appropriate signal irrespective of whether a user passes through the door or not. If the door unit fails to produce these signals, then the control unit assumes that the door unit is faulty, makes the yellow light turn on, displays a message saying that users are not welcome, and waits in the idle state for the necessary repair work to be done. Control signals Electromechanical system for door control Message area Green light 123 Red light 456 789 Yellow light * 0# CANCEL Door control panel Door Figure 10.36 Controlled access to a door. 10.12 EXTENDED FSMS 305 If the control unit receives a Passed or a NotPassed signal while the green light is on, the DoorOpen timer is stopped, the green light is switched off, a signal is sent to the door to close, and, finally, the system readies itself to handle another user request. A user could change his or her mind while entering the PIN by pressing the cancel button. Moreover, the same effect of cancellation can be achieved by abandoning the PIN entry process which is detected by a timer. The above description of a door control system has been specified in SDL in Figures 10.37, 10.38, and 10.39. The system-level diagrams are found in Figure 10.37, and the behavior of the door control system can be found in the process diagrams of Figures 10.38 and 10.39. The reader may be reminded that we have specified the behavior of the door control system as an EFSM in the form of an SDL process shown in Figures 10.38 and 10.39. SYSTEM DOORCONTROL 1(2) [(UserData)] FromUser ToUser Door Control Unit [(Light), (Message), (Star)] [(DoorControl)] Control Status [(DoorStatus)] SYSTEM DOORCONTROL 2(2) SIGNALLIST Light = Off, On; SIGNALLIST Message = Welcome, NotWelcome; SIGNALLIST Star = Asterisk; SIGNALLIST DoorControl = Open, Close; SIGNALLIST DoorStatus = Passed, NotPassed; SIGNALLIST UserData = 0, 1, ..., 9, CANCEL; Figure 10.37 SDL/GR door control system. 306 CHAPTER 10 TEST GENERATION FROM FSM MODELS PROCESS DoorControl START Off to GreenLight Off to RedLight Off to YellowLight Welcome to Display Start Timer (ReadDigit) GETDIGIT digit CANCEL Timeout (ReadDigit) Asterisk t0 Display Welcome to Display Welcome to Display Start Timer (ReadDigit) Stop Timer (ReadDigit) Start Timer (ReadDigit) Incomplete Analyze digits Complete and valid Invalid On to RedLight Start Timer (DoorOpen) Start Timer (ReadDigit) Start Timer (RedTimer) Open to Door RED On to GreenLight Timeout (RedTimer) DOOROPEN Off to RedLight Welcome to Display /* Timer values in seconds ReadDigit 5 DoorOpen 10 RedTimer 5 */ Figure 10.38 Door control behavior specification. PROCESS DoorControl DOOROPEN 10.13 TEST GENERATION FROM EFSM MODELS 307 Passed NotPassed Timeout (DoorOpen) Stop Timer (DoorOpen) Stop Timer (DoorOpen) Off to GreenLight Off to GreenLight Off to GreenLight On to Yellow Close to Door Close to Door Not Welcome to Display Welcome to Display Welcome to Display Start Timer (ReadDigit) Start Timer (ReadDigit) GETDIGIT GETDIGIT IDLE Figure 10.39 Door control behavior specification. 10.13 TEST GENERATION FROM EFSM MODELS This section explains the mechanisms to generate test cases from an EFSM model of a system. Specifically, we generate test cases from an SDL process. Assume that we have an EFSM model E and a program PE that correctly implements E. Our task is to generate test cases from E to test program PE. Ideally, we want to generate a test suite TE from E so that by testing PE with TE, we verify whether PE implements the functions specified in E. Theoretically, in general, it is impossible to design such a test suite which, when applied to PE, can reveal all the faults with PE. In practice, however, we want to design TE with the goal of verifying that PE behaves as expected for the commonly used input sequences. This goal is achieved by designing TE in two phases: Phase 1 : Identify a set of state transition sequences such that each sequence of state transitions represents a common use sequence. 308 CHAPTER 10 TEST GENERATION FROM FSM MODELS Phase 2 : Design a test case from each state transition sequence identified above. In phase 1, the main task is to identify sequences of state transitions with two objectives. The first objective is whether or not an implementation PE supports the functions represented by the chosen sequences. The second objective is to reveal faults with the implementation. Here, once again, state transition sequences are chosen to meet some coverage criteria. We remind the reader that a coverage criterion provides us with a systematic method for selecting sequences of state transitions. In addition, a coverage criterion gives us the minimum number of sequences we need to identify. While selecting state transition sequences, we must do the following: • Perform tests to ensure that the SUT produces expected sequences of outcomes in response to input sequences. • Perform tests to ensure that the SUT takes the right actions when a timeout occurs. • Perform tests to ensure that the SUT has appropriately implemented other task blocks, such as resource allocation and deallocation, database accesses, and so on. As discussed earlier in this chapter, there are two well-known coverage criteria for selecting state transition sequences, namely, state coverage and transition coverage. The former criterion means identifying a set of transition tours such that every state is visited at least once. The latter criterion means identifying a set of transition tours such that every state transition is visited at least once. Phase 2, that is, the process of generating a test case from a transition sequence, is explained in the following by means of an example. Example: Test Generation from Door Control System. We show an EFSM model of a door control system in Figures 10.38 and 10.39. The control system has four states, namely, GETDIGIT, RED, DOOROPEN, and IDLE. After a sequence of initialization steps, the EFSM moves to its initial state GETDIGIT. Normal behavior of the system is restricted to the state transitions among the three states GETDIGIT, RED, and DOOROPEN. If the electromechanical door unit fails to respond to signals from the door control system, the EFSM moves from the DOOROPEN state to the IDLE state. The IDLE state can be thought of as an error state. Once faults with the door unit are fixed and the control unit is reset, the system moves back from the IDLE state to the GETDIGIT state. Therefore, by adding a sequence of tasks from the IDLE state to the START of the EFSM, we can make the EFSM strongly connected. Assuming that the SDL process of Figures 10.38 and 10.39 is in state GETDIGIT, we design a test case that covers the transition tour shown in Figure 10.40. This transition tour executes the loop induced by digit inputs in state GETDIGIT while the read digits are incomplete. We assume that a user keys in an acceptable sequence of digits; that is, analysis of the digits in the decision box is complete and valid . 10.13 TEST GENERATION FROM EFSM MODELS 309 GETDIGIT ⇒ . . . GETDIGIT ⇒ . . . GETDIGIT ⇒ DOOROPEN ⇒ GETDIGIT. Figure 10.40 Transition tour from door control system of Figures 10.38 and 10.39. The reader may notice that control can flow from the DOOROPEN state to the GETDIGIT state in two ways, namely, by receiving Passed and NotPassed inputs, corresponding to someone passing through the door and not passing through the door, respectively. While generating a test case to cover the above transition tour, we assume that the Passed input occurs. In the following, we generate a test case using the three test design principles discussed in Section 10.5: Step 1: We transform the inputs and outputs in a transition tour into outputs and expected inputs, respectively, to derive the core behavior of a test case. We obtain the ports, namely, DISPLAY, KEYPAD, GREENLIGHT, and DOOR, as shown in Figure 10.41 by analyzing the transition tour of Figure 10.40. The sequence of inputs and outputs found in Figures 10.38 and 10.39, corresponding to the transition tour shown in Figure 10.40, have been transformed into outputs and inputs, respectively, in the TTCN-3 notation of Figure 10.42. The reader may note that the test behavior shown in Figure 10.42 is in terms of the four ports identified in Figure 10.41. If the test system observes that the SUT behaves as specified in Figure 10.42, then it assigns a Pass test verdict. We informally express a condition such that a test case outputs a number of digits in line 3 of Figure 10.42. We have obtained the test behavior shown in Figure 10.43 from Figure 10.42 by refining the if statement in line 3 of Figure 10.42 as follows. We initialize a counter count with value 0 in line 1 of Display Keypad Green light Door control SUT Door Test system Figure 10.41 Testing door control system. 310 CHAPTER 10 TEST GENERATION FROM FSM MODELS 1. label label1 KEYPAD.send(digit); 2. DISPLAY.receive(Asterisk); 3. if (NOT enough number of digits) goto label1; // Not in TTCN-3 form yet. 4. DOOR.receive(Open); 5. GREENLIGHT.receive(On); 6. DOOR.send(Passed); 7. GREENLIGHT.receive(Off); 8. DOOR.receive(Close); 9. DISPLAY.receive(Welcome); 10. setverdict(Pass); Figure 10.42 Output and input behavior obtained from transition tour of Figure 10.40. 1. count := 0; // Count is of type integer. 2. label label1 KEYPAD.send( digit ); 3. DISPLAY.receive( Asterisk ); 4. count := count + 1; 5. if (count < 4) goto label1; 6. else { 7. DOOR.receive( Open ); 8. GREENLIGHT.receive( On ); 9. DOOR.send( Passed ); 10. GREENLIGHT.receive( Off ); 11. DOOR.receive (Close); 12. DISPLAY.receive(Welcome); 13. setverdict(Pass); 14. }; Figure 10.43 Test behavior obtained by refining if part in Figure 10.42. Step 2: Figure 10.43 and increment the counter in line 4 after the test behavior outputs a digit. The informal condition part of the if statement in line 7 of Figure 10.42 has been refined into a concrete condition in Figure 10.43 (line 5) by assuming that the door control system accepts a sequence of digits of length 4. We augment a test behavior to prepare itself to receive events other than the expected events. This is done by including an any event, denoted by a “?,” as an alternative event to each expected event. When we apply this transformation step to the test behavior shown in Figure 10.43, we obtain the test behavior shown in Figure 10.44. For example, the any event denoted by a “?” in line 20 of Figure 10.44 is designed to match with any event which does not match the expected event specified in line 18. The any events in lines 23, 26, 29, 32, and 36 are alternative events to the expected events in lines 16, 14, 11, 9, and 4, respectively. 10.13 TEST GENERATION FROM EFSM MODELS 311 1. count := 0; // Count is of type integer. 2. label label1 KEYPAD.send( digit ); 3. alt { 4. [] DISPLAY.receive( Asterisk ); 5. count := count + 1; 6. if (count < 4) goto label1; 7. else { 8. alt { 9. [] DOOR.receive( Open ); 10. alt { 11. [] GREENLIGHT.receive( On ); 12. DOOR.send( Passed ); 13. alt { 14. [] GREENLIGHT.receive( Off ); 15. alt { 16. [] DOOR.receive (Close); 17. alt { 18. [] DISPLAY.receive(Welcome); 19. setverdict(Pass); 20. [] DISPLAY.receive(?); 21. setverdict(Fail); 22. } 23. [] DOOR.receive(?); 24. setverdict(Fail); 25. } 26. [] GREENLIGHT.receive(?); 27. setverdict(Fail); 28. } 29. [] GREENLIGHT.receive(?); 30. setverdict(Fail); 31. } 32. [] DOOR.receive(?); 33. setverdict(Fail); 34. } 35. } // end of else 36. [] DISPLAY.receive(?); 37. setverdict(Fail); 38. } Figure 10.44 Test behavior that can receive unexpected events (derived from Figure 10.43). Step 3: We augment a test behavior with timers so that a test system does not enter into a deadlock in case the SUT produces no output, that is, before waiting to receive an expected event, the test behavior starts a timer. The corresponding timeout input event is specified as an alternative to the expected input and the any event explained before. When we apply this transformation step to the test behavior shown in Figure 10.44, we obtain the test behavior shown in Figure 10.45. Finally, the test behavior is augmented with test verdicts. A Pass verdict is assigned if the system under test behaves as expected, and this is shown in line 10 of Figure 10.42. The Pass verdict in line 10 of Figure 10.42 can be found in line 13 in Figure 10.43, line 19 in Figure 10.44, and line 27 in Figure 10.45, as we 312 CHAPTER 10 TEST GENERATION FROM FSM MODELS 1. count := 0; // Count is of type integer. 2. label label1 KEYPAD.send( digit ); 3. Timer1.start(d1); 4. alt { 5 [] DISPLAY.receive( Asterisk ); 6. Timer1.stop; 7. count := count + 1; 8. if (count < 4) goto label1; 9. else { 10. Timer2.start(d2); 11. alt { 12. [] DOOR.receive( Open ); 13. Timer2.stop; Timer3.start(d3); 14. alt { 15. [] GREENLIGHT.receive( On ); 16. Timer3.stop; 17. DOOR.send( Passed ); 18. Timer4.start( d4 ); 19. alt { 20. [] GREENLIGHT.receive( Off ); 21. Timer4.stop; Timer5.start( d5 ); 22. alt { 23. [] DOOR.receive (Close); 24. Timer5.stop; Timer6.start( d6 ); 25. alt { 26. [] DISPLAY.receive(Welcome); 27. Timer6.stop; setverdict(Pass); 28. [] DISPLAY.receive(?); 29. Timer6.stop; setverdict(Fail); 30. [] Timer6.timeout; 31. setverdict(Inconc); 32. } 33. [] DOOR.receive(?); 34. Timer5.stop; setverdict(Fail); 35. [] Timer5.timeout; 36. setverdict(Inconc); 37. } 38. [] GREENLIGHT.receive(?); 39. Timer4.stop; setverdict(Fail); 40. [] Timer4.timeout; 41. setverdict(Inconc); 42. } 43. [] GREENLIGHT.receive(?); 44. Timer3.stop; setverdict(Fail); 45. [] Timer3.timeout; 46. setverdict(Inconc); 47. } 48. [] DOOR.receive(?); 49. Timer2.stop; setverdict(Fail); 50. [] Timer2.timeout; 51. setverdict(Inconc); 52. } 53. } 54. [] DISPLAY.receive(?); 55. Timer1.stop; setverdict(Fail); 56. [] Timer1.timeout; 57. setverdict(Inconc); 58. }} Figure 10.45 Core behavior of test case for testing door control system (derived from Figure 10.44.) 10.14 ADDITIONAL COVERAGE CRITERIA FOR SYSTEM TESTING 313 go on making the test case more and more complete. We assign a Fail verdict when the test system receives an unexpected event in the form of any events and an Inconclusive verdict when a timeout occurs. 10.14 ADDITIONAL COVERAGE CRITERIA FOR SYSTEM TESTING We discussed two coverage criteria, namely, state coverage and state transition coverage, to select test cases from FSM and EFSM models of software systems in Sections 10.5 and 10.13. Those two criteria focused on sequences of events, possibly including internal events, occurring at PCOs. In this section, we explain some more coverage criteria in line with the concepts of functional testing. The reader may recall from Chapter 9 on functional testing that we identify the domains of input and output variables and select test data based on special values from those domains. For example, if an output variable takes on a small number of discrete values, then test cases are designed to make the SUT produce all those output values. If an output variable takes on values from a contiguous range, test data are selected such that the SUT produces the external points and an interior point in the specified range. In line with the above concept of functional testing, in the following, we explain some coverage criteria for event-driven systems modeled as FSMs or EFSMs. First, we identify the PCOs, also referred to as ports—points where a SUT interacts with the external world. Next, we apply the following coverage criteria to select test cases: PCO Coverage: Select test cases such that the SUT receives an event at each input PCO and produces an event at each output PCO. Sequences of Events at PCOs: Select test cases such that common sequences of inputs and outputs occur at the PCOs. By a common sequence we mean sequences commonly found in the uses of the system. Events Occurring in Different Contexts: In many applications, a user produces an event for the system by pressing a button, for example. Here, a button represents an input PCO, and pressing the button represents an event. However, the semantics of pressing a button, that is, interpretations of events at a PCO, depend on data contexts used by the system. For a given context, test data are selected such that all events, both desired and undesired, occur in the context. Inopportune Events: A system is expected to discard invalid, or erroneous, events. On the other hand, inopportune events are normal events which occur at an inappropriate time. Example: Automated Teller Machine. Let us consider the user interface of an ATM system as shown in Figure 10.46. A user selects one of the transaction options and specifies a transaction amount using buttons B1 through B6. The meaning of a 314 CHAPTER 10 TEST GENERATION FROM FSM MODELS 1 2 3 4 5 6 B1 Display 7 8 9 B2 OK 0 CANCEL B3 B4 B5 B6 Cash card slot Receipt slot Cash dispensing door Deposit envelope door Figure 10.46 User interface of ATM. Select an option <= Fund transfer Deposit => B1 B2 Withdraw => B3 B4 Check balance => B5 B6 Cash dispensing door 1 2 3 4 5 6 7 8 9 OK 0 CANCEL Cash card slot Receipt slot Deposit envelope door Figure 10.47 Binding of buttons with user options. button changes as the message in the display area changes. For example, a user can choose a transaction option, such as Deposit or Withdraw, by pressing buttons B2 or B4, respectively, as shown in Figure 10.47. However, as shown in Figure 10.48, when it comes to selecting an amount, buttons B2 and B4 represent options $40 and $100, respectively. Moreover, all the buttons for the amount context shown in Figure 10.48 do not represent the same type of data. For example, buttons B1 through B5 represent discrete, integer values, whereas button B6 gives the user an option to specify other values. In the context shown in Figure 10.47, buttons B3 and B5 are undefined, and, thus, those are potential sources of undesirable (erroneous) events. Test cases must be selected to observe how the system responds to undefined events. The OK button produces a normal event while the user is entering a PIN after inserting the cash card. The OK event essentially tells the system that the user 10.15 SUMMARY 315 Select an amount B1 $ 20 B3 $ 60 B5 $ 160 $40 B2 $100 B4 OTHER B6 Cash dispensing door 1 2 3 4 5 6 7 8 9 OK 0 CANCEL Cash card slot Receipt slot Deposit envelope door Figure 10.48 Binding of buttons with cash amount. has completely entered a PIN. However, it may not be meaningful to press OK in the context shown in Figure 10.48. Therefore, pressing the OK button produces an inopportune event for the system to handle. Test cases need to be selected to consider inopportune events in addition to normal (valid) and abnormal (invalid or erroneous) events. 10.15 SUMMARY This chapter began with the classification of software systems in terms of stateless and state-oriented systems. A stateless system does not memorize the previous inputs, and, therefore, its response to a new input does not depend on the previous inputs. On the other hand, a state-oriented system memorizes the sequence of inputs it has received so far in the form of a state. Next, we examined the concept of ports in the context of software testing. State-oriented systems were modeled as FSMs. We explained two broad methods for generating test sequences from an implementation of an FSM—one without state verification and one with state verification. One can design weak test sequences in the form of transition tours without using state verification. On the other hand, one can design stronger test sequences by performing state verification with one of three kinds of sequences, namely, unique input–output sequences, distinguishing sequences, and characterizing sequences. We examined four test architectures: local , distributed , coordinated , and remote with the test generation methods in place. These abstract test architectures are described in terms of controllable inputs given to an IUT and observable outputs from the IUT. The concept of points of control and observation (PCOs) is introduced. A PCO is a point of interaction called a port, which is accessible to a test entity, between the test entity and an IUT. 316 CHAPTER 10 TEST GENERATION FROM FSM MODELS We provided a brief introduction to Testing and Test Control Notation Version 3 (TTCN-3), which is a common notation to specify test cases for a communication system. Specifically, the following features are described with examples: module, data types, templates, ports, components, and test cases. Finally, we introduced the concept of an extended finite-state machine (EFSM), which has the capability to perform additional computations such as updating variables, manipulating timers, and making decisions. The concept of a process in SDL allows one to specify a module in the form of an EFSM. Finally, we presented the ways to generate test cases from an EFSM model. LITERATURE REVIEW An excellent collection of papers on FSM-based testing can be found in the book edited by Richard J. Linn and M. Umit Uyar [11]. Many articles referenced in this chapter have been reprinted in the book. Those interested in knowing more about EFSM-based testing may refer to the excellent book by B. Sarikaya (Principles of Protocol Engineering and Conformance Testing, Ellis Horwood, Hemel Hempstead, Hertfordshire, 1993). Sarikaya explained formal specification languages Estelle, SDL, LOTOS, TTCN (the original Tree and Tabular Combined Notation), and Abstract Syntax Notation One (ASN.1) in the first part of his book. In the second part of the book, he explains the generation of transition tours from an unified model of Estelle, SDL, and LOTOS using the concepts of control flow and data flow . In the past decade, researchers have proposed several techniques for generating test cases from nondeterministic FSMs. In the following we list the commonly referenced ones: A. Alur, C. Couroubetis, and M. Yannakakis, “Distinguishing Tests for Nondeterministic and Probabilistic Machines,” in Proceedings of the 27th Annals of the ACM Symposium on Theory of Computing, Las Vegas, Nevada, ACM Press, New York, 1995, pp. 363–372. R. M. Hierons, “Applying Adaptive Test Cases to Nondeterministic Implementations,” Information Processing Letters, Vol. 98, No. 2, April 2006, pp. 56–60. R. M. Hierons and H. Ural, “Reducing the Cost of Applying Adaptive Test Cases,” Computer Networks: The International Journal of Computer and Telecommunications Networking, Vol. 51, No. 1, January 2007, pp. 224 – 238. G. Luo, G. v. Bochmann, and A. Petrenko, “Test Selection Based on Communicating Nondeterministic Finite-State Machines Using a Generalized Wp-Method,” IEEE Transactions on Software Engineering, Vol. 20, No. 2, February 1994, pp. 149–162. P. Tripathy and K. Naik, “Generation of Adaptive Test Cases from Nondeterministic Finite State Models,” in Proceeding of the Fifth International REFERENCES 317 Workshop on Protocol Test Systems, G. v. Bochmann, R. Dssouli, and A. Das, Eds., Montreal, North-Holland, Amsterdam, 1992, pp. 309–320. F. Zhang and T. Cheung, “Optimal Transfer Trees and Distinguishing Trees for Testing Observable Nondeterministic Finite-State Machines,” IEEE Transactions on Software Engineering, Vol. 29, No. 1, January 2003, pp. 1 – 14. REFERENCES 1. S. Naito and M. Tsunoyama. Fault Detection for Sequential Machine by Transition Tours. In Proceedings of the 11th IEEE Fault Tolerant Computer Symposium, Los Alamitos, CA, IEEE Computer Society Press, 1981, pp. 238–243. 2. B. Sarikaya and G. v. Bochmann. Some Experience with Test Sequence Generation. In Proceedings of Second International Workshop on Protocol Specification, Testing, and Verification, North Holland, Amsterdam, The Netherlands, 1982, pp. 555–567. 3. E. P. Hsieh. Checking Experiments for Sequential Machines. IEEE Transactions on Computers, October 1971, pp. 1152– 1166. 4. K. K. Sabnani and A. T. Dahbura. A Protocol Testing Procedure. Computer Networks and ISDN System, Vol. 15, 1988, pp. 285– 297. 5. K. Naik. Efficient Computation of Unique Input/Output Sequences in Finite-State Machines. IEEE/ACM Transactions on Networking, August 1997, pp. 585–599. 6. G. Gonenc. A Method for the Design of Fault Detection Experiments. IEEE Transactions on Computers, June 1970, pp. 551– 558. 7. F. C. Hennie. Fault-Detecting Experiments for Sequential Circuits. In Proceedings of the 5th Annual Symposium on Switching Circuit Theory and Logical Design, Princeton University, Princeton, IEEE Press, New York, Lenox Hill Station, November 1964, pp. 95–110. 8. A. Gill. State-Identification Experiments in Finite Automata. Information and Control , Vol. 4, 1961, pp. 132–154. 9. Z. Kohavi. Switching and Finite Automata Theory. McGraw-Hill, New York, 1978. 10. T. S. Chow. Testing Software Designs Modeled by Finite State Machines. IEEE Transactions on Software Engineering, May 1978, pp. 178– 187. 11. R. J. Linn and M. U. Uyar, Ed. Conformance Testing Methodologies and Architectures for OSI Protocols. IEEE Computer Society Press, Los Alamitos, CA, 1994. 12. D. Sidhu and T. Leung. Fault Coverage of Protocol Test Methods. In Proceeding of the IEEE INFOCOM , IEEE Press, New Orleans, LA, March 1988, pp. 80– 85. 13. A. T. Dahbura and K. K. Sabnani. An Experience in the Fault Coverage of a Protocol Test. In Proceeding of the IEEE INFOCOM , IEEE Press, New Orleans, LA, March 1988, pp. 71–79. 14. Information Processing Systems. OSI Conformance Testing Methodology and Framework . ISO/IEC JCT 1/SC 21 DIS 9646, Part 3, February 1989. International Organization for Standardization, available at http://www.standardsinfo.net/. 15. Information Processing Systems. OSI Conformance Testing Methodology and Framework . ISO/IEC JCT 1/SC 21 DIS 9646, Parts 4–5, March 1989. International Organization for Standardization, available at http://www.standardsinfo.net/. 16. Information Processing Systems. OSI Conformance Testing Methodology and Framework . ISO/IEC JCT 1/SC 21 DIS 9646, Parts 1–2, November 1988. International Organization for Standardization, available at http://www.standardsinfo.net/. 17. R. J. Linn, Jr. Conformance Testing for OSI Protocols. Computer Networks and ISDN Systems, Vol. 18, 1989/1990, pp. 203–219. 18. D. Rayner. OSI Conformance Testing. Computer Networks and ISDN Systems, Vol. 14, 1987, pp. 79– 98. 19. C. Willcock, T. Deiss, S. Tobies, S. Keil, F. Engler, and S. Schulz. An Introduction to TTCN-3 . Wiley, New York, 2005. 318 CHAPTER 10 TEST GENERATION FROM FSM MODELS 20. F. Belina, D. Hogrefe, and A. Sarma. SDL—With Application from Protocol Specification. Prentice-Hall, Upper Saddle River, NJ, 1991. 21. CCITT. Specification and Description Language, Recommendation z.100, CCITT SG X, ITU, Geneva, Switzerland, 1992. Exercises 1. Considering the FSM of Figure 10.17 discussed in this chapter, provide a test sequence table, similar to Table 10.12, for the state transition (D, B, b/y). 2. What are the fundamental differences between UIO sequence and distinguishing sequence methods of state verification? 3. Consider the FSM G = , shown in Figure 10.49, where S = {A, B, C} is the set of states, I = {a, b, c} is the set of inputs, O = {e, f } is the set of outputs, A is the initial state, δ : S × I → S is the next-state function, and λ : S × I → O is the output function. (a) Generate a distinguishing sequence for the FSM, if it exists. (b) Generate characterizing sequences for the FSM. (c) Generate UIO sequence(s) for each state of the FSM, if those exist. (d) Compare the distinguishing sequence with the UIO sequences generated from the FSM. Are there any similarities and/or differences between the two kinds of sequences? 4. Consider the FSM H = of Figure 10.50, where S = {A, B, C, D, E, F, G} is the set of states, I = {ri, a, c, x, z} is the set b/f Initial state a/e A a/f c/e c/e b/e C a/f c/f B b/f Figure 10.49 FSM G. Initial state A G a/b a/b a/f x/d E B F z/b c/d a/f z/f C D Figure 10.50 FSM H. b/y Initial state A b/x a/x b/x a/y C B a/x Figure 10.51 FSM K. REFERENCES 319 of inputs, O = {null, b, f, d} is the set of outputs, A is the initial state, δ : S × I → S is the next-state function, and λ : S × I → O is the output function. (a) Generate UIO sequence(s) for each state of the FSM, if those exist. (b) Generate test sequences for each transition of the FSM using UIO sequences for state verification. Assume that a reset input ri will always bring the FSM back to its initial state A. The FSM produces a null output in response to the ri input. (c) Represent the following two test cases in TTCN-3 form with verdicts: (B, C, c/d) and (A, B, a/b). 5. Consider the FSM K = of Figure 10.51, where S = {A, B, C} is the set of states, I = {a, b} is the set of inputs, O = {x, y} is the set of outputs, A is the initial state, δ : S × I → S is the next-state function, and λ : S × I → O is the output function. (a) Show that the FSM K does not possess a distinguishing sequence. (b) Can you generate a UIO sequence for each state of this FSM? Justify your answer. (c) Generate characterizing sequences for the FSM. 6. Consider the nondeterministic FSM NFSM = of Figure 10.52, where S = {A, B, C, D, E, F, G} is the set of states, I = {ri, a, c, x, z} is the set of inputs, O = {null, b, f, d} is the set of outputs, A is the initial Initial state A G a/b a/b c/f x/d E B F z/b c/d c/f c/d C D Figure 10.52 Nondeterministic FSM. 320 CHAPTER 10 TEST GENERATION FROM FSM MODELS state, δ : S × I → S is the next-state function, and λ : S × I → O is the output function. NFSM is nondeterministic means that for some input ak ∈ I there are more than one transitions with different outputs defined for some state. (a) Generate UIO sequence(s) for each state of the NFSM, if those exist. (b) Do you think the generated UIO sequences can uniquely identify the states? Justify your answer. (c) Devise a methodology to uniquely identify the states of the NFSM. (Hint: The tree structure will identify the state of the NFSM uniquely.) Using your methodology, generate UIO trees for each state of the NFSM. (d) Generate test cases for the following transitions: (a) (F, G, x/d), (b) (B, C, c/d), and (c) (A, B, a/b). (e) Represent the following two test cases in TTCN-3 form with verdicts: (B, C, c/d) and (A, B, a/b). 7. Discuss the effectiveness of the error detection capabilities among the abstract test architectures presented in this chapter. 8. Design a test component to test the system, represented in Figure 10.8, from the standpoint of a local phone alone. In your test component, consider the partial transition tour OH–AD–RNG–TK. 9. Assuming that a called (remote) phone is far away from the calling (local) phone, explain the difficulty you will encounter while designing a test case in TTCN-3 from the following transition tour in Figure 10.8: OH–AD–RNG– TK – LON – TK – RON – AD – OH. 10. Explain the concept of test case verdicts. 11 CHAPTER System Test Design Many things difficult to design prove easy to perform. — Samuel Johnson 11.1 TEST DESIGN FACTORS The central activity in test design is to identify inputs to and the expected outcomes from a system to verify whether the system possesses certain features. A feature is a set of related requirements. The test design activities must be performed in a planned manner in order to meet some technical criteria, such as effectiveness, and economic criteria, such as productivity. Therefore, we consider the following factors during test design: (i) coverage metrics, (ii) effectiveness, (iii) productivity, (iii) validation, (iv) maintenance, and (v) user skill. In the following, we give motivations for considering these factors. Coverage metrics concern the extent to which the DUT is examined by a test suite designed to meet certain criteria. Coverage metrics lend us two advantages. First, these allow us to quantify the extent to which a test suite covers certain aspects, such as functional, structural, and interface of a system. Second, these allow us to measure the progress of system testing. The criteria may be path testing, branch testing, or a feature identified from a requirement specification. Each test case is given an identifier(s) to be associated with a set of requirements. This association is done by using the idea of a coverage matrix. A coverage matrix [Aij ] is generated for the above idea of coverage [1]. The general structure of the coverage matrix [Aij ] is represented as shown in Table 11.1, where Ti stands for the ith test case and Nj stands for the j th requirement to be covered; [Aij ] stands for coverage of the test case Ti over the tested element Nj . The complete set of test cases, that is, a test suite, and the complete set of tested elements of the coverage matrix are identified as Tc = {T1, T2, .., Tq} and Nc = {N1, N2, .., Np}, respectively. A structured test case development methodology must be used as much as possible to generate a test suite. A structured development methodology also minimizes maintenance work and improves productivity. Careful design of test cases Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 321 322 CHAPTER 11 SYSTEM TEST DESIGN TABLE 11.1 Coverage Matrix [Aij ] Test Case Requirement Identifier Identifier N1 N2 ... Np T1 A11 A12 ... A1p T2 A21 A22 ... A2p T3 A31 A32 ... A3p ... ... ... ... ... Tq Aq1 Aq2 ... Aqp in the early stages of test suite development ensures their maintainability as new requirements emerge. The correctness of the requirements is very critical in order to develop effective test cases to reveal defects. Therefore, emphasis must be put on identification and analysis of the requirements from which test objectives are derived. Test cases are created based on the test objectives. Another aspect of test case production is validation of the test cases to ensure that those are reliable. It is natural to expect that an executable test case meets its specification before it is used to examine another system. This includes ensuring that test cases have adequate error handling procedures and precise pass–fail criteria. We need to develop a methodology to assist the production, execution, and maintenance of the test suite. Another factor to be aware of is the potential users of the test suite. The test suite should be developed with these users in mind; the test suite must be easy to deploy and execute in other environments, and the procedures for doing so need to be properly documented. Our test suite production life cycle considers all six factors discussed above. 11.2 REQUIREMENT IDENTIFICATION Statistical evidence gathered by Vinter [2] demonstrates the importance of requirements captured in the development of embedded real-time system projects. Vinter analyzed 1000 defects during his studies, out of which 23.9% of the defect reports stemmed from requirement issues, functionality 24.3%, component structure (the code) 20.9%, data 9.6%, implementation 4.3%, integration 5.2%, architecture 0.9%, testing 6.9%, and other 4.3%. Within those defect reports that were associated with requirements problems, Vinter argued that 48% could be classified as “misunderstanding.” Typically, disagreement existed over the precise interpretation of a particular requirement. Missing constraints constitute 19% of the defect reports that stemmed from requirements problem while changed requirements account for 27%. A further 6% were classified as “other” issues. These statistical studies have inspired practitioners to advocate a new vision of requirements identification. Requirements are a description of the needs or desires of users that a system is supposed to implement. There are two main challenges in defining requirements. First is to ensure that the right requirements are captured, which is essential for 11.2 REQUIREMENT IDENTIFICATION 323 meeting the expectations of the users. Requirements must be expressed in such a form that the users and their surrogates can easily review and confirm their correctness. Therefore, the “form” of a requirement is crucial to the communication between users (and their surrogates) and the representatives of a software development organization. Second is to ensure that the requirements are communicated unambiguously to the developers and testers so that there are no surprises when the system is delivered. A software development team may not be in charge of collecting the requirements from the users. For example, a team of marketing people may collect the requirements. In such a case, there may not be a direct link between the ultimate users and the technical teams, such as development team, system integration team, and system test team. It is undesirable for the teams to interpret the requirements in their own ways. There are two severe consequences of different teams interpreting a requirement in different ways. First, the development team and the system test team may have conflicting arguments about the quality of the product while analyzing the requirements before delivery. Second, the product may fail to meet the expectations of the users. Therefore, it is essential to have an unambiguous representation of the requirements and have it made available in a centralized place so that all the stakeholders have the same interpretation of the requirements [3]. We describe a formal model to capture the requirements for review and analysis within an organization in order to achieve the above two goals. Figure 11.1 shows a state diagram of a simplified requirement life cycle starting from the submit state to the closed state. This transition model provides different phases of a requirement, where each phase is represented by a state. This model represents the life of a requirement from its inception to completion through the following states: submit, open, review, assign, commit, implement, verification, and finally closed. At each of these states certain actions are taken by the owner, and the requirement is moved to the next state after the actions are completed. A requirement may be moved to the decline state from any of the states open, review, assign, implement, and verification for several reasons. For example, a marketing manager may decide Submit Open Review Decline Assign Closed Verification Implement Figure 11.1 State transition diagram of requirement. Commit 324 CHAPTER 11 SYSTEM TEST DESIGN that the implementation of a particular requirement may not generate revenue. Therefore, the marketing manager may decline a requirement and terminate the development process. The state transition diagram showing the life cycle of an individual requirement can be easily implemented using a database system with a graphical user interface (GUI). One can customize any existing open-source tracking system to implement the requirement model described here [4]. The idea behind implementing the state transition diagram is to be able to track the requirements as these flow through the organization [5, 6]. A requirement schema is designed with different tabbed panes using commonly used GUI and database. Subsequently, requirements can be stored and queries generated for tracking and reporting the status of the requirements. A list of fields of the schema is given in Table 11.2. It is necessary to implement a secure access system for different users of the database. Customers TABLE 11.2 Requirement Schema Field Summary Field Name Description requirement_id title description state product customer note software_release committed_release priority severity marketing_justification eng_comment time_to_implement eng_assigned functional_spec_title functional_spec_name functional_spec_version A unique identifier associated with the requirement Title of the requirement; one-line summary of the requirement Description of the requirement Current state of the requirement; can take a value from the set {Submit, Open, Review, Assign, Commit, Implement, Verification, Closed, Decline} Product name Name of customer who requested this requirement Submitter’s note; any additional information the submitter wants to provide that will be useful to marketing manager or director of software engineering Assigned to a software release; software release number in which the requirement is desired to be available for end customer Software release number in which requirement will be available Priority of requirement; can take value from the set {high, normal} Severity of requirement; can take value from the set {critical, normal} Marketing justification for existence of requirement Software engineering director’s comment after review of requirement; comments useful when developing functional specification, coding, or unit testing Estimated in person-week; time needed to implement requirement, including developing functional specification, coding, unit, and integration testing Assigned to an engineer by software engineering director in order to review requirement Functional specification title Functional specification filename Latest version of functional specification 11.2 REQUIREMENT IDENTIFICATION 325 TABLE 11.2 (Continued) Field Name Description decline_note ec_number attachment tc_id tc_results verification_method verification_status compliance testing_note defect_id Explanation of why requirement is declined Engineering change (EC) document number Attachment (if any) Test case identifier; multiple test case identifiers can be entered; these values may be obtained automatically from test factory database Test case result; can take value from the set {Untested, Passed, Failed, Blocked, Invalid}; can be automatically obtained from test factory database Verification method; T—by testing; A—by analysis; D—by demonstration; I—by inspection Verification state (passed, failed, incomplete) of requirement. Compliance; can take value from the set {compliance, partial compliance, noncompliance} Notes from test engineer; may contain explanation of analysis, inspection, or demonstration given to end customer by test engineer Defect identifier; value can be extracted from test factory database along with test results. If the tc_results field takes the value “failed,” then the defect identifier is associated with the failed test case to indicate the defect that causes the failure. may be given restricted access to the database, so that they can see the status of their requirements within the organization. Customers can generate a traceability matrix from the requirement database system, which gives them confidence about test coverage [7]. A traceability matrix allows one to find a two-way mapping between requirements and test cases as follows [8]: • From a requirement to a functional specification to specific tests which exercise the requirements • From each test case back to the requirement and functional specifications A traceability matrix finds two applications: (i) identify and track the functional coverage of a test and (ii) identify which test cases must be exercised or updated when a system evolves [9]. It is difficult to determine the extent of coverage achieved by a test suite without a traceability matrix. The test suite can contain a sizable number of tests for the features of a system, but without the cross-referencing provided by a traceability matrix it is difficult to discern whether a particular requirement has been adequately covered. The following definition by Gotel and Finkelstein [10] sums up the general view of requirements traceability (p. 97): The requirements traceability is the ability to describe and follow the life of a requirement, in both forward and backward direction, i.e., from its origins, through its development and specification, to its subsequent deployment and use, and through periods of ongoing refinement and iteration in any of these phases. 326 CHAPTER 11 SYSTEM TEST DESIGN Submit State A new requirement is put in the submit state to make it available to others. The owner of this state is the submitter. A new requirement may come from different sources: customer, marketing manager, and program manager. A program manager oversees a software release starting from its inception to its completion and is responsible for delivering it to the customer. A software release is the release of a software image providing new features. For example, one can release an OS to customers every eight months by adding new features and using an appropriate numbering scheme, such as OS release 2.0, OS release 2.1, and so on. Usually, the requirements are generated from the customers and marketing managers. A defect filed by a test engineer may become a requirement in a future release because it may become apparent that the reported defect is not really a true defect and should be treated as an enhancement request. In case of a dispute between the test engineers and the developers, the program manager may make a final decision in consultation with relevant people. The program manager can submit a new requirement based on the issue raised in the defect if he or she decides that the disputed defect is, in fact, an enhancement request. The following fields of the schema given in Table 11.2 are filled out when a requirement is submitted: requirement_id : A unique identifier associated with the requirement. priority: A priority level of the requirement—high or normal. title: A title for the requirement. submitter: The submitter’s name. description: A short description of the requirement. note: Some notes on this requirement, if there are any. product: Name of the product in which the requirement is desired. customer: Name of the customer who requested this requirement. The priority level is an indication of the order in which requirements need to be implemented. Requirements prioritization [11] is an important aspect in a market-driven requirements engineering process [12, 13]. All the high-priority requirements should be considered for implementation before the normal priority requirements. The submitter can assign a priority level to a requirement that defines the requirement’s level of importance. The marketing manager can move the state from submit to open by assigning the ownership to himself. Open State In this state, the marketing manager is in charge of the requirement and coordinates the following activities. • Reviews the requirement to find duplicate entries. The marketing manager can move the duplicate requirement from the open state to the decline state with an explanation and a pointer to the existing requirement. Also, he or she may ensure that there are no ambiguities in the requirement and, if there is any ambiguity, consult with the submitter and update the description and the note fields of the requirement. • Reevaluates the priority of the requirement assigned by the submitter and either accepts it or modifies it. 11.2 REQUIREMENT IDENTIFICATION 327 • Determines the severity of the requirement. There are two levels of severity defined for each requirement: normal and critical. The severity option provides a tag for the upper management, such as the director of software engineer, in order to review the requirement. If the severity level is critical, then it is a flag to the director to complete the review as soon as possible. Assignment of a severity level is made independent of the priority level. • Suggests a preferred software release for the requirement. • Attaches a marketing justification to the requirement. • Moves the requirement from the open state to the review state by assigning the ownership to the director of software engineering. • The marketing manager may decline a requirement in the open state and terminate the development process, thereby moving the requirement to the decline state with a proper explanation. The following fields may be updated by the marketing manager, who is the owner of the requirement in the open state: priority: Reevaluate the priority—high or normal—of this requirement. severity: Assign a severity level—normal or critical—to the requirement. decline_note: Give an explanation of the requirement if declined. software_release: Suggest a preferred software release for the requirement. marketing_justification: Provide a marketing justification for the require- ment. description: Describe the requirement, if there is any ambiguity. note: Make any useful comments, if there is a need. Review State The director of software engineering is the owner of the requirement in the review state. A requirement stays in the review state until it passes through the engineering process as explained in the following. The software engineering director reviews the requirement to understand it and estimate the time required to implement this. The director thus prepares a preliminary version of the functional specification for this requirement. This scheme provides a framework to map the requirement to the functional specification which is to be implemented. The director of software engineering can move the requirement from the review state to the assign state by changing the ownership to the marketing manager. Moreover, the director may decline this requirement if it is not possible to implement. The following fields may be updated by the director: eng_comment: Comments generated during the review are noted in this field. The comments are useful in developing a functional specification, generating code, or performing unit-level testing. time_to_implement: This field holds the estimated time in person-weeks to implement the requirement. 328 CHAPTER 11 SYSTEM TEST DESIGN attachment: An analysis document, if there is any, including figures and descriptions that are likely to be useful in the future development of functional specifications. functional_spec_title: The name of the functional specification that will be written for this requirement. functional_spec_name: The functional specification filename. functional_spec_version: The latest version number of the functional specification. eng_assigned : Name of the engineer assigned by the director to review the requirement. Assign State The marketing manager is the owner of the requirement in the assign state. A marketing manager assigns the requirement to a particular software release and moves the requirement to the commit state by changing the ownership to the program manager, who owns that particular software release. The marketing manager may decline the requirement and terminate the development process, thereby moving the requirement to the decline state. The following fields are updated by the marketing manager: decline_note and software_release. The former holds an explanation for declining, if it is moved to the decline state. On the other hand, if the requirement is moved to the commit state, the marketing manager updates the latter field to specify the software release in which the requirement will be available. Commit State The program manager is the owner of the requirement in the commit state. The requirement stays in this state until it is committed to a software release. The program manager reviews all the requirements that are suggested to be in a particular release which is owned by him. The program manger may reassign a particular requirement to a different software release by consulting with the marketing manager, the software engineering director, and the customer. The requirement may be moved to the implement state by the program manager after it is committed to a particular software release. All the functional specifications should be frozen after a requirement is committed, that is, exited from the commit state. It is important to stabilize and freeze the functional specification for test design and development. The test engineers must complete the review of the requirement and the relevant functional specification from a testability point of view. Next, the test engineers can start designing and writing test cases for this requirement, as discussed in Section 11.6. The only field to be updated by the program manager, who is the owner of the requirement in the commit state, is committed_release. The field holds the release number for this requirement. Implement State The director of software engineering is the owner of the requirement in the implement state. This state implies that the software engineering group is currently coding and unit testing the requirement. The director of software engineering may move the requirement from the implement state to the verification 11.2 REQUIREMENT IDENTIFICATION 329 TABLE 11.3 Engineering Change Document Information EC number Requirement(s) affected Problem/issue description Description of change required Secondary technical impact Customer impacts Change recommended by Change approved by A unique number Requirement ID(s) and titles Brief description of issue Description of changes needed to original requirement description Description of impact EC will have on system Description of impact EC will have on end customer Name of the engineer(s) Name of the approver(s) state after the implementation is complete and the software is released for system testing, The director can assign an EC number and explain that the requirement is not doable in its current definition. The EC document is attached to the requirement definition. An outline of an EC document is given in Table 11.3. The director can also decline the requirement, if it is technically not possible to implement it, and move the requirement to the decline state. The following fields may be updated by the director, since he or she is the owner of a requirement in the implement state: decline_note: An explanation of the reasons the requirement is declined if it is moved to the decline state. ec_number: The EC document number. attachment: The EC document. Verification State The test manager is the owner of the requirement in the verification state. The test manager verifies the requirement and identifies one or more methods for assigning a test verdict: (i) testing, (ii) inspection, (iii) analysis, and (iv) demonstration. If testing is a method for verifying a requirement, then the test case identifiers and their results are provided. This information is extracted from the test factory discussed in Section 11.6. Inspection means review of the code. Analysis means mathematical and/or statistical analysis. Demonstration means observing the system in a live operation. A verdict is assigned to the requirement by providing the degree of compliance information: full compliance, partial compliance, or noncompliance. A testing note is included if a method other than testing is used for verification. The notes may contain an explanation of the analysis, inspection, or demonstration given to the customer. The requirement may get an EC number from the test manager as a testing note. The EC document specifies any deficiency in the implementation of the requirement. A deviation or an error discovered at this stage can rarely be corrected. It is often necessary to negotiate with the customer through an EC document to the requirement. The program manager coordinates this negotiation activity with the customer. The test manager may decline the implementation with an EC number, explaining that the implementation is not conforming to the requirement, and 330 CHAPTER 11 SYSTEM TEST DESIGN move to the decline state. The test manager may move the requirement to the closed state after it has been verified and the value of the verification_status field set to “passed.” The following fields are updated by the test manager since he or she is the owner of the requirement at the verification state: decline_note: The reasons to decline this requirement. ec_number: An EC document number. attachment: The EC document. verification_method : Can take one of the four values from the set {Testing, Analysis, Demonstration, Inspection}. verification_status: Can take one of the three values from the set {Passed, Failed, Incomplete}, indicating the final verification status of the requirement. compliance: Can take one of the three values from the set {compliance, partial compliance, noncompliance}, which indicate the extent to which the software image complies with the requirements. tc_id : The test case identifiers that cover this requirement. tc_results: The test case results for the above tests. It can take on one of the five values from the set {Untested, Passed, Failed, Blocked, Invalid}. These values are extracted from the test factory database, which is discussed in Section 11.7. defect_id : A defect identifier. If the tc_results field takes the value “failed,” then the defect identifier is associated with the failed test case to indicate the defect that causes the failure. This value is extracted from test factory database. testing_note: May hold an explanation of the analysis, inspection, or demonstration given to the end customer by the test engineer. Closed State The requirement is moved to the closed state from the verification state by the test manager after it is verified. Decline State In this state, the marketing department is the owner of the requirement. A requirement comes to this state because of some of the following reasons: • The marketing department rejected the requirement. • It is technically not possible to implement this requirement and, possibly, there is an associated EC number. • The test manager declines the implementation with an EC number. The marketing group may move the requirement to the submit state after reviewing it with the customer. The marketing manager may reduce the scope of the requirement after discussing it with the customer based on the EC information and resubmit the requirement by moving it to the submit state. 11.3 CHARACTERISTICS OF TESTABLE REQUIREMENTS 331 11.3 CHARACTERISTICS OF TESTABLE REQUIREMENTS System-level tests are designed based on the requirements to be verified. A test engineer analyzes the requirement, the relevant functional specifications, and the standards to determine the testability of the requirement. The above task is performed in the commit state. Testability analysis means assessing the static behavioral characteristics of the requirement to reveal test objectives. One way to determine the requirement description is testable is as follows: • Take the following requirement description: The system must perform X. • Then encapsulate the requirement description to create a test objective: Verify that the system performs X correctly. • Review this test objective by asking the question: Is it workable? In other words, find out if it is possible to execute it assuming that the system and the test environment are available. • If the answer to the above question is yes, then the requirement description is clear and detailed for testing purpose. Otherwise, more work needs to be done to revise or supplement the requirement description. As an example, let us consider the following requirement: The software image must be easy to upgrade/downgrade as the network grows. This requirement is too broad and vague to determine the objective of a test case. In other words, it is a poorly crafted requirement. One can restate the previous requirement as: The software image must be easy to upgrade/downgrade for 100 network elements. Then one can easily create a test objective: Verify that the software image can be upgraded/downgraded for 100 network elements. It takes time, clear thinking, and courage to change things. In addition to the testability of the requirements, the following items must be analyzed by the system test engineers during the review: • Safety: Have the safety-critical requirements [14] been identified? The safety-critical requirements specify what the system shall not do, including means for eliminating and controlling hazards and for limiting any damage in the case that a mishap occurs. • Security: Have the security requirements [15], such as confidentiality, integrity, and availability, been identified? • Completeness: Have all the essential items been completed? Have all possible situations been addressed by the requirements? Have all the irrelevant items been omitted? • Correctness: Are the requirements understandable and have they been stated without error? Are there any incorrect items? • Consistency: Are there any conflicting requirements? • Clarity: Are the requirement materials and the statements in the docu- ment clear, useful, and relevant? Are the diagrams, graphs, and illustrations 332 CHAPTER 11 SYSTEM TEST DESIGN clear? Have those been expressed using proper notation to be effective? Do those appear in proper places? Is the writing style clear? • Relevance: Are the requirements pertinent to the subject? Are the requirements unnecessarily restrictive? • Feasibility: Are the requirements implementable? • Verifiable: Can tests be written to demonstrate conclusively and objectively that the requirements have been met? Can the functionality of the system be measured in some way that will assess the degree to which the requirements are met? • Traceable: Can each requirement be traced to the functions and data related to it so that changes in a requirements can lead to easy reevaluation? Functional Specification A functional specification provides: i. A precise description of the major functions the system must fulfill the requirements, description of the implementation of the functions, and explanation of the technological risks involved ii. External interfaces with other software modules iii. Data flow such as flowcharts, transaction sequence diagrams, and FSMs describing the sequence of activities iv. Fault handling, memory utilization and performance estimates v. Any engineering limitation, that is, inferred requirements that will not be supported vi. The command line interface or element management system to provi- sion/configure the feature in order to invoke the software implementation related to this feature Once again, the functional specification must be reviewed from the point of view of testability. The characteristics of testable functional specifications are outlined in Table 11.4. The functional specifications are more likely to be testable if they satisfy all the items in the Table 11.4. Common problems with functional specifications include lack of clarity, ambiguity, and inconsistency. The following are the objectives that are kept in mind while reviewing a functional specification [16]: • Achieving Requirements: It is essential that the functional specification identifies the formal requirements to be achieved. One determines, by means of review, whether requirements have been addressed by the functional specification. • Correctness: Whenever possible, the specification parts should be compared directly to an external reference for correctness. • Extensible: The specification is designed to easily accommodate future extensions that can be clearly envisioned at the time of review. • Comprehensible: The specification must be easily comprehensible. By the end of the review process, if the reviewers do not understand how the system works, the specification or its documentation is likely to be flawed. 11.3 CHARACTERISTICS OF TESTABLE REQUIREMENTS 333 TABLE 11.4 Characteristics of Testable Functional Specifications Purpose, goals, and exception are clearly stated. Address the right objectives. Contain the requirements and standards with which this document complies. Clearly stated operating environment. For what hardware, OS, and software release the feature is targeted. Minimum hardware configuration that supports this application. Clearly list the major functions which the system must perform. Clearly define the success criteria which the system must fulfill to be effective. Provide an understandable, organized, and maintainable model of the processes, and or data or objects, using a standard structured method and the principle of functional decomposition. Use standard and clearly defined terminology (key words, glossary, syntax and semantics). Display a heavy use of model and graphics (e.g., SDL-GR, finite-state model), not primarily English narrative. Document the assumptions. The document should have a natural structure/flow and with each atomic feature labeled with an identifier for easy cross-referring to specific test cases. Should have standard exception handling procedure, consistent error messages, and on-line help functions. External interfaces, such as CLI, MIBs, EMS, and Web interface are defined clearly. Clearly stated possible trade-offs between speed, time, cost, and portability. Performance requirements are defined, usually in terms of packets per second, transactions per second, recovery time, response time, or other such metrics. Scaling limits and resource utilization (CPU utilization, memory utilization) are stated precisely. Documentation of unit tests. Note: SDL-GR, Specification and Description Language Graphical Representation Such specifications and documentations need to be reworked to make them more comprehensible. • Necessity: Each item in the document should be necessary. • Sufficiency: The specification should be examined for missing or incomplete items. All functions must be described as well as important properties of input and output data such as volume and magnitude. • Implementable: It is desirable to have a functional specification that is implementable within the given resource constraints that are available in the target environment such as hardware, processing power, memory, and network bandwidth. One should be able to implement a specification in a short period of time without a technological breakthrough. • Efficient: The functional specification must optimize those parts of the solution that contribute most to the performance (or lack thereof) of the system. The reviewers have the discretion of rejecting the specification on the ground of ineffectiveness in specifying efficiency requirements. • Simplicity: In general, it is easier to achieve and verify requirements stated in the form of simple functional specifications. 334 CHAPTER 11 SYSTEM TEST DESIGN • Reusable Components: The specification should reuse existing components as much as possible and be modular enough that the common components can be extracted to be reused. • Consistency with Existing Components: The general structure of the specification should be consistent with the choices made in the rest of the system. It should not require the design paradigm of a system to be changed for no compelling reason. • Limitations: The limitations should be realistic and consistent with the requirements. 11.4 TEST OBJECTIVE IDENTIFICATION The question “What do I test?” must be answered with another question: “What do I expect the system to do?” We cannot test the system comprehensively if we do not understand it. Therefore, the first step in identifying the test objective is to read, understand, and analyze the functional specification. It is essential to have a background familiarity with the subject area, the goals of the system, business processes, and system users for a successful analysis. Let us consider our previously revised requirement: The software image must be easy to upgrade/ downgrade for 100 network elements. The test engineer needs to ask one question: What do I need to know to develop a comprehensive set of test objectives for the above requirement? An inquisitive test engineer may also ask the following questions: • Do we have to upgrade the software image sequentially on each of the network elements or at the same time on all the 100 elements? Or, do we proceed in a batch of 20 elements at a time? • What is the source of the upgrade? Will the source be on an element management server? • What does “easy” mean here? Does it refer to the length of time, say, 200 seconds, taken by the upgrade process? • Can we have a mix of old and new software images on different network elements on the same network? In other words, is a new software image compatible with the old image? • If we support old and new software images on different network elements on the same network, then the EMS should be capable of managing two versions of a software installed on the same network. Is it possible for an EMS to manage network elements running different software versions? • To what release will the software be downgraded? Suppose the software image is upgraded to the nth release from the (n − 1)th release. Now, if the software image is to be downgraded, to what release should it be downgraded—(n − 1)th or (n − 2)th release? 11.5 EXAMPLE 335 • While a system is being upgraded, do we need to observe the CPU utilization of the network elements and the EMS server? What is the expected CPU utilization? We critically analyze requirements to extract the inferred requirements that are embedded in the requirements. An inferred requirement is one that a system is expected to support but is not explicitly stated. Inferred requirements need to be tested just like the explicitly stated requirements. As an example, let us consider the requirement that the system must be able to sort a list of items into a desired order. One obvious test objective is: Verify that the system can sort an unsorted list of items. However, there are several unstated requirements not being verified by the above test objective. Many more test objectives can be identified for the requirement: • Verify that the system produces the sorted list of items when an already sorted list of items is given as input. • Verify that the system produces the sorted list of items when a list of items with varying length is given as input. • Verify that the number of output items is equal to the number of input items. • Verify that the contents of the sorted output records are the same as the input record contents. • Verify that the system produces an empty list of items when an empty list of items is given as input. • Check the system behavior and the output list by giving an input list containing one or more empty (null) records. • Verify that the system can sort a list containing a very large number of unsorted items. The test objectives are put together to form a test group or a subgroup after they have been identified. A set of (sub)groups of test cases are logically combined to form a larger group. A hierarchical structure of test groups as shown in Figure 11.2 is called a test suite. It is necessary to identify the test groups based on test categories and refine the test groups into sets of test objectives. Individual test cases are created for each test objective within the subgroups; this is explained in the next section with an example. Test groups may be nested to an arbitrary depth. They may be used to aid system test planning and execution, which are discussed in Chapters 13 and 13, respectively. 11.5 EXAMPLE The Frame Relay Forum (FRF) defines two kinds of frame relay (FR)/asynchronous transfer mode (ATM) interworking scenarios: network interworking [17] and service interworking [18]. These two interworking functions provide a means by which two technologies, namely ATM and FR, can interoperate. Simply stated, network 336 CHAPTER 11 SYSTEM TEST DESIGN Test suite Group 1 Group 2 Group N Sub group 1 Test objectives Sub group M Test objectives Sub subgroup 1 Sub subgroup K Test objectives Figure 11.2 Test suite structure. Test objectives Interworking provides a transport between two FR devices (or entities). Service interworking enables an ATM user to transparently interwork with an FR user, and neither knows that the other end uses a different technology. Suppose that one of the requirements is to support service interworking as described in FRF.8 [18] on a switch. The director of software engineering develops a functional specification after the requirements are approved by the marketing manager. The test group develops a test category based on the requirements and the functional specification. Since the actual functional specification is not available, we assume the following for simplicity: • The term FrAtm refers to the software component that provides the FR-ATM permanent virtual connection (PVC) service interworking functionality. • FrAtm supports a variety of ATM cell-based physical interfaces on the ATM side: OC3, E3, DS3, E1, and DS1. • FrAtm supports a variety of frame-based physical interfaces on the FR side: V.11, V.35, DS1, E1, DS3, and E3. • The subcomponents of FrAtm are as follows: local management interface (LMI) and data-link connection identifier (DLCI). • FrAtm software components are being implemented on an FR and ATM switch. In other words, both FR and ATM functionality is available on the same switch. Let us briefly analyze the service interworking functionality before we develop different categories of tests. Figure 11.3 illustrates the service interworking between FR and ATM. Service interworking applies when (i) an FR service user 11.5 EXAMPLE 337 Network A FR CPE FR CPE Upper layer FR UNI FR UNI Frame relay network 0.922 C O R E 0.922 0.922 C C O O R R E E PHY PHY PHY I W F I W F I W F Optional protocol translation 0.922 C O R E SSCS CPCS SAR ATM PHY PHY Network B ATM network ATM UNI ATM UNI ATM ATM PHY PHY Figure 11.3 Service interworking between FR and ATM services. B-CPE B-CPE Upper layer SSCS CPCS SAR ATM PHY interworks with an ATM service user, (ii) the ATM service user performs no frame relaying specific functions, and (iii) the frame relaying service user performs no ATM service-specific functions. Broadband-customer premise equipment (B-CPE) has no knowledge that a distant device is attached to an FR network. As shown in Figure 11.3, an FR user sends traffic on a PVC through the FR network to an interworking function (IWF), which then maps it to an ATM PVC. The FR PVC address–ATM PVC address mapping and other options are configured by the network management system associated with the IWF. Again, the IWF can be extended to the networks as shown, but it is more likely to be integrated into the ATM network switch or FR switch. Note that there is always one ATM PVC per FR PVC in the case of service internetworking. The IWF can be explained using a protocol stack model described in Figure 11.3. This protocol stack uses a “null” service-specific convergence sublayer (SSCS) for describing the IWF. This SSCS provides interfaces using standard primitives to Q.922 DL core on one side and to AAL5 (ATM adaptation layer) CPCS (common part convergence sublayer) on the other side within the IWF. Figure 11.4 shows the transformation of FR to ATM cells. Frame Formatting and Delimiting • FR to ATM: The FR frame is mapped into AAL5 PDU; the frame flags, inserted zero bits, and CRC-16 are stripped. The Q922 frame header is removed and some of the fields of the header are mapped into the ATM cell header fields. • ATM to FR: The message delineation provided by AAL5 is used to identify frame boundaries; insert zero bits, CRC-16, and flags. Protocol fields and functions of the ATM AAL5 PDU are translated into the protocol fields and function of the FR frame. 338 CHAPTER 11 SYSTEM TEST DESIGN DLCI DLCI C/R EA 0 FE BE DE EA CN CN 1 DLCI — Data link connection identifier, 12 bits C/R — Command/response, 1 bit EA — Address extension, 1 bit FECN — Forward explicit congestion notification, 1 bit BECN — Backward explicit congestion notification, 1 bit DE — Discard eligibility, 1 bit 0.922 header 2×8 Data (1-n) × 8 CRC-16 2×8 CPCS PDU PAD CPCS-UU 2 × 8 (0−47) × 8 1×8 CPI 1×8 Length 1×8 CRC-32 4×8 SAR SDU SAR SDU 48 × 8 48 × 8 ATM header 5×8 SAR SDU 48 × 8 SAR SDU SAR SDU 48 × 8 48 × 8 GFC VPI VPI VPI VCI VCI PTI CLP GFC — Generic flow control, 4 bits VPI — Virtual path identifier, 8 bits VCI — Virtual circuit identifier, 3 bits For user information cells : PTI <2> is set to 0 PTI <1> is EFCI PTI <0> is user–to–user indication CLP — Cell loss priority, 1 bit 1 = congestion experienced 0 = no congestion HEC — Header error control, 8 bits HEC Figure 11.4 Transformation of FR to ATM cell. Discard Eligibility and Cell Loss Priority Mapping • FR to ATM: Mode 1 or mode 2 may be selected per PVC at subscription time; the default is mode 1 operation: Mode 1: The discard eligibility (DE) field in the Q.922 core frame shall be mapped to the ATM CLP field of every cell generated by the segmentation process of the AAL5 PDU containing the information of that frame. Mode 2: The ATM CLP of every cell generated by the segmentation process of the AAL5 PDU containing the information of that frame shall be set to a constant value (either 0 or 1) configured at the service subscription time. 11.5 EXAMPLE 339 • ATM to FR: Mode 1 or 2 may be selected per PVC at subscription time; the default is mode 1 operation: Mode 1: If one or more cells of a frame have their CLP fields set, the IWF shall set the DE field of the Q.922 core frame. Mode 2: The DE field of the Q.922 core frame shall be set to a constant value (either 0 or 1) configured at service subscription time. Forward Congestion Indication Mapping • FR to ATM: Mode 1 or 2 may be selected per PVC at subscription time; the default is mode 1 operation: Mode 1: The FECN (Forward Explicit Congestion Notification) field in the Q.922 core frame shall be mapped to the ATM EFCI (Explicit Forward Congestion Indication) field of every cell generated by the segmentation process of the AAL5 PDU containing the information of that frame. Mode 2: The FECN field in the Q.922 core frame shall not be mapped to the ATM EFCI field of cells generated by the segmentation process of the AAL5 PDU containing the information of that frame. The EFCI field is always set to “congestion not experienced.” • ATM to FR: If the EFCI field in the last cell of a segmented frame received is set to “congestion experienced,” the IWF will set the FECN core frame to “congestion experienced.” Backward Congestion Indication Mapping • FR to ATM: BECN (Backward Explicit Congestion Notification) is ignored. • ATM to FR: BECN of the Q.922 core frame is set to 0. Command/Response Field Mapping • FR to ATM: The C/R bit of the Q.922 core frame is mapped to the least significant bit in the common part convergence sublayer user-to-user (CPCS_UU) field of the CPCS PDU. • ATM to FR: The least significant bit in the CPCS_UU field of the CPCS PDU is mapped to the C/R bit of the Q.922 core frame. DLCI Field Mapping There is a one-to-one mapping between the Q.922 DLCI and the VPI/VCI (Virtual Path Identifier/Virtual Circuit Identifier) field in the ATM cells. The mapping is defined when the PVC is established. The association may be arbitrary or systematic. Traffic Management Frame relay quality-of-service (QoS) parameters (CIR, Bc, Be) will be mapped to ATM QoS parameters (PCR, SCR, MBS) using method 1, one-to-one mapping of the ATM Forum B-ICI (BISDN-Inter Carrier Interface) specification, which is given in Table 11.5. The value of the frame size variable used in the calculations is configurable per PVC. Also configurable per PVC will be CDVT. 340 CHAPTER 11 SYSTEM TEST DESIGN TABLE 11.5 Mapping of FR QoS Parameters to ATM QoS Parameters PCR0+1 = AR/8 × [OHA(n)] SCR0 = CIR/8 × [OHB(n)] MBS0 = [Bc/8 × (1/(1 − CIR/AR)) + 1] × [OHB(n)] SCR1 = EIR/8 × [OHB(n)] MBS1 = [Be/8 × (1/(1 − EIR/AR)) + 1] × [OHB(n)] CDVT = 1/PCR0+1 where n - number of user information octets in frame AR - access line rate CIR - committed information rate, Bc/T (bits/s) EIR - excess information rate, Be/T (bits/s), CIR + EIR < AR Bc - committed burst size (bits) Be - excess burst size (bits) T - measurement interval PCR - peak cell rate (cells/s) SCR - sustained cell rate (cells/s) MBS - maximum burst size (number of cells) CDVT - cell delay variation tolerance |X| - smallest integer greater than or equal to X OHA(n) = |(n + h1 + h2)/48|/(n + h1 + h2), overhead factor for access rate (cells/byte) h1 = FR header size (octets), 2-, 3:, or 4-octet headers h2 = FR HDLC overhead size of CRC-16 and flags (4 octets) OHB(n) = |(n + h1 + h2)/48|/n, overhead factor for committed/excess rate (cells/byte) and subscript 0 + 1, 0, or 1 applied to PCR, SCR, or MBS implies the parameter values for CLP = 0 + 1 cell stream, CLP = 0 cell stream, and CLP = 1 cell stream, respectively Note: This method characterizes FR traffic using three generic cell rate algorithms (GCRAs) described in Appendix B of the ATM Forum UNI specification, 1993. Frames are with n user interface bytes. PVC Mapping • FR to ATM: When the IWF is notified that an FR PVC changes state from active to inactive, it will begin sending AIS (Alarm Indication Signal) F5 OAM (Operation, Administration, and Management) cells on the corresponding ATM PVC. An AIS F5 OAM cell is sent once per second until the state of the PVC changes from inactive to active. • ATM to FR: When the IWF is notified that an ATM PVC changes state from active to inactive, it will set the FR PVC status bit to inactive. Note: An ATM PVC is considered inactive if (i) AIS or RDI (Remote Defect Indication) OAM cells are received or (ii) the ILMI (Interim Local Management Interface) MIB variable atmVccOperStatus is either localDown or end2endDown or (iii) the ATM interface is down. ATM responds to received AIS cells by sending RDI cells. Upper Layer User Protocol Encapsulation The network provider can configure one of the following two modes of operation for each pair of interoperable FR and ATM PVCs regarding upper protocol encapsulations: 11.5 EXAMPLE 341 • Mode 1—transparent mode: The upper layer encapsulation methods will not be mapped between ATM and FR standards. • Mode 2—translation mode: Encapsulation methods for carrying multiple upper layer user protocols (e.g., LAN to LAN) over FR PVC and an ATM PVC conform to the standard RFC 1490 [19] and RFC 1483 [20], respectively. The IWF shall perform mapping between the two encapsulations due to the incompatibilities of the two methods. This mode supports the interworking of the internetworking (routed and/or bridged) protocol. Fragmentation and Reassembly • FR to ATM: When fragmented packets are received on an FR PVC by the IWF, reassembly should be performed while forwarding the assembled frame to the ATM PVC. • ATM to FR: Fragmentation should be performed on the received CPCS PDU before forwarding them to the FR PVC if the CPCS PDU is greater than the maximum frame size supported on the FR PVC. FR–ATM PVC Service Interworking Test Category The FR–ATM PVC service interworking requirement is divided into six main categories: (i) functionality, (ii) robustness, (iii) performance, (iv) stress, (v) load and stability, and (vi) regression. The complete structure of the test categories is given in Figure 11.5. Functionality Tests Functionality tests are designed to verify the FrAtm software as thoroughly as possible over all the requirements specified in the FRF.8 document. This category has been further subdivided into six different functional subgroups. For each identified functional subgroup, test objectives are formed that adequately exercise the function. Within each subgroup test objectives are designed to cover the valid values. The following major functionalities of the FrAtm feature must be tested adequately: (1) configuration and monitoring tests, (2) traffic management tests, (3) congestion tests, (4) service interworking function (SIWF) translation mode mapping tests, (5) alarm tests, and (6) interface tests: 1. Configuration and Monitoring Tests: Tests are designed to verify all the configurable attributes of the FrAtm component using the CLI command. These configurable attributes are implementation dependent and should be defined in the functional specification. Configuration tests configure the FrAtm using CLI. Monitoring tests verify the ability to use the CLI to determine the status and functioning of FrAtm. Statistics counters are validated for accuracy by using known amounts of traffic. 2. Traffic Management Tests: Tests are designed to verify mapping between ATM traffic parameters PCR0+1, SCR0+1, and MBS and FR rate enforcement parameters CIR, Bc, and Be. Mapping between these parameters is subject to engineering consideration and is largely dependent on the balance between acceptable loss probabilities and network costs. 342 CHAPTER 11 SYSTEM TEST DESIGN Functionality tests Robustness tests FR–ATM test suite Performance tests Stress tests Load and stability tests Regression tests Figure 11.5 FrAtm test suite structure. Configuration and monitoring tests Traffic management tests Congestion tests SIWF translation mode mapping tests Alarm tests Interface tests 3. Congestion Tests: Various tests are designed to verify the extent of the FrAtm congestion control mechanism. The test objectives of each test are as follows: • Verify that the long-term maximum rate for the committed burst is CIR. • Verify that the long-term maximum rate for the excess burst is EIR. • Verify that DE = 0 and DE = 1 frames are counted toward the com- mitted burst and excess burst, respectively. • Verify that FrAtm sets the BECN bit on the FrAtm to signal local congestion to the FR user. 4. Service Interworking Function Translation Mode Mapping Tests: Tests are designed to verify the SIWF translation mode mapping from FR to ATM and vice versa with the following objectives: • Verify that for frames from FR to ATM the IWF replaces the RFC 1490 encapsulation with the RFC 1483 encapsulation header. • Verify that for frames from ATM to FR the IWF replaces the RFC 1483 encapsulation with the RFC 1490 encapsulation header. • In the FR-to-ATM direction, verify that the FECN bit on a given frame maps directly into the EFCI bit of every cell that constitutes a frame if the EFCI attribute is configured as “preserve”; otherwise, the EFCI bit of every cell generated is set to zero. 11.5 EXAMPLE 343 • In the ATM-to-FR direction, verify that the EFCI bit of the last cell constituting a given frame is mapped directly into the FECN bit of that frame. • In the FR-to-ATM direction, verify that the DE bit of a given frame maps directly into the CLP bit of every cell that constitutes that frame if so configured. • In the ATM-to-FR direction, verify that the CLP-to-DE mapping can be configured with values “preserve,” “always 0,” and “always 1.” • In the FR-to-ATM direction, verify that the command/response (C/R) bit of the FR frame is mapped directly to the least significant bit of UU data in the CPCS of AAL5 encapsulation. • In the ATM-to-FR direction, verify that the least significant bit of the UU data in the CPCS is mapped directly to the C/R bit of the constituting FR frame. • In the FR-to-ATM direction, verify that the IWF will send AIS F5 OAM cells on the ATM PVC when the FR PVC changes state from active to inactive. Also, verify that the OAM cells are sent once per second until the state of the PVC changes from inactive to active. • In the ATM-to-FR direction, verify that the IWF will set the FR PVC status bit to inactive when the ATM PVC changes state from active to inactive. Verify that RDI cells are sent corresponding to the received AIS cells. • In the FR-to-ATM direction, verify that the IWF reassembles the fragmented packets received on a FR PVC and forwards it to the ATM PVC. • In the ATM-to-FR direction, verify that the fragmentation ID is performed by the IWF on the received CPCS PDU before forwarding them to the FR PVC. Ensure that the CPCS PDU size is greater than the maximum frame size supported on the FR PVC. 5. Alarm Tests: Tests are designed to make sure that various alarms are generated for the FR–ATM service. The test objectives of each test are as follows: • Verify that FrAtm generates appropriate alarms per PVC for individual DLCI. • Verify that FrAtm generates state change notification (SCN) alarms when the operational state changes. The state change can occur when the link comes up or goes down. The state change can also occur by locking and unlocking FrAtm. • Verify that FrAtm generates alarms to indicate the receipt of unexpected messages. • Verify that alarms are generated to indicate the failure to allocate storage to create the service or to create subcomponents such as DLCIs. 344 CHAPTER 11 SYSTEM TEST DESIGN 6. Interface Tests: Verify that FrAtm supports V.11, V35, DS1, E1, DS3, and E3 on the FR side and OC3, E3, and DS3 interfaces on the ATM side. Robustness Tests One verifies the robustness of the FrAtm implementations to error situations: FrAtm Tests: Verify that (i) the FrAtm software component can handle continuous lock/unlock without any crash, (ii) the DLCI software component can handle multiple lock/unlock commands generated at high rate, and (iii) the DLCI software component can handle lock/unlock while traffic is going through the switch without any crash. Boundary Value Tests: Verify that FrAtm handles boundary values and values below and values above the valid boundary for a subset of configurable attributes. For example, if the maximum number of DLCI configurable is 1024, then try to configure 1025 DLCI subcomponents and verify that the 1025th DLCI subcomponent is not created. Performance Tests Tests are designed to measure data delay and throughput across different interfaces on both the FR and the ATM sides. Tests are conducted to measure the delay and throughput over a full range of ATM interface cards and FR interface cards for frame sizes 64, 128, 256, 512, 1024, 2048, and 4096 bytes. Stress Tests Tests are designed to observe and capture the behavior of the FR–ATM services under various types of load profiles. The test objectives of each test are as follows: • Verify that FrAtm works without any hiccups when it is stressed for 48 hours or more while lock/unlock activities and data transfer activity are going on. • Verify that FrAtm works without any hiccups when the maximum number of DLCIs on the FR side using one FR card and the maximum number of ATM VCCs (Virtual Channel Connection) are configured. Load and Stability Tests Tests are designed to simulate a real customer configuration by sending real data traffic. For these tests, setting up the customer environment in the laboratory is challenging and expensive. An example of this test is to verify that a user file can be transferred correctly from an FR network to an ATM network by using the FR–ATM PVC service interworking feature. Regression Tests New test cases are not designed; rather, tests are selected from the groups explained before. In our opinion, a subset of test cases from each subgroup may be selected as regression tests. In addition, test cases from FR and ATM must be selected and executed to ensure that the previously supported functionality works with the new software component FrAtm. 11.6 MODELING A TEST DESIGN PROCESS 345 Deleted Deprecated Create Draft Review Released Update Figure 11.6 State transition diagram of a test case. 11.6 MODELING A TEST DESIGN PROCESS Test objectives are identified from a requirement specification, and one test case is created for each test objective. Each test case is designed as a combination of modular components called test steps. Test cases are clearly specified so that testers can quickly understand, borrow, and reuse the test cases. Figure 11.6 illustrates the life-cycle model of a test case in the form of a state transition diagram. The state transition model shows the different phases, or states, in the life cycle of a test case from its inception to its completion through the following states: create, draft, review, deleted, released, update, and deprecated. Certain actions are taken by the “owner” of the state, and the test case moves to a next state after the actions are completed. In the following, the states are explained one by one. One can easily implement a database of test cases using the test case schema shown in Table 11.6. We refer to such a database of test cases as a test factory. Create State A test case is put in this initial state by its creator, called the owner or creator, who initiates the design of the test case. The creator initializes the following mandatory fields associated with the test case: requirement_ids, tc_id, tc_title, originator_group, creator, and test_category. The test case is expected to verify the requirements referred to in the requirement_ids field. The originator_group is the group who found a need for the test. The creator may assign the test case to a specific test engineer, including himself, by filling out the eng_assigned field, and move the test case from the create to the draft state. Draft State The owner of this state is the test group, that is, the system test team. In this state, the assigned test engineer enters the following information: tc_author, objective, setup, test_steps, cleanup, pf_criteria, candidate_for_automation, automation_priority. After completion of all the mandatory fields, the test engineer may reassign the test case to the creator to go through the test case. The 346 CHAPTER 11 SYSTEM TEST DESIGN TABLE 11.6 Test Case Schema Summary Field Description tc_id tc_title creator status owner eng_assigned objective tc_author originator_group test_category setup test_steps cleanup pf_criteria requirement_ids candidate_for_automation automation_priority review_action approver_names Test case identifier assigned by test author (80 characters) Title of test case (120 characters) Name of person who created test case Current state of the record: create, draft, review, released, update, deleted, and deprecated Current owner of test case Test engineer assigned to write test procedure Objective of test case (multiline string) Name of test case author (user name) Group that originates test (performance testing group, functional testing group, scaling testing group, etc.) Test category name (performance, stress, interoperability, functionality, etc.) List of steps to perform prior to test List of test steps List of posttest activities List of pass/fail criteria List of references to requirements ID from requirement database If test can be/should be automated Automation priority. Action items from review meeting minutes List of approver names test case stays in this state until it is walked through by the creator. After that, the creator may move the state from the draft state to the review state by entering all the approvers’ names in the approver_names field. Review and Deleted States The owner of the review state is the creator of the test case. The owner invites test engineers and developers to review and validate the test case. They ensure that the test case is executable, and the pass–fail criteria are clearly specified. Action items are created for the test case if any field needs a modification. Action items from a review meeting are entered in the review_actions field, and the action items are executed by the owner to effect changes to the test case. The test case moves to the released state after all the reviewers approve the changes. If the reviewers decide that this is not a valid test case or it is not executable, then the test case is moved to the deleted state. A review action item must tell to delete this test case for a test case to be deleted. Released and Update States A test case in the released state is ready for execution, and it becomes a part of a test suite. On the other hand, a test case in the update state implies that it is in the process of being modified to 11.7 MODELING TEST RESULTS 347 enhance its reusability, being fine tuned with respect to its pass–fail criteria, and/or having the detailed test procedure fixed. For example, a reusable test case should be parameterized rather than hard coded with data values. Moreover, a test case should be updated to adapt it to changes in system functionality or the environment. One can improve the repeatability of the test case so that others can quickly understand, borrow, and reuse it by moving a test case in the released–update loop a small number of times. Also, this provides the foundation and justification for the test case to be automated. A test case should be platform independent. If an update involves a small change, the test engineer may move the test case back to the released state after the fix. Otherwise, the test case is subject to a further review, which is achieved by moving it to the review state. A test case may be revised once every time it is executed. Deprecated State An obsolete test case may be moved to a deprecated state. Ideally, if it has not been executed for a year, then the test case should be reviewed for its continued existence. A test case may become obsolete over time because of the following reasons. First, the functionality of the system being tested has much changed, and due to a lack of test case maintenance, a test case becomes obsolete. Second, as an old test case is updated, some of the requirements of the original test case may no longer be fulfilled. Third, reusability of test cases tend to degrade over time as the situation changes. This is especially true of test cases which are not designed with adequate attention to possible reuse. Finally, test cases may be carried forward carelessly long after their original justifications have disappeared. Nobody may know the original justification for a particular test case, so it continues to be used. 11.7 MODELING TEST RESULTS A test suite schema can be used by a test manager to design a test suite after a test factory is created. A test suite schema, as shown in Table 11.7, is used to group test cases for testing a particular release. The schema requires a test suite ID, a title, an objective, and a list of test cases to be managed by the test suite. One also identifies the individual test cases to be executed (test cycles 1, 2, 3 and/or regression) and the requirements that the test cases satisfy. The idea here is to gather a selected number of released test cases and repackage them to form a test suite for a new project. Test engineers concurrently execute test cases from a selected test suite on different test beds. The results of executing those test cases are recorded in the test factory database for gathering and analyzing test metrics. In a large, complex system with many defects, there are several possibilities of the result of a test execution, not merely passed or failed. Therefore, we model the results of test execution by using a state transition diagram as shown in Figure 11.7, and the corresponding schema is given in Table 11.8. Figure 11.7 illustrates a state diagram of a test case result starting from the untested state to four different states: passed, failed, blocked, and invalid. 348 CHAPTER 11 SYSTEM TEST DESIGN TABLE 11.7 Test Suite Schema Summary Field Description test_suite_id test_suite_title test_suite_objective tests test_id test_title test_category tester sw_dev priority requirement_ids cycle 1–3 regression Unique identifier assigned by originator Test suite title, one-line title for test suite Objective of test suite, short description. Reference list of test case identifiers Test case ID, selected Test case title, read only, filled in when test case is created Category name (performance, stress, interoperability, functionality, etc.) Engineer responsible for testing Software developer responsible for this test case who will assists test engineer in execution of test case Priority of test case Requirement identifier, read only, filled in when test case is created Check box to indicate test is cycle 1, 2, or 3 test case Check box to indicate test is regression test case TABLE 11.8 Test Result Schema Summary Field Description tc_id test_title test_category status run date time tester release build defect_ids test_suite Reference to test case identifier record Test case title, read only, filled in when result is created Test category name (performance, stress, interoperability, functionality, etc.) State of test case result: passed, failed, blocked, invalid or untested; initially status of test case is “untested” Date test case was run Time spent in executing test case Name of person who ran test Software release (03.00.00) Software integration number (1–100). List of defects which cause test to fail; value can come from bug tracking database Test suite this results pertains to The execution status of a test case is put in its initial state of untested after designing or selecting a test case. If the test case is not valid for the current software release, the test case result is moved to the invalid state. In the untested state, the test suite identifier is noted in a field called test_suite_id. The state of the test result, after execution of a test case is started, may change to one of the following states: Untested Passed Failed Blocked 11.8 TEST DESIGN PREPAREDNESS METRICS 349 Invalid Figure 11.7 State transition diagram of test case result. passed, failed, invalid, or blocked. A test engineer may move the test case result to the passed state from the untested state if the test case execution is complete and satisfies the pass criteria. If the test execution is complete and satisfied the fail criteria, a test engineer moves the test result to the failed state from the untested state and associates the defect with the test case by initializing the defect_ids field. The test case must be reexecuted when a new build containing a fix for the defect is received. If the reexecution is complete and satisfies the pass criteria, the test result is moved to the passed state. The test case result is moved to a blocked state if it is not possible to completely execute it. If known, the defect number that blocks the execution of the test case is recorded in the defect_ids field. The test case may be reexecuted when a new build addressing a blocked test case is received. If the execution is complete and satisfies the pass criteria, the test result is moved to the passed state. On the other hand, if it satisfies the fail criteria, the test result is moved to the failed state. If the execution is unsuccessful due to a new blocking defect, the test result remains in the blocked state and the new defect that blocked the test case is listed in the defect_ids field. 11.8 TEST DESIGN PREPAREDNESS METRICS Management may be interested to know the progress, coverage, and productivity aspects of the test case preparation work being done by a team of test engineers. Such information lets them (i) know if a test project is progressing according to schedule and if more resources are required and (ii) plan their next project more accurately. The following metrics can be used to represent the level of preparedness of test design. 350 CHAPTER 11 SYSTEM TEST DESIGN Preparation Status of Test Cases (PST): A test case can go through a number of phases, or states, such as draft and review, before it is released as a valid and useful test case. Thus, it is useful to periodically monitor the progress of test design by counting the test cases lying in different states of design—create, draft, review, released, and deleted. It is expected that all the planned test cases that are created for a particular project eventually move to the released state before the start of test execution. Average Time Spent (ATS) in Test Case Design: It is useful to know the amount of time it takes for a test case to move from its initial conception, that is, create state, to when it is considered to be usable, that is, released state. This metric is useful in allocating time to the test preparation activity in a subsequent test project. Hence, it is useful in test planning. Number of Available Test (NAT) Cases: This is the number of test cases in the released state from existing projects. Some of these test cases are selected for regression testing in the current test project. Number of Planned Test (NPT) Cases: This is the number of test cases that are in a test suite and ready for execution at the start of system testing. This metric is useful in scheduling test execution. As testing continues, new, unplanned test cases may be required to be designed. A large number of new test cases compared to NPT suggest that initial planning was not accurate. Coverage of a Test Suite (CTS): This metric gives the fraction of all requirements covered by a selected number of test cases or a complete test suite. The CTS is a measure of the number of test cases needed to be selected or designed to have good coverage of system requirements. 11.9 TEST CASE DESIGN EFFECTIVENESS The objectives of the test case design effectiveness metric is to (i) measure the “defect revealing ability” of the test suite and (ii) use the metric to improve the test design process. During system-level testing, defects are revealed due to the execution of planned test cases. In addition to these defects, new defects are found during testing for which no test cases had been planned. For these new defects, new test cases are designed, which are called test case escaped (TCE). Test escapes occur because of deficiencies in the test design process. This happens because the test engineers get new ideas while executing the planned test cases. A metric commonly used in the industry to measure test case design effectiveness is the test case design yield (TCDY), defined as TCDY = NPT × 100% NPT + number of TCE The TCDY is also used to measure the effectiveness of a particular testing phase. For example, the system integration manager may want to know the TCDY value for his or her system integration testing. 11.11 LITERATURE REVIEW 351 11.10 SUMMARY This chapter began with an introduction to six factors that are taken into consideration during the design of test cases: coverage metrics, effectiveness, productivity, validation, maintenance, and user skill. Then we discussed the reasons to consider these factors in designing system tests. Next, we discussed the requirement identification process. We provided a state transition model to track the individual requirements as they flow through the organization. At each state of the transition model, certain actions are taken by the owner. The requirement is moved to a new state after the actions are completed. We presented a requirement schema that can be used to generate a traceability matrix. A traceability matrix finds two applications: (i) identifying and tracking the functional coverage of a test suite and (ii) identifying which test cases must be exercised or updated when a system undergoes a change. A traceability matrix allows one to find a mapping between requirements and test cases as follows: • From a requirement to a functional specification to specific tests which exercise them • From each test case back to the requirements and functional specifications Next, we examined the characteristics of testable requirements and functional specification. We provided techniques to identify test objectives from the requirements and functional specifications. A requirement must be analyzed to extract the inferred requirements that are embedded in the requirement. An inferred requirement is anything that a system is expected to do but not explicitly stated. Finally, we showed the ways to create a hierarchical structure of test groups, called a test suite. As an example, we illustrated in detail the design of a system test suite for the FR–ATM serviceinterworking protocol. Next, we provided a state transition model of a test case from its inception to completion. At each state of the transition model, certain actions are taken by the owner. The test case is moved to a new state after the actions are completed. We presented a test case schema to create a test factory that can be used to design a test suite and monitor the test case preparedness metrics. Finally, we provided a metric used in the industry to measure test case design effectiveness, known as the test case design yield. The objectives of the test case effectiveness metric are to (i) measure the “defect revealing ability” of the test suite and (ii) use the metric to improve the test design process. 11.11 LITERATURE REVIEW A good discussion of testing requirements is presented in the article by Suzanne Robertson, entitled “An Early Start to Testing: How to Test Requirements,” which is reprinted in Appendix A of the book by E. Dustin, J. Rashka, and J. Paul (Automated Software Testing: Introduction, Management, and Performance, Addison-Wesley, 352 CHAPTER 11 SYSTEM TEST DESIGN Reading, MA, 1999). The author describes a set of requirement tests that cover relevance, coherency, traceability, completeness, and other qualities that successful requirements must have. The requirements traceability reference models described in the article by B. Ramesh and M. Jarke (“Towards Reference Models for Requirements Traceability,” IEEE Transactions on Software Engineering, Vol.27, No.1, January 2001, pp. 58–93) is based on several empirical studies. Data collection for the work spanned a period of over three years. The main study comprised 30 focus group discussions in 26 organizations conducted in a wide variety of industries, including defense, aerospace, pharmaceuticals, electronics, and telecommunications. The participants had an average of 15.5 years of experience in several key areas of systems development, including software engineering, requirements management, software testing, system integration, systems analysis, maintenance, and software implementation. The participants are categorized into two distinct groups with respect to their traceability practices. These groups are referred to as low-end and high-end traceability users. Separate reference models for these two groups are discussed in the article. The IEEE standard 829-1983 (IEEE Standard for Software Test Documentation: IEEE/ANSI Standard ) provides templates for test case specifications and test procedure specifications. A test case specification consists of the following components: test case specification identifier, test items, input specification, output specification, special environment needs, special procedural requirements, and test case dependencies. A test procedure specification consists of the following components: test procedure specification identifier, purpose, specific requirements, and procedure steps. Another approach to measuring test case effectiveness has been proposed by Yuri Chernak (“Validating and Improving Test-Case Effectiveness,” IEEE Software, Vol. 18, No. 1, January/February 2001, pp. 81–86). The effectiveness metric called test case escaped is defined as TCE = number of defects found by test cases × 100% total number of defects The total number of defects in the above equation is the sum of the defects found by the test cases and the defects found by chance, which the author calls “side effects.” Chernak illustrates his methodology with a client–server application using a baseline TCE value (< 75 for this case) to evaluate test case effectiveness and make test process improvements. Incomplete test design and incomplete functional specifications were found to be the main causes of test escapes. Formal verification of test cases is presented in the article by K. Naik and B. Sarikaya, “Test Case Verification by Model Checking,” Formal Methods in Systems Design, Vol. 2, June 1993, pp. 277–321. The authors identified four classes of safety properties and one liveness property and expressed them as formulas in branching time temporal logic. They presented a model checking algorithm to verify those properties. REFERENCES 353 REFERENCES 1. H. S. Wang, S. R. Hsu, and J. C. Lin. A generalized optimal path-selection model for the structure program testing. Journal of Systems and Software, July, 1989, pp. 55–62. 2. O. Vinter. From Problem Reports to Better Products. In Improving Software Organizations: From Principles to Practice, L. Mathiassen, J. Pries-Heje, and O. Ngwenyama, Eds. Addison-Wesley, Reading, MA, 2002, Chapter 8. 3. M. Glinz and R. J. Wieringa. Stakeholders in Requirements Engineering. IEEE Software, March/April 2007, pp. 18–20. 4. N. Serrano and I. Ciordia. Bugzilla, ITtracker, and Other Bug Trackers. IEEE Software, March/April 2005, pp. 11–13. 5. D. Jacobs. Requirements Engineering So Things Don’t Get Ugly. Crosstalk, the Journal of Defense Software Engineering, October 2004, pp. 19–25. 6. P. Carlshamre and B. Regnell. Requirements Lifecycle Management and Release Planning in Market-Driven Requirements Engineering Process. In Proceedings of the 11th International Workshop on Database and Expert System Applications, Greenwich, UK, IEEE Computer Society Press, Piscataway, NJ, September 2000, pp. 961–966. 7. J. Bach. Risk and Requirements-Based Testing. IEEE Computer, June 1999, pp. 113–114. 8. B. J. Brown. Assurance of Quality Software. SEI Curriculum Module, SEI-CM-7-1.1, July 1987. 9. M. Lehman and J. Ramil. Software Evolution—Background, Theory, Practice. Information Pro- cessing Letters, October 2003, pp. 33–44. 10. O. C. Z. Gotel and A. C. W. Finkelstein. An Analysis of the Requirements Traceability Problem. In Proceedings of First International Conference on Requirements Engineering, Colorado Spring, IEEE Computer Society Press, Piscataway, NJ, 1994, pp. 94– 101. 11. J. Karlsson and K. Ryan. A Cost-Value Approach for Prioritizing Requirements. IEEE Software, September/October 1997, pp. 67–74. 12. E. Carmel and S. Becker. A Process Model for Packaged Software Development. IEEE Transactions on Engineering Management, February 1995, pp. 50–61. 13. P. Soffer, L. Goldin, and T. Kuflik. A Unified RE Approach for Software Product Evolution. In Proceedings of SREP’05 , Paris, Printed in Ireland by the University of Limerick, August 2005, pp. 200– 210. 14. C. W. Johnson and C. M. Holloway. Questioning the Role of Requirements Engineering in the Causes of Safety-Critical Software Failures. In Proceedings of the IET 1st International Conference on System Safety, IEEE Computer Society Press, Piscataway, NJ, London, June 2006. 15. S. L. Pfleeger. A Framework for Security Requirements. Computers & Security, October 1991, pp. 515– 523. 16. W. E. Howden. Validation of Scientific Programs. ACM Computing Surveys, June 1982, pp. 193– 227. 17. Frame Relay/ATM PVC Network Inter-Working-Implementation Agreement, FRF.5. Frame Relay Forum, December 1994. Available at http://www.ipmplsforum.org/Approved/FRF.5/FRF5.TOC. shtml. 18. Frame Relay/ATM PVC Service Inter-Working-Implementation Agreement, FRF.8. Frame Relay Forum, April 1995. Available at http://www.ipmplsforum.org/Approved/FRF.8/FRF8.TOC.shtml. 19. Internet Engineering Task Force (IETF). Multiprotocol Interconnect over Frame Relay, RFC 1490. IETF, July 1993. Available at http://www.apps.ietf.org/rfc/rfc1490.html. 20. Internet Engineering Task Force (IETF). Multiprotocol Encapsulation over AAL, RFC 1483. IETF, July 1993. Available at http://www.apps.ietf.org/rfc/rfc1483.html. Exercises 1. Explain the difference between coverage metrics and traceability matrix. 2. Explain the difference between requirement testability and software testability. 354 CHAPTER 11 SYSTEM TEST DESIGN 3. Justify the statement that software testability and fault tolerant are opposites to each other, and attaining both at the same time for the same piece of software is not feasible. When do you want high testability and when don’t you during the life cycle of a critical software system? Justify your answer. 4. What are the differences between software testability and reliability? What is more important in a software: high testability or high reliability? Justify your answer. 5. In a software test project, the number of unit-, integration-, and system-level test cases specified are 250, 175, and 235, respectively. The number of test cases added during the unit, integration, and system testing phases are 75, 60, and 35, respectively. Calculate the TCDY for unit, integration, and system testing phases. 6. Prepare a checklist of items that will serve as the focal point for reviewing test cases. 7. Under what circumstances can the execution result of a test case be declared as blocked? 8. Implement the requirement model discussed in this chapter. 9. Implement the test factory model discussed in this chapter. 10. For your current test project, develop the test suite hierarchy and identify a test objective for each test case within each test (sub)group. 11. Develop detailed test cases using the schema defined in Table 11.6 as a template for the test objectives you identified in the previous exercise. 12 CHAPTER System Test Planning and Automation When planning for a year, plant corn. When planning for a decade, plant trees. When planning for life, train and educate people. — Chinese proverb 12.1 STRUCTURE OF A SYSTEM TEST PLAN A good plan for performing system testing is the cornerstone of a successful software project. In the absence of a good plan it is highly unlikely that the desired level of system testing is performed within the stipulated time and without overusing resources such as manpower and money. Moreover, in the absence of a good plan, it is highly likely that a low-quality product is delivered even at a higher cost and later than the expected date. The purpose of system test planning, or simply test planning, is to get ready and organized for test execution. Starting a system test in an ad hoc way, after all the modules are checked in to the version control system, is ineffective. Working under deadline pressure, people, that is, test engineers, have a tendency to take shortcuts and to “just get the job done,” which leads to the shipment of a highly defective product. Consequently, the customer support group of the organization has to spend a lot of time in dealing with unsatisfied customers and be forced to release several patches to demanding customers. Test planning is essential in order to complete system testing and ship quality product to the market on schedule. Planning for system testing is part of overall planning for a software project. It provides the framework, scope, resources, schedule, and budget for the system testing part of the project. Test efficiency can be monitored and improved, and unnecessary delay can be avoided with a good test plan. The purpose of a system test plan is summarized as follows: Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi Tripathy Copyright © 2008 John Wiley & Sons, Inc. 355 356 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION TABLE 12.1 Outline of System Test Plan 1. Introduction 2. Feature description 3. Assumptions 4. Test approach 5. Test suite structure 6. Test environment 7. Test execution strategy 8. Test effort estimation 9. Scheduling and milestones • It provides guidance for the executive management to support the test project, thereby allowing them to release the necessary resources to perform the test activity. • It establishes the foundation of the system testing part of the overall software project. • It provides assurance of test coverage by creating a requirement traceability matrix. • It outlines an orderly schedule of events and test milestones that are tracked. • It specifies the personnel, financial, equipment, and facility resources required to support the system testing part of a software project. The activity of planning for system testing combines two tasks: research and estimation. Research allows us to define the scope of the test effort and resources already available in-house. Each major functional test suite consisting of test objectives can be described in a bounded fashion using the system requirements and functional specification as references. The reader may refer to Chapter 11 for a discussion of the test suites described in a bounded fashion. A system test plan is outlined in Table 12.1. We explain the ways one can create a test plan. The test plan is released for review and approval after the author, that is, the leader of the system test group, completes it with all the pertinent details. The review team must include software and hardware development staff, customer support group members, system test team members, and the project manager responsible for the project. The author(s) should solicit reviews of the test plan and ask for comments prior to the meeting. The comments can then be addressed at the review meeting. The system test plan must be completed before the software project is committed. 12.2 INTRODUCTION AND FEATURE DESCRIPTION The introduction section of the system test plan describes the structure and the objective of the test plan. This section includes (i) test project name, (ii) revision 12.4 TEST APPROACH 357 history, (iii) terminology and definitions, (iv) names of the approvers and the date of approval, (v) references, and (vi) summary of the rest of the test plan. The feature description section summarizes the system features that will be tested during the execution of this test plan. In other words, a high-level description of the functionalities of the system is presented in this section. The feature description for the FR/ATM service interworking example is discussed in Chapter 11. 12.3 ASSUMPTIONS The assumptions section describes the areas for which test cases will not be designed in this plan due to several seasons. First, the necessary equipment to carry out scalability testing may not be available. Second, it may not be possible to procure third-party equipment in time to conduct interoperability testing. Finally, it may not be possible to conduct compliance tests for regulatory bodies and environment tests in the laboratory. These assumptions must be considered while reviewing a system test plan. 12.4 TEST APPROACH The overall test approach is an important aspect of a testing project which consists of the following: • Lessons learned from past test projects are useful in focusing on problematic areas in the testing process. Issues discovered by customers that were not caught during system testing in past projects are discussed. For example, if a customer encountered a memory leak in the system in the past project, action must be taken to flush out any memory leak early in system test execution. An appropriate response may be needed to develop effective test cases by using sophisticated software tools such as memory leak detectors from the market. • If there are any outstanding issues that need to be tested differently, for example, requiring specific hardware and software configuration, then these issues need to be discussed here. • A test automation strategy for writing scripts is a topic of discussion. • Test cases should be identified in the test factory that can be reused in this test plan. • Outline should be prepared of the tools, formats, and organizing scheme, such as a traceability matrix, that will be used and followed during the test project. • Finally, the first level of test categories, which are likely to apply to the present situation and as discussed in Chapter 8, are identified. 358 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION 12.5 TEST SUITE STRUCTURE Detail test groups and subgroups are outlined in the test suite structure section based on the test categories identified in the test approach section. Test objectives are created for each test group and subgroup based on the system requirements and functional specification discussed in Chapter 11. A traceability matrix is generated to make an association between requirements and test objectives to provide the highest degree of confidence. Note that at this stage only test suites along with the test objectives are identified but not the detail test cases. Identification of test objectives provides a clue to the total number of new test cases that need to be developed for this project. If some existing test cases, automated or manual, need to be run as regression tests, those test cases must be included in the test suite. This information is useful in estimating the time required to create the test cases and to execute the test suite, as discussed in Section 12.8. The test suite structure for the FR/ATM service interworking example is discussed in Chapter 11. 12.6 TEST ENVIRONMENT It is necessary to plan for and design the test environment, also called a test bed or a test laboratory, to make the execution of system-level test cases effective. It is a challenge to design test environments which contain only a small proportion of the equipment and facilities used in actual operation because of budget limitations. The central idea in using a small proportion of equipment and facilities is to do more with less. The objective here is to achieve effective testing whereby most of the defects in the system are revealed by utilizing a limited quantity of resources. One has to be innovative in designing a test bed such that the test objectives are fulfilled by executing the test cases on the test bed. One must consider alternatives, or, at least, a scaled-down version of the deployment environment from the standpoint of cost-effectiveness. Efforts must be made to create a deployment environment by using simulators, emulators, and third-party traffic generation tools. Such tools are found to be useful in conducting scalability, performance, load, and stress testing. An emulator may not be an ideal substitute for real equipment, but as long as it satisfies the purpose, it is worth investing in. We explained in Chapter 8 that there are different categories of test cases designed at the system level. Therefore, multiple test environments are constructed in practice for the following reasons: • To run scalability tests, we need more resources than needed to run functional tests. • Multiple test beds are required to reduce the length of system testing time. Preparing for a test environment is a great challenge in test planning. This is especially true in testing distributed systems and computer networks where a variety of equipment are connected through communication protocols. For example, such equipment includes user computers, servers, routers, base stations in a wireless 12.6 TEST ENVIRONMENT 359 network, authentication servers, and billing servers. It may take several months to set up an effective test bed for large, complex systems. It requires careful planning, procurement of test equipment, and installation of the equipment in a test facility different from the software development facilities so that system testing is performed effectively. Developers have their own test environments to perform unit tests and integration tests. However, a separate, dedicated system test laboratory, different from the ones used in unit and integration testing, is essential for the following reasons: • Test engineers need to have the ability to reconfigure a test environment. • Test activities need not interfere with development activities or live oper- ations. • Increased productivity is achieved by having a dedicated test laboratory. A central issue concerning setting up of a system test laboratory is the justification to procure the equipment. Note that there is a need to justify each item to be procured. A good justification for procuring the equipment can be made by answering the following questions: • Why do we need this equipment? • What will be the impact of not having this equipment? • Is there an alternative to procuring this equipment? The technical leader of the system test engineering group should gather some facts and perform some preparation activities in order to get answers to these questions. The following items are part of a good fact gathering process: • Reviews the system requirements and the functional specification • Participates in the review processes to better understand the system and raise potential concerns related to the migration of the system from the development environment to the deployment environment • Documents his or her findings The following preparation activities are conducted to support the development of a system test bed: • Obtain information about the customer deployment architecture, including hardware, software, and their manufacturers. For example, the real deployment network diagram along with the software configuration is useful in designing a scaled-down version of the system in the laboratory. The manufacturer names will be handy in procuring the exact equipment for interoperability and compatibility testing. • Obtain a list of third-party products to be integrated with the SUT. Identification of the external products is important because of the need for performing interoperability testing. • List third-party test tools to be used to monitor, simulate, and/or generate real traffic. This traffic will be used as input to the SUT. 360 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION • Identify the third-party software tools to be used under licenses. • Identify the hardware equipment necessary to support special features specified in the requirement/test objectives, such as high availability and backup/recovery exercises within the test environment. • List the number of hardware copies to carry out system testing if the project involves new hardware. • Analyze the functional, performance, stress, load, and scalability test objectives to identify elements of the test environment that will be required to support those tests. • Identify the security requirements for the test environment. Ensure that the security test cases can be executed using the test environment and an intruder cannot disrupt the stress and stability tests that may be running overnight or over the weekends. • List the small, but necessary networking gears that may be required to set up the test laboratory, such as switches, terminal servers, hubs, attenuators, splitters, personal computers, servers, and different kinds and sizes of cables to interconnect these gears. • List any other accessories required to facilitate system testing, such as racks, vehicles, and special shielding to prevent radiation. After the above fact gathering and researching activities, the team leader develops a schematic diagram of one or more test beds in terms of the following two items: • High-level graphic layout of test architectures • Table of types of equipment, their quantities, and their descriptions to support the test architecture The equipment list should be reviewed to determine the equipment available in-house and those that need to be procured. The list of equipment to be procured constitutes a test equipment purchase list. The list needs to include quantities required, unit price information, including maintenance cost, and justification, as outlined in the template shown in Table 12.2. The team leader must specify, for each item in the justification column, the justification for the item and the impact it will have on system testing in terms of quality and a time-to-market schedule. The team leader may obtain a quote from the suppliers to get an accurate unit price. The test team leader needs to keep track of the equipment received and their installation after the budget is TABLE 12.2 Equipment Needed to be Procured Equipment to Procure Quantity Unit Price Maintenance Cost Justification 12.7 TEST EXECUTION STRATEGY 361 approved and the orders are placed. The leader needs to ensure that these activities are on track to meet the overall software project schedule. 12.7 TEST EXECUTION STRATEGY It is important to have a proper game plan ready before the system test group receives a software system for the first time for performing system testing. The game plan is in the form of a system test execution strategy to carry out the task [1]. We address the following concerns by putting a game plan in place before initiating system testing: • How many times are the test cases executed and when? • What does one do with the failed test cases? • What happens when too many test cases fail? • In what order are the test cases executed? • Which test cases are to be run just before ending the system testing phase? In the absence of a good system test execution strategy, it is highly unlikely that the desired system-level test is performed within the given time and without overusing resources. Let us consider a simple execution strategy. Assume that a system test team has designed a test suite T for a system S. Due to the detection and removal of defects and the possibility of causing more defects, S is an evolving system that can be characterized by a sequence of builds B0, B1, B2, . . . , Bk, where each build is expected to have fewer defects than its immediate predecessor. A simple strategy to perform system testing is as follows: Run the test set T on B0 to detect a set of defects D0; let B1 be a build obtained from B0 by fixing defects D0; run T on B1 to detect a set of defects D1; let B2 be a build obtained from B1 by fixing defects D1; and this process can continue until the quality level of the system is at least the desired quality level. One can adopt such a simple execution strategy if all the test cases in T can be independently executed, that is, no test case is blocked due to a defect in the system. If all the reported defects are immediately fixed and no new defects are introduced, then system testing can be considered to be over by running T just twice: once on B0 to detect D0 and a second time on B1 for confirmation. In the above discussion, we refer to running T , or a subset thereof, on a new build of a software SUT as a system test cycle. However, the processes of test execution, defect detection, and fixing defects are intricately intertwined. The key characteristics of those processes are as follows: • Some test cases cannot be executed unless certain defects are detected and fixed. • A programmer may introduce new defects while fixing one defect, which may not be successful. • The development team releases a new build for system testing by working on a subset of the reported defects, rather than all the defects. 362 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION • It is a waste of resources to run the entire test set T on a build if too many test cases fail. • As the number of reported defects significantly reduces over a few iterations of testing, it is not necessary to run the entire test set T for regression testing—a carefully chosen subset of T is expected to be adequate. Therefore, the simple test execution strategy explained above is not practical. An effective and efficient execution strategy must take into account those characteristics of test execution, defect detection, and defect removal. We present a process model for a metric-based, multicycle test execution strategy which clearly defines an entry criterion to start system testing, how to prioritize execution of test cases, when to move from one test cycle to the next, when to rerun failed test cases, when to suspend a test cycle and initiate root cause analysis, and how to choose a subset of test cases in the final test cycle for regression testing. We characterize each test cycle by a set of six parameters: goals, assumptions, test execution, revert and extension criteria, action, and exit criteria. The idea behind a parameterized test cycle is twofold: (i) the quality level of a system is raised in a measurable, controlled, and incremental manner possibly from the initial low level to the final high level and (ii) the process of system testing is broken down into a sequence of repeatable steps. The parameter values in each test cycle allow the system test leader to effectively control the progress of system testing so that the system moves one step closer to the desired, final quality level. 12.7.1 Multicycle System Test Strategy The idea of a multicycle test execution strategy with three cycles, for example, is illustrated in Figure 12.1a. Figure 12.1b shows how the quality of a product increases from test cycle to test cycle. In one test cycle, all the test cases in the entire test suite T , or a carefully chosen subset of T , are executed at least once. It is expected that a number of defects are fixed within the life-span of a test cycle so that the quality level of the system is raised by a certain amount, as illustrated in Figure 12.1b. The need for two root cause analyses, one by the development team and one by the system test team, is explained later in this section. The total number of test cycles to be used in individual test projects is a matter of a management decision based on the level of the delivered quality one wants to achieve, the extent to which a test suite covers the requirements of the system, and the effectiveness of defect fixes. 12.7.2 Characterization of Test Cycles Each test cycle is characterized by a set of six parameters: goals, assumptions, test execution, actions, revert and extension criteria, and exit criteria. Appropriate values must be assigned to these parameters while preparing a concrete test execution strategy. In Section 12.7.6, we give real-life, representative instances of these parameters for three test cycles, where the third test cycle is the final one. Begin Revert Test cycle 1 12.7 TEST EXECUTION STRATEGY 363 Root cause analysis by the development team Test cycle 2 Test cycle 3 End Root cause analysis by the system test team (a) Progress of system testing in terms of test cycles Quality at the beginning Cycle 1 Cycle 2 Cycle 3 Quality at the end (b) Desired increase in quality level Figure 12.1 Concept of cycle-based test execution strategy. Goals A system test team sets its own goals to be achieved in each test cycle. These goals are ideal in the sense that these are very high standard goals, and consequently, may not be achievable in a given test cycle. The motivation for setting ideal goals is that it is not known what weaker goal is desirable and achievable. Setting a weaker goal has the danger of discouraging the development and test teams from achieving stronger goals. Therefore, we aim to achieve ideal goals in each test cycle, but we are ready to exit from a test cycle even if the goals are not fully achieved. An exit criterion specifies the termination of a test cycle. Goals are specified in terms of the number of test cases to pass within a test cycle. Though goals can also be specified in terms of the number of defects remaining in the software product at the end of each system test cycle [2], predicting the remaining number of defects is a very difficult task [3]. Remark 1. The goal of a test cycle may be changed due to unforeseen circumstances. For example, a testing cycle is terminated prematurely because of the poor quality of the software. However, the system test group may proceed to complete the system testing cycle to gain hands-on experience and to be able to identify needs for additional test cases. In this situation, the specified goal of the test cycle can be changed to understanding and improving the effectiveness of the test cases. 364 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION Assumptions A SIT group and a system test group work in an autonomous but cooperative manner in an organization. The SIT group produces builds almost regularly, whereas the system test group selects builds for testing on its own schedule, say, once every two weeks. As a consequence, not all builds are accepted for system testing. In the assumption part, the system test group records its own assumption about when to select builds for system testing. Appropriate assumptions must be made in order to achieve the goals. For example, consider a goal that 90% of test cases should pass in a five-week test cycle and another assumption that the system test team will accept just one new build from the SIT group during the test cycle. Assume that at the end of the fourth week of testing the total number of passed, failed, and blocked test cases are at 70, 7, and 23%, respectively. A test case is said to be blocked if a defect prevents the execution of the test case. Therefore, the goal cannot be achieved by the end of the fifth week. The team can accept builds from the SIT group on a daily basis to avoid this situation. One has to spend a significant amount of time in setting up the test beds by installing a build every day to pick a daily build for system testing. Therefore, a test manager must make an appropriate assumption to allow them to achieve the desired goal. Test Execution Test cases are simultaneously executed in multiple test environments by using the concept of test prioritization. Prioritization of test execution changes between test cycles. This offers a trade-off between prioritizing test cases just once for the entire duration of system testing and prioritizing test cases from build to build. Some basic ideas in test prioritization are as follows: (i) test cases that exercise basic functionalities have higher priority than the rest in the first test cycle; (ii) test cases that have failed in one test cycle have a higher priority in the following test cycle; and (iii) test cases in certain test groups have higher priority than the others. There are three differences between test prioritization studied in the past [4–6] and a new approach presented in this book, which are as follows: 1. Test prioritization has been considered largely for regression testing. In our test strategy, test cases are prioritized for development testing as well. Test cases are prioritized during system testing in order to exercise the likely vulnerable part of the code early so that developers will have more time to debug and fix the defects, if there are any. For example, if the defects are revealed toward the end of a test cycle, the developers may not have enough time to fix the defects; moreover, the system test team may not be able to validate the fixes in a very short time. Srivastava and Thiagarajan [7] present a test prioritization technique for regression testing during development. Their test prioritization algorithm is based on block-level differences between the compiled binary code of the previous version of a system and its current version to be tested. The prioritization algorithm tries to pick a test from the remaining tests that will cover the maximum number of the remaining impacted, that is, new or modified, blocks. 12.7 TEST EXECUTION STRATEGY 365 2. Test prioritization produces a single sequence of ordered test cases, while the need and opportunity for producing smaller, concurrent test execution sequences have been ignored. In our approach, simultaneity of test executions and importance of test cases have been factored into test prioritization. Multiple, concurrent test environments allow us to shorten the length of system testing. Kapfhammer [8] proposes the idea of running regression tests on multiple communicating machines. 3. It is highly unlikely that tens of engineers working on a test project have identical interest and expertise in all aspects of a system. For example, one person might be an expert in performance evaluation, whereas another person may be useful in user-interface testing. Test cases falling in their respective categories are separately prioritized. Revert and Extension Criteria Each test cycle must be completed within a stipulated time. However, the duration of a test cycle may be extended under certain circumstances, and in the worst case, the test cycle must be restarted from the beginning. The conditions for prematurely terminating a test cycle and for extending a test cycle must be precisely stated. Essentially, the concept of a revert criterion is used to ensure that system testing contributes to the overall quality of the product at reduced cost. It may not be useful to continue a test cycle if it is found that the software is of too poor quality. On the other hand, a test cycle can be extended due to various reasons, such as (i) a need to reexecute all the test cases in a particular test group because a large fraction of the test cases within the group failed or (ii) a significantly large number of new test cases were added while test execution was in progress. The idea of a revert criterion discussed above is similar to the idea of stopping rules for testing [9]. Actions Two unexpected events may occur while a test cycle is in progress: (i) too many test cases fail and (ii) the system test team has to design a large number of new test cases during the test cycle. On the one hand, the development team tries to understand what went wrong during development when too many test cases fail. On the other hand, when too many new test cases have to be designed, the system test team tries to understand what went wrong in the test planning phase. The two kinds of events require different kinds of responses from the development team and the system test team, as explained below. It is useful to alert the development team once the number of failed test cases reaches a predefined level. Consequently, the developers take an action in the form of root cause analysis (RCA). In this analysis, the developers study the software components to identify whether a defect was in new, old, rewritten, or refixed code, which is discussed in Chapter 13. Next, the following question is asked: Would it have been possible to detect the defect during an earlier phase in the development cycle, such as design review, code review, unit testing, or integration testing? If the answer is yes, then corrective actions are taken by updating the design specification, reviewing the code, and adding new test cases for unit testing and integration testing to improve the quality of these error-prone software 366 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION components. The detailed defect analysis is discussed in the system test execution Chapter 13. Test engineers may find the need to design new test cases as test executions reveal defects and those defects are fixed. The total number of test cases keep increasing as testing continues. A significant increase in the number of test cases implies that the initial planning for test design was not very accurate, and this may adversely affect the test execution schedule. In addition, management may want to know the reason for the initial number of test cases being low. The system test team initiates an RCA when the total number of newly designed test cases crosses a certain threshold. The test team studies all the new test cases and categorizes them into different groups based on the functional requirements. Next, the functional specification and/or any other relevant documents are studied to understand why the test team was unable to identify the objectives of these new test cases in the first place. Exit Criteria We know when a test cycle has completed by applying an exit criterion. Mere execution of all the test cases in a test suite, or a subset thereof, does not mean that a cycle has completed. There is a need to monitor some quality metrics associated with the test cycle. Examples of these metrics are explained in detail for three test cycles. Determining the exit criteria of the final test cycle is a complex issue. It involves the nature of products (e.g., shrink-wrap application versus an operating system), business strategy related to the product, marketing opportunities and timing, and customer requirements, to name just a few [10]. The exit criteria considered here takes the view that quality is an important attribute and that time-to-market with desirable qualities is a major goal. 12.7.3 Preparing for First Test Cycle It is desirable to understand the following concepts in order to get ready for the first test cycle: (i) the different states of a known defect in the form of a life cycle from the new state to the closed state, (ii) assignment of test cases to test engineers, and (iii) an entry criterion telling us when to start the first test cycle. These concepts are explained below. Life Cycle of a Defect The idea behind giving a life-cycle model to defects is to be able to track them from the time they are detected to the time they are closed. The life cycle of a defect is represented by a state transition diagram with five states: new, assigned, open, resolved, and closed. The states of defects are used in the description of individual test cycles. A defect is put in the new state by a test engineer, called a submitter, as soon as it is revealed by a failed test case. After a defect in the new state is reviewed and presumed to be a true defect, it is moved to the assigned state, which means that the defect has been assigned to a software developer. The software developer moves the defect from the assigned state to the open state as soon as he or she starts working on the defect. Once the developer is satisfied by means of unit testing that the defect has been fixed, the defect is moved to the resolved state. The submitter verifies the fix by executing 12.7 TEST EXECUTION STRATEGY 367 the associated failed test cases against the new build prepared by the development team after a defect moves to the resolved state. If the submitter is satisfied by means of system testing that a defect in the resolved state has been truly fixed, the defect is moved to the closed state, and the associated test cases are declared as passed. Otherwise, the defect is moved back to the open state, and the associated test cases are still considered to have failed. A detailed model of a defect life-cycle is discussed in Chapter 13. Assignment of Test Cases Since no single engineer is going to execute all the test cases all by themselves, it is desirable to assign test cases to appropriate test engineers by considering their expertise and interest. Assignment of test cases to test engineers may be changed from test cycle to test cycle for three reasons: (i) when a test case is assigned to a different engineer, the test case is likely to be executed from a different perspective; (ii) an opportunity to learn something new makes a positive impact on employee morale; and (iii) knowledge about test cases is distributed throughout the test group, so if a test engineer is temporarily unavailable for the task, it is easy to find a competent replacement. Entry Criteria for First Test Cycle The entry criteria for the first test cycle, given in Table 12.3, tell us when we should start executing system tests. The criteria consist of five groups which depend on five cross-functional groups: marketing, hardware, software, technical publication, and system testing. Each group has one or more condition items to be satisfied. System testing should not start until all the groups of the entry criteria are satisfied. Otherwise, a revert criterion may be triggered sooner during system test execution. The first group of the entry criteria concerns the completion of the project plan. The marketing team provides a business justification for the proposed product and its requirements. The engineering team defines a development approach, an execution environment, a schedule of the milestones, and the possible risks involved, identifying the dependencies of the product on other elements. The software project plan must be reviewed and approved by the software, hardware, marketing, and technical publication groups before system testing is initiated. System testing may include new hardware, and, therefore, the second group of the entry criteria is about adequate testing of any new hardware system or subsystem used by the software system. The pass-and-fail results obtained from executing test cases cannot be considered to be dependable without a stable hardware base. It is important to ensure that the hardware has gone through the following three phases: (i) planning and specification; (ii) design, prototype implementation, and testing; and (iii) integration with the software system. The third group of the entry criteria is about completion of the development and testing work by the software development group. It consists of seven criteria, which provides evidence that the system is sufficiently stable to withstand the rigor of system testing. The stability of a system should be based on metrics and evidence, rather than statements from software managers to start system testing. For example, documentation of weekly passed-and-failed status of all the unit and integration test cases should be available for review. The unit test and system 368 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION TABLE 12.3 Entry Criteria for First System Test Cycle Group Criteria 1. Marketing 2. Hardware 3. Software 4. Technical publication 5. System testing 1. Project plan and/or system requirements document is complete and updated. 2. All approved hardware versions for field deployment are available in-house. A list of these hardware versions should be provided. 3. Hardware version control process is in place and documented. 4. Hardware DVT plan is completed and results are available. 5. All functional specifications (FSs) are complete and have been updated to be in sync with the implementation. A list of individual FSs, including version number and status, must be provided. 6. All design specifications (DSs) are complete and have been updated to be in sync with the implementation. A list of DSs, including version number and status, must be provided. 7. All code complete and frozen; code changes allowed for only defect fixing but not features. 8. A software version control is in place. 9. 100% unit tests are executed and passed. 10. 100% system integration tests are executed and passed. 11. Not more than two unique crashes have been observed during the last two weeks of integration testing. 12. A draft version of the user guide is available. 13. The system test plan is in place (reviewed and approved). 14. Test execution working document is in place and complete. integration test reports must be thoroughly reviewed by the system test team before the start of the first test cycle. The fourth group of the entry criteria concerns the readiness of the user guides written by technical writers. Unless the user guides are ready, the system test engineers will not be able to verify the accuracy and usability. The fifth group of the entry criteria relates to a system test plan. Detailed test cases are included in the test plan. One can estimate the quality of the software early in the system testing phase [11] by documenting the test cases. The system test plan must be reviewed and approved by the software, hardware, marketing, and technical publication groups before the start of system testing. Cross-functional review meetings must be conducted to track the status of the readiness criteria, outlined in Table 12.3, at least four weeks prior to the start of the first test cycle. Representatives from the five groups must attend the cross-functional review meeting to provide the status in their respective areas. Any exceptions to these criteria must be documented, discussed, and agreed upon at the final cross-functional status review meeting. 12.7 TEST EXECUTION STRATEGY 369 12.7.4 Selecting Test Cases for Final Test Cycle Though it is desirable to reexecute all the test cases in the final test cycle, a lack of time and additional cost may not allow the system test team to do so. The concept of regression testing is applied only in the final test cycle. In our approach, test cases are selected based on (i) the results of their prior execution; (ii) their membership in certain test groups: basic, functionality, robustness, interoperability, stress, scalability, performance, and load and stability; and (iii) their association with software components that have been modified. Test cases are selected in three steps: In the first step, the test suite is partitioned into four different bins—red, yellow, green, and white—based on certain criteria which are described in the selection procedure given below. The red bin is used to hold the test cases that must be executed. The yellow bin is used to hold the test cases that are useful to execute. The green bin is used to hold the test cases that will not add any value to regression testing and thus can be skipped. The rest of the test cases are included in the white bin. In other words, the test cases for which no concrete decision can be made in the first step are put in the white bin. In the second step, test cases from the white bin are moved to the other bins by considering the software components that were modified during system testing and the test cases that are associated with those components. Finally, in the third step, the red and yellow bins are selected for regression testing. In the following, we present the test selection procedure: Step 1: The test suite is partitioned into red, yellow, green, and white bins as follows: • Red: The red bin holds the following kinds of test cases that must be executed: Test cases that failed at least once in the previous test cycles. Test cases from those test groups for which RCA was conducted by the development team in the previous test cycles. Test cases from the stress, scalability, and load and stability test groups. These test cases are more likely to reveal system crash defects. They are selected to ensure that the final build is stable and the probability of the system crashing at the customer site is extremely low. Test cases from the performance test category. The performance characteristic must be measured against the final build that is going to be released to the customers. • Yellow: The yellow bin holds the test cases that are useful to execute. This bin includes those test cases whose objectives are similar to the objectives of the test cases in the red bin. For example, let a test case with the following objective be in the red bin: While software up-gradation is in progress, the CPU utilization should not exceed 60%. Then, a test case with the following objective is put in the yellow bin: Verify that software image can be upgraded to the nth release from the (n − 1)th release in less than 300 seconds. The condition, that is, less 370 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION than 60% CPU utilization, tells us when to execute the upgradation test. • Green: The green bin holds the test cases that will not add any value to regression testing and thus can be skipped. This bin includes those test cases whose objectives are implicitly covered by the execution of the test cases in the red and yellow bins. For example, if a test case with the objective “Verify that a software image can be upgraded to the nth release from the (n − 1)th release in less than 300 seconds” is in the yellow bin, then a basic test case with the objective “The software can be upgraded to the nth release from the (n − 1)th release” is included in the green bin. • White: The test cases for which no concrete decision can be made in the first step are put in the white bin. This includes the rest of the test cases not falling in the red, yellow, or green bin. Step 2: Test cases from the white bin are moved to the other bins by considering the software components that were modified during the system testing and the test cases that are associated with those components. The software developers identify all the software components that have been modified after the start of the first test cycle. Each test case from the white bin is mapped to the identified software components. This mapping is done by analyzing the objective of the test case and then checking whether the modified code of the identified software components is exercised by executing the test cases. The test cases that are mapped to more than one, one, or zero software components are moved to the red, yellow, or green bin, respectively. Step 3: Test cases from the red and yellow bins are selected for regression testing as follows: • All the test cases from the red bin are selected for the final test cycle. • Depending on the schedule, time to market, and customer demand, test cases from the yellow bin are selected for the final test cycle. Remark 2. It is useful to understand the selection strategy explained above. The red bin holds the test cases that must be selected in the final test cycle. The test cases falling in the “must execute” category are (i) the test cases which had failed in the previous test cycles; (ii) the test cases from the groups for which RCA was performed in the previous test cycles; (iii) test cases from the stress, scalability, and load and stability test groups; and (iv) the test cases which concern modified software components. We remind the reader that RCA for a test group is performed if too many test cases had failed from that group. We put emphasis on a test group by selecting its test cases in the final test cycle. Test cases from the stress, scalability, and load and stability groups are also included in the final test cycle by default, even though those test cases may not concern the modified components. The rationale for their inclusion is that one must run a final check on the stress, scalability, load, and stability aspects of software systems—such as 12.7 TEST EXECUTION STRATEGY 371 servers, Internet routers, and base stations in wireless communication—which are likely to serve thousands of simultaneous end users. 12.7.5 Prioritization of Test Cases Prioritization of test cases means ordering the execution of test cases according to certain test objectives. Formulating test objectives for prioritization of individual test cases is an extremely difficult task. Here, we discuss test prioritization in terms of groups of test cases with common properties. An example of a test objective for prioritization is: Execute the maximum number of test cases without being blocked. Furthermore, a major concern from software developers is that test engineers report critical defects (e.g., defects related to system crash) toward the end of test cycles. This does not give them enough time to fix those defects. In addition, blocking defects need to be fixed earlier in the test cycles in order to execute the blocked test cases. Therefore, we need to prioritize the execution of tests to detect critical defects early in the test cycles. In a multicycle-based test execution strategy, it is desirable to have different test objectives for prioritization in different test cycles for three reasons: (i) initially the quality level of the system under test is not very high, (ii) the quality of the system keeps improving from test cycle to test cycle, and (iii) a variety of defects are detected as testing progresses. Below we explain test prioritization in individual test cycles. Test Prioritization in Test Cycle 1 Principle. Prioritize the test cases to allow the maximum number of test cases to completely execute without being blocked. Test engineers execute their assigned test cases in different test environments. Each engineer prioritizes the execution of their subset of test cases as follows: • A high priority is assigned to the test cases in the basic and functionality test groups. • A medium priority is assigned to the robustness and interoperability test groups. • A low priority is assigned to the test cases in the following groups: documentation, performance, stress, scalability, and load and stability tests. The basic tests give prima facie evidence that the system is ready for more rigorous tests. The functionality tests provide a comprehensive testing over the full range of the requirements within the capabilities of the system. Both of these test groups are given high priority in the first test cycle to ensure that any functionality defects are fixed first. Functionality defects can block the execution of other test group. Stress, performance, scalability, load, and stability tests need complex configurations across different platforms, operating systems, and database management systems. Execution of these tests depends on the outcome of the interoperability and robustness tests. Therefore, interoperability and robustness test cases are executed 372 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION next to flush out any issues that may block the execution of stress, performance, scalability, load, and stability tests. Test Prioritization in Test Cycle 2 Principle. Test cases which failed in the previous test cycle are executed early in the test cycle. In the second test cycle, the test cases are reassigned to the test engineers based on their interest and expertise. The process described in Section 12.7.4 is used to distribute all the test cases into three different bins: red, yellow, and green. In this step, we are not selecting a subset of the test suite. Instead, the idea of partitioning a test suite is further used in prioritizing test cases. Each test engineer prioritizes the execution of test cases in their subset as follows: • A high priority is assigned to the test cases in the red bin. • A medium priority is assigned to the test cases in the yellow bin. • A low priority is assigned to the test cases in the green bin. Test Prioritization in Test Cycle 3 Principle. Test prioritization is similar to that in the second test cycle, but it is applied to a selected subset of the test cases chosen for regression testing. Once again, the test cases are reassigned based on interest and expertise among the test engineers. Then each test engineer prioritizes the execution of test cases in their assigned subset as follows: • A high priority is assigned to the test cases in the red bin. • A low priority is assigned to the test cases in the yellow bin. The reader may recall from the discussion of Section 12.7.4 that the test cases in the green bin are not executed in the final test cycle. 12.7.6 Details of Three Test Cycles As we move from test cycle to test cycle, new test cases may be included, and the revert criteria and the exit criteria are made more stringent so that the quality of a system improves, rather than deteriorates, as testing progresses. Note that we have given values of the parameters in a test cycle. The concrete values used in the test cycles are customizable according to the testing capabilities and needs of an organization. The values used in the paper are actual values used in real test projects. We have used concrete values in the description of test cycles to make the descriptions meaningful and to be able to observe improvement in quality from test cycle to test cycle. 12.7 TEST EXECUTION STRATEGY 373 Test Cycle 1 In this cycle, we try to detect most of the defects by executing all the test cases. The six characteristic parameters of the first test cycle are described in the following. Goals We intend to execute all the test cases from the test suite and maximize the number of passed test cases. The goal is to ensure that 98% of the test cases have passed. Assumptions The system test group accepts a software image once every week for the first four weeks and once every two weeks afterward. The possibility of some test cases being blocked is more in the first four weeks of the test cycle because of more priority 1 (“critical”) defects being reported earlier in the test cycle due to the execution of higher priority test cases. Unless these defects are resolved quickly, the test execution rate may slow down. Therefore, it is useful to accept new software images every week. We have observed that software images become more stable after four weeks, and test execution is not blocked. Subsequently, the system test team may accept software images once every two weeks. Test Execution Test cases are executed according to the prioritization strategy for this test cycle explained in Section 12.7.5. Revert and Extension Criteria Essentially, if the number of failed test cases reaches 20% of the total number of test cases to be executed, the system test team abandons this test cycle. The test cycle is restarted when the development team claims that the defects have been fixed. Assume that there are 1000 test cases to be executed. If the system test team observes that 200 test cases, which is 20% of 1000, out of the first, say, 700 test cases have failed, there is no point in continuing with the test cycle. This is because the quality of the product is too low, and any further testing before the defects are fixed is a waste of testing resources. If more than two unique crashes are observed during the test cycle, the system test team runs regression tests after the crash defects are fixed. If the number of failed test cases for any group of test cases, such as functionality, performance, and scalability, reaches 20% of the number of test cases in the group during the test cycle, the system test team reexecutes all the test cases in that group in this cycle after the defects are presumed to be fixed. Consequently, the duration of the test cycle is extended. Similarly, if the number of new test cases increases by 10% of the system test cases, the test cycle is extended to document the additional test cases. Action The software development group initiates an RCA during the test cycle if the total number of failed test cases reaches some preset values as shown in Table 12.4. For example, if 25% of all test cases executed fail in a single week from a single group, then the development team performs an RCA. The system test group initiates an RCA if the number of new test cases increases by 10% of the total number of test cases designed before the test cycle was started. 374 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION TABLE 12.4 Test Case Failure Counts to Initiate RCA in Test Cycle 1 Test Case Failure Count (%) Single test group All test groups Single Week 25 20 Cumulative Weeks 20 15 Exit Criteria The test cycle is considered to have completed when the following predicates hold: (i) new test cases are designed and documented for those defects that were not detected by the existing test case, referred to as test case escapes; (ii) all test cases are executed at least once; (iii) 95% of test cases pass; and (iv) all the known defects are in the closed state. Remark 3. Test case escapes occur because of deficiencies in the test design process. They are identified when test engineers find defects or when they encounter conditions that are not described in the plan. This happens by accident or when a new test scenario occurs to test engineers while executing the planned test cases. As test engineers learn more about the product, they develop innovative ways to test the product and find new defects. These test cases had escaped from the test case design effort, which is also known as side effects [12]. Remark 4. A software development group may take more time to fix defects that cause the failure of 5% of test cases. In that case there is no point in waiting for the fix and indefinitely delaying the completion of the first test cycle until an additional 3% of test cases pass in order to achieve the stated goal of 98% passed test cases set out for the first test cycle. It is indeed the case that some defects take much more time to be fixed, and some are deferred until the next software release. It is advisable to exit from the first test cycle to start the second cycle for effective and efficient utilization of resources. In any case, our strategy in the second test cycle is to execute all the test cases, which includes those failed 5% of test cases as well. Test Cycle 2 The fixes for the defects found in the first test cycle are verified in the second test cycle. One of the four possibilities occurs for each modification while fixing a defect: (i) the defect is fixed without introducing a new defect; (ii) the defect is fixed, but something working is broken; (iii) neither the defect is fixed nor new defects are introduced; and (iv) the defect is not fixed, but something else is broken. Moreover, all the entry criteria may not be satisfied before the start of the first test cycle, which may be resolved by cross-functionality groups during the execution of the first test cycle. Sometimes, a demanding customer may cause a last-minute feature check-in before the start of the second test cycle; otherwise, the customer will not deploy the product. It is desirable to reexecute every test case during the second test cycle to ensure that those changes did not adversely 12.7 TEST EXECUTION STRATEGY 375 impact the quality of the software due to those uncertainties. The six characteristic parameters of the second test cycle are described below. Goals All test cases are executed once again to ensure that there is no collateral damage due to the fixes applied in the first test cycle. The number of passed test cases are maximized, ensuring that 99% of the test cases have passed at the end of the second test cycle. Assumption The system test group accepts software images every two weeks. This is because the system is relatively stable after having gone through the first test cycle. Test Execution Test cases are executed according to the prioritization strategy for this test cycle explained in Section 12.7.5. Revert and Extension Criteria At any instant in the test cycle, if the total number of failed test cases reaches 10% of the total number of test cases to be executed, the system test team stops testing. The test cycle is restarted from the beginning after the development team claims that the defects have been fixed. If more than one unique crash is observed during this test cycle, the system test team runs regression tests after the crash defects are fixed. This means that the duration of the test cycle is extended. If the number of failed test cases reaches 10% of the total number of tests in a group during the test cycle, the system test team reexecutes all the test cases from that group in this test cycle after the defects are claimed to be fixed. Therefore, the duration of the test cycle needs to be extended. The test cycle is extended to document the additional test cases if the number of new test cases increases by 5% of the total number of test cases. Action The software development group initiates an RCA during the test cycle if the counts of failed test cases reach some preset values as shown in Table 12.5. For example, if 15% of all test cases executed in a single week from a single group fail, then the development team performs an RCA. The system test group initiates an RCA if the number of new test cases is increased by 5% of the total number of test cases before the start of the second test cycle. A specific explanation is given if new test cases are added during the first test cycle and also new test cases are added during the second test cycle. TABLE 12.5 Test Case Failure Counts to Initiate RCA in Test Cycle 2 Test Case Failure Count (%) Single test group All test groups Single Week 15 10 Cumulative Weeks 10 5 376 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION Exit Criteria The test cycle is considered to have completed when the following hold: (i) new test cases are designed and documented for those defects that were not detected by the existing test case, (ii) all test cases are executed at least once, (iii) 98% of test cases pass, and (iv) all the known defects are in the closed state. Remark 5. One of the exit criteria is that 98% of test cases must pass, which is higher than the first test cycle. Once again, there is no point for test engineers to wait for the fixes and indefinitely postpone the completion of the second test cycle until another 1% of the failed test cases are passed in order to achieve the 99% pass goal set before the start of the second test cycle. Test Cycle 3 In the third and final test cycle, a selected subset of test cases of the original test suite is reexecuted to ensure that the software is stable before it is released to the customer. The objective of this test cycle is to execute all the selected test cases against a single software image. Goals A selected subset of test cases from the test suite are reexecuted. The process of selecting test cases for this test cycle has been explained in Section 12.7.4. At the end of the cycle, the software is released to the customer. Therefore, 100% of the selected test cases should pass. Assumption The system test group accepts just one software image at the beginning of the test cycle. In exceptional circumstances, the system test group may accept a second image during this cycle, but it should not be less than three weeks before the end of the test cycle. Test Execution Test cases are executed according to the prioritization strategy for this test cycle explained in Section 12.7.5. Revert and Extension Criteria At any instant during the test cycle, if the total number of failed test cases exceeds 5% of the total number of test cases to be executed, the system test team stops testing. Testing is also stopped if a single crash is observed. The test cycle restarts from the beginning after the development team claims that the defects have been fixed, since this is the final test cycle before the release. All the selected test cases need to be reexecuted to ensure that there is no collateral damage due to the fix. Therefore this test cycle can be terminated and restarted again, but not extended. Exit Criteria The final test cycle is considered to have completed if all of these predicates hold: (i) all the selected test cases are executed; (ii) the results of all the tests are available; (iii) 98% of test cases pass; (iv) the 2% failed test cases should not be from stress, performance, scalability, and load and stability test groups; (v) the system does not crash in the final three weeks of testing; and (vi) the test report is completed and approved. The test report summarizes the test 12.8 TEST EFFORT ESTIMATION 377 results of all the three test cycles, performance characteristics, scaling limitations (if any), stability observations, and interoperability of the system. Remark 6. One of the exit criteria is that 98% of test cases must pass in the third test cycle. One can ask: Why is it not 100%? This was the goal set before the start of the third test cycle. It must be noted that the exit criteria of the final test cycle are influenced by a trade-off among time, cost and quality. As Yourdon [13] has argued, sometimes less than perfect is good enough. Only business goals and priority determine how much less than perfect is acceptable. Ultimately, the exit criteria must be related to the business goals of the organization. Exit criteria are generally not based solely on quality; rather, they take into consideration the innovation and timeliness of the product to be released to the market. Of course, for any mission-critical application, the pass rate of system test cases must be 100%. Our experience told us that fewer than three test cycles is not good enough to give us much confidence in a large software system unless everything works to perfection as planned—which is a rare case. By “perfection” we mean that (i) the product does not change apart from the defect fixes, (iii) the fixes work as intended, and (iii) the test engineers do not add any new test cases. Rarely are these three conditions satisfied. More than three test cycles can be scheduled with appropriate exit criteria in terms of increasing percentage of test cases passing in successive test cycles. However, more than three test cycles incurs a much higher cost, while delaying the launch of a product “just in time” to the market. Management does not want to delay the launch of a product unless it is a mission-critical project, where the goal is to have a zero-defect product. It is better to plan for three-cycle-based system testing but skip the third test cycle if there is no need. On the other hand, it is not desirable to budget for two test cycles and fall behind when the software is of low quality. 12.8 TEST EFFORT ESTIMATION The system test group needs to estimate testing effort to produce a schedule of test execution. Intuitively, testing effort defines the amount of work that needs to be done. In concrete terms, this work has two major components: • The number of test cases created by one person in one day • The number of test case executed by one person in one day The above components are also referred to as test productivity. Unfortunately, there is no mathematical formula to compute the test productivity numbers. Rather, the test effort data are gathered by measuring the test creation and execution time in real projects by taking a microscopic view of real testing processes. If productivity data are collected for many test projects, an average test productivity measure can be estimated. Our experience on this issue is discussed in this section. In the planning stage it is natural to estimate the cost of the test and the time to complete the test. Together the two parameters cost and time are called test effort. 378 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION In most cases, estimation of the system test effort is combined with estimation of the entire software project. However, it is useful to separate the test effort estimate from the estimate of the entire project so that enough time is allocated to plan for the system testing from the beginning and conduct it as soon as the entry criteria are satisfied. The three key factors in estimating test effort are as follows: 1. Number of test cases to be designed for the software project 2. Effort required to create a detailed test case 3. Effort required to execute a test case and analyze the result of execution Some other factors affecting the test effort are as follows: • Effort needed to create test environments • Effort needed to train test engineers on the details of the project • Availability of test engineers for system testing as they are needed We now discuss the ways to estimate the key factors affecting test effort. Test planners rely on their past experience in testing similar projects and a thorough knowledge of the current project to provide accurate estimation of the above three key items. 12.8.1 Number of Test Cases It is useful to understand the commonly used term “test case” before it is used for estimation. The granularity of test cases has a significant impact on the estimation of the number of test cases needed. For example, one person may use the term test case to mean one atomic test step, which is just a single input–output interaction between the tester and the SUT. Another person may use the same term test case to mean hundreds of test steps. Our definition of test case is independent of the number of test steps needed to construct a test case. The granularity of a test case is tightly coupled with the test objective, or purpose. A simple test case typically contains 7–10 test steps, while a complex test case can easily contain 10–50 test steps. The test steps are atomic building blocks which are combined to form a test case satisfying a well-defined objective. We discuss several ways to estimate the number of test cases. Estimation Based on Test Group Category It is straightforward to estimate the number of test cases after the test suite structure and the test objectives are created, as discussed in Section 12.5, by simply counting the number of test objectives. However, this will give an underestimation of the total number of test cases designed by the end of the system test execution cycle. This is because as system testing progresses with the initially estimated and designed test cases, test engineers want to observe additional behavior of the system due to their observations of some unforeseen failures of the system. The number of test cases used by the end of the system testing phase is more than its initial estimation. A more accurate estimation of the number of test cases for each group is obtained by adding a “fudge factor” of 10–15% to the total number of test objectives identified for the group. On the 12.8 TEST EFFORT ESTIMATION 379 one hand, underestimation of the number of test cases causes reduced allocation of resources to system testing at its beginning. However, it demands more resources when the need is felt, causing uncontrollable project delay. On the other hand, overestimation of the number of test cases leads to their inefficient utilization. For pragmatic reasons, for any moderate-size, moderate-risk testing project, a reasonable factor of safety, in the form of a fudge factor, needs to be included in the estimate in order to provide a contingency for uncertainties. The test team leader can generate a table consisting of test (sub)groups and for each (sub)group the estimated number of test cases. An additional column may be included for time required to create and time required to execute the test cases, which are discussed later in Sections 12.8.2 and 12.8.3, respectively. Example: FR–ATM PVC Service Interworking. Let us consider the FR–ATM PVC service interworking example discussed in Chapter 11. A test effort estimation is given in Table 12.6 based on the test categories and FrAtm structure discussed in Chapter 11. The first column is estimated based on the number of test objectives identified for the system. The procedure to obtain the estimations is given below. The estimation of the next two columns of Table 12.6 are discussed in Sections 12.8.2 and 12.8.3, respectively. One test case is required to test the attribute for each configurable attribute of the FrAtm component. Therefore, we estimate that 40 test cases will be required to test the configuration functionality with our assumption of 40 configuration attributes. Similarly, 30 test cases need to be created in order to test the monitoring feature with our assumption that 30 monitoring attributes are available in the implementation of an FrAtm component. The configuration and monitoring attributes information should be available in the functional specification of the TABLE 12.6 Test Effort Estimation for FR–ATM PVC Service Interworking Test (Sub)Groups Estimated Number of Test Cases Person-Day to Create Configuration 40 4 Monitoring 30 4 Traffic Management 3 2 Congestion 4 2 SIWF Translation Mode Mapping 12 4 Alarm 4 1 Interface 9 5 Robustness 44(4 + 40) 4 Performance 18 6 Stress 2 1.5 Load and stability 2 1.5 Regression 150(60 + 90) 0 Total 318 35 Person-Day to Execute 6 4 3 4 4 1 3 6 10 2 2 15 60 380 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION FrAtm component. The traffic management test cases are estimated based on the number of QoS parameters to be mapped from FR to ATM, which is approximately 3. The number of interface tests is estimated based on the number of RF and ATM cards supported by the FrAtm component, which is 9. One test case needs to be created for each interface card. Therefore, 9 test cases are estimated for the interface test group. Forty-four test cases are estimated for robustness group, 4 for the FrAtm subgroup and 40 for boundary value tests. One boundary value test case needs to be designed for each configuration attribute. Therefore, we estimated 40 test cases for boundary value functionality testing. Performance test cases are estimated based on the following assumptions: • Delay measurement for FrAtm with 3 types of ATM cards. This requires us to design 3 test cases. • Delay measurement for FrAtm with 6 types of FR cards. Therefore, we need to design 6 test cases. • Throughput measurement using one FR card one at a time. There are 6 types of FR cards, and thus we need 6 test cases. • Throughput measurement using one ATM card at a time. There are 3 types of ATM cards, and thus we need 3 test cases. The estimated number of test cases for performance testing may change if the combinations of cards are considered with different objectives. Finally, a number of test cases for regression testing are selected from the existing FR and ATM test suites. For simplicity, we assume that 60 and 90 test cases are selected from the existing FR and ATM test suites, respectively. Estimation Based on Function Points The concept of function points is used to estimate resources by analyzing a requirements document. This methodology was first proposed by A. J. Albrecht in 1979 [14, 15]. The function point method is becoming more popular in the industry, though it is felt that it is still in an experimental state. The central idea in the function point method is as follows: Given a functional view of a system in the form of the number of user inputs, the number of user outputs, the number of user on-line queries, the number of logical files, and the number of external interfaces, one can estimate the project size in number of lines of code required to implement the system and the number of test cases required to test the system. Albrecht [14] has shown that it is useful to measure a software system in terms of the number of “functions” it performs and gave a method to “count” the number of those functions. The count of those functions is referred to as function points. The function point of a system is a weighted sum of the numbers of inputs, outputs, master files, and inquiries produced to or generated by the software. The four steps of computing the function point of a system are described below: Step 1: Identify the following five types of components, also known as “user function types,” in a software system and count them by analyzing the requirements document: 12.8 TEST EFFORT ESTIMATION 381 • Number of external input types (NI): This is the distinct number of user data or user control input entering the boundary of a system. • Number of external output types (NO): This is the distinct number of user data or control output type that leaves the external boundary of the system. • Number of external inquiry types (NQ): This is the distinct number of unique input–output combinations, where an input causes and generates an immediate output. Each distinct input–output pair is considered as an inquiry type. • Number of logical internal file types (NF): This is the distinct number of major logical groups of user data or control information in the system. Each logical group is treated as a logical file. These groups of data are generated, used, and maintained by the system. • Number of external interface file types (NE): This is the distinct number of files passed or shared between systems. Each major group of data or control that leaves the system boundary is counted as an external interface file type. Step 2: Analyze the complexity of each of the above five types of user functions and classify those to three levels of complexity, namely simple, average, or complex. For example, the external input types are simple, the logical internal file types have average complexity, and the number of external interface file types are complex in nature. A weighting factor is associated with each level of complexity for each type of user function. Two types of user functions with the same level of complexity need not have the same weighting factor. For example, simple external inputs may have a weighting factor of 3, whereas simple external inquiry types may have a weighting factor of 4. Let WFNI denote the weighting factor of external input types, WFNO denote the weighting factor of external output types, and so on. Now, the unadjusted, or crude, function point, denoted by UFP, is computed as UFP = WFNI × NI + WFNO × NO + WFNQ × NQ + WFNL × NL + WFNE × NE Step 3: Compute the unadjusted function point by using the form shown in Table 12.7. Albrecht [14] has identified 14 factors that affect the required development effort for a project. A grade—between 0 and 5—is assigned to each of the 14 factors for a certain project, where 0 = not present or no influence if present, 1 = insignificant influence, 2 = moderate influence, 3 = average influence, 4 = significant influence, and 5 = strong influence. The sum of these 14 grades is known as the processing complexity adjustment (PCA) factor. The 14 factors are listed in Table 12.8. 382 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION TABLE 12.7 Form for Computing Unadjusted Function Point Complexity (Use One Item in Each Row) No. Identifier Simple Average 1 NI —×3=— —×4=— 2 NO —×4=— —×5=— 3 NQ —×7=— — × 10 = — 4 NF —×5=— —×7=— 5 NE —×3=— —×4=— Complex —×6=— —×7=— — × 15 = — — × 10 = — —×6=— Total(UFP) Total TABLE 12.8 Factors Affecting Development Effort 1. Requirement for reliable backup and recovery 2. Requirement for data communication 3. Extent of distributed processing 4. Performance requirements 5. Expected operational environment 6. Extent of on-line data entries 7. Extent of multiscreen or multioperation data input 8. Extent of on-line updating of master files 9. Extent of complex inputs, outputs, on-line queries, and files 10. Extent of complex data processing 11. Extent that currently developed code can be designed for reuse 12. Extent of conversion and installation included in the design 13. Extent of multiple installations in an organization and variety of customer organizations 14. Extent of change and focus on ease of use Step 4: Now, we compute the function point (FP) of a system using the following empirical expression: FP = UFP × (0.65 + 0.01 × PCA) By multiplying the unadjusted function point, denoted by UFP, by the expression 0.65 + 0.01 × PCA, we get an opportunity to adjust the function point by ±35%. This adjustment is explained as follows. Let us analyze the two extreme values of PCA, namely 0 and 70 ( = 14 × 5): PCA = 0 : PCA = 70 : FP = 0.65 × UFP FP = UFP × (0.65 + 0.01 × 70) = 1.35 × UFP Therefore, the value of FP can range from 0.65 × UFP to 1.35 × UFP. Hence, we adjust the value of FP within a range of ±35% by using intermediate values of PCA. 12.8 TEST EFFORT ESTIMATION 383 The function point metric can be utilized to estimate the number of test cases in the following ways: • Indirect Method: Estimate the code size from the function points and then estimate the number of test cases from the code size. • Direct Method: Estimate the number of test cases directly from function points. Indirect Method The function points of a software system are computed by examining the details of the requirements of the system from the requirement database. At the time of such a computation, the programming language in which the system will be implemented may not be known. Implementation of the system in different programming languages will produce different measures of line of code (LOC). Therefore, the choice of a programming language has a direct impact on the LOC metric. Capers Jones [16] gave a relationship between function points and the code size of a software system. Specifically, he gave the number of LOC per function point for a number of programming languages as shown in Table 12.9. Given the total number of function points for a system, one can predict the number of LOC for a software system by making an assumption of a programming language in implementing the system. At this point, there is a need to utilize one’s experience in estimating the number of test cases to be designed for system testing. This estimation practice varies among organizations and even within the same organization for different categories of software systems. The 30 year empirical study of Hitachi Software [11] gives a standard of one test case per 10–15 LOC. A system with 100 function points and implemented in C will produce a code size of 100 × 128 = 12,800 lines. Test cases numbering between 850 and 1280 are designed for system-level testing of the software system under consideration. It must be remembered that these test cases do not include those test cases needed for unit testing and integration testing. This is because the test engineers in the system test group must be unbiased and never reuse the test cases of the programmers. Instead, the system testing team designs the test cases from the scratch. TABLE 12.9 Empirical Relationship between Function Points and LOC Programming Language Average LOC/FP Assembly language 320 C 128 COBOL 106 FORTRAN 106 C++ 64 Visual Basic 32 384 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION Direct Method Capers Jones [16] gave a direct relationship between function points and the total number of test cases created as Total number of test cases = (function points)1.2 The number of test cases estimated above encompasses all forms of testing done on a software system, such as unit testing, integration testing, and system testing. It does not distinguish system test size from total testing effort. Now it is useful to compare the test estimation method used by Hitachi Software [11] and the Caper Jones method [16]. The method used by Hitachi Software estimates the number of test cases by considering code size in number of LOC, whereas Caper Jones estimates the number of test cases by considering function points. We start with a certain number of function points and estimate the code size therefrom for a meaningful comparison. For example, we have considered 100 function points in the two preceding examples. Next, we derive the LOC information by assuming a certain programming language to be able to apply to the Hitachi Software method. Assuming that our target language is C, the estimated number of LOC to be produced in C is 100 × 128 = 12,800 LOC. The method adopted by Hitachi Software produces 850–1280 test cases to test a system with 12,800 LOC, whereas the Caper Jones method estimates 251 test cases. Note that there is a four- to five-fold difference between the number of test cases estimated by the two methods. The difference is interpreted as follows: The Japanese software industry (e.g., Hitachi Software) uses the term test point rather than test case, and a test point is similar to a test step. A test case normally contains between 1 and 10 test steps. The number of test cases estimated by using the Hitachi Software method represents the number of test steps (or test points). Now, if we assume that a test case contains an average of five test steps, the number of test cases estimated by the Hitachi Software method is between 170 and 256, and this range is closer to 251—the number estimated by the Caper Jones method. 12.8.2 Test Case Creation Effort It is necessary to allocate time to create test cases after the test suite structure and the test objectives are identified. On average, the productivity of the manual test case creation is summarized in Table 12.10. The time represents the duration from the entry to the create state to the entry to the released state of the state transition diagram of a test case discussed in Chapter 11. TABLE 12.10 Guidelines for Manual Test Case Creation Effort Size of Test Case Average Number of Test Cases per Person-Day Small, simple test case (1–10 atomic test steps) Large, complex test case (10–50 atomic test steps) 7 – 15 1.5 – 3 12.8 TEST EFFORT ESTIMATION 385 The activities involved in creating test cases have been discussed in detail in Chapter 11 and are summarized here: • Reading and understanding the system requirements and functional specifications documents • Creating the test cases • Entering all the mandatory fields, including test steps, and pass–fail criteria • Reviewing and updating the test case The skill and effectiveness of a test engineer are significant factors that influence the estimation of test case creation. We assume in our guideline provided in Table 12.10 that a test engineer is skilled in developing test cases, that is, he or she has four to six years of prior experience in developing test cases in the relevant area of expertise. Caper Jones [17] estimated that test cases are created at a rate of 30–300 per person-month. In other words 1.5–15 test cases can be created by one person in one day. Our estimation of creating test cases for each group of FR–ATM service interworking is given in the third column of Table 12.6 based on the guidelines provided in Table 12.10. 12.8.3 Test Case Execution Effort The time required to execute a test case depends on the complexity of the test case and the expertise of the executer on the subject matter of the system. Arguably, the test cases in the telecommunications area are the most complex ones and require understanding configuration of switches, routers, and different protocols. In any case, one must consider the following factors in order to estimate the work effort for manual test execution: • Understanding the test cases if the test cases have not been created by the executer • Configuring the test beds • Executing the test steps and evaluating the test steps • Determining the execution status (pass–fail) of the test case • Logging of test results in the test factory database • Collecting and analyzing the data if performance and stress–load test cases are being executed • Updating the test case in the test factory, as discussed in Chapter 11 • In the event of a failure, performing the following tasks: Trying to reproduce the failure Executing different scenarios related to the failure observed Collecting details of the configuration, logs, and hardware configuration Filing a defect report in the defect tracking system 386 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION • If the software developers cannot reproduce a failure, following up with them in debugging and localizing the problem Intuitively—and it can also be observed—the test execution rate is linearly proportional to the pass rate of test cases. In other words, if the test case failure rate is high, it slows down the test case execution rate. This is because a test engineer needs to help the developers in replicating the problem and in identifying the root cause of the problem by trial and error, which takes away a lot of time from execution of test cases. We have summarized, based on our experience, the execution rate of test cases for both new and regression tests in Tables 12.11 and 12.12, respectively. The execution time for newly created test cases is different than the time for the regression test. The difference is due to the fact that the regression test cases were run before, and their test steps and pass–fail criteria were validated earlier. However, if a test engineer has not executed a particular group of regression tests before, he may have to spend more time in understanding the tests and in setting up the test bed in order to execute the assigned regression tests. As a consequence, the execution time for the regression test is almost the same as that of the newly created test case for an inexperienced engineer. Therefore, test automation must be considered for regression tests, which is discussed in a later section of this chapter. Our estimation of test case execution for each group of FR–ATM service interworking is given in the fourth column of Table 12.6 based on the guidelines provided in Tables 12.11 and 12.12. It may be noted that if automated regression tests are available, the total execution time could have been reduced. TABLE 12.11 Guidelines for Manual Test Case Execution Effort Size of Test Case Average Number of Test Cases per Person-Day Small, simple test case (1–10 atomic test steps) Large, complex test case (10–50 atomic test steps) 7 – 15 1.5 – 2.5 TABLE 12.12 Guidelines for Estimation of Effort to Manually Execute Regression Test Cases Average Number of Regression Test Cases per Person-Day Size of Test Case Small, simple test case (1–10 atomic test steps) Large, complex test case (10–50 atomic test steps) Did Not Execute Test Cases Earlier 10 – 15 1 – 2.5 Test Cases Executed Earlier 10 – 20 2.5 – 5 12.9 SCHEDULING AND TEST MILESTONES 387 12.9 SCHEDULING AND TEST MILESTONES For reasons of economy and to meet contractual deadlines, it is important to outline a schedule and the milestones for the test project. Organizations are constrained to release products within the time frame recommended by their marketing and sales departments because of today’s high competitiveness in the market. The system test group is constrained to distribute its testing effort to accommodate the marketing requirement to release the product within a prescribed time frame. One must consider different avenues to complete the system testing phase on time without much delay in product delivery and without compromising the quality of a product. In scheduling the activities in system testing, the leader of the system test group attempts to coordinate the available resources to achieve the projected productivity. The leader considers any interdependency among tasks and schedules the tasks in parallel whenever possible. The milestones, reviews, and test deliverables are specified in the test schedule to accurately reflect the progress of the test project. Scheduling system testing is a portion of the overall scheduling of a software development project. It is merged with the overall software project plan after a schedule for system testing is produced. It is essential to understand and consider the following steps in order to schedule a test project effectively: • Develop a detailed list of tasks, such as: Procurement of equipment to set up the test environments Setting up the test environments Creation of detail test cases for each group and subgroup in the test factory Creation of a test suite in the test factory Execution of test cases during the system test cycles Writing the final test report after completion of system testing • List all the major milestones needed to be achieved during the test project, such as: Review of the test plan with cross-functional team members Completion of the approved version of the test plan Review of the newly created test cases and the chosen test cases for regression testing Completion of test cases; all test cases are in their released states so that those can be a part of a test suite, as discussed in Chapter 11 Creation of the test suite to be executed, as discussed in Chapter 11 Preparation of test environments Date of entry criteria readiness review meeting Official release of the software image to system test team Start of first system test cycle 388 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION End of first system test cycle Start of second system test cycle End of second system test cycle Start of final system test cycle End of final system test cycle Final test report delivery date • Identify the interdependencies among the test tasks and any software mile- stones that may influence the flow of work, such as official release of the software image to the system test group. In addition, identify the tasks that can be performed concurrently. Some examples of concurrent implementations of test tasks are as follows: Test beds can be set up concurrently with the creation of test cases. Creation of test cases in different groups can be done concurrently. Test cases from different groups can be concurrently executed on different test beds. • Identify the different kinds of resources and expertise needed to fulfill each task on the list. • Estimate the quantity of resources needed for each task, such as equipment and human resources, as discussed earlier in this chapter. • Identify the types and quantities of resources available for this testing project, such as: Human resources: persons available and their dates of availability, partor full-time availability of each individual human resource, and area of expertise of each individual human resource Hardware/ software resources: availability of all the hardware/software to build the test environment as discussed in Section 12.6 • Assume that the rest of the resources will be available by certain dates. • Allocate the resources to each task. This allocation is done task by task in sequence, starting backward from the end date of the test project. • Schedule the start and end dates of each task. For example, the test exe- cution task can be based on the estimates discussed in Section 12.8.3. Allow delays for unforeseen events, such as illness, vacation, training, and meetings. • At this point insert the rest of the milestones into the overall schedule. You may have to adjust the resource allocation and the schedule dates. • Determine the earliest and the latest possible start date and end date for each task by taking into account any interdependency between tasks identified earlier. • Review the schedule for reasonableness of the assumptions. At this stage, it is a good idea to ask “what if” questions. Analyze and change the schedule through iterations as the assumptions change. 12.9 SCHEDULING AND TEST MILESTONES 389 • Identify the conditions required to be satisfied for the schedule. In addition, identify the contingency plan to be prepared for a possibility that those conditions are not met. For example, if it is not possible to hire a full-time test engineer by a certain date, one should go for a contractor or move in other test engineers from a different project. • Document the assumptions, such as (i) new test engineer must be hired by a certain date, (ii) new equipment must be available by a certain date, and (iii) space availability to set up the test environment. • Review the schedule with the test team and get their feedback. The schedule may not be met without their active participation. The test effort estimate and schedule need to go through a few rounds of iterations in order to make them reliable. A good software tool available in the market is valuable to juggle the test schedule. A major problem the test team leader may encounter from management is how to shorten the test case execution time. Some tasks might be completely eliminated by reducing the scope and depth of testing. If not possible, one may include skilled test professionals with high proficiency, ask people to work overtime, and add more people and test beds. Another way to reduce the execution time is to automate all the test cases upfront so that execution time is reduced drastically. Gantt Chart A Gantt chart is often used to represent a project schedule that includes the duration of individual tasks, their dependencies, and their ordering. A typical Gantt chart graphically displays the start and end points of each task—the total duration needed to complete each task. As the project progresses, it displays the percentage of completion of each task. It allows the planner to assess the duration of a project, identify the resources needed, and lay out the order in which tasks need to be carried out. It is useful in managing the dependencies between tasks. It is widely used for project planning and scheduling in order to: • Assess the time characteristics of a project • Show the task order • Show the link between scheduled tasks • Define resources involved • Monitor project completion In a Gantt chart, each task takes one row. Dates run along the top in increments of days, weeks, or months, depending on the length of the project. The expected time for each task is represented by a horizontal rectangle or line. Tasks may overlap or run in sequence or in parallel. Often the project has important events, and one would want to show those events on the project timeline. For example, we may wish to highlight when a prototype is completed or the date of a test plan review. One can enter these on a Gantt chart as milestone events and mark them with a special symbol, often a diamond. The tasks should be kept to a manageable number (no more than 15 or 20) during the construction of a Gantt chart so that the chart fits on a single page. More complex projects 390 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION may require subordinate charts which detail the timing of all the subtasks of the main tasks. It often helps to have an additional column containing numbers or initials which identify the team member responsible for the task for team projects. Remark. The idea of a Gantt chart was originally proposed in 1917 by the American engineer Henry L. Gantt. He developed the first Gantt chart to be used in production flow planning. Accepted as a commonplace project management tool today, it was an innovation of worldwide importance in the 1920s. Gantt charts have been successfully used in large construction projects such as the Hoover Dam project started in 1931 and the interstate highway network started in 1956. Example: FR–ATM PVC Service Interworking. Let us consider the FR–ATM PVC service interworking example discussed in Chapter 11. A high-level test schedule for this test project is shown in Figure 12.2 using the Gantt chart. We made the following assumptions in planning the test schedule: • Four test engineers: Alex, Rohan, Inu, and Lucy are available for this project from day 1, and Alex is the test team leader for this project. • All four engineers are well trained to generate test cases in the test factory. • All of them are knowledgeable in the area of FR and ATM protocol. • It took five days for Alex to develop the test plan for this project. ID Task Name Duration Start 1 Development of the test plan 5 days Jan 3 2 Review the test plan 0 days Jan 7 3 Update the test plan 1 day Jan 10 4 Test plan approved 0 days Jan 10 5 Procurement of equipment 7 days Jan 11 6 Set up the test environment 10 days Jan 20 7 Creation of test cases in test factory 18 days Jan 11 8 Review the test cases 0 days Feb 3 9 Update the test cases in test factory 4 days Feb 4 10 Test cases are in released state 0 days Feb 9 11 Creation of test suite in test factory 1 day Feb 10 12 Entrance criteria readiness meeting 1 day Feb 11 13 Official release of S/W to system test 0 days Feb 14 14 Start of first test cycle 0 days Feb 14 15 First test cycle 15 days Feb 14 16 End of first test cycle 0 days Mar 4 17 Start of second test cycle 0 days Mar 4 18 Second test cycle 15 days Mar 7 19 End of second test cycle 0 days Mar 25 20 Beta release criteria review meeting 0 days Mar 25 21 Release the software to beta customer 1 days Mar 28 22 Start of third test cycle 0 days Mar 28 23 Third test cycle 5 days Mar 29 24 End of third test cycle 0 days Apr 4 25 Test report preparation 4 days Apr 5 26 Release criteria review meeting 0 days Apr 8 27 Release the software to the customer 1 day Apr 13 Finish Jan 7 Jan 7 Jan 10 Jan 10 Jan 19 Feb 2 Feb 3 Feb 3 Feb 9 Feb 9 Feb 10 Feb 11 Feb 14 Feb 14 Mar 4 Mar 4 Mar 4 Mar 25 Mar 25 Mar 25 Mar 28 Mar 28 Apr 4 Apr 4 Apr 8 Apr 8 Apr 13 December January February March Alex April 1/7 Alex 1/10 Lucy Lucy Alex,Inu,Rohan 2/3 Alex,Rohan,Inu, 2/9 Rohan,Lucy,Inu, Alex 2/14 2/14 Alex,Rohan,Lucy,Inu 3/4 3/4 Alex,Rohan,Inu,Lucy 3/25 3/25 Alex 3/28 Alex,Rohan,Inu,Lucy 4/4 Alex,Rohan 4/8 Alex Figure 12.2 Gantt chart for FR–ATM service interworking test project. 12.10 SYSTEM TEST AUTOMATION 391 • No one has any plans to take a vacation during the system testing period. • Test equipment is available for this test project. The test environment setup and the creation of test cases are done concurrently. Lucy is assigned to set up the test beds whereas the rest of the engineers are assigned to design test cases. We have estimated approximately 35 person-days to create 168 test cases, excluding the regression test case as given in Table 12.6. Therefore, we have allocated 18 days to design the test cases distributed among three test engineers, Alex, Inu, and Rohan, with a combined 54 person-days. This includes training and understanding of the FR–ATM service internetworking protocol. The first test cycle is started after the entry criteria are verified at the readiness review meeting. According to our estimation, it will take 60 person-days to execute all the selected 318 test cases, including the regression tests. Therefore, we have allocated 15 days to execute all the test cases by four test engineers, Alex, Inu, Rohan, and Lucy, with a combined resource of 60 person-days. After the second test cycle, the software is scheduled to be released to beta customers. A subset of the test cases is selected for regression testing in the final test cycle. Therefore, only one week is allocated for this task. The software is released to the customers after the final test report is created and reviewed. 12.10 SYSTEM TEST AUTOMATION It is absolutely necessary for any testing organization to move forward to become more efficient, in particular in the direction of test automation. The reasons for automating test cases are given in Table 12.13. It is important to think about automation as a strategic business activity. A strategic activity requires senior management support; otherwise it will most likely fail due to lack of funding. It should be aligned with the business mission and goals and a desire to speed up delivery of the system to the market without compromising quality. However, automation is a long-term investment; it is an on-going process. It cannot be achieved overnight; expectation need to be managed to ensure that it is realistically achievable within a certain time period. TABLE 12.13 Benefits of Automated Testing 1. Test engineer productivity 2. Coverage of regression testing 3. Reusability of test cases 4. Consistency in testing 5. Test interval reduction 6. Reduced software maintenance cost 7. Increased test effectiveness 392 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION The organization must assess and address a number of considerations before test automation can proceed. The following prerequisites need to be considered for an assessment of whether or not the organization is ready for test automation: • The system is stable and its functionalities are well defined. • The test cases to be automated are unambiguous. • The test tools and infrastructure are in place. • The test automation professionals have prior successful experience with automation. • Adequate budget has been allocated for the procurement of software tools. The system must be stable enough for automation to be meaningful. If the system is constantly changing or frequently crashing, the maintenance cost of the automated test suite will be rather high to keep the test cases up to date with the system. Test automation will not succeed unless detailed test procedures are in place. It is very difficult to automate a test case which is not well defined to be manually executed. If the tests are executed in an ad hoc manner without developing the test objectives, detailed test procedure, and pass–fail criteria, then they are not ready for automation. If the test cases are designed as discussed in Chapter 11, then automation is likely to be more successful. The test engineers should have significant programming experience. It is not possible to automate tests without using programming languages, such as Tcl (Tool command language), C, Perl, Python, Java, and Expect. It takes months to learn a programming language. The development of an automation process will fail if the testers do not have the necessary programming skills or are reluctant to develop it. Adding temporary contractors to the test team in order to automate test cases may not work. The contractors may assist in developing test libraries but will not be able to maintain an automated test suite on an on-going basis. Adequate budget should be available to purchase and maintain new software and hardware tools to be used in test automation. The organization should keep aside funds to train the staff with new software and hardware tools. Skilled professionals with good automation background may need to be added to the test team in order to carry out the test automation project. Therefore, additional head count should be budgeted by the senior executive of the organization. 12.11 EVALUATION AND SELECTION OF TEST AUTOMATION TOOLS A test automation tool is a software application that assists in the automation of test cases that would otherwise be run manually. Some tools are commercially available in the market, but for testing complex, imbedded, real-time systems, very few commercial test tools are available in the market. Therefore, most organizations build their own test automation frameworks using programming languages such as C and Tcl. It is essential to combine both hardware and software for real-time testing 12.11 EVALUATION AND SELECTION OF TEST AUTOMATION TOOLS 393 tools. This is due to the fact that special kinds of interface cards are required to be connected to the SUT. The computing power of personal computers with network interface cards may not be good enough to send traffic to the SUT. Test professionals generally build their own test tools in high-technology fields, such as telecommunication equipment and application based on IP. Commercial third-party test tools are usually not available during the system testing phase. For example, there were no commercially available test tools during the testing of the 1xEv-DO system described in Chapter 8. The second author of this book developed in-house software tools to simulate access terminals using their own products. However, we advocate that testers should build their own test automation tools only if they have no alternative. Building and maintaining one’s own test automation tool from scratch are time-consuming tasks and an expensive undertaking. The test tool evaluation criteria are formulated for the selection of the right kind of software tool. There may be no tool that fulfills all the criteria. Therefore, we should be a bit flexible during the evaluation of off-the-self automation tools available in the market. The broad criteria for evaluating test automation tools have been classified into the following eight categories as shown in Figure 12.3. 1. Test Development Criteria: An automation test tool should provide a high-level, preferably nonproprietary, easy-to-use test scripting language such as Tcl. It should have the ability to interface and drive modules that can be easily written in, for example, C, Tcl, Perl, or Visual Basic. The tool must provide facility to directly access, read, modify, and control the internals of the automated test scripts. The input test data should be stored separately from the test script but easily cross-referenced to the corresponding test scripts, if necessary. The tool should have built-in templates of test scripts, test cases, tutorials, and demo application examples to show how to develop automated test cases. Finally, no changes should be made to the SUT in order to use the tool. The vendor’s Test development Test maintenance Test execution Test results Test management GUI Testing capability Vendor Qualification Pricing Tool evaluation criteria Figure 12.3 Broad criteria of test automation tool evaluation. 394 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION recommended environment should match the real test laboratory execution environment. 2. Test Maintenance Criteria: The tool should possess a rich set of features, such as version control capability on test cases, test data, and migration of test cases across different platforms. The tool must provide powerful, easy-to-use facilities to browse, navigate, modify, and reuse the test suites. The tool should have the ability to select a subset of test cases to form a group for a particular test run based on one or more distinguishing characteristics. A tool needs to have features to allow modification and replication of test cases, easy addition of new test cases, and import from another. The tool should have the capability to add multiple tags to a test case and modify those tags so that the test case can be easily selected in a subgroup of test cases sharing a common characteristic. 3. Test Execution Criteria: An automation tool should allow test cases to be executed individually, as a group, or in a predefined sequence. The user should have the ability to check the interim results during the execution of a group of tests and exercise other options for the remainder of the tests based on the interim results. The user should have the option to pause and resume the execution of a test suite. The tool should have the facility to execute the test suite over the Internet. The tool should allow simultaneous execution of several test suites that can be distributed across multiple machines for parallel execution. This substantially reduces the time needed for testing if multiple test machines are available. The test tool should have a capability for monitoring, measuring, and diagnosing performance characteristics of the SUT. Finally, the tool should have the capability to be integrated with other software tools which are either in use or expected to be used. 4. Test Results Criteria: The test tool must provide a flexible, comprehensive logging process during execution of the test suite, which may include detailed records of each test case, test results, time and date, and pertinent diagnostic data. A tool should have the capability to cross-reference the test results back to the right versions of test cases. The test result log can be archived in an industry standard data format and the tool should have an effective way to access and browse the archived test results. The tool should provide query capability to extract test results, analyze the test status and trend, and produce graphical reports of the test results. Finally, the tool should have the capability to collect and analyze response time and throughput as an aid to performance testing. 5. Test Management Criteria: A tool should have the ability to provide a test structure, or hierarchy, that allows test cases to be stored and retrieved in a manner that the test organization wants to organize. The tool should have the capability to allocate tests or groups of tests to specific test engineers and compare the work status with the plan through graphic display. A tool needs to have authorization features. For example, a test script developer may be authorized to create and update the test scripts, while the test executer can only access them in the run mode. The tool should have the capability to send out emails with the test results after completion of test suite execution. 12.12 TEST SELECTION GUIDELINES FOR AUTOMATION 395 6. GUI Testing Capability Criteria: An automated GUI test tool should include a record/playback feature which allows the test engineers to create, modify, and run automated tests across many environments. These tools should have a capability to recognize and deal with all types of GUI objects, such as list boxes, radio buttons, icons, joysticks, hot keys, and bit-map images with changes in color shades and presentation fonts. The recording activity of the tool capturing the keystrokes entered by the test engineer can be represented as scripts in a high-level programming language and saved for future replay. The tools must allow test engineers to modify test scripts to create reusable test procedures to be played back on a new software image for comparison. The performance of a GUI test tool needs to be evaluated. One may consider the question: How fast can the tool record and playback a complex test scenario or a group of test scenarios? 7. Vendor Qualification Criteria: Many questions need to be asked about the vendor’s financial stability, age of the vendor company, and its capability to support the tool. The vendor must be willing to fix problems that arise with the tool. A future roadmap must exist for the product. Finally, the maturity and market share of the product must be evaluated. 8. Pricing Criteria: Pricing is an important aspect of the product evaluation criteria. One can ask a number of questions: Is the price competitive? Is it within the estimated price range for an initial tool purchase? For a large number of licenses, a pricing discount can be negotiated with the vendor. Finally, the license must explicitly cap the maintenance cost of the test tool from year to year. Tool vendors may guarantee the functionality of the test tool; however, experience shows that often test automation tools do not work as expected within the particular test environment. Therefore, it is recommended to evaluate the test tool by using it before making the decision to purchase it. The test team leader needs to contact the tool vendor to request a demonstration. After a demonstration of the tool, if the test team believes that the tool holds potential, then the test team leader may ask for a temporary license of the tool for evaluation. At this point enough resources are allocated to evaluate the test tool. The evaluator should have a clear understanding of the tool requirements and should make a test evaluation plan based on the criteria outlined previously. The goal here is to ensure that the test tool performs as advertised by the vendor and that the tool is the best product for the requirement. Following the hands-on evaluation process, an evaluation report is prepared. The report documents the hands-on experience with the tool. This report should contain background information, a tool summary, technical findings, and a conclusion. This document is designed to address the management concerns because eventually it has to be approved by executive management. 12.12 TEST SELECTION GUIDELINES FOR AUTOMATION Test cases should be automated only if there is a clear economic benefit over manual execution. Some test cases are easy to automate while others are more cumbersome. 396 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION The general guideline shown in Figure 12.4 may be used in evaluating the suitability of test cases to be automated as follows: Less Volatile: A test case is stable and is unlikely to change over time. The test case should have been executed manually before. It is expected that the test steps and the pass–fail criteria are not likely to change any more. Repeatability: Test cases that are going to be executed several times should be automated. However, one-time test cases should not be considered for automation. Poorly designed test cases which tend to be difficult to reuse are not economical for automation. High Risk : High-risk test cases are those that are routinely rerun after every new software build. The objectives of these test cases are so important that one cannot afford to not reexecute them. In some cases the propensity of the test cases to break is very high. These test cases are likely to be fruitful in the long run and are the right candidates for automation. Easy to Automate: Test cases that are easy to automate using automation tools should be automated. Some features of the system are easier to test than other features, based on the characteristics of a particular tool. Custom objects with graphic and sound features are likely to be more expensive to automate. Manually Difficult: Test cases that are very hard to execute manually should be automated. Manual test executions are a big problem, for example, causing eye strain from having to look at too many screens for too long in a GUI test. It is strenuous to look at transient results in real-time applications. These nasty, unpleasant test cases are good candidates for automation. Boring and Time Consuming: Test cases that are repetitive in nature and need to be executed for longer periods of time should be automated. The tester’s time should be utilized in the development of more creative and effective test cases. Less volatile Repeatability High risk Easy to automate Manually dificult Boring and time consuming Figure 12.4 Test selection guideline for automation. Guidelines for test automation 12.13 CHARACTERISTICS OF AUTOMATED TEST CASES 397 12.13 CHARACTERISTICS OF AUTOMATED TEST CASES The largest component of test case automation is programming. Unless test cases are designed and coded properly, their execution and maintenance may not be effective. The design characteristics of effective test cases were discussed in Chapter 11. A formal model of a standard test case schema was also provided in Chapter 11. In this section, we include some key points which are pertinent to the coding of test cases. The characteristics of good automated test cases are given in Figure 12.5 and explained in the following. 1. Simple: The test case should have a single objective. Multiobjective test cases are difficult to understand and design. There should not be more than 10–15 test steps per test case, excluding the setup and cleanup steps. Multipurpose test cases are likely to break or give misleading results. If the execution of a complex test leads to a system failure, it is difficult to isolate the cause of the failure. 2. Modular: Each test case should have a setup and cleanup phase before and after the execution test steps, respectively. The setup phase ensures that the initial conditions are met before the start of the test steps. Similarly, the cleanup phase puts the system back in the initial state, that is, the state prior to setup. Each test step should be small and precise. One input stimulus should be provided to the system at a time and the response verified (if applicable) with an interim verdict. The test steps are building blocks from reusable libraries that are put together to form multistep test cases. 3. Robust and Reliable: A test case verdict (pass–fail) should be assigned in such a way that it should be unambiguous and understandable. Robust test cases can ignore trivial failures such as one-pixel mismatch in a graphical display. Care should be taken so that false test results are minimized. The test cases must have built-in mechanisms to detect and recover from errors. For example, a test case Simple Modular Robust and reliable Reusable Maintainable Documented Independent and self-sufficient Figure 12.5 Characteristics of automated test cases. Characteristics 398 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION need not wait indefinitely if the SUT has crashed. Rather, it can wait for a while and terminate an indefinite wait by using a timer mechanism. 4. Reusable: The test steps are built to be configurable, that is, variables should not be hard coded. They can take values from a single configurable file. Attention should be given while coding test steps to ensure that a single global variable is used, instead of multiple, decentralized, hard-coded variables. Test steps are made as independent of test environments as possible. The automated test cases are categorized into different groups so that subsets of test steps and test cases can be extracted to be reused for other platforms and/or configurations. Finally, in GUI automation hard-coded screen locations must be avoided. 5. Maintainable: Any changes to the SUT will have an impact on the automated test cases and may require necessary changes to be done to the affected test cases. Therefore, it is required to conduct an assessment of the test cases that need to be modified before an approval of the project to change the system. The test suite should be organized and categorized in such a way that the affected test cases are easily identified. If a particular test case is data driven, it is recommended that the input test data be stored separately from the test case and accessed by the test procedure as needed. The test cases must comply with coding standard formats. Finally, all the test cases should be controlled with a version control system. 6. Documented: The test cases and the test steps must be well documented. Each test case gets a unique identifier, and the test purpose is clear and understandable. Creator name, date of creation, and the last time it was modified must be documented. There should be traceability to the features and requirements being checked by the test case. The situation under which the test case cannot be used is clearly described. The enviro