首页资源分类嵌入式系统 > 基于模型的嵌入式系统测试

基于模型的嵌入式系统测试

已有 445466个资源

下载专区

上传者其他资源

    文档信息举报收藏

    标    签:嵌入式系统测试

    分    享:

    文档简介

    基于模型的嵌入式系统测试

    文档预览

    Model-Based Testing for Embedded Systems EDITED BY Justyna Zander, Ina Schieferdecker, and Pieter J. Mosterman Model-Based Testing for Embedded Systems Computational Analysis, Synthesis, and Design of Dynamic Systems Series Series Editor Pieter J. Mosterman MathWorks Natick, Massachusetts McGill University Montréal, Québec Discrete-Event Modeling and Simulation: A Practitioner’s Approach, Gabriel A. Wainer Discrete-Event Modeling and Simulation: Theory and Applications, edited by Gabriel A. Wainer and Pieter J. Mosterman Model-Based Design for Embedded Systems, edited by Gabriela Nicolescu and Pieter J. Mosterman Model-Based Testing for Embedded Systems, edited by Justyna Zander, Ina Schieferdecker, and Pieter J. Mosterman Multi-Agent Systems: Simulation and Applications, edited by Adelinde M. Uhrmacher and Danny Weyns Forthcoming Titles: Computation for Humanity: Information Technology to Advance Society, edited by Justyna Zander and Pieter J. Mosterman Real-Time Simulation Technologies: Principles, Methodologies, and Applications, edited by Katalin Popovici and Pieter J. Mosterman Model-Based Testing for Embedded Systems EDITED BY Justyna Zander, Ina Schieferdecker, and Pieter J. Mosterman Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa business MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software. CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2012 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20110804 International Standard Book Number-13: 978-1-4398-1847-3 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents Preface ix Editors xi MATLAB Statement xiv Contributors xv Technical Review Committee xix Book Introduction xxi Part I Introduction 1 A Taxonomy of Model-Based Testing for Embedded Systems from Multiple Industry Domains 3 Justyna Zander, Ina Schieferdecker, and Pieter J. Mosterman 2 Behavioral System Models versus Models of Testing Strategies in Functional Test Generation 23 Antti Huima 3 Test Framework Architectures for Model-Based Embedded System Testing 49 Stephen P. Masticola and Michael Gall Part II Automatic Test Generation 4 Automatic Model-Based Test Generation from UML State Machines 77 Stephan Weißleder and Holger Schlingloff 5 Automated Statistical Testing for Embedded Systems 111 Jesse H. Poore, Lan Lin, Robert Eschbach, and Thomas Bauer 6 How to Design Extended Finite State Machine Test Models in Java 147 Mark Utting 7 Automatic Testing of LUSTRE/SCADE Programs 171 Virginia Papailiopoulou, Besnik Seljimi, and Ioannis Parissis 8 Test Generation Using Symbolic Animation of Models 195 Fr´ed´eric Dadeau, Fabien Peureux, Bruno Legeard, R´egis Tissot, Jacques Julliand, Pierre-Alain Masson, and Fabrice Bouquet v vi Part III Integration and Multilevel Testing 9 Model-Based Integration Testing with Communication Sequence Graphs Fevzi Belli, Axel Hollmann, and Sascha Padberg 10 A Model-Based View onto Testing: Criteria for the Derivation of Entry Tests for Integration Testing Manfred Broy and Alexander Pretschner 11 Multilevel Testing for Embedded Systems Abel Marrero P´erez and Stefan Kaiser 12 Model-Based X-in-the-Loop Testing Ju¨rgen Großmann, Philip Makedonski, Hans-Werner Wiesbrock, Jaroslav Svacina, Ina Schieferdecker, and Jens Grabowski Contents 223 245 269 299 Part IV Specific Approaches 13 A Survey of Model-Based Software Product Lines Testing 339 Sebastian Oster, Andreas Wu¨bbeke, Gregor Engels, and Andy Schu¨rr 14 Model-Based Testing of Hybrid Systems 383 Thao Dang 15 Reactive Testing of Nondeterministic Systems by Test Purpose- Directed Tester 425 Ju¨ri Vain, Andres Kull, Marko Ka¨¨aramees, Maili Markvardt, and Kullo Raiend 16 Model-Based Passive Testing of Safety-Critical Components 453 Stefan Gruner and Bruce Watson Part V Testing in Industry 17 Applying Model-Based Testing in the Telecommunication Domain 487 Fredrik Abbors, Veli-Matti Aho, Jani Koivulainen, Risto Teittinen, and Dragos Truscan 18 Model-Based GUI Testing of Smartphone Applications: Case S60TM and Linux 525 Antti J¨a¨askel¨ainen, Tommi Takala, and Mika Katara 19 Model-Based Testing in Embedded Automotive Systems 545 Pawel Skruch, Miroslaw Panek, and Bogdan Kowalczyk Part VI Testing at the Lower Levels of Development 20 Testing-Based Translation Validation of Generated Code 579 Mirko Conrad Contents vii 21 Model-Based Testing of Analog Embedded Systems Components 601 Lee Barford 22 Dynamic Verification of SystemC Transactional Models 619 Laurence Pierre and Luca Ferro Index 639 This page intentionally left blank Preface The ever-growing pervasion of software-intensive systems into technical, business, and social areas not only consistently increases the number of requirements on system functionality and features but also puts forward ever-stricter demands on system quality and reliability. In order to successfully develop such software systems and to remain competitive on top of that, early and continuous consideration and assurance of system quality and reliability are becoming vitally important. To achieve effective quality assurance, model-based testing has become an essential ingredient that covers a broad spectrum of concepts, including, for example, automatic test generation, test execution, test evaluation, test control, and test management. Model-based testing results in tests that can already be utilized in the early design stages and that contribute to high test coverage, thus providing great value by reducing cost and risk. These observations are a testimony to both the effectiveness and the efficiency of testing that can be derived from model-based approaches with opportunities for better integration of system and test development. Model-based test activities comprise different methods that are best applied complementing one another in order to scale with respect to the size and conceptual complexity of industry systems. This book presents model-based testing from a number of different perspectives that combine various aspects of embedded systems, embedded software, their models, and their quality assurance. As system integration has become critical to dealing with the complexity of modern systems (and, indeed, systems of systems), with software as the universal integration glue, model-based testing has come to present a persuasive value proposition in system development. This holds, in particular, in the case of heterogeneity such as components and subsystems that are partially developed in software and partially in hardware or that are developed by different vendors with off-the-shelf components. This book provides a collection of internationally renowned work on current technological achievements that assure the high-quality development of embedded systems. Each chapter contributes to the currently most advanced methods of model-based testing, not in the least because the respective authors excel in their expertise in system verification and validation. Their contributions deliver supreme improvements to current practice both in a qualitative as well as in a quantitative sense, by automation of the various test activities, exploitation of combined model-based testing aspects, integration into model-based design process, and focus on overall usability. We are thrilled and honored by the participation of this select group of experts. They made it a pleasure to compile and edit all of the material, and we sincerely hope that the reader will find the endeavor of intellectual excellence as enjoyable, gratifying, and valuable as we have. In closing, we would like to express our genuine appreciation and gratitude for all the time and effort that each author has put into his or her chapter. We gladly recognize that the high quality of this book is solely thanks to their common effort, collaboration, and communication. In addition, we would like to acknowledge the volunteer services of those who joined the technical review committee and to extend our genuine appreciation for their involvement. Clearly, none of this would have been possible had it not been for the ix x Preface continuous support of Nora Konopka and her wonderful team at Taylor & Francis. Many thanks to all of you! Finally, we would like to gratefully acknowledge support by the Alfried Krupp von Bohlen und Halbach Stiftung. Justyna Zander Ina Schieferdecker Pieter J. Mosterman Editors Justyna Zander is a postdoctoral research scientist at Harvard University (Harvard Humanitarian Initiative) in Cambridge, Massachusetts, (since 2009) and project manager at the Fraunhofer Institute for open communication systems in Berlin, Germany (since 2004). She holds a PhD (2008) and an MSc (2005), both in the fields of computer science and electrical engineering from Technical University Berlin in Germany, a BSc (2004) in computer science, and a BSc in environmental protection and management from Gdansk University of Technology in Poland (2003). She graduated from the Singularity University, Mountain View, California, as one of 40 participants selected from 1200 applications in 2009. For her scientific efforts, Dr. Zander received grants and scholarships from institutions such as the Polish Prime Ministry (1999– 2000), the Polish Ministry of Education and Sport (2001–2004), which is awarded to 0.04% students in Poland, the German Academic Exchange Service (2002), the European Union (2003–2004), the Hertie Foundation (2004–2005), IFIP TC6 (2005), IEEE (2006), Siemens (2007), Metodos y Tecnologia (2008), Singularity University (2009), and Fraunhofer Gesellschaft (2009–2010). Her doctoral thesis on model-based testing was supported by the German National Academic Foundation with a grant awarded to 0.31% students in Germany (2005–2008). xi xii Editors Ina Schieferdecker studied mathematical computer science at Humboldt-University Berlin and earned her PhD in 1994 at Technical University Berlin on performance-extended specifications and analysis of quality-of-service characteristics. Since 1997, she has headed the Competence Center for Testing, Interoperability and Performance (TIP) at the Fraunhofer Institute on Open Communication Systems (FOKUS), Berlin, and now heads the Competence Center Modelling and Testing for System and Service Solutions (MOTION). She has been a professor on engineering and testing of telecommunication systems at Technical university Berlin since 2003. Professor Schieferdecker has worked since 1994 in the area of design, analysis, testing, and evaluation of communication systems using specification-based techniques such as unified modeling language, message sequence charts, and testing and test control notation (TTCN-3). Professor Schieferdecker has written many scientific publications in the area of system development and testing. She is involved as editorial board member with the International Journal on Software Tools for Technology Transfer. She is a cofounder of the Testing Technologies IST GmbH, Berlin, and a member of the German Testing Board. In 2004, she received the Alfried Krupp von Bohlen und Halbach Award for Young Professors, and she became a member of the German Academy of Technical Sciences in 2009. Her work on this book was partially supported by the Alfried Krupp von Bohlen und Halbach Stiftung. Editors xiii Pieter J. Mosterman is a senior research scientist at MathWorks in Natick, Massachusetts, where he works on core Simulink r simulation and code generation technologies, and he is an adjunct professor at the School of Computer Science of McGill University. Previouly, he was a research associate at the German Aerospace Center (DLR) in Oberpfaffenhofen. He has a PhD in electrical and computer engineering from Vanderbilt University in Nashville, Tennessee, and an MSc in electrical engineering from the University of Twente, the Netherlands. His primary research interests are in Computer Automated Multiparadigm Modeling (CAMPaM) with principal applications in design automation, training systems, and fault detection, isolation, and reconfiguration. He designed the Electronics Laboratory Simulator, nominated for the Computerworld Smithsonian Award by Microsoft Corporation in 1994. In 2003, he was awarded the IMechE Donald Julius Groen Prize for a paper on HyBrSim, a hybrid bond graph modeling and simulation environment. Professor Mosterman received the Society for Modeling and Simulation International (SCS) Distinguished Service Award in 2009 for his services as editor-in-chief of SIMULATION: Transactions of SCS. He is or has been an associate editor of the International Journal of Critical Computer Based Systems, the Journal of Defense Modeling and Simulation, the International Journal of Control and Automation, Applied Intelligence, and IEEE Transactions on Control Systems Technology. MATLAB Statement MATLAB r is a registered trademark of The MathWorks, Inc. For product information, please contact: The MathWorks, Inc. 3 Apple Hill Drive Natick, MA 01760-2098 USA Tel: 508 647 7000 Fax: 508-647-7001 E-mail: info@mathworks.com Web: www.mathworks.com xiv Contributors Fredrik Abbors Department of Information Technologies ˚Abo Akademi University Turku, Finland Veli-Matti Aho Process Excellence Nokia Siemens Networks Tampere, Finland Lee Barford Measurement Research Laboratory Agilent Technologies Reno, Nevada and Department of Computer Science and Engineering University of Nevada Reno, Nevada Thomas Bauer Fraunhofer Institute for Experimental Software Engineering (IESE) Kaiserslautern, Germany Fevzi Belli Department of Electrical Engineering and Information Technology University of Paderborn Paderborn, Germany Fabrice Bouquet Computer Science Department University of Franche-Comt´e/INRIA CASSIS Project Besan¸con, France Manfred Broy Institute for Computer Science Technische Universit¨at Mu¨nchen Garching, Germany Mirko Conrad The MathWorks, Inc. Natick, Massachusetts Fr´ed´eric Dadeau Computer Science Department University of Franche-Comt´e/INRIA CASSIS Project Besan¸con, France Thao Dang VERIMAG CNRS (French National Center for Scientific Research) Gieres, France Gregor Engels Software Quality Lab—s-lab University of Paderborn Paderborn, Germany Robert Eschbach Fraunhofer Institute for Experimental Software Engineering (IESE) Kaiserslautern, Germany Luca Ferro TIMA Laboratory University of Grenoble, CNRS Grenoble, France Michael Gall Siemens Industry, Inc. Building Technologies Division Florham Park, New Jersey Jens Grabowski Institute for Computer Science University of Goettingen Goldschmidtstraße 7 Goettingen, Germany xv xvi Ju¨rgen Großmann Fraunhofer Institute FOKUS Kaiserin-Augusta-Allee 31 Berlin, Germany Stefan Gruner Department of Computer Science University of Pretoria Pretoria, Republic of South Africa Axel Hollmann Department of Applied Data Technology Institute of Electrical and Computer Engineering University of Paderborn Paderborn, Germany Antti Huima President and CEO Conformiq-Automated Test Design Saratoga, California Antti J¨a¨askel¨ainen Department of Software Systems Tampere University of Technology Tampere, Finland Jacques Julliand Computer Science Department University of Franche-Comt´e Besan¸con, France Marko K¨a¨aramees Department of Computer Science Tallinn University of Technology Tallinn, Estonia Stefan Kaiser Fraunhofer Institute FOKUS Berlin, Germany Mika Katara Department of Software Systems Tampere University of Technology Tampere, Finland Jani Koivulainen Conformiq Customer Success Conformiq Espoo, Finland Contributors Bogdan Kowalczyk Delphi Technical Center Krako´w ul. Podgo´rki Tynieckie 2 Krako´w, Poland Andres Kull ELVIOR Tallinn, Estonia Bruno Legeard Research and Development Smartesting/University of Franche-Comt´e Besan¸con, France Lan Lin Department of Electrical Engineering and Computer Science University of Tennessee Knoxville, Tennessee Philip Makedonski Institute for Computer Science University of Goettingen Goldschmidtstraße 7 Goettingen, Germany Maili Markvardt Department of Computer Science Tallinn University of Technology Tallinn, Estonia Abel Marrero P´erez Daimler Center for Automotive IT Innovations Berlin Institute of Technology Berlin, Germany Pierre-Alain Masson Computer Science Department University of Franche-Comt´e Besan¸con, France Stephen P. Masticola System Test Department Siemens Fire Safety Florham Park, New Jersey Contributors xvii Pieter J. Mosterman MathWorks, Inc. Natick, Massachusetts and McGill University School of Computer Science Montreal, Quebec, Canada Jesse H. Poore Ericsson-Harlan D. Mills Chair in Software Engineering Department of Electrical Engineering and Computer Science University of Tennessee Knoxville, Tennessee Sebastian Oster Real-Time Systems Lab Technische Universit¨at Darmstadt Darmstadt, Germany Sascha Padberg Department of Applied Data Technology Institute of Electrical and Computer Engineering University of Paderborn Paderborn, Germany Miroslaw Panek Delphi Technical Center Krako´w ul. Podgo´rki Tynieckie 2 Krako´w, Poland Virginia Papailiopoulou INRIA Rocquencourt, France Ioannis Parissis Grenoble INP—Laboratoire de Conception et d’Int´egration des Syst´emes University of Grenoble Valence, France Fabien Peureux Computer Science Department University of Franche-Comt´e Besan¸con, France Laurence Pierre TIMA Laboratory University of Grenoble, CNRS Grenoble, France Alexander Pretschner Karlsruhe Institute of Technology Karlsruhe, Germany Kullo Raiend ELVIOR Tallinn, Estonia Ina Schieferdecker Fraunhofer Institute FOKUS Kaiserin-Augusta-Allee 31 Berlin, Germany Holger Schlingloff Fraunhofer Institute FIRST Kekulestraße Berlin, Germany Andy Schu¨rr Real-Time Systems Lab Technische Universit¨at Darmstadt Darmstadt, Germany Besnik Seljimi Faculty of Contemporary Sciences and Technologies South East European University Tetovo, Macedonia Pawel Skruch Delphi Technical Center Krako´w ul. Podgo´rki Tynieckie 2 Krako´w, Poland Jaroslav Svacina Fraunhofer Institute FIRST Kekulestraße Berlin, Germany Tommi Takala Department of Software Systems Tampere University of Technology Tampere, Finland xviii Risto Teittinen Process Excellence Nokia Siemens Networks Espoo, Finland Contributors Bruce Watson Department of Computer Science University of Pretoria Pretoria, Republic of South Africa R´egis Tissot Computer Science Department University of Franche-Comt´e Besan¸con, France Stephan Weißleder Fraunhofer Institute FIRST Kekulestraße 7 Berlin, Germany Dragos Truscan Department of Information Technologies ˚Abo Akademi University Turku, Finland Hans-Werner Wiesbrock IT Power Consultants Kolonnenstraße 26 Berlin, Germany Mark Utting Department of Computer Science University of Waikato Hamilton, New Zealand Andreas Wu¨bbeke Software Quality Lab—s-lab University of Paderborn Paderborn, Germany Ju¨ri Vain Department of Computer Science/Institute of Cybernetics Tallinn University of Technology Tallinn, Estonia Justyna Zander Harvard University Cambridge, Massachusetts and Fraunhofer Institute FOKUS Kaiserin-Augusta-Allee 31 Berlin, Germany Technical Review Committee Lee Barford Fevzi Belli Fabrice Bouquet Mirko Conrad Fr´ed´eric Dadeau Thao Dang Thomas Deiss Vladimir Entin Alain-Georges Vouffo Feudjio Gordon Fraser Ambar Gadkari Michael Gall Jeremy Gardiner Juergen Grossmann Stefan Gruner Axel Hollmann Mika Katara Bogdan Kowalczyk Yves Ledru Pascale LeGall Jenny Li Levi Lucio Jos´e Carlos Maldonado Eda Marchetti Steve Masticola Swarup Mohalik Pieter J. Mosterman Sebastian Oster Jan Peleska Abel Marrero P´erez Jesse H. Poore Stacy Prowell Holger Rendel Axel Rennoch Markus Roggenbach Bernhard Rumpe Ina Schieferdecker Holger Schlingloff Diana Serbanescu Pawel Skruch Paul Strooper Mark Utting Stefan van Baelen Carsten Wegener Stephan Weißleder Martin Wirsing Karsten Wolf Justyna Zander xix This page intentionally left blank Book Introduction Justyna Zander, Ina Schieferdecker, and Pieter J. Mosterman The purpose of this handbook is to provide a broad overview of the current state of modelbased testing (MBT) for embedded systems, including the potential breakthroughs, the challenges, and the achievements observed from numerous perspectives. To attain this objective, the book offers a compilation of 22 high-quality contributions from world-renowned industrial and academic authors. The chapters are grouped into six parts. • The first part comprises the contributions that focus on key test concepts for embedded systems. In particular, a taxonomy of MBT approaches is presented, an assessment of the merit and value of system models and test models is provided, and a selected test framework architecture is proposed. • In the second part, different test automation algorithms are discussed for various types of embedded system representations. • The third part contains contributions on the topic of integration and multilevel testing. Criteria for the derivation of integration entry tests are discussed, an approach for reusing test cases across different development levels is provided, and an X-in-the-Loop testing method and notation are proposed. • The fourth part is composed of contributions that tackle selected challenges of MBT, such as testing software product lines, conformance validation for hybrid systems and nondeterministic systems, and understanding safety-critical components in the passive test context. • The fifth part highlights testing in industry including application areas such as telecommunication networks, smartphones, and automotive systems. • Finally, the sixth part presents solutions for lower-level tests and comprises an approach to validation of automatically generated code, contributions on testing analog components, and verification of SystemC models. To scope the material in this handbook, an embedded system is considered to be a system that is designed to perform a dedicated function, typically with hard real-time constraints, limited resources and dimensions, and low-cost and low-power requirements. It is a combination of computer software and hardware, possibly including additional mechanical, optical, and other parts that are used in the specific role of actuators and sensors (Ganssle and Barr 2003). Embedded software is the software that is part of an embedded system. Embedded systems have become increasingly sophisticated and their software content has grown rapidly in the past decade. Applications now consist of hundreds of thousands or even millions of lines of code. Moreover, the requirements that must be fulfilled while developing embedded software are complex in comparison to standard software. In addition, embedded xxi xxii Book Introduction systems are often produced in large volumes, and the software is difficult to update once the product is deployed. Embedded systems interact with the physical environment, which often requires models that embody both continuous-time and discrete-event behavior. In terms of software development, it is not just the increased product complexity that derives from all those characteristics, but it combines with shortened development cycles and higher customer expectations of quality to underscore the utmost importance of software testing (Sch¨auffele and Zurawka 2006). MBT relates to a process of test generation from various kinds of models by application of a number of sophisticated methods. MBT is usually the automation of black-box testing (Utting and Legeard 2006). Several authors (Utting, Pretschner, and Legeard 2006; Kamga, Herrmann, and Joshi 2007) define MBT as testing in which test cases are derived in their entirety or in part from a model that describes some aspects of the system under test (SUT) based on selected criteria. In addition, authors highlight the need for having dedicated test models to make the most out of MBT (Baker et al. 2007; Schulz, Honkola, and Huima 2007). MBT clearly inherits the complexity of the related domain models. It allows tests to be linked directly to the SUT requirements, makes readability, understandability, and maintainability of tests easier. It helps to ensure a repeatable and scientific basis for testing and has the potential for known coverage of the behaviors of the SUT (Utting 2005). Finally, it is a way to reduce the effort and cost for testing (Pretschner et al. 2005). This book provides an extensive survey and overview of the benefits of MBT in the field of embedded systems. The selected contributions present successful test approaches where different algorithms, methodologies, tools, and techniques result in important cost reduction while assuring the proper quality of embedded systems. Organization This book is organized in the six following parts: (I) Introduction, (II) Automatic Test Generation, (III) Integration and Multilevel Testing, (IV) Specific Approaches, (V) Testing in Industry, and (VI) Testing at the Lower Levels of Development. An overview of each of the parts, along with a brief introduction of the contents of the individual chapters, is presented next. The following figure depicts the organization of the book. III. Integration and multilevel testing IV. Specific approaches V. Testing in industry I. Introduction Model-based development Model-based testing II. Automatic test generation Embedded system specification Test specification Model Test model Code Executable test case VI. Testing at the lower levels of development Book Introduction xxiii Part I. Introduction The chapter “A Taxonomy of Model-Based Testing for Embedded Systems from Multiple Industry Domains” provides a comprehensive overview of MBT techniques using different dimensions and categorization methods. Various kinds of test generation, test evaluation, and test execution methods are described, using examples that are presented throughout this book and in the related literature. In the chapter “Behavioral System Models versus Models of Testing Strategies in Functional Test Generation,” the distinction between diverse types of models is discussed extensively. In particular, models that describe intended system behavior and models that describe testing strategies are considered from both practical as well as theoretical viewpoints. It shows the difficulty of converting the system model into a test model by applying the mental and explicit system model perspectives. Then, the notion of polynomial-time limit on test case generation is included in the reasoning about the creation of tests based on finite-state machines. The chapter “Test Framework Architectures for Model-Based Embedded System Testing” provides reference architectures for building a test framework. The test framework is understood as a platform that runs the test scripts and performs other functions such as, for example, logging test results. It is usually a combination of commercial and purpose-built software. Its design and character are determined by the test execution process, common quality goals that control test harnesses, and testability antipatterns in the SUT that must be accounted for. Part II. Automatic Test Generation The chapter “Automatic Model-Based Test Generation from UML State Machines” presents several approaches for the generation of test suites from UML state machines based on different coverage criteria. The process of abstract path creation and concrete input value generation is extensively discussed using graph traversal algorithms and boundary value analysis. Then, these techniques are related to random testing, evolutionary testing, constraint solving, model checking, and static analysis. The chapter “Automated Statistical Testing for Embedded Systems” applies statistics to solving problems posed by industrial software development. A method of modeling the population of uses is established to reason according to first principles of statistics. The Model Language and Java Usage Model Builder Library is employed for the analysis. Model validation and revision through estimates of long-run use statistics are introduced based on a medical device example while paying attention to test management and process certification. In the chapter “How to Design Extended Finite State Machine Models in Java” extended finite-state machine (EFSM) test models that are represented in the Java programming language are applied to an SUT. ModelJUnit is used for generating the test cases by stochastic algorithms. Then, a methodology for building a MBT tool using Java reflection is proposed. Code coverage metrics are exploited to assess the results of the method, and an example referring to the GSM 11.11 protocol for mobile phones is presented. The chapter “Automatic Testing of Lustre/Scade Programs” addresses the automation of functional test generation using a Lustre-like language in the Lutess V2 tool and refers to the assessment of the created test coverage. The testing methodology includes the definitions of the domain, environment dynamics, scenarios, and an analysis based on safety properties. A program control flow graph for SCADE models allows a family of coverage criteria to assess the effectiveness of the test methods and serves as an additional basis for the test generation algorithm. The proposed approaches are illustrated by a steam-boiler case study. xxiv Book Introduction In the chapter “Test Generation Using Symbolic Animation of Models,” symbolic execution (i.e., animation) of B models based on set-theoretical constraint solvers is applied to generate the test cases. One of the proposed methods focuses on creation of tests that reach specific test targets to satisfy structural coverage, whereas the other is based on manually designed behavioral scenarios and aims at satisfying dynamic test selection criteria. A smartcard case study illustrates the complementarity of the two techniques. Part III. Integration and Multilevel Testing The chapter “Model-Based Integration Testing with Communication Sequence Graphs” introduces a notation for representing the communication between discrete-behavior software components on a meta-level. The models are directed graphs enriched with semantics for integration-level analysis that do not emphasize internal states of the components, but rather focus on events. In this context, test case generation algorithms for unit and integration testing are provided. Test coverage criteria, including mutation analysis, are defined and a robot-control application serves as an illustration. In the chapter “A Model-Based View onto Testing: Criteria for the Derivation of Entry Tests for Integration Testing” components and their integration architecture are modeled early on in development to help structure the integration process. Fundamentals for testing complex systems are formalized. This formalization allows exploiting architecture models to establish criteria that help minimize the entry-level testing of components necessary for successful integration. The tests are derived from a simulation of the subsystems and reflect behaviors that usually are verified at integration time. Providing criteria to enable shifting effort from integration testing to component entry tests illustrates the value of the method. In the chapter “Multilevel Testing for Embedded Systems,” the means for a smooth integration of multiple test levels artifacts based on a continuous reuse of test models and test cases are provided. The proposed methodology comprises the creation of an invariant test model core and a test-level specific test adapter model that represents a varying component. Numerous strategies to obtain the adapter model are introduced. The entire approach results in an increased optimization of the design effort across selected functional abstraction levels and allows for the easier traceability of the test constituents. A case study from the automotive domain (i.e., automated light control) illustrates the feasibility of the solution. The chapter “Model-Based X-in-the-Loop Testing” provides a methodology for technology-independent specification and systematic reuse of testing artifacts for closedloop testing across different development stages. Simulink r -based environmental models are coupled with a generic test specification designed in the notation called TTCN-3 embedded. It includes a dedicated means for specifying the stimulation of an SUT and assessing its reaction. The notions of time and sampling, streams, stream ports, and stream variables are paid specific attention as well as the definition of statements to model a control flow structure akin to hybrid automata. In addition, an overall test architecture for the approach is presented. Several examples from the automotive domain illustrate the vertical and horizontal reuse of test artifacts. The test quality is discussed as well. Part IV. Specific Approaches The chapter “A Survey of Model-Based Software Product Lines Testing” presents an overview of the testing that is necessary in software product line engineering methods. Such methods aim at improving reusability of software within a range of products sharing a common set of features. First, the requirements and a conception of MBT for software product lines are introduced. Then, the state of the art is provided and the solutions are compared to each other based on selected criteria. Finally, open research objectives are outlined and recommendations for the software industry are provided. Book Introduction xxv The chapter “Model-Based Testing of Hybrid Systems” describes a formal framework for conformance testing of hybrid automaton models and their adequate test generation algorithms. Methods from computer science and control theory are applied to reason about the quality of a system. An easily computable coverage measure is introduced that refers to testing properties such as safety and reachability based on the equal-distribution degree of a set of states over their state space. The distribution degree can be used to guide the test generation process, while the test creation is based on the rapidly exploring random tree algorithm (Lavalle 1998) that represents a probabilistic motion planning technique in robotics. The results are then explored in the domain of analog and mixed signal circuits. The chapter “Reactive Testing of Nondeterministic Systems by Test Purpose Directed Tester” provides a model-based construction of an online tester for black-box testing. The notation of nondeterministic EFSM is applied to formalize the test model. The synthesis algorithm allows for selecting a suboptimal test path at run time by finding the shortest path to cover the test purpose. The rules enabling an implementation of online reactive planning are included. Coverage criteria are discussed as well, and the approach is compared with related algorithms. A feeder-box controller of a city lighting system illustrates the feasibility of the solution. The chapter “Model-Based Passive Testing of Safety-Critical Components” provides a set of passive-testing techniques in a manner that is driven by multiple examples. First, general principles of the approach to passive quality assurance are discussed. Then, complex software systems, network security, and hardware systems are considered as the targeted domains. Next, a step-by-step illustrative example for applying the proposed analysis to a concurrent system designed in the form of a cellular automaton is introduced. As passive testing usually takes place after the deployment of a unit, the ability of a component to monitor and self-test in operation is discussed. The benefits and limitations of the presented approaches are described as well. Part V. Testing in Industry The chapter “Applying Model-Based Testing in the Telecommunication Domain” refers to testing practices at Nokia Siemens Networks at the industrial level and explains the state of MBT in the trenches. The presented methodology uses a behavioral system model designed in UML and SysML for generating the test cases. The applied process, model development, validation, and transformation aspects are extensively described. Technologies such as the MATERA framework (Abbors, Ba¨cklund, and Truscan 2010), UML to QML transformation, and OCL guideline checking are discussed. Also, test generation, test execution aspects (e.g., load testing, concurrency, and run-time executability), and the traceability of all artifacts are discussed. The case study illustrates testing the functionality of a Mobile Services Switching Center Server, a network element using offline testing. The chapter “Model-Based GUI Testing of Smartphone Applications: Case S60TM and Linux r ” discusses application of MBT along two case studies. The first one considers builtin applications in a smartphone model S60, and the second tackles the problem of a media player application in a variant of mobile Linux. Experiences in modeling and adapter development are provided and potential problems (e.g., expedient pace of product creation) are reported in industrial deployment of the technology for graphical user interface (GUI) testing of smartphone applications. In this context, the TEMA toolset (Ja¨¨askel¨ainen 2009) designed for test modeling, test generation, keyword execution, and test debugging is presented. The benefits and business aspects of the process adaptation are also briefly considered. The chapter “Model-Based Testing in Embedded Automotive Systems” provides a broad overview of MBT techniques applied in the automotive domain based on experiences from Delphi Technical Center, Krako´w (Poland). Key automotive domain concepts specific to xxvi Book Introduction MBT are presented as well as everyday engineering issues related to MBT process deployment in the context of the system-level functional testing. Examples illustrate the applicability of the techniques for industrial-scale mainstream production projects. In addition, the limitations of the approaches are outlined. Part VI. Testing at the Lower Levels of Development The chapter “Testing-Based Translation Validation of Generated Code” provides an approach for model-to-code translation that is followed by a validation phase to verify the target code produced during this translation. Systematic model-level testing is supplemented by testing for numerical equivalence between models and generated code. The methodology follows the objectives and requirements of safety standards such as IEC 61508 and ISO 26262 and is illustrated using a Simulink-based code generation tool chain. The chapter “Model-Based Testing of Analog Embedded Systems Components” addresses the problem of determining whether an analog system meets its specification as given either by a model of correct behavior (i.e., the system model) or of incorrect behavior (i.e., a fault model). The analog model-based test follows a two-phase process. First, a pretesting phase including system selection, fault model selection, excitation design, and simulation of fault models is presented. Next, an actual testing phase comprising measurement, system identification, behavioral simulation, and reasoning about the faults is extensively described. Examples are provided while benefits, limitations, and open questions in applying analog MBT are included. The chapter “Dynamic Verification of SystemC Transactional Models” presents a solution for verifying logic and temporal properties of communication in transaction-level modeling designs from simulation. To this end, a brief overview on SystemC is provided. Issues related to globally asynchronous/locally synchronous, multiclocked systems, and auxiliary variables are considered in the approach. Target Audience The objective of this book is to be accessible to engineers, analysts, and computer scientists involved in the analysis and development of embedded systems, software, and their quality assurance. It is intended for both industry-related professionals and academic experts, in particular those interested in verification, validation, and testing. The most important objectives of this book are to help the reader understand how to use Model-Based Testing and test harness to a maximum extent. Various perspectives serve to: - Get an overview on MBT and its constituents; - Understand the MBT concepts, methods, approaches, and tools; - Know how to choose modeling approaches fitting the customers’ needs; - Be able to select appropriate test generation strategies; - Learn about successful applications of MBT; - Get to know best practices of MBT; and - See prospects of further developments in MBT. Book Introduction xxvii References Abbors, F., Ba¨cklund, A., and Truscan, D. (2010). MATERA—An integrated framework for model-based testing. In Proceedings of the 17th IEEE International Conference and Workshop on Engineering of Computer-Based Systems (ECBS 2010), Pages: 321–328. IEEE Computer Society’s Conference Publishing Services (CPS). Baker, P., Ru Dai, Z., Grabowski, J., Haugen, O., Schieferdecker, I., and Williams, C. (2007). Model-Driven Testing, Using the UML Testing Profile. ISBN 9783-5407-2562-6, Springer Verlag. Ganssle, J. and Barr, M. (2003). Embedded Systems Dictionary, ISBN-10: 1578201209, ISBN-13: 978-1578201204, 256 pages. J¨a¨askel¨ainen, A., Katara, M., Kervinen, A., Maunumaa, M., Pa¨¨akk¨onen, T., Takala, T., and Virtanen, H. (2009). Automatic GUI test generation for smartphone applications— an evaluation. Proceedings of the Software Engineering in Practice track of the 31st International Conference on Software Engineering (ICSE 2009), pp. 112–122. IEEE Computer Society (companion volume). Kamga, J., Herrmann, J., and Joshi, P. (2007). Deliverable: D-MINT automotive case study—Daimler, Deliverable 1.1, Deployment of Model-Based Technologies to Industrial Testing, ITEA2 Project. Lavalle, S.M. (1998). Rapidly-Exploring Random Trees: A New Tool for Path Planning. Computer Science Dept, Iowa State University, Technical Report 98–11. http://citeseer.ist.psu.edu/311812.html. Pretschner, A., Prenninger, W., Wagner, S., Ku¨hnel, C., Baumgartner, M., Sostawa, B., Z¨olch, R., and Stauner, T. (2005). One evaluation of model-based testing and its automation. In Proceedings of the 27th International Conference on Software Engineering, St. Louis, MO, Pages: 392–401, ISBN: 1-59593-963-2. ACM New York. Sch¨auffele, J. and Zurawka, T. (2006). Automotive Software Engineering, ISBN: 3528110406. Vieweg. Schulz, S., Honkola, J., and Huima, A. (2007). Towards model-based testing with architecture models. In Proceedings of the 14th Annual IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS ’07). IEEE Computer Society, Washington, DC, Pages: 495–502. DOI=10.1109/ECBS.2007.73 http://dx.doi.org/10.1109/ECBS.2007.73. Utting, M. (2005). Model-based testing. In Proceedings of the Workshop on Verified Software: Theory, Tools, and Experiments VSTTE 2005. Utting, M. and Legeard, B. (2006). Practical Model-Based Testing: A Tools Approach. ISBN-13: 9780123725011. Elsevier Science & Technology Books. Utting, M., Pretschner, A., and Legeard, B. (2006). A Taxonomy of Model-Based Testing, ISSN: 1170-487X. This page intentionally left blank Part I Introduction This page intentionally left blank 1 A Taxonomy of Model-Based Testing for Embedded Systems from Multiple Industry Domains Justyna Zander, Ina Schieferdecker, and Pieter J. Mosterman CONTENTS 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Definition of Model-Based Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Test dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1.1 Test goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1.2 Test scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1.3 Test abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Taxonomy of Model-Based Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 Test generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.2.1 Test selection criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.2.2 Test generation technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.2.3 Result of the generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.3 Test execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.4 Test evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3.4.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3.4.2 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.1 Introduction This chapter provides a taxonomy of Model-Based Testing (MBT) based on the approaches that are presented throughout this book as well as in the related literature. The techniques for testing are categorized using a number of dimensions to familiarize the reader with the terminology used throughout the chapters that follow. In this chapter, after a brief introduction, a general definition of MBT and related work on available MBT surveys is provided. Next, the various test dimensions are presented. Subsequently, an extensive taxonomy is proposed that classifies the MBT process according to the MBT foundation (referred to as MBT basis), definition of various test generation techniques, consideration of test execution methods, and the specification of test evaluation. The taxonomy is an extension of previous work by Zander and Schieferdecker (2009) and it is based on contributions of Utting, Pretschner, and Legeard (2006). A summary concludes the chapter with the purpose of encouraging the reader to further study the contributions of the collected chapters in this book and the specific aspects of MBT that they address in detail. 3 4 Model-Based Testing for Embedded Systems 1.2 Definition of Model-Based Testing This section provides a brief survey of the selected definitions of MBT available in the literature. Next, certain aspects of MBT are highlighted in the discussion on test dimensions and their categorization is illustrated. MBT relates to a process of test generation from models of/related to a system under test (SUT) by applying a number of sophisticated methods. The basic idea of MBT is that instead of creating test cases manually, a selected algorithm is generating them automatically from a model. MBT usually comprises the automation of black-box test design (Utting and Legeard 2006), however recently it has been used to automate white-box tests as well. Several authors such as Utting (2005) and Kamga, Hermann, and Joshi (2007) define MBT as testing in which test cases are derived in their entirety or in part from a model that describes some aspects of the SUT based on selected criteria. Utting, Pretschner, and Legeard (2006) elaborate that MBT inherits the complexity of the domain or, more specifically, of the related domain models. Dai (2006) refers to MBT as model-driven testing (MDT) because of the context of the model-driven architecture (MDA) (OMG 2003) in which MBT is proposed. Advantages of MBT are that it allows tests to be linked directly to the SUT requirements, which renders readability, understandability, and maintainability of tests easier. It helps ensure a repeatable and scientific basis for testing. Furthermore, MBT has been shown to provide good coverage of all the behaviors of the SUT (Utting 2005) and to reduce the effort and cost for testing (Pretschner et al. 2005). The term MBT is widely used today with subtle differences in its meaning. Surveys on different MBT approaches are provided by Broy et al. (2005), Utting, Pretschner, and Legeard (2006), and the D-Mint Project (2008), and Schieferdecker et al. (2011). In the automotive industry, MBT describes all testing activities in the context of Model-Based Design (MBD), as discussed for example, by Conrad, Fey, and Sadeghipour (2004) and Lehmann and Kra¨mer (2008). Rau (2002), Lamberg et al. (2004), and Conrad (2004a, 2004b) define MBT as a test process that encompasses a combination of different test methods that utilize the executable model in MBD as a source of information. As a single testing technique is insufficient to achieve a desired level of test coverage, different test methods are usually combined to complement each other across all the specified test dimensions (e.g., functional and structural testing techniques are frequently applied together). If sufficient test coverage has been achieved on the model level, properly designed test cases can be reused for testing the software created based on or generated from the models within the framework of backto-back tests as proposed by Wiesbrock, Conrad, and Fey (2002). With this practice, the functional equivalence between the specification, executable model, and code can be verified and validated (Conrad, Fey, and Sadeghipour 2004). The most generic definition of MBT is testing in which the test specification is derived in its entirety or in part from both the system requirements and a model that describe selected functional and nonfunctional aspects of the SUT. The test specification can take the form of a model, executable model, script, or computer program code. The resulting test specification is intended to ultimately be executed together with the SUT so as to provide the test results. The SUT again can exist in the form of a model, code, or even hardware. For example, in Conrad (2004b) and Conrad, Fey, and Sadeghipour (2004), no additional test models are created, but the already existing functional system models are utilized for test purposes. In the test approach proposed by Zander-Nowicka (2009), the system models are exploited as well. In addition, however, a test specification model (also Taxonomy of MBT for Embedded Systems 5 called test case specification, test model, or test design in the literature (Pretschner 2003b, Zander et al. 2005, and Dai 2006) is created semi-automatically. Concrete test data variants are then automatically derived from this test specification model. The application of MBT is as proliferate as the interest in building embedded systems. For example, case studies borrowed from such widely varying domains as medicine, automotive, control engineering, telecommunication, entertainment, or aerospace can be found in this book. MBT then appears as part of specific techniques that are proposed for testing a medical device, the GSM 11.11 protocol for mobile phones, a smartphone graphical user interface (GUI), a steam boiler, smartcard, a robot-control application, a kitchen toaster, automated light control, analog- and mixed-signal electrical circuits, a feeder-box controller of a city lighting system, and other complex software systems. 1.2.1 Test dimensions Tests can be classified depending on the characteristics of the SUT and the test system. In this book, such SUT features comprise, for example, safety-critical properties, deterministic and nondeterministic behavior, load and performance, analog characteristics, network-related, and user-friendliness qualities. Furthermore, systems that exhibit behavior of a discrete, continuous, or hybrid nature are analyzed in this book. The modeling paradigms for capturing a model of the SUT and tests combine different approaches, such as history-based, functional data flow combined with transition-based semantics. As it is next to impossible for one single classification scheme to successfully apply to such a wide range of attributes, selected dimensions have been introduced in previous work to isolate certain aspects. For example, Neukirchen (2004) aims at testing communication systems and categorizes testing in the dimensions of test goals, test scope, and test distribution. Dai (2006) replaces the test distribution by a dimension describing the different test development phases, since she is testing both local and distributed systems. Zander-Nowicka (2009) refers to test goals, test abstraction, test execution platforms, test reactiveness, and test scope in the context of embedded automotive systems. In the following, the specifics related to test goal, test scope, and test abstraction (see Figure 1.1) are introduced to provide a basis for a common vocabulary, simplicity, and a better understanding of the concepts discussed in the rest of this book. Test goal Nonfunctional Test scope System Functional Structural Dynamic Integration Component Static FIGURE 1.1 Selected test dimensions. Abstract Nonabstract Test abstraction 6 Model-Based Testing for Embedded Systems 1.2.1.1 Test goal During software development, systems are tested with different purposes (i.e., goals). These goals can be categorized as static testing, also called review, and dynamic testing, where the latter is based on test execution and further distinguishes between structural, functional, and nonfunctional testing. After the review phase, the test goal is usually to check the functional behavior of the system. Nonfunctional tests appear in later development stages. • Static test: Testing is often defined as the process of finding errors, failures, and faults. Errors in a program can be revealed without execution by just examining its source code (International Software Testing Qualification Board 2006). Similarly, other development artifacts can be reviewed (e.g., requirements, models, or the test specification itself). • Structural test: Structural tests cover the structure of the SUT during test execution (e.g., control or data flow), and so the internal structure of the system (e.g., code or model) must be known. As such, structural tests are also called white-box or glass-box tests (Myers 1979; International Software Testing Qualification Board 2006). • Functional test: Functional testing is concerned with assessing the functional behavior of an SUT against the functional requirements. In contrast to structural tests, functional tests do not require any knowledge about system internals. They are therefore called black-box tests (Beizer 1995). A systematic, planned, executed, and documented procedure is desirable to make them successful. In this category, functional safety tests to determine the safety of a software product are also included. • Nonfunctional test: Similar to functional tests, nonfunctional tests (also called extrafunctional tests) are performed against a requirements specification of the system. In contrast to pure functional testing, nonfunctional testing aims at assessing nonfunctional requirements such as reliability, load, and performance. Nonfunctional tests are usually black-box tests. Nevertheless, internal access during test execution is required for retrieving certain information, such as the state of the internal clock. For example, during a robustness test, the system is tested with invalid input data that are outside the permitted ranges to check whether the system is still safe and operates properly. 1.2.1.2 Test scope Test scopes describe the granularity of the SUT. Because of the composition of the system, tests at different scopes may reveal different failures (Weyuker 1988; International Software Testing Qualification Board 2006; and D-Mint Project 2008). This leads to the following order in which tests are usually performed: • Component: At the scope of component testing (also referred to as unit testing), the smallest testable component (e.g., a class in an object-oriented implementation or a single electronic control unit [ECU]) is tested in isolation. • Integration: The scope of integration test combines components with each other and tests those as a subsystem, that is, not yet a complete system. It exposes defects in the interfaces and in the interactions between integrated components or subsystems (International Software Testing Qualification Board 2006). • System: In a system test, the complete system, including all subsystems, is tested. Note that a complex embedded system is usually distributed with the single subsystems Taxonomy of MBT for Embedded Systems 7 connected, for example, via buses using different data types and interfaces through which the system can be accessed for testing (Hetzel 1988). 1.2.1.3 Test abstraction As far as the abstraction level of the test specification is considered, the higher the abstraction, the better test understandability, readability, and reusability are observed. However, the specified test cases must be executable at the same time. Also, the abstraction level should not affect the test execution in a negative way. An interesting and promising approach to address the effect of abstraction on execution behavior is provided by Mosterman et al. (2009, 2011) and Zander et al. (2011) in the context of complex system development. In their approach, the error introduced by a computational approximation of the execution is accepted as an inherent system artifact as early as the abstract development stages. The benefit of this approach is that it allows eliminating the accidental complexity of the code that makes the abstract design executable while enabling high-level analysis and synthesis methods. A critical enabling element is a high-level declarative specification of the execution logic so that its computational approximation becomes explicit. Because it is explicit and declarative, the approximation can then be consistently preserved throughout the design stages. This approach holds for test development as well. Whenever the abstract test suites are executed, they can be refined with the necessary concrete analysis and synthesis mechanisms. 1.3 Taxonomy of Model-Based Testing In Utting, Pretschner, and Legeard (2006), a broad taxonomy for MBT is presented. Here, three general classes are identified: model, test generation, and test execution. Each of the classes is divided into further categories. The model class consists of subject, independence, characteristics, and paradigm categories. The test generation class consists of test selection criteria and technology categories. The test execution class contains execution options. Zander-Nowicka (2009) completes the overall view with test evaluation as an additional class. Test evaluation refers to comparing the actual SUT outputs with the expected SUT behavior based on a test oracle. Such a test oracle enables a decision to be made as to whether the actual SUT outputs are correct. The test evaluation is divided into two categories: specification and technology. Furthermore, in this chapter, the test generation class is extended with an additional category called result of the generation. Also, the semantics of the class model is different in this taxonomy than in its previous incarnations. Here, a category called MBT basis indicates what specific element of the software engineering process is the basis for MBT process. An overview of the resulting MBT taxonomy is illustrated in Figure 1.2. All the categories in the presented taxonomy are decomposed into further elements that influence each other within or between categories. The “A/B/C” notation at the leaves indicates mutually exclusive options. In the following three subsections, the categories and options in each of the classes of the MBT taxonomy are explained in depth. The descriptions of the most important options are endowed with examples of their realization. 8 Model-Based Testing for Embedded Systems Classes: Categories: Options: Model MBT basis System model Test model Coupled system model and test model Properties Test selection criteria Test generation Technology Test execution Result of the generation Execution options Test evaluation Specification Technology + Mutation-analysis based Structural model coverage Data coverage Requirements coverage Test case specification Random and stochastic Fault-based Automatic/manual Random generation Graph search algorithm Model checking Symbolic execution Theorem proving Online/offline Executable test models Executable test scripts Executable code MiL / SiL / HiL / PiL (simulation) Reactive/nonreactive Generating test logs Reference signal-feature based Reference signal based Requirements coverage Test evaluation specification Automatic/manual Online/offline FIGURE 1.2 Overview of the taxonomy for Model-Based Testing. 1.3.1 Model The models applied in the MBT process can include both system-specific and test-specific development artifacts. Frequently, the software engineering practice for a selected project determines the basis for incorporating the testing into the process and thus, selecting the MBT type. In the following, selected theoretical viewpoints are introduced and join points between them are discussed. To specify the system and the test development, the methods that are presented in this book employ a broad spectrum of notations such as Finite State Machines (FSM) (e.g., Chapter 2), Unified Modeling Language (UML r ) (e.g., state machines, use cases), UML Testing Profile (UTP) (see OMG 2003, 2005), SysML (e.g., Chapter 4), The Model Language (e.g., Chapter 5), Extended FSM, Labeled State Transition System notation, Java (e.g., Chapter 6), Lustre, SCADE r (e.g., Chapter 7), B-Notation (e.g., Chapter 8), Communication Sequence Graphs (e.g., Chapter 9), Testing and Test Control Notation, version 3 (TTCN-3) (see ETSI 2007), TTCN-3 embedded (e.g., Chapter 12), Transaction Level Models, Property Specification Language, SystemC (e.g., Chapter 22), Simulink r (e.g., Chapter 12, Chapter 19, or Chapter 20), and so on. Taxonomy of MBT for Embedded Systems 9 Model-Based Testing basis In the following, selected options referred to as the MBT basis are listed and their meaning is described. • System model : A system model is an abstract representation of certain aspects of the SUT. A typical application of the system model in the MBT process leverages its behavioral description for derivation of tests. Although this concept has been extensively described in previous work (Conrad 2004a; Utting 2005), another instance of using a system model for testing is the approach called architecture-driven testing (ADT) introduced by Din and Engel (2009). It is a technique to derive tests from architecture viewpoints. An architecture viewpoint is a simplified representation of the system model with respect to the structure of the system from a specific perspective. The architecture viewpoints not only concentrate on a particular aspect but also allow for the combination of the aspects, relations, and various models of system components, thereby providing a unifying solution. The perspectives considered in ADT include a functional view, logical view, technical view, and topological view. They enable the identification of test procedures and failures on certain levels of detail that would not be recognized otherwise. • Test model: If the test cases are derived directly from an abstract test model and are decoupled from the system model, then such a test model is considered to constitute the MBT basis. In practice, such a method is rarely applied as it requires substantial effort to introduce a completely new test model. Instead, the coupled system and test model approach is used. • Coupled system and test model: UTP plays an essential role for the alignment of system development methods together with testing. It introduces abstraction as a test artifact and counts as a primary standard in this alignment. UTP is utilized as the test modeling language before test code is generated from a test model. Though, this presupposes that an adequate system model already exists and will be leveraged during the entire test process (Dai 2006). As a result, system models and test models are developed in concert in a coupled process. UTP addresses concepts, such as test suites, test cases, test configuration, test component, and test results, and enables the specification of different types of testing, such as functional, interoperability, scalability, and even load testing. Another instantiation of such a coupled technique is introduced in the Model-in-theLoop for Embedded System Test (MiLEST) approach (Zander-Nowicka 2009) where Simulink system models are coupled with additionally generated Simulink-based test models. MiLEST is a test specification framework that includes reusable test patterns, generic graphical validation functions, test data generators, test control algorithms, and an arbitration mechanism all collected in a dedicated library. The application of the same modeling language for both system and test design brings about positive effects as it ensures that the method is more transparent and it does not force the engineers to learn a completely new language. A more extensive illustration of the challenge to select a proper MBT basis is provided in Chapter 2 of this book. 1.3.2 Test generation The process of test generation starts from the system requirements, taking into account the test objectives. It is defined in a given test context and results in the creation of test cases. A number of approaches exist depending on the test selection criteria, generation technology, and the expected generation results. They are reviewed next. 10 Model-Based Testing for Embedded Systems 1.3.2.1 Test selection criteria Test selection criteria define the facilities that are used to control the generation of tests. They help specify the tests and do not depend on the SUT code. In the following, the most commonly used criteria are investigated. Clearly, different test methods should be combined to complement one another so as to achieve the best test coverage. Hence, there is no best suitable solution for generating the test specification. Subsequently, the test selection criteria are described in detail. • Mutation-analysis based: Mutation analysis consists of introducing a small syntactic change in the source of a model or program in order to produce a mutant (e.g., replacing one operator by another or altering the value of a constant). Then, the mutant behavior is compared to the original. If a difference can be observed, the mutant is marked as killed. Otherwise, it is called equivalent. The original aim of the mutation analysis is the evaluation of a test data applied in the test case. Thus, it can be applied as a foundational technique for test generation. One of the approaches to mutation analysis is described in Chapter 9 of this book. • Structural model coverage criteria: These exploit the structure of the model to select the test cases. They deal with coverage of the control-flow through the model, based on ideas from the flow of control in computer program code. Previous work (Pretschner 2003) has shown how test cases can be generated that satisfy the modified condition/decision coverage (MC/DC) coverage criterion. The idea is to first generate a set of test case specifications that enforce certain variable valuations and then generate test cases for them. Similarly, safety test builder (STB) (GeenSoft 2010a) or Reactis Tester (Reactive Systems 2010; Sims and DuVarney 2007) generate test sequences covering a set of Stateflow r test objectives (e.g., transitions, states, junctions, actions, MC/DC coverage) and a set of Simulink test objectives (e.g., Boolean flow, look-up tables, conditional subsystems coverage). • Data coverage criteria: The idea is to decompose the data range into equivalence classes and select one representative value from each class. This partitioning is usually complemented by a boundary value analysis (Kosmatov et al. 2004), where the critical limits of the data ranges or boundaries determined by constraints are selected in addition to the representative values. An example is the MATLAB r Automated Testing Tool (MATT 2008) that enables black-box testing of Simulink models and code generated from them by Real-Time Workshop r (Real-Time Workshop 2011). MATT furthermore enables the creation of custom test data for model simulations by setting the types of test data for each input. Additionally, accuracy, constant, minimum, and maximum values can be provided to generate the test data matrix. Another realization of this criterion is provided by Classification Tree Editor for Embedded Systems (CTE/ES) implementing the Classification Tree Method (Grochtmann and Grimm 1993; Conrad 2004a). The SUT inputs form the classifications in the roots of the tree. From here, the input ranges are divided into classes according to the equivalence partitioning method. The test cases are specified by selecting leaves of the tree in the combination table. A row in the table specifies a test case. CTE/ES provides a way of finding test cases systematically by decomposing the test scenario design process into steps. Visualization of the test scenario is supported by a GUI. Taxonomy of MBT for Embedded Systems 11 • Requirements coverage criteria: These criteria aim at covering all informal SUT requirements. Traceability of the SUT requirements to the system or test model/code aids in the realization of this criterion. It is targeted by almost every test approach (ZanderNowicka 2009). • Test case definition: When a test engineer defines a test case specification in some formal notation, the test objectives can be used to determine which tests will be generated by an explicit decision and which set of test objectives should be covered. The notation used to express these objectives may be the same as the notation used for the model (Utting, Pretschner, and Legeard 2006). Notations commonly used for test objectives include FSMs, UTP, regular expressions, temporal logic formulas, constraints, and Markov chains (for expressing intended usage patterns). A prominent example of applying this criterion is described by Dai (2006), where the test case specifications are derived from UML models and transformed into executable tests in TTCN-3 by using MDA methods (Zander et al. 2005). The work of Pretschner et al. (2004) is also based on applying this criterion (see symbolic execution). • Random and stochastic criteria: These are mostly applicable to environment models because it is the environment that determines the usage patterns of the SUT. A typical approach is to use a Markov chain to specify the expected SUT usage profile. Another example is to use a statistical usage model in addition to the behavioral model of the SUT (Carter, Lin, and Poore 2008). The statistical model acts as the selection criterion and chooses the paths, while the behavioral model is used to generate the oracle for those paths. As an example, Markov Test Logic (MaTeLo) (All4Tec 2010) can generate test suites according to several algorithms. Each of them optimizes the test effort according to objectives such as boundary values, functional coverage, and reliability level. Test cases are generated in XML/HTML format for manual execution or in TTCN-3 for automatic execution (Dulz and Fenhua 2003). Another instance, Java Usage Model Builder Library (JUMBL) (Software Quality Research Laboratory 2010) (cf. Chapter 5) can generate test cases as a collection of test cases that cover the model with minimum cost, by random sampling with replacement, by interleaving the events of other test cases, or in order by probability. An interactive test case editor supports creating test cases by hand. • Fault-based criteria: These rely on knowledge of typically occurring faults, often captured in the form of a fault model. 1.3.2.2 Test generation technology One of the most appealing characteristics of MBT is its potential for automation. The automated generation of test cases usually necessitates the existence of some form of test case specifications. In the proceeding paragraphs, different technologies applied to test generation are discussed. • Automatic/Manual technology: Automatic test generation refers to the situation where, based on given criteria, the test cases are generated automatically from an information source. Manual test generation refers to the situation where the test cases are produced by hand. 12 Model-Based Testing for Embedded Systems • Random generation: Random generation of tests is performed by sampling the input space of a system. It is straightforward to implement, but it takes an undefined period of time to reach a certain satisfying level of model coverage as Gutjahr (1999) reports. • Graph search algorithms: Dedicated graph search algorithms include node or arc coverage algorithms such as the Chinese Postman algorithm that covers each arc at least once. For transition-based models, which use explicit graphs containing nodes and arcs, there are many graph coverage criteria that can be used to control test generation. The commonly used are all nodes, all transitions, all transition pairs, and all cycles. The method is exemplified by Lee and Yannakakis (1994), which specifically addresses structural coverage of FSM models. • Model checking: Model checking is a technology for verifying or falsifying properties of a system. A property typically expresses an unwanted situation. The model checker verifies whether this situation is reachable or not. It can yield counterexamples when a property is falsified. If no counterexample is found, then the property is proven and the situation can never be reached. Such a mechanism is implemented in safety checker blockset (GeenSoft 2010b) or in EmbeddedValidator (BTC Embedded Systems AG 2010). The general idea of test case generation with model checkers is to first formulate test case specifications as reachability properties, for example, “eventually, a certain state is reached or a certain transition fires.” A model checker then yields traces that reach the given state or that eventually make the transition fire. Wieczorek et al. (2009) present an approach to use Model Checking for the generation of Integration Tests from Choreography Models. Other variants use mutations of models or properties to generate test suites. • Symbolic execution: The idea of symbolic execution is to run an executable model not with single input values but with sets of input values instead (Marre and Arnould 2000). These are represented as constraints. With this practice, symbolic traces are generated. By instantiation of these traces with concrete values, the test cases are derived. Symbolic execution is guided by test case specifications. These are given as explicit constraints and symbolic execution may be performed randomly by respecting these constraints. Pretschner (2003) presents an approach to test case generation with symbolic execution built on the foundations of constraint logic programming. Pretschner (2003a and 2003b) concludes that test case generation for both functional and structural test case specifications reduces to finding states in the state space of the SUT model. The aim of symbolic execution of a model is then to find a trace that represents a test case that leads to the specified state. • Theorem proving: Usually theorem provers are employed to check the satisfiability of formulas that directly occur in the models. One variant is similar to the use of model checkers where a theorem prover replaces the model checker. For example, one of the techniques applied in Simulink r Design VerifierTM (The MathWorks r , Inc.) uses mathematical procedures to search through the possible execution paths of the model so as to find test cases and counterexamples. • Online/offline generation technology: With online test generation, algorithms can react to the actual outputs of the SUT during the test execution. This idea is exploited for implementing reactive tests as well. Offline testing generates test cases before they are run. A set of test cases is generated once and can be executed many times. Also, the test generation and test execution can Taxonomy of MBT for Embedded Systems 13 be performed on different machines, at different levels of abstractions, and in different environments. If the test generation process is slower than test execution, then there are obvious advantages to minimizing the number of times tests are generated (preferably only once). 1.3.2.3 Result of the generation Test generation usually results in a set of test cases that form test suites. The test cases are expected to ultimately become executable to allow for observation of meaningful verdicts from the entire validation process. Therefore, in the following, the produced test cases are described from the execution point of view, and so they can be represented in different forms, such as test scripts, test models, or code. These are described next. • Executable test models: Similarly, the created test models (i.e., test designs) should be executable. The execution engine underlying the test modeling semantics is the indicator of the character of the test design and its properties (cf. the discussion given in, e.g., Chapter 11). • Executable test scripts: The test scripts refer to the physical description of a test case (cf., e.g., Chapter 2). They are represented by a test script language that has to then be translated to the executables (cf. TTCN-3 execution). • Executable code: The code is the lowest-level representation of a test case in terms of the technology that is applied to execute the tests (cf. the discussion given in, e.g., Chapter 6). Ultimately, every other form of a test case is transformed to a code in a selected programming language. 1.3.3 Test execution In the following, for clarity reasons, the analysis of the test execution is limited to the domain of engineered systems. An example application in the automotive domain is recalled in the next paragraphs. Chapters 11, 12, and 19 provide further background and detail to the material in this subsection. Execution options In this chapter, execution options refer to the execution of a test. The test execution is managed by so-called test platforms. The purpose of the test platform is to stimulate the test object (i.e., SUT) with inputs and to observe and analyze the outputs of the SUT. In the automotive domain, the test platform is typically represented by a car with a test driver. The test driver determines the inputs of the SUT by driving scenarios and observes the reaction of the vehicle. Observations are supported by special diagnosis and measurement hardware/software that records the test data during the test drive and that allows the behavior to be analyzed offline. An appropriate test platform must be chosen depending on the test object, the test purpose, and the necessary test environment. In the proceeding paragraphs, the execution options are elaborated more extensively. • Model-in-the-Loop (MiL): The first integration level, MiL, is based on a behavioral model of the system itself. Testing at the MiL level employs a functional model or implementation model of the SUT that is tested in an open loop (i.e., without a plant model) or closed loop (i.e., with a plant model and so without physical hardware) (Sch¨auffele and Zurawka 2006; Kamga, Herrman, and Joshi 2007; Lehmann and Kra¨mer 2008). The test purpose is prevailingly functional testing in early development phases in simulation environments such as Simulink. 14 Model-Based Testing for Embedded Systems • Software-in-the-Loop (SiL): During SiL, the SUT is software tested in a closed-loop or open-loop configuration. The software components under test are typically implemented in C and are either handwritten or generated by code generators based on implementation models. The test purpose in SiL is mainly functional testing (Kamga, Herrmann, and Joshi 2007). If the software is built for a fixed-point architecture, the required scaling is already part of the software. • Processor-in-the-Loop (PiL): In PiL, embedded controllers are integrated into embedded devices with proprietary hardware (i.e., ECU). Testing on the PiL level is similar to SiL tests, but the embedded software runs on a target board with the target processor or on a target processor emulator. Tests on the PiL level are important because they can reveal faults that are caused by the target compiler or by the processor architecture. It is the last integration level that allows debugging during tests in an inexpensive and manageable manner (Lehmann and Kra¨mer 2008). Therefore, the effort spent by PiL testing is worthwhile in most any case. • Hardware-in-the-Loop (HiL): When testing the embedded system on the HiL level, the software runs on the target ECU. However, the environment around the ECU is still simulated. ECU and environment interact via the digital and analog electrical connectors of the ECU. The objective of testing on the HiL level is to reveal faults in the low-level services of the ECU and in the I/O services (Sch¨auffele and Zurawka 2006). Additionally, acceptance tests of components delivered by the supplier are executed on the HiL level because the component itself is the integrated ECU (Kamga, Herrmann, and Joshi 2007). HiL testing requires real-time behavior of the environment model to ensure that the communication with the ECU is the same as in the real application. • Vehicle: The ultimate integration level is the vehicle itself. The target ECU operates in the physical vehicle, which can either be a sample or be a vehicle from the production line. However, these tests are expensive, and, therefore, performed only in the late development phases. Moreover, configuration parameters cannot be varied arbitrarily (Lehmann and Kra¨mer 2008), hardware faults are difficult to trigger, and the reaction of the SUT is often difficult to observe because internal signals are no longer accessible (Kamga, Herrmann, and Joshi 2007). For these reasons, the number of in-vehicle tests decreases as MBT increases. In the following, the execution options from the perspective of test reactiveness are discussed. Reactive testing and the related work on the reactive/nonreactive are reviewed. Some considerations on this subject are covered in more detail in Chapter 15. • Reactive/Nonreactive execution: Reactive tests are tests that apply any signal or data derived from the SUT outputs or test system itself to influence the signals fed into the SUT. As a consequence, the execution of reactive test cases varies depending on the SUT behavior. This contrasts with the nonreactive test execution where the SUT does not influence the test at all. Reactive tests can be implemented in, for example, AutomationDesk (dSPACE GmbH 2010a). Such tests react to changes in model variables within one simulation step. Scripts that capture the reactive test behavior execute on the processor of the HiL system in real time and are synchronized with the model execution. The Reactive Test Bench (SynaptiCAD 2010) allows for specification of single timing diagram test benches that react to the user’s Hardware Description Language (HDL) design files. Markers are placed in the timing diagram so that the SUT activity is Taxonomy of MBT for Embedded Systems 15 recognized. Markers can also be used to call user-written HDL functions and tasks within a diagram. Dempster and Stuart (2002) conclude that a dynamic test generator and checker are not only more effective in creating reactive test sequences but also more efficient because errors can be detected immediately as they happen. • Generating test logs: The execution phase can produce test logs on each test run that are then used for further test coverage analysis (cf. e.g., Chapter 17). The test logs contain detailed information on test steps, executed methods, covered requirements, etc. 1.3.4 Test evaluation The test evaluation, also called the test assessment, is the process that relies on the test oracle. It is a mechanism for analyzing the SUT output and deciding about the test result. The actual SUT results are compared with the expected ones and a verdict is assigned. An oracle may be the existing system, test specification, or an individual’s expert knowledge. 1.3.4.1 Specification Specification of the test assessment algorithms may be based on different foundations depending on the applied criteria. It generally forms a model of sorts or a set of ordered reference signals/data assigned to specific scenarios. • Reference signal-based specification: Test evaluation based on reference signals assesses the SUT behavior comparing the SUT outcomes with the previously specified references. An example of such an evaluation approach is realized in MTest (dSPACE GmbH 2010b, Conrad 2004b) or SystemTestTM (MathWorks r , 2010). The reference signals can be defined using a signal editor or they can be obtained as a result of a simulation. Similarly, test results of back-to-back tests can be analyzed with the help of MEval (Wiesbrock, Conrad, and Fey 2002). • Reference signal-feature-based specification: Test evaluation based on features of the reference signal∗ assesses the SUT behavior by classifying the SUT outcomes into features and comparing the outcome with the previously specified reference values for those features. Such an approach to test evaluation is supported in the time partitioning test (TPT) (Lehmann 2003, PikeTec 2010). It is based on the scripting language Python extended with some syntactic test evaluation functions. By the availability of those functions, the test assessment can be flexibly designed and allow for dedicated complex algorithms and filters to be applied to the recorded test signals. A library containing complex evaluation functions is available. A similar method is proposed in MiLEST (Zander-Nowicka 2009), where the method for describing the SUT behavior is based on the assessment of particular signal features specified in the requirements. For that purpose, an abstract understanding of a signal is defined and then both test case generation and test evaluation are based on this ∗A signal feature (also called signal property by Gips and Wiesbrock (2007) and Schieferdecker and Großmann (2007) is a formal description of certain defined attributes of a signal. It is an identifiable, descriptive property of a signal. It can be used to describe particular shapes of individual signals by providing means to address abstract characteristics (e.g., increase, step response characteristics, step, maximum) of a signal. 16 Model-Based Testing for Embedded Systems concept. Numerous signal features are identified, and for all of these, feature extractors, comparators, and feature generators are defined. The test evaluation may be performed online because of the application of those elements that enable active test control and unlock the potential for reactive test generation algorithms. The division into reference-based and reference signal-feature-based evaluation becomes particularly important when continuous signals are considered. • Requirements coverage criteria: Similar to the case of test data generation, these criteria aim to cover all the informal SUT requirements, but in this case with respect to the expected SUT behavior (i.e., regarding the test evaluation scenarios) specified during the test evaluation phase. Traceability of the SUT requirements to the test model/code provides valuable support in realizing this criterion. • Test evaluation definition: This criterion refers to the specification of the outputs expected from the SUT in response to the test case execution. Early work of Richardson, O’Malley, and Tittle (1998) already describes several approaches to specification-based test selection and extends them based on the concept of test oracle, faults, and failures. When a test engineer defines test scenarios in a certain formal notation, these scenarios can be used to determine how, when, and which tests will be evaluated. 1.3.4.2 Technology The technology selected to implement the test evaluation specification enables an automatic or manual process, whereas the execution of the test evaluation occurs online or offline. Those options are elaborated next. • Automatic/Manual technology: The execution option can be interpreted either from the perspective of the test evaluation definition or its execution. Regarding the specification of the test evaluation, when the expected SUT outputs are defined by hand, then it is a manual test specification process. In contrast, when they are derived automatically (e.g., from the behavioral model), then the test evaluation based on the test oracle occurs automatically. Typically, the expected reference signals/data are defined manually; however, they may be facilitated by parameterized test patterns application. The test assessment itself can be performed manually or automatically. Manual specification of the test evaluation is supported in Simulink r Verification and ValidationTM (MathWorks 2010), where predefined assertion blocks can be assigned to test signals defined in a Signal Builder block in Simulink. This practice supports verification of functional requirements during model simulation where the evaluation itself occurs automatically. • Online/Offline execution of the test evaluation: The online (i.e., “on-the-fly”) test evaluation happens already during the SUT execution. Online test evaluation enables the concept of test control and test reactiveness to be extended. Offline means that the test evaluation happens after the SUT execution, and so the verdicts are computed after analyzing the execution test logs. Watchdogs defined in Conrad and Ho¨tzer (1998) enable online test evaluation. It is also possible when using TTCN-3. TPT means for online test assessment are limited and are used as watchdogs for extracting any necessary information for making test cases reactive (Lehmann and Kra¨mer 2008). The offline evaluation is more sophisticated in TPT. It offers means for more complex evaluations, including operations such as comparisons with external reference data, limit-value monitoring, signal filters, and analyses of state sequences and time conditions. Taxonomy of MBT for Embedded Systems 17 1.4 Summary This introductory chapter has extended the Model-Based Testing (MBT) taxonomy of previous work. Specifically, test dimensions have been discussed with pertinent aspects such as test goals, test scope, and test abstraction described in detail. Selected classes from the taxonomy have been illustrated, while all categories and options related to the test generation, test execution, and test evaluation have been discussed in detail with examples included where appropriate. Such perspectives of MBT as cost, benefits, and limitations have not been addressed here. Instead, the chapters that follow provide a detailed discussion as they are in a better position to capture these aspects seeing how they strongly depend on the applied approaches and challenges that have to be resolved. As stated in Chapter 6, most published case studies illustrate that utilizing MBT reduces the overall cost of system and software development. A typical benefit achieves 20%–30% of cost reduction. This benefit may increase up to 90% as indicated by Clarke (1998) more than a decade ago, though that study only pertained to test generation efficiency in the telecommunication domain. For additional research and practice in the field of MBT, the reader is referred to the surveys provided by Broy et al. (2005), Utting, Pretschner, and Legeard (2006), Zander and Schieferdecker (2009), Shafique and Labiche (2010), as well as every contribution found in this collection. References All4Tec, Markov Test Logic—MaTeLo, commercial Model-Based Testing tool, http:// www.all4tec.net/ [12/01/10]. Beizer, B. (1995). Black-Box Testing: Techniques for Functional Testing of Software and Systems. ISBN-10: 0471120944. John Wiley & Sons, Inc., Hoboken, NJ. Broy, M., Jonsson, B., Katoen, J. -P., Leucker, M., and Pretschner, A. (Editors) (2005). Model-Based Testing of Reactive Systems, Editors: no. 3472. In LNCS, Springer-Verlag, Heidelberg, Germany. BTC Embedded Systems AG, EmbeddedValidator, commercial verification tool, http://www.btc-es.de/ [12/01/10]. Carnegie Mellon University, Department of Electrical and Computer Engineering, Hybrid System Verification Toolbox for MATLAB—CheckMate, research tool for system verification, http://www.ece.cmu.edu/∼webk/checkmate/ [12/01/10]. Carter, J. M., Lin, L., and Poore, J. H. (2008). Automated Functional Testing of Simulink Control Models. In Proceedings of the 1 st Workshop on Model-based Testing in Practice—MoTip 2008, Editors: Bauer, T., Eichler, H., Rennoch, A., ISBN: 978-38167-7624-6, Fraunhofer IRB Verlag, Berlin, Germany. Clarke, J. M. (1998). Automated Test Generation from Behavioral Models. In the Proceedings of the 11th Software Quality Week (QW’98), Software Research Inc., San Francisco, CA. 18 Model-Based Testing for Embedded Systems Conrad, M. (2004a). A Systematic Approach to Testing Automotive Control Software, Detroit, MI, SAE Technical Paper Series, 2004-21-0039. Conrad, M. (2004b). Modell-basierter Test eingebetteter Software im Automobil: Auswahl und Beschreibung von Testszenarien. PhD thesis. Deutscher Universita¨tsverlag, Wiesbaden (D). (In German). Conrad, M., Fey, I., and Sadeghipour, S. (2004). Systematic Model-Based Testing of Embedded Control Software—The MB3T Approach. In Proceedings of the ICSE 2004 Workshop on Software Engineering for Automotive Systems, Edinburgh, United Kingdom. Conrad, M., and Ho¨tzer, D. (1998). Selective Integration of Formal Methods in the Development of Electronic Control Units. In Proceedings of the ICFEM 1998, 144-Electronic Edition, Brisbane Australia, ISBN: 0-8186-9198-0. Dai, Z. R. (2006). An Approach to Model-Driven Testing with UML 2.0, U2TP and TTCN-3. PhD thesis, Technical University Berlin, ISBN: 978-3-8167-7237-8. Fraunhofer IRB Verlag. Dempster, D., and Stuart, M. (2002). Verification methodology manual, Techniques for Verifying HDL Designs, ISBN: 0-9538-4822-1. Teamwork International, Great Britain, Biddles Ltd., Guildford and King’s Lynn. Din, G., and Engel, K. D. (2009). An Approach for Test Derivation from System Architecture Models Applied to Embedded Systems, In Proceedings of the 2nd Workshop on Model-based Testing in Practice (MoTiP 2009), In Conjunction with the 5th European Conference on Model-Driven Architecture (ECMDA 2009), Enschede, The Netherlands, Editors: Bauer, T., Eichler, H., Rennoch, A., Wieczorek, S., CTIT Workshop Proceedings Series WP09-08, ISSN 0929-0672. D-Mint Project (2008). Deployment of model-based technologies to industrial testing. http://d-mint.org/ [12/01/10]. dSPACE GmbH, AutomationDesk, commercial tool for testing, http://www.dspace.com/de/ gmb/home/products/sw/expsoft/automdesk.cfm [12/01/2010a]. dSPACE GmbH, MTest, commercial MBT tool, http://www.dspaceinc.com/ww/en/inc/ home/products/sw/expsoft/mtest.cfm [12/01/10b]. Dulz, W., and Fenhua, Z. (2003). MaTeLo—Statistical Usage Testing by Annotated Sequence Diagrams, Markov Chains and TTCN-3. In Proceedings of the 3 rd International Conference on Quality Software, Page: 336, ISBN: 0-7695-2015-4. IEEE Computer Society Washington, DC. ETSI (2007). European Standard. 201 873-1 V3.2.1 (2007-02): The Testing and Test Control Notation Version 3; Part 1: TTCN-3 Core Language. European Telecommunications Standards Institute, Sophia-Antipolis, France. GeenSoft (2010a). Safety Test Builder, commercial Model-Based Testing tool, http:// www.geensoft.com/en/article/safetytestbuilder/ [12/01/10]. GeenSoft (2010b). Safety Checker Blockset, commercial Model-Based Testing tool, http://www.geensoft.com/en/article/safetycheckerblockset app/ [12/01/10]. Taxonomy of MBT for Embedded Systems 19 Gips C., Wiesbrock, H. -W. (2007). Notation und Verfahren zur automatischen U¨ berpru¨fung von temporalen Signalabha¨ngigkeiten und -merkmalen fu¨r modellbasiert entwickelte Software. In Proceedings of Model Based Engineering of Embedded Systems III, Editors: Conrad, M., Giese, H., Rumpe, B., Scha¨tz, B.: TU Braunschweig Report TUBS-SSE 2007-01. (In German). Grochtmann, M., and Grimm, K. (1993). Classification Trees for Partition Testing. In Software Testing, Verification & Reliability, 3, 2, 63–82. Wiley, Hoboken, NJ. Gutjahr, W. J. (1999). Partition Testing vs. Random Testing: The Influence of Uncertainty. In IEEE Transactions on Software Engineering, Volume 25, Issue 5, Pages: 661–674, ISSN: 0098–5589. IEEE Press Piscataway, NJ. Hetzel, W. C. (1988). The Complete Guide to Software Testing. Second edition, ISBN: 0-89435-242-3. QED Information Services, Inc., Wellesley, MA. International Software Testing Qualification Board (2006). Standard glossary of terms used in Software Testing. Version 1.2, produced by the Glossary Working Party, Editor: van Veenendaal E., The Netherlands. IT Power Consultants, MEval, commercial tool for testing, http://www.itpower.de/ 30-0-Download-MEval-und-SimEx.html [12/01/10]. Kamga, J., Herrmann, J., and Joshi, P. Deliverable (2007). D-MINT automotive case study—Daimler, Deliverable 1.1, Deployment of model-based technologies to industrial testing, ITEA2 Project, Germany. Kosmatov, N., Legeard, B., Peureux, F., and Utting, M. (2004). Boundary Coverage Criteria for Test Generation from Formal Models. In Proceedings of the 15 th International Symposium on Software Reliability Engineering. ISSN: 1071–9458, ISBN: 0-7695-2215-7, Pages: 139–150. IEEE Computer Society, Washington, DC. Lamberg, K., Beine, M., Eschmann, M., Otterbach, R., Conrad, M., and Fey, I. (2004). Model-Based Testing of Embedded Automotive Software Using MTest. In Proceedings of SAE World Congress, Detroit, MI. Lee, T., and Yannakakis, M. (1994). Testing Finite-State Machines: State Identification and Verification. In IEEE Transactions on Computers, Volume 43, Issue 3, Pages: 306–320, ISSN: 0018–9340. IEEE Computer Society, Washington, DC. Lehmann, E. (then Bringmann, E.) (2003). Time Partition Testing, Systematischer Test des kontinuierlichen Verhaltens von eingebetteten Systemen, PhD thesis, Technical University Berlin. (In German). Lehmann, E., and Kra¨mer, A. (2008). Model-Based Testing of Automotive Systems. In Proceedings of IEEE ICST 08, Lillehammer, Norway. Marre, B., and Arnould, A. (2000). Test Sequences Generation from LUSTRE Descriptions: GATEL. In Proceedings of ASE of the 15 th IEEE International Conference on Automated Software Engineering, Pages: 229–237, ISBN: 0-7695-0710-7, Grenoble, France. IEEE Computer Society, Washington, DC. MathWorks , Inc., Real-Time Workshop , http://www.mathworks.com/help/toolbox/rtw/ [12/01/10]. 20 Model-Based Testing for Embedded Systems MathWorks , Inc., Simulink r Design Verifier TM, commercial Model-Based Testing tool, MathWorks , Inc., Natick, MA, http://www.mathworks.com/products/ sldesignverifier [12/01/10]. MathWorks , Inc., Simulink r , MathWorks , Inc., Natick, MA, http://www.mathworks. com/products/simulink/ [12/01/10]. MathWorks , Inc., Simulink r Verification and ValidationTM, commercial model-based verification and validation tool, MathWorks , Inc., Natick, MA, http://www. mathworks.com/products/simverification/ [12/01/10]. MathWorks , Inc., Stateflow r , MathWorks , Inc., Natick, MA, http://www.mathworks. com/products/stateflow/ [12/01/10]. MathWorks , Inc., SystemTestTM, commercial tool for testing, MathWorks , Inc., Natick, MA, http://www.mathworks.com/products/systemtest/ [12/01/2010]. MATLAB Automated Testing Tool—MATT. (2008). The University of Montana, research Model-Based Testing prototype, http://www.sstc-online.org/Proceedings/ 2008/pdfs/JH1987.pdf [12/01/10]. Mosterman, P. J., Zander, J., Hamon, G., and Denckla, B. (2009). Towards Computational Hybrid System Semantics for Time-Based Block Diagrams. In Proceedings of the 3rd IFAC Conference on Analysis and Design of Hybrid Systems (ADHS’09), Editors: A. Giua, C. Mahulea, M. Silva, and J. Zaytoon, pp. 376–385, Zaragoza, Spain, Plenary paper. Mosterman, P. J., Zander, J., Hamon, G., and Denckla, B. (2011). A computational model of time for stiff hybrid systems applied to control synthesis, Control Engineering Practice Journal (CEP), 19, Elsevier. Myers, G. J. (1979). The Art of Software Testing. ISBN-10: 0471043281. John Wiley & Sons, Hoboken, NJ. Neukirchen, H. W. (2004). Languages, Tools and Patterns for the Specification of Distributed Real-Time Tests, PhD thesis, Georg-August-Universia¨t zu G¨ottingen. OMG. (2003). MDA Guide V1.0.1. http://www.omg.org/mda/mda files/MDA Guide Version1-0.pdf [12/01/10 TODO]. OMG. (2003). UML 2.0 Superstructure Final Adopted Specification, http://www.omg.org/ cgi-bin/doc?ptc/03-08-02.pdf [12/01/10]. OMG. (2005). UML 2.0 Testing Profile. Version 1.0 formal/05-07-07. Object Management Group. PikeTec, Time Partitioning Testing—TPT, commercial Model-Based Testing tool, http://www.piketec.com/products/tpt.php [12/01/2010]. Pretschner, A. (2003). Compositional Generation of MC/DC Integration Test Suites. In Proceedings TACoS’03, Pages: 1–11. Electronic Notes in Theoretical Computer Science 6. Pretschner, A. (2003a). Compositional Generation of MC/DC Integration Test Suites. In Proceedings TACoS’03, Pages: 1–11. Electronic Notes in Theoretical Computer Science 6. http://citeseer.ist.psu.edu/633586.html. Taxonomy of MBT for Embedded Systems 21 Pretschner, A. (2003b). Zum modellbasierten funktionalen Test reaktiver Systeme. PhD thesis. Technical University Munich. (In German). Pretschner, A., Prenninger, W., Wagner, S., Ku¨hnel, C., Baumgartner, M., Sostawa, B., Z¨olch, R., and Stauner, T. (2005). One Evaluation of Model-based Testing and Its Automation. In Proceedings of the 27 th International Conference on Software Engineering, St. Louis, MO, Pages: 392–401, ISBN: 1-59593-963-2. ACM New York. Pretschner, A., Slotosch, O., Aiglstorfer, E., and Kriebel, S. (2004). Model Based Testing for Real—The Inhouse Card Case Study. In International Journal on Software Tools for Technology Transfer. Volume 5, Pages: 140–157. Springer-Verlag, Heidelberg, Germany. Rau, A. (2002). Model-Based Development of Embedded Automotive Control Systems, PhD thesis, University of Tu¨bingen. Reactive Systems, Inc., Reactis Tester, commercial Model-Based Testing tool, http:// www.reactive-systems.com/tester.msp [12/01/10a]. Reactive Systems, Inc., Reactis Validator, commercial validation and verification tool, http://www.reactive-systems.com/reactis/doc/user/user009.html, http://www. reactive-systems.com/validator.msp [12/01/10b]. Richardson, D, O’Malley, O., and Tittle, C. (1998). Approaches to Specification-Based Testing. In Proceedings of ACM SIGSOFT Software Engineering Notes, Volume 14, Issue 8, Pages: 86–96, ISSN: 0163–5948. ACM, New York. Sch¨auffele, J., and Zurawka, T. (2006). Automotive Software Engineering, ISBN: 3528110406. Vieweg. Schieferdecker, I., and Großmann, J. (2007). Testing Embedded Control Systems with TTCN-3. In Proceedings Software Technologies for Embedded and Ubiquitous Systems SEUS 2007, Pages: 125–136, LNCS 4761, ISSN: 0302–9743, 1611–3349, ISBN: 978-3540–75663-7 Santorini Island, Greece. Springer-Verlag, Berlin/Heidelberg. Schieferdecker, I., Großmann, J., and Wendland, M.-F. (2011). Model-Based Testing: Trends. Encyclopedia of Software Engineering DOI: 10.1081/E-ESE-120044686, Taylor & Francis. Schieferdecker, I., and Hoffmann, A. (2011). Model-Based Testing. Encyclopedia of Software Engineering DOI: 10.1081/E-ESE-120044686, Taylor & Francis. Schieferdecker, I., Rennoch, A., and Vouffo-Feudjio, A. (2011). Model-Based Testing: Approaches and Notations. Encyclopedia of Software Engineering DOI: 10.1081/ E-ESE-120044686, Taylor & Francis. Shafique, M., and Labiche, Y. (2010). A Systematic Review of Model Based Testing Tool Support, Carleton University, Technical Report, SCE-10-04, http://squall.sce. carleton.ca/pubs/tech report/TR SCE-10-04.pdf [03/22/11]. Sims S., and DuVarney D. C. (2007). Experience Report: The Reactis Validation Tool. In Proceedings of the ICFP ’07 Conference, Volume 42, Issue 9, Pages: 137–140, ISSN: 0362–1340. ACM, New York. 22 Model-Based Testing for Embedded Systems Software Quality Research Laboratory, Java Usage Model Builder Library—JUMBL, research Model-Based Testing prototype, http://www.cs.utk.edu/sqrl/esp/jumbl.html [12/01/10]. SynaptiCAD, Waveformer Lite 9.9 Test-Bench with Reactive Test Bench, commercial tool for testing, http://www.actel.com/documents/reactive tb tutorial.pdf [12/01/10]. Utting, M. (2005). Model-Based Testing. In Proceedings of the Workshop on Verified Software: Theory, Tools, and Experiments VSTTE 2005. Utting, M., and Legeard, B. (2006). Practical Model-Based Testing: A Tools Approach. ISBN-13: 9780123725011. Elsevier Science & Technology Books. Utting, M., Pretschner, A., and Legeard, B. (2006). A taxonomy of model-based testing, ISSN: 1170-487X, The University of Waikato, New Zealand. Weyuker, E. (1988). The Evaluation of Program-Based Software Test Data Adequacy Criteria. In Communications of the ACM, Volume 31, Issue 6, Pages: 668–675, ISSN: 0001-0782. ACM, New York, NY. Wieczorek, S., Kozyura, V., Roth, A., Leuschel, M., Bendisposto, J., Plagge, D., and Schieferdecker, I. (2009). Applying Model Checking to Generate Model-based Integration Tests from Choreography Models. 21st IFIP Int. Conference on Testing of Communicating Systems (TESTCOM), Eindhoven, The Netherlands, ISBN 978-3-642-05030-5. Wiesbrock, H. -W., Conrad, M., and Fey, I. (2002). Pohlheim: Ein neues automatisiertes Auswerteverfahren fu¨r Regressions und Back-to-Back-Tests eingebetteter Regelsysteme. In Softwaretechnik-Trends, Volume 22, Issue 3, Pages: 22–27. (In German). Zander, J., Dai, Z. R., Schieferdecker, I., and Din, G. (2005). From U2TP Models to Executable Tests with TTCN-3—An Approach to Model Driven Testing. In Proceedings of the IFIP 17 th Intern. Conf. on Testing Communicating Systems (TestCom 2005 ), ISBN: 3-540–26054-4, Springer-Verlag, Heidelberg, Germany. Zander, J., Mosterman, P. J., Hamon, G., and Denckla, B. (2011). On the Structure of Time in Computational Semantics of a Variable-Step Solver for Hybrid Behavior Analysis, 18th World Congress of the International Federation of Automatic Control (IFAC), Milano, Italy. Zander, J., and Schieferdecker, I. (2009). Model-Based Testing of Embedded Systems Exemplified for the Automotive Domain, Chapter in Behavioral Modeling for Embedded Systems and Technologies: Applications for Design and Implementation, Editors: Gomes, L., Fernandes, J. M., DOI: 10.4018/978-1-60566-750-8.ch015. Idea Group Inc. (IGI), Hershey, PA, ISBN 1605667501, 9781605667508, pp. 377–412. Zander-Nowicka, J. (2009). Model-Based Testing of Embedded Systems in the Automotive Domain, PhD Thesis, Technical University Berlin, ISBN: 978-3-8167-7974-2. Fraunhofer IRB Verlag, Germany. http://opus.kobv.de/tuberlin/volltexte/2009/2186/pdf/ zandernowicka justyna.pdf. 2 Behavioral System Models versus Models of Testing Strategies in Functional Test Generation Antti Huima CONTENTS 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.1.1 Finite-state machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.1.2 Arithmetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.1.3 Advanced tester model execution algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.1.4 Harder arithmetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.1.5 Synchronized finite-state machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2 Simple Complexity-Theoretic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2.1 Tester model-based coverage criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.2 Multitape Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2.3 System models and tester models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2.4 Tester models are difficult to construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.4.1 General case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.4.2 Models with bounded test complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.4.3 Polynomially testable models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.4.4 System models with little internal state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3 Practical Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3.1 System-model-driven approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3.1.1 Conformiq Designer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.3.1.2 Smartesting Test Designer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.3.1.3 Microsoft SpecExplorer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.3.2 Tester-model-driven approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.3.2.1 UML Testing Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.3.2.2 ModelJUnit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.2.3 TestMaster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.2.4 Conformiq Test Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.2.5 MaTeLo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.2.6 Time Partition Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.4 Compositionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.5 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.6 Nondeterministic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 An important dichotomy in the concept space of model-based testing is between models that directly describe intended system behavior and models that directly describe testing strategies. The purpose of this chapter is to shed light on this dicothomy from both practical as well as theoretical viewpoints. In the proceedings, we will call these two types of models “system models” and “tester models,” respectively, for brevity’s sake. 23 24 Model-Based Testing for Embedded Systems When it comes to the dividing line between system models and tester models (1) it certainly exists, (2) it is important, and (3) it is not always well understood. One reason why the reader may not have been fully exposed to this distinction between the two types of models is that for explicit finite-state models the distinction disappears by sleight of hand of deterministic polynomial-time complexity, something not true at all for more expressive modeling formalisms. The main argument laid forth in this chapter is that converting a system model into a tester model is computationally hard, and this will be demonstrated both within practical as well as theoretical frameworks. Because system designers and system tester already have some form of a mental model of the system’s correct operation in their minds (Pretschner 2005), this then means that constructing tester models must be hard for humans also. This observation correlates well with the day-to-day observations of test project managers: defining, constructing, and expressing testing strategies is challenging regardless of whether the strategies are eventually expressed in the form of models or not. Even though eventually a question for cognitive psychology, it is generally agreed that, indeed, test designers create or possess mental models of the systems under test. Alexander Pretschner writes (Pretschner 2005): Traditionally, engineers form a vague understanding of the system by reading the specification. They build a mental model. Inventing tests on the grounds of these mental models is a creative process. . . This line of reasoning leads immediately to the result that creating system models should be “easier” than tester models—and also reveals the other side of the coin, namely that software tools that generate tests from system models are much more difficult to construct than those generating tests from tester models. To illustrate this, consider Figure 2.1. Testing experts have mental models of the systems they should test. They can convert those mental models into explicit (computer-readable) system models (arrow 1). These system models could be converted into tester models algorithmically (arrow 2); we do not need to postulate here anything about the nature of those algorithms as for this discussion it is enough to assume for them to exist.∗ Alternatively, test engineers can create tester models directly based on their mental models of the systems (arrow 3). Now let us assume that engineers could create good tester models “efficiently,” that is tester models that cover all the necessary testing conditions (however they are defined) and are correct. This would open the path illustrated in Figure 2.2 to efficiently implement the Mental system model 1 Explicit system model 3 2 Explicit tester model FIGURE 2.1 Conversions between mental and explicit models. ∗And they of course do—a possible algorithm would be to (1) guess a tester model, (2) limit the test cases that can be generated from the tester model to a finite set, and then (3) verify that the generated test cases pass against the system model. For the purposes of showing existence, this is a completely valid algorithm—regardless of it being a nondeterministic one. Functional Test Generation 25 Mental system model Explicit system model Explicit tester model FIGURE 2.2 Illustration of the difficulty of constructing tester models. arrow number 2 from Figure 2.1. Now clearly this alternative path is not algorithmic in a strict sense as it involves human cognition, but still if this nonalgorithmic path would be efficient, it would provide an efficient way to implement an algorithmically very difficult task, namely the conversion of a computer-readable system model into a computer-readable tester model. This would either lead to (1) showing that the human brain is an infinitely more potent computational device than a Turing machine, thus showing that the Church–Turing thesis∗ does not apply to human cognition, or (2) the collapse of the entire computational hierarchy. Both results are highly unlikely given our current knowledge. To paraphrase, as long as it is accepted that for combinatorial problems (such as test case design) the human brain is not infinitely or categorically more efficient and effective than any known mechanical computing devices, it must be that tester model construction is very difficult for humans, and in general much more difficult than the construction of explicit system models. Namely, the latter is a translation problem, whereas the former is a true combinatorial and computational challenge. In summary, the logically derived result that tester models are more difficult to construct than system models but easier to handle for computers leads to the following predictions, which can be verified empirically: 1. Tester model-based test generation tools should be easier to construct; hence, there should be more of them available. 2. System models should be quicker to create than the corresponding tester models. For the purposes of this chapter, we need to fix some substantive definitions for the nature of system models and tester models, otherwise there is not much that can be argued. On an informal level, it shall be assumed that tester models can be executed for test generation efficiently, that is, in polynomial time in the length of the produced test inputs and predicted test outputs. The key here is the word efficiently; without it, we cannot argue about the relative “hardness” of constructing tester models from system models. This leads to basic complexity-theoretic† considerations in Section 2.2. ∗The Church–Turing thesis in its modern form basically states that everything that can be calculated can be calculated by a Turing machine. However, it is in theory an open question whether the human brain can somehow calculate Turing-uncomputable functions. At least when it comes to combinatorial problems, we tend to believe the negative. In any case, the argument that tester model design is difficult for humans is eventually based, in this chapter, on the assumption that for this kind of problems, the human brain is not categorically more efficient than idealized computers. †This chapter is ultimately about the system model versus tester model dichotomy, not about complexity theory. Very basic computational complexity theory is used in this chapter as a vehicle to establish a theoretical foundation that supports and correlates with practical field experience, and some convenient shortcuts are taken. For example, even though test generation is eventually a “function problem,” it is mapped to the decision problem complexity classes in this presentation. This does not affect the validity of any of the conclusions presented here. 26 Model-Based Testing for Embedded Systems This polynomial-time limit on test case generation from tester models is a key to understanding this chapter. Even though in practice system model and tester model-driven approaches differ substantively, on an abstract level both these approaches fit the modelbased testing pattern “model → tests.” It is most likely not possible to argue definitively about the differences between the two approaches based on semantical concepts alone, as those are too much subject to interpretation. The complexity-theoretic assumption that tester models generate the corresponding test suites efficiently (1) matches the current practice and (2) makes sense because tester models encode (by their definition) testing strategies previously construed by humans using their intelligence, and thus it can be expected that the actual test case generation should be relatively fast. 2.1 Introduction In this section, we will proceed through a series of examples highlighting how the expressivity of a language for modeling systems affects the difficulty of constructing the corresponding tester models. This is not about constructing the models algorithmically, but an informal investigation into the growing mental challenge of constructing them (by humans). Namely, in the tester model-driven, model-based testing approach, the tester models are actually constructed by human operators, and as the human operators already must have some form of a mental model about the system’s requirements and intended behavior, all the operators must go through an analogous process in order to construct the tester models. In this section, we shall use formal (computer-readable) system models and their algorithmic conversion into tester models as an expository device to illustrate the complexity of the tester model derivation process regardless of whether it is carried out by a computer or a human operator. 2.1.1 Finite-state machines Consider the system model, in the form of a finite-state machine, shown in Figure 2.3. Each transition from one state to another is triggered by an input (lowercase letters) and produces an output (uppercase letters). Inputs that do not appear on a transition leaving from a state are not allowed. Assume the testing goal is to test a system specified like this and to verify that every transition works as specified. Assume further that the system under test has an introspective facility so that its internal control state can be verified easily. After reset, the system starts from state 1, which is thus the initial state for this model. ?a!A ?b!B 1 2 3 ?c!A ?a!C ?a!A ?c!C ?c!C 4 5 6 ?b!C ?a!B FIGURE 2.3 A finite-state machine. Functional Test Generation 27 One approach for producing a tester model for this system is to enumerate test cases explicitly and call that list of test cases a tester model. In the case of a finite-state machine, test cases can be immediately constructed using simple path traversal procedures, such as Chinese Postman tours (Edmonds and Johnson 1973, Aho et al. 1991). One possible way to cover all the transitions in a single test case here is shown in Figure 2.4. Here, the numbers inside parentheses correspond to verifying the internal control state of the system under test. Note also that inputs and outputs have been reversed as this is a test case—what is an input to the system under test is an output for the tester. The path traversal procedures are so efficient for explicitly represented finite-state machines (polynomial time) that, however, the system model itself can be repurposed as a tester model as illustrated in Figure 2.5. Two things happen here: inputs and outputs are switched, and the state numbers in the tester model now refer to internal states of the system under test to be verified. Path traversal procedures can now be executed for this tester model in order to produce test cases, which can be either executed online (during their construction) or offline (after they have been constructed). This results in the polynomial-time generation of a polynomial-size test suite. Obviously, the system model and the tester model are the same model for all practical purposes, and they are both computationally easy to handle. This explains the wide success of finite-state machine approaches for relatively simple test generation problems and also the confusion that sometimes arises about whether the differences between tester models and system models are real or not. For explicitly represented finite-state models, there are no fundamental differences. 2.1.2 Arithmetics Let us now move from completely finite domains into finite-state control with some numerical variables (extended finite-state machines). For example, consider the extended state machine shown in Figure 2.6. The state machine communicates in terms of rational numbers Q instead of symbols, can do conditional branches, and has internal variables (x, y, and z) for storing data. (1) !a?A (2) !b?B (3) !c?C (6) !a?B (5) !c?C (3) !c?C (6) !a?B (5) !b?C (4) !c?A (1) !a?A (2) !a?C (5) !a?A (1) FIGURE 2.4 A tour through the state machine. 1 !a?A 2 !b?B 3 !c?A !a?C !a?A !c?C !c?C 4 5 6 !b?C !a?B FIGURE 2.5 Repurposing the system model as a tester model. 28 Model-Based Testing for Embedded Systems !x ?x,y,z Yes Yes 1 x>y x < 2z 2 !z !y No No 3 4 FIGURE 2.6 State machine with internal variables. !x,y,z 1 FIGURE 2.7 Prefix of the state machine when inverted. x = rnd(−10,10),...!x,y,z 1 FIGURE 2.8 Randomized input selection strategy. This model cannot be inverted into a tester model directly in the same manner as we inverted the finite-state machine in the previous section because the inverted model would start with the prefix shown in Figure 2.7. Notably, the variables x, y, and z would be unbound—the transition cannot be executed. A simple way to bind the variables would be to use a random testing approach (Duran and Ntafos 1984), for instance to bind every input variable to a random rational number between, say, −10 and 10, as illustrated in Figure 2.8. The corresponding tester model would then require randomized execution. In this case, the probability for a single loop through the tester model to reach any one of the internal states would be relatively high. However, if the second condition in the original machine would be changed as shown in Figure 2.9, then the same strategy for constructing the tester would result in a tester model that would have very slim chances of reaching the internal state (2) within a reasonable number of attempts. This exemplifies the general robustness challenges of random testing approaches for test generation. 2.1.3 Advanced tester model execution algorithms Clearly, one could postulate a tester model execution algorithm that would be able to calculate forwards in the execution space of the tester model and resolve the linear equations backwards in order to choose suitable values for x, y, and z so that instead of being initialized to random values, they would be initialized more “intelligently.” Essentially, the same deus ex machina argument could be thrown in all throughout this chapter. However, it would violate the basic assumption that tester models are (polynomial time) efficient to execute Functional Test Generation 29 Yes Yes x < (1 + 10−6)y No FIGURE 2.9 Changed condition in the state machine illustrates the challenges with random test data selection. because even though it is known that linear equations over rational numbers can be solved in polynomial time, the same is not true for computationally harder classes of data constraints showing up in system models, as will be discussed below. Another, slightly more subtle reason for rejecting this general line of reasoning is the following. If one argues that (1) a system model can be always converted straightforwardly into a tester model by flipping inputs and outputs and (2) a suitably clever algorithm will then select the inputs to the system under test to drive testing intelligently, then actually what is claimed is system model-based test generation—the only difference between the system model and the tester model is, so to speak, the lexical choice between question and exclamation marks! This might be even an appealing conclusion to a theorist, but it is very far from the actual, real-world practice on the field, where system models and tester models are substantially different and handled by separate, noninterchangeable toolsets. At this point, it could be possible to raise the general question whether this entire discussion is somehow unfair, as it will become obvious that a system model-based test generation tool must have exactly that kind of “intelligent” data selection capacity that was just denied from tester model-based tools! This may look like a key question, but actually, it is not and was already answered above. Namely, a tester model is a model of a testing strategy, a strategy that has been devised by a human. The reason why humans labor in creating testing strategy models is that afterwards the models are straightforward to execute for test generation. The cost of straigthforward execution of the models is the relative labor of constructing them. The “intelligent” capacities of system model-based test generation tools must exist in order to compensate for the eliminated human labor in constructing testing strategies. A tester model-based test generation tool with all the technical capacities of a system model-based one is actually equivalent to a system model-based test generation tool, as argued above. Therefore, testing strategy models exist because of the unsuitability, unavailability, undesirability, or cost of system model-based test generation tools in the first place and are characterized by the fact that they can be straightforwardly executed. This analysis corresponds with the practical dicothomy between tester model- and system model-based tools in today’s marketplace. 2.1.4 Harder arithmetics Whereas linear equations over rationals admit practical (Nelder and Mead 1965) and polynomial-time (Bland et al. 1981) solutions, the same is not true for more general classes of arithmetical problems, which may present themselves in system specifications. A well-known example is linear arithmetics over integers; solving a linear equation system restricted to integer solutions is an np-complete problem (Karp 1972, Papadimitriou 1981). This shows that the “flipping” of a system model into a tester model would not work in the case of 30 Model-Based Testing for Embedded Systems integer linear arithmetics because then the postulated “intelligent” tester model execution algorithm would not be efficient. Similarly, higher-order equations restricted to integer solutions (Diophantine equations) are unsolvable (Matiyasevich 1993) and thus do not admit any general testing strategy construction algorithm. 2.1.5 Synchronized finite-state machines The reachability problem for synchronized finite-state machines is pspace-complete (Demri, Laroussinie, and Schnoebelen 2006), resulting in that a system model expressed in terms of synchronized finite-state machines cannot be efficiently transformed into an efficient tester model. The fundamental problem is that a system model composed of n synchronized finitestate machines, each having k states, has internally kn states, which is exponential in the size of the model’s description. Again, a synchronized system model cannot be “flipped” into a tester model by exchanging inputs and outputs because then hypothetical test generation algorithm running on the flipped model could not necessarily find all the reachable output transitions in polynomial time. When a human operator tries to create testing strategies for distributed and multithreaded systems, the operator tends to face the same challenge mentally: it is difficult to synchronize and keep track of the states of the individual components in order to create comprehensive testing strategies. In the context of distributed system testing, Ghosh and Mathur (1999) write: Test data generation in order to make the test sets adequate with respect to a coverage criteria is a difficult task. Experimental and anecdotal evidence reveals that it is difficult to obtain high coverage for large systems. This is because in a large software system it is often difficult to design test cases that can cover a low-level element. 2.2 Simple Complexity-Theoretic Approach In this section, we approach the difficulty of tester model creation from a complexitytheoretic viewpoint. The discussion is more informal than rigorous and the four propositions below are given only proof outlines. The reader is directed to any standard reference in complexity theory such as Papadimitriou (Papadimitriou 1993) for a thorough exposition of the theoretical background. In general, test case generation is intimately related to reachability problems in the sense that being able to solve reachability queries is a prerequisite for generating test cases from system models. This is also what happens in the mind of a person who designs testing strategies mentally. For instance, in order to be able to test deleting a row from a database, a test engineer must first figure out how to add the row to be deleted in the first place. And to do that, the engineer must understand how to log into the system. The goal to test row deletion translates into a reachability problem—how to reach a state where a row has been deleted—and the test engineer proceeds mentally to solve this problem. This is significant because there is a wide body of literature available about the complexity of reachability problems. As a matter of fact, reachability is one of the prototypical complexity-theoretic problems (Papadimitriou 1993). In this chapter, we introduce a simple framework where system models and tester models are constructed as Turing machines. This framework is used only to be able to argue about the computational complexity of the system model → tester model conversion and does not Functional Test Generation 31 readily lead to any practical algorithms. A practically oriented reader may consider jumping directly to the discussion in Section 2.2.4. In order for us to be able to analyze tester construction in detail, we must settle on a concept for test coverage. Below, we shall use only one test coverage, criterion, namely covering output transitions in the system models (as defined below). This is a prototypical model-based black-box testing criterion and suits our purpose well. 2.2.1 Tester model-based coverage criteria The reader may wonder why we do not consider tester model-driven black-box testing criteria, such as covering all the transitions of a tester model. Such coverage criteria are, after all, common in tester model-based test generation tools. This section attempts to answer this question. Tester model-based coverage criteria are often used as practical means to select a subset of all possible paths through a (finite-state) tester model. In that sense, tester model-based coverage is a useful concept and enables practical tester model-driven test generation. However, in order to understand the relationship between tester model-based coverage criteria and the present discussion, we must trace deeper into the concept of model-driven black-box coverage criteria. It is commonly accepted that the purpose of tests is to (1) detect faults and (2) prove their absence. It is an often repeated “theorem” that tests can find faults but cannot prove their absence. This is not actually true, however, because tests can for example prove the absence of systematic errors. For instance, if one can log into a database at the beginning of a test session, it proves that there is no systematic defect in the system, which would always prevent users from logging into it. This is the essence of conformance and feature testing, for instance. Given that the purpose of testing is thus to test for the presence or absence of certain faults, it is clear that tests should be selected so that they actually target “expected” faults. This is the basis for the concept of a fault model. Informally, a fault model represents a hypothesis about the possible, probable, or important errors that the system under test may contain. Fault models can never be based on the implementation of the system under test alone because if the system under test works as the sole reference for the operation of itself, it can never contain any faults. Model-based test coverage criteria are based on the—usually implicit—assumption that the system under test resembles its model. Furthermore, it is assumed that typical and interesting faults in the system under test correlate with, for example, omitting or misimplementing transitions in a state chart model or implementing arithmetic comparisons, which appear in a system model, incorrectly in the system under test. These considerations form the basis for techniques such as boundary-value testing and mutant-based test assessment. A tester model does not bear the same relationship with the system because it does not model the system but a testing strategy. Therefore, for instance, attempting to cover all transitions of a testing strategy model does not have similar, direct relationship with the expected faults of the system under test as the transitions of a typical system model in the form of state machine. In other words, a testing strategy model already encodes a particular fault model or test prioritization that is no longer based on the correspondence between a system model and a system under test correspondence but on the human operator’s interpretation. So now in our context, where we want to argue about the challenge of creating efficient and effective tests, we will focus on system model-based coverage criteria because they bear a direct relationship with the implicit underlying fault model; and as a representative of that set of coverage criteria, we select the simple criterion of covering output transitions as defined below. 32 Model-Based Testing for Embedded Systems 2.2.2 Multitape Turing Machines We will use multitape Turing machines as devices to represent system and tester models. The following definition summarizes the technical essence of a multitape Turing machine. Computationally, multitape Turing machines can be reduced via polynomial-time transformations into single-tape machines with only a negligible (polynomial) loss of execution efficiency (Papadimitriou 1993). Definition 1 (Turing Machines and Their Runs). A multitape deterministic Turing machine with k tapes is a tuple Q, Σ, q, b, δ , where Q is a finite set of control states, Σ is a finite set of symbols, named the alphabet, q ∈ Q is the initial state, b ∈ Σ is blank symbol, and δ : Q × Σk → Q × (Σ × {l, n, r})k is the partial transition function. A configuration of a k-multitape Turing machine is an element of Q × (Σ∗ × Σ × Σ∗)k. A configuration is mapped to a next-state configuration by the rule q, w1, σ1, u1 , . . . , wk, σk, uk → q , w1, σ1, u1 , . . . , wk, σk, uk iff it holds that δ(q, σ1, . . . , σk ) = q , α1, κ1 , . . . , αk, κk and for every 1 ≤ k ≤ i, it holds that κi = n =⇒ wi = wi ∧ ui = ui ∧ σi = αi κi = l =⇒ wiσ1 = wi ∧ u1 = αiui κi = r =⇒ wi = wiαi ∧ (σiui = ui ∨ (σi = b ∧ ui = ∧ ui = )) (2.1) (2.2) (2.3) In the sequel, we assume Σ to be fixed and that it contains a designated separator symbol , which is only used to separate test case inputs and outputs (see below). A run of a Turing machine is a sequence of configurations starting from a given initial configuration c and proceeding in a sequence of next-state configurations c → c1 → c2 → · · · . If the sequence enters a configuration ck without a next-state configuration, the computation halts and the computation’s length is k steps; if the sequence is infinitely long, then the computation is nonterminating. A k-tape Turing machine can be encoded using a given, finite alphabet containing more than one symbol in O(|Q||Σ|k(log Q + k log |Σ|)) space. A tape of a multitape Turing machine can be restricted to be an input tape, which means that the machine is not allowed to change its symbols. A tape can be also an output tape, denoting that the machine is not ever allowed to move left on that tape. 2.2.3 System models and tester models A system model is represented by a three-tape Turing machine. One of the tapes is initialized at the beginning of the model’s execution with the entire input provided to the system. The machine then reads the input on its own pace. Similarly, another tape is reserved for output and the machine writes, during execution, symbols that correspond to the system outputs. The third tape is for internal bookkeeping and general computations. This model effectively precludes nondeterministic system models because the input must be fixed before execution starts. This is intentional as introducing nondeterministic models (ones which could produce multiple different outputs on the same input) would complicate the exposition. However, the issue of nondeterministic models will be revisited in Section 2.6. Definition 2 (System Model). A system model is a three-tape deterministic machine, with one input tape, one output tape, and one tape for internal state. One run of the system model corresponds to initializing the contents of the input tape, clearing the output tape and the internal tape, and then running the machine till it halts. At this point, the contents Functional Test Generation 33 of the output tape correspond to the system’s output. We assume that all system models eventually halt on all inputs.∗ The next definition fixes the model-based test coverage criteria considered within the rest of this section. Definition 3 (Output Transition). Given a system model, an output transition in the system model is a q, σi, σo, σ iff δ(q, σi, σo, σ) is defined (the machine can take a step forward) and the transition moves the head on the output tape right (thus committing the machine to a new output symbol). A halting† run of a system models covers an output transition if a configuration matching an output transition shows up in the run. We define two complexity metrics for system models: their run-time complexity (how many steps the machine executes) and their testing complexity (how long an input is necessary to test any of the output transitions). Making a distinction between these two measures helps highlight the fact that even constructing short tests in general is difficult. Definition 4 (Run-Time and Testing Complexity for System Models). Given a family of system models S, the family has • run-time complexity f iff every system model s ∈ S terminates in O(f (|i|)) steps on any input i, and • testing complexity f iff every output transition in s is either unreachable or can be covered by a run over an input i such that the length of i is O(f (|s|)). Whereas a system model is a three-tape machine, a ester model has only two tapes: one for internal computations and one for outputting the test suite generated by the tester model traversal algorithm, which has been conveniently packed into the tester model itself. This does not cause any loss of generality because the algorithm itself can be considered to have a fixed size, so from a complexity point of view its effect is an additive constant and it thus vanishes in asymptotic considerations. A tester model outputs a test suite in the form of a string w1 u1 w2 u2 · · · , where wi are inputs and ui are expected outputs. Thus, a test is an input/output pair and a test suite is a finite collection of them. Note that this does not imply in any way that the analysis here is limited to systems that can accept only a single input because the input string in the current framework can denote a stream of multiple messages, for instance. The fact that the entire input can be precommitted is an effect of the determinism of system models (see Section 2.6 for further discussion). Definition 5 (Tester Model). A tester model is a two-tape deterministic machine, with one output tape and one tape for internal state. The machine is deterministic and takes no input, so it always carries out the same computation. When the machine halts, the output tape is supposed to contain pairs of input and output words separated by the designated separator symbol . A tester model is valid with respect to a system model if it produces a sequence of tests that would pass against the system model. This means that if the model produces the output w1 u1 · · · wn un , then for every wi that is given as an input to the system model in question, the system model produces the output ui. The outputs are called test suites. ∗Assuming that system models halt on all inputs makes test construction for system models, which can be tested with bounded inputs a decidable problem. This assumption can be lifted without changing much of the content of the present section. It, however, helps strengthen Proposition 1 by showing that even if system models eventually terminate, test construction in general is still undecidable. †Every run of a system model is assumed to halt. 34 Model-Based Testing for Embedded Systems The run-time complexity of a tester model is measured by how long it takes to produce the output given the output length. Definition 6 (Run-Time Complexity for Tester Models). Given a family of tester models T , the family has run-time complexity f if any t ∈ T executes O(f (|o|)) steps before halting and having produced the machine-specific output string (test suite) o. Finally, we consider tester construction strategies, that is, algorithms for converting system models into tester models. These could be also formulated as Turing machines, but for the sake of brevity, we simply present them as general algorithms. Definition 7 (Tester Construction Strategy). A tester construction strategy for a family of system models S is a computable function S that deterministically maps system models in S into valid and complete tester models, that is, tester models that generate tests that pass against the corresponding system models, and that cover all the reachable output transitions on those system models. The run-time complexity of a tester construction strategy is measured in how long it takes to produce the tester model given the size of the input system model, as defined below. Definition 8 (Run-Time Complexity for Tester Construction Strategies). Given a tester construction strategy S for a family of system models S, the strategy has runtime complexity f if for any s ∈ S the strategy computes the corresponding tester model in O(f (|s|)) steps, where |s| is a standard encoding of the system model. This completes our framework. In this framework, tester model-based test generation is divided into two tasks: (1) construct a tester model machine and (2) execute it to produce the tests that appear on the output tape of the tester model machine. Step (1) is dictated by a chosen tester construction strategy and step (2) is then straightforward execution of a deterministic Turing machine. Complexity-wise, the computational complexity of system models is measured by how fast the machines execute with respect to the length of the input strings. We introduced another complexity measure for system models also, namely their testing complexity—this measures the minimum size of any test suite that covers all the reachable output transitions of a given system model. Tester models are measured by how quickly they output their test suites, which makes sense because they do not receive any inputs in the first place. A tester construction strategy has a run-time complexity measure that measures how quickly the strategy can construct a tester model given an encoding of the system model as an input. We shall now proceed to argue that constructing tester models is difficult from a basic computational complexity point of view. 2.2.4 Tester models are difficult to construct 2.2.4.1 General case In general, tester models are impossible to construct. This is a consequence of the undecidability of the control state reachability problem for general Turing machines. Proposition 1. Tester construction strategies do not exist for all families of system models. Proof outline. This is a consequence of the celebrated Halting Theorem. We give an outline of a reduction to the result that the reachability of a given Turing machine’s given halting control state is undecidable. Let m be a one-tape deterministic Turing machine with a Functional Test Generation 35 halting control state q (among others). We translate m to a system model s by (1) adding a “timer” to the model, which causes the model to halt after n test execution steps, where n is a natural number presented in binary encoding on the input tape and (2) adding a control state q and an entry in the transition function which, upon entering the control state q, writes symbol “1” on the output tape, moves right on the output tape, and then enters the control state q that has no successors, that is, q is a halting state. Now, clearly, this newly added output transition (q → q ) can be covered by a tester model if and only if q is reachable in s given large enough n on the input tape. Note that the generated test suite would be either empty or “b1 · · · bk 1 ” depending on the reachability of q, where b1 · · · bk is a binary encoding of n. 2.2.4.2 Models with bounded test complexity We consider next the case of system models whose test complexity is bounded, that is, system models for which there is a limit (as function of the model size) on the length of the input necessary to test any of the output transitions. Proposition 2. Tester construction is decidable and r-hard∗ for families of system models with bounded test complexity. Proof outline. Let S be a family of system models whose test complexity is given by a f . For any s ∈ S, every reachable output transition can be reached with a test input whose length is thus bounded by f (|s|). Because the system model is assumed to halt on all inputs, it can be simulated on any particular input. Thus, a tester model can be constructed by enumerating all inputs within the above bound on length, simulating them against the system model, and choosing a set that covers all the reachable output transitions. This shows that the problem is computable. To show that it is r-hard, it suffices to observe that in a system model s, the reachability of an output transition can still depend on any eventually halting computation, including those with arbitrary complexities with respect to their inputs, and the inputs themselves can be encoded within the description of s itself in polynomial space. 2.2.4.3 Polynomially testable models We will now turn our attention to “polynomially testable” models, that is, models that require only polynomially long test inputs (as function of model size) and which can be executed in polynomial time with respect to the length of those inputs. Definition 9. A family of system models S is polynomially testable if there exist univariate polynomials P1 and P2 such that (1) the family has run-time complexity P1 and (2) the family has testing complexity P2. The next definition closes a loophole where a tester model could actually run a superpolynomial (exponential) algorithm to construct the test suite by forcing the test suite to be of exponential length. Since this is clearly not the intention of the present investigation, we will call tester construction strategies “efficient” if they do not produce tester models that construct exponentially too large test suites. Definition 10. A tester construction strategy S for a family S of system models is efficient if for every s ∈ S, the tester S(s) = t produces, when executed, a test suite whose size is O(P(f (|s|))) where f is the testing complexity of the family S. ∗This means that the problem is as hard as any computational problem that is still decidable. r stands for the complexity class of recursive functions. 36 Model-Based Testing for Embedded Systems Given a family of polynomially testable system models, it is possible to fix a polynomial P such that every system model s has a test suite that tests all its reachable output transitions and can be executed in P(|s|) steps. This is not a new definition but follows logically from the definitions above and the reader can verify this. The following proposition now demonstrates that constructing efficient testers for polynomially testable models is np-complete. This means that it is hard, but some readers might wonder if it actually looks suspiciously simple as np-complete problems are still relatively easy when compared for example to pspace-complete ones, and it was argued in the introduction that for example the reachability problem for synchronized state machines is already a pspace-complete problem. But there is no contradiction—the reason why tester construction appears relatively easy here is that the restriction that the system models must execute in polynomial time is a complexity-wise severe one and basically excludes system models for computationally heavy algorithms. Proposition 3. Efficient tester construction is np-complete for families of polynomially testable system models, that is, unless p = np, efficient tester construction strategies for polynomially testable system models cannot in general have polynomial run-time complexity bounds. Proof outline. We give an outline of reductions in both directions. First, to show nphardness, consider a family of system models, each encoding a specific Boolean satisfaction problem (sat) instance. If the length of an encoded sat instance is , the corresponding system model can have control states that proceed in succession when the machine starts and write the encoding one symbol at a time on the internal tape. Then, the machine enters in a general portion that reads an assignment of Boolean values to the variables of the sat instance from the input tape and outputs either “0” or “1” on the output tape depending on whether the given assignment satisfies the sat instance or not. It is easy to see that the system model can be encoded in a time polynomial in , that is, O(P( )) for a fixed polynomial P. If there would exist a polynomial-time tester construction strategy S for this family of models, sat could be solved in polynomial time by (1) constructing the system model s as above in O(P( )) time, (2) running S(s) producing a tester model t in time polynomial in |s| as it must be that |s| = O(P( )) also, and (3) running t still in polynomial time (because S is efficient) and producing a test suite that contains a test case to cover the output transition “1” iff the sat instance was satisfiable. To show that the problem is in NP, first note that because the output transitions of the system model can be easily identified, the problem can be presented in a form where every output transition is considered separately—and the number of output transitions in the system model is clearly polynomial in the size of the machine’s encoding. Now, for every single output transition, a nondeterministic algorithm can first guess a test input that covers it and then simulate it against the system model in order to verify that it covers the output transition in question. Because the system model is polynomially testable, this is possible. The individual tester models thus constructed for individual output transitions can be then chained together to form a single tester model that covers all the reachable output transitions. 2.2.4.4 System models with little internal state In this section, we consider system models with severely limited internal storage, that is, machines whose internal read/write tape has only bounded capacity. This corresponds to system models that are explicitly represented finite-state machines. Functional Test Generation 37 Proposition 4. Polynomial-time tester construction strategies exist for all families of system models with a fixed bound B on the length of the internal tape. Proof outline. Choose any such family. Every system model in the family can have at most |Q||Σ|B internal states and the factor |Σ|B is obviously constant in the family. Because |Q| = O(|s|) for every system model s, it follows that the complete reachability graph for the system model s can be calculated in O(P(|s|)) time for a fixed polynomial P. The total length of the test suite required to test the output transitions in the reachability graph is obviously proportional to the size of the graph, showing that the tester can be constructed in polynomial time with respect to |s|. 2.2.5 Discussion In general, constructing tester models is an undecidable problem (Proposition 1). This means that in general, it is impossible to create tester models from system models if there is a requirement that the tester models must actually be able to test all reachable parts of the corresponding system models. If there is a known bound on the length of required tests, tester model construction is decidable (because system models are assumed to halt on all inputs) but of unbounded complexity (Proposition 2). This shows that tester construction is substantially difficult, even when one must not search for arbitrarily large test inputs. The reason is that system models can be arbitrarily complex internally even if they do not consume long input strings. In the case of system models, which can be tested with polynomial-size test inputs and can be simulated efficiently, tester construction is an np-complete problem (Proposition 3). This shows that even when there are strict bounds on both a system model’s internal complexity as well as the complexity of required test inputs, tester model construction is hard. In the case of explicitly represented finite-state machines, tester models can be constructed in polynomial time, that is, efficiently (Proposition 4), demonstrating the reason why the dichotomy that is the subject of this chapter does not surface in research that focuses on explicitly represented finite-state machines. Thus, it has been demonstrated that constructing tester models is hard. Assuming that producing system models is a translation problem, it can be concluded that computerreadable tester models are harder to construct than computer-readable system models. We will now move on to examine the practical approaches to system model- and tester model-driven test generation. One of our goals is to show that the theoretically predicted results can be actually empirically verified in today’s practice. 2.3 Practical Approaches In this section, we survey some of the currently available or historical approaches for modelbased testing with emphasis on the system model versus tester model question. Not all available tools or methodologies are included as we have only chosen a few representatives. In addition, general academic work on test generation from finite-state models or other less expressive formalisms is not included. 2.3.1 System-model-driven approaches At the time of writing this chapter, there are three major system-model-driven test generation tools available on the market. 38 Model-Based Testing for Embedded Systems 2.3.1.1 Conformiq Designer Conformiq Designer is a tool developed by Conformiq that generates executable and humanreadable test cases from behavioral system models. The tool is focused on functional blackbox testing. The modeling language employed in the tool consists of UML statecharts and Java-compatible action notation. All usual Java constructs are available, including arbitrary data structures and classes as well as model-level multithreading (Huima 2007, Conformiq 2009a). The tool uses internally constraint solving and symbolic state exploration as the basic means for test generation. Given the computational complexity of this approach, the tool encounters algorithmic scalability issues with complex models. As a partial solution to this problem, Conformiq published a parallelized and distributed variant of the product in 2009 (Conformiq 2009b). 2.3.1.2 Smartesting Test Designer Smartesting Test Designer generates test cases from behavioral system models. The tool uses UML statecharts, UML class diagrams, and OCL-based action notation as its modeling language. In the Smartesting tool, the user must enter some of the test data in external spreadsheets instead of getting the data automatically from the model (Fredriksson 2009), but the approach is still system model driven (Smartesting 2009). The tool uses internally constraint solving and symbolic state exploration as the basic means for test generation. 2.3.1.3 Microsoft SpecExplorer Microsoft SpecExplorer is a tool for generating test cases from system models expressed in two possible languages Spec# (Barnett et al. 2005) and the Abstract State Machine Language (AsmL) (Gurevich, Rossman, and Schulte et al. 2005). Spec# is an extended variant of C#. Developed originally by Microsoft Research, the tool has been used to carry out model-based testing of Windows-related protocols inside Microsoft. The tool works based on system models but avoids part of the computational complexity of test case generation by a methodology the vendor has named “slicing.” In practice, this means reducing the system model’s input data domains into finite domains so that a “slice” of the system model’s explicitly represented state space can be fully calculated. The approach alleviates the computational complexity but puts more burden on the designer of the system model. In practice, the “slices” represent the users’ views regarding the proper testing strategies and are also educated guesses. In that sense, the present SpecExplorer approach should be considered a hybrid between system-model- and tester-model-driven approaches (Veanes et al. 2008, Microsoft Research 2009). 2.3.2 Tester-model-driven approaches As predicted in the discussion above, there are more usable tester-model-driven tools available on the market than there are usable system-model-driven tools. In this section, we mention just some of the presently available or historical tester-model-driven tools and approaches. 2.3.2.1 UML Testing Profile The UML Testing Profile 1.0 (UTP) defines a “language for designing, visualizing, specifying, analyzing, constructing and documenting the artifacts of test systems. It is a test modeling language that can be used with all major object and component technologies and Functional Test Generation 39 applied to testing systems in various application domains. The UML Testing Profile can be used stand alone for the handling of test artifacts or in an integrated manner with UML for a handling of system and test artifacts together” (Object Management Group 2005). As in itself, UTP is not a tool nor even exactly a methodology but mostly a language. However, methodologies around UTP have been defined later, such as by Baker et al. (2007). For example, UML message diagrams can be used to model testing scenarios and statecharts to model testing behaviors. There exist also tools, for example, for converting UTP models into TTCN-3 code templates. 2.3.2.2 ModelJUnit ModelJUnit is a model-based test generation tool that generates tests by traversal of extended finite-state machines expressed in Java code. It is clearly a tester-model-driven approach given that the user must to implement, as part of model creation, methods that will “include code to call the methods of your (system under test) SUT, check their return value, and check the status of the SUT” (Utting et al. 2009). The tool itself is basically a path traversal engine for finite-state machines described in Java (Utting and Legeard 2006). 2.3.2.3 TestMaster The Teradyne Corporation produced a model-based testing tool called TestMaster, but the product has been since discontinued, partially because of company acquisitions, which orphanized the TestMaster product to some extent. The TestMaster concept was based on creating finite-state machine models augmented with handcrafted test inputs as well as manually designed test output validation commands. The tool then basically generated different types of path traversals through the finite-state backbone and collected the input and output commands from those paths into test scripts. Despite of being nowadays discontinued, the product enjoyed some success at least in the telecommunications domain. 2.3.2.4 Conformiq Test Generator Conformiq Test Generator is a historical tool from Conformiq that is no longer sold. On a high level, it was similar to TestMaster in concept, but it focused on online testing instead of test script generation. Also, the modeling language was UML statecharts with a proprietary action notation instead of the TestMaster’s proprietary state machine notation. Conformiq Test Generator was also adopted by a limited number of companies before it was discontinued. 2.3.2.5 MaTeLo MaTeLo is a tool for designing testing strategies and then generating test cases using a statistical, Markov-chain related approach (Dulz and Zhen 2003). This approach is often called statistical use case modeling and is a popular method for testing, for instance, user interfaces. 2.3.2.6 Time Partition Testing Time Partition Testing (TPT) is a method as well as a tool from Piketec. It combines test case or test strategy modeling with combinatorial generation of test case variation based on the testing strategy model. In addition to the test inputs, the user also implements the desired output validation criteria (Bringmann and Kra¨mer 2006). TPT has special focus on the automotive industry as well as continuous-signal control systems. 40 Model-Based Testing for Embedded Systems 2.3.3 Comparison The approaches based on a tester model that are presented above are all fundamentally based on path traversals of finite-state machines, even though in the MaTeLo approach the traversals can be statistically weighted and the TPT system adds finite, user-defined combinatorial control for path selection. In all these approaches, the user must define output validation actions manually because no system model exists that could be simulated with the generated test inputs to produce the expected outputs. The path generation procedures are well understood, relatively simple to implement, and of relatively low practical complexity. The system-model-driven approaches, on the other hand, are all, based on reports from their respective authors, founded on systematic exploration of the system model’s state space, and they aim at generating both the test inputs as well as the expected test outputs automatically by this exploration procedure. For the modeling formalisms available in Conformiq Designer, Smartesting Test Designer, and SpecExplorer, the reachability problem and henceforth test generation problem is undecidable. A finite-state machine path traversal cannot be applied in this context because every simulated input to the system model can cause an infinite or at least hugely wide branch in the state space, and it can be difficult to derive a priori upper bounds on the lengths of required test inputs (Huima 2007). The tester-model-driven tools have two distinguishing benefits: (1) people working in testing understand the concept of a (finite-state) scenario model easily and (2) the tools can be relatively robust and efficient because the underlying algorithmic problems are easy. Both these benefits correspond to handicaps of the system-model-driven approach as (1) test engineers can feel alienated toward the idea of modeling the system directly and (2) operating the tools can require extra skills to contain the computational complexity of the underlying procedure. At the same time, the system-model-driven test generation approach has its own unique benefits, as creating system models is straightforward and less error prone than constructing the corresponding tester models because the mental steps involved in actually designing testing strategies are simply omitted. 2.4 Compositionality An important practical issue which has not been covered above is that of compositionality of models. Consider Figure 2.10, which shows two subsystems A and B that can be composed together to form the composite system C. In the composite system, the two subsystems are connected together via the pair of complementary interfaces b and b . The external interfaces a b A b′ c B FIGURE 2.10 System composition. a b/b′ c A B C Functional Test Generation 41 of C are then a and c, that is, those interfaces of A and B that have not been connected together and are thus hidden from direct observation and control. It is intuitively clear that a system model for A and system model for B should be able to be easily connected together to form the system model for the composite system C, and this has been also observed in practice. This leads to a compositional approach for model-based testing where the same models can be used in principle to generate component, function, system, and end-to-end tests. When system models are composed together basically all the parts of the component models play a functional role in the composite model. However, in the case of test models, it is clear that those parts of the models that are responsible for verifying correct outputs on interfaces hidden in the composite system are redundant; removing them from the composite model would not change how tests can be generated. This important observation hints toward the fact that testing strategy models are not in practice compositional—a practical issue that will be elaborated next. The two key problems are that (1) tester models do not predict system outputs fully, but usually implement predicates that check only for certain properties or parts of the system outputs, and that (2) tester models typically admit only a subset of all the possible input sequences. To consider the issue (1) first, suppose that Figures 2.11 and 2.12 represent some concrete testing patterns for the components A and B, respectively. Assuming M2 = P1 and M3 = P4, it appears that these two patterns can be composed to form the pattern shown in Figure 2.13. For system models, it is easy to describe how this kind of composition is achieved. Consider first A as an isolated component. A system-model-driven test generator first “guesses” inputs M1 and M3 (in practice using state-space exploration or some other method) and can then simulate the system model with the selected inputs in order to obtain the predicted outputs M2 and M4. Now, the system model for A is an executable description of how the component operates on behavioral level, so it is straightforward to connect it with FIGURE 2.11 A testing pattern. FIGURE 2.12 A testing pattern. a A b M1 M2 M3 M4 b′ B c P1 P2 P3 P4 42 Model-Based Testing for Embedded Systems a A B c M1 M2 = P1 P2 P3 M3 = P4 M4 FIGURE 2.13 A composed testing pattern. the system model for B in the same way as two Java classes, say, can be linked together by cross-referencing to create a compound system without any extra trickery. Now, the system model generator first “guesses” inputs M1 and P3 using exactly the same mechanism (the integrated model containing both A and B is simply another system model). Then, the integrated model is simulated to calculate the expected outputs. First, the chosen value for M1 is sent to that part of the integrated model which corresponds to the component model A. Then, this component sends out the predicted value for M2. It is not recorded as an expected output but instead sent as an input to the component model for B; at this point, the output M2 becomes the input P1. Then, the simulated output from component B with input P1 becomes the expected value for the output P2. The output value M4 is calculated analogously. It is clear that this is a compositional and modular approach and does not require extra modeling work. For tester models, the situation is less straightforward. Consider, for the sake of argument, an extended finite-state model where every admissible (fulfilling path conditions) path through the state machine generates a test case. Now, some such paths through a tester model for the component A generate test cases for the scenario above. Instead of being a sequence of input and output messages, it is actually a sequence of (1) input messages and (2) predicates that check the corresponding outputs. Thus, a particular test case corresponding to the component A scenario looks like !m1, ?φ2, !m3, ?φ4, (2.4) where m1 is a value for M1 and m3 is value for M3, but φ2 and φ4 are user-defined predicates to check the correctness of the actual outputs corresponding to M2 and M4. One problem now is that φ2 in practice returns true for multiple concrete outputs. This is how people in practice create tester models because they want to avoid the labor of checking for details in observed outputs, which actually do not contribute toward testing the present testing purposes. Namely, suppose now that we also have a tester model for the component B, and similarly we can generate test sequences consisting of inputs and validation predicates from that model. Let such a sequence be denoted !p1, ?ψ2, !p3, ?ψ4 (2.5) analogously to what was presented above. Is it now true that if φ2(p1) and ψ4(m3) evaluate to true the sequence !m1, ?ψ2, !p3, ?φ4 is guaranteed to be a valid test case for the compound system? The answer is no! Consider the first output check ψ2 in the compound system test case. It checks the actual value of the output P2 produced by the component B. This value in general depends on the input P1, Functional Test Generation 43 which must have the value p1 from (2.5) in order for ψ2 to return true because the predicate ψ2 was derived assuming that input in the first place. However, there is no guarantee that the component A will produce output p2 when given the input m1 as the only guaranteed assertion is that the output after m1 fulfills φ2, while φ2 and p1 do not necessarily have any relationship to each other as they come from two independent and separate models. This discussion has so far focused on that compositionality breaks for tester models that do not predict system outputs fully. The other problem mentioned previously was that tester models do not usually admit all required input sequences. Continuing the present example, it is plausible that for a particular test sequence (2.5), there are no test sequences (2.4) that could be generated from the tester model for component A such that the output triggered by m1 would actually match to p1—a necessary condition for an integrated test case to exist. The reason why this is possible is that a tester model can be very far from admitting all possible input sequences. As a matter of fact, the entire idea of use case modeling which is very near to tester modeling is to focus the models on a set of representative, interesting input sequences. Now when the component models for A and B are created possibly independently, there are no guarantees that the outputs from the model A would match the inputs generated from the model B. But to even get to this point would first require the tester models to generate the full expected outputs in order to avoid the problem with partially specified outputs, which was highlighted first. It can be hence concluded that the current tester model-based tooling and methodology leads to noncompositionality of tester models. We conjecture that fully compositional tester models are indistinguishable from the corresponding system models because it seems that it is the capability to predict system outputs that makes models compositional in the usual sense of component-based software compositionality. On the other hand, the compositionality of system models may also be of limited value because testing subsystems (such as A and B) separately based on their models leads to stronger testing than testing an integrated system (such as the composite formed of A and B) based on the integrated model. The reason is that in the integrated system, it can be impossible to trigger, for example, error conditions around the internal, hidden interfaces. 2.5 Scalability The underlying problem of deriving test cases from a (mental) system model is difficult, so both the system model as well as the tester model approaches must suffer from scalability challenges. This is a natural consequence of the material presented above. The scalability challenges, however, differ. The main scalability challenge for tester-model-driven test generation is human operators’ ability to construct good tester models for increasingly complex systems under test. For system-model-driven test generation the major scalability challenge is the algorithmic infeasibility of deriving a comprehensive test suite from increasingly complex system models. This leads to the situation depicted in Figure 2.14. For the pure system model-driven paradigm, the main scalability problem is algorithmic complexity. The complexity grows when the system under test becomes more complex from a testing perspective (leftmost dashed arrow). For the tester-model-driven approach, the main scalability issue is the cognitive difficulty of producing and maintaining good tester models (rightmost dashed arrow). Some solutions, SpecExplorer for instance, provide a hybrid methodology aiming to strike a balance between the two ends of the spectrum (dotted line in the figure). This leads to two practical, context-dependent questions: (1) should one embrace the system-model-driven or the tester-model-driven approach and (2) how these two methodologies can be predicted to evolve in the future? 44 Model-Based Testing for Embedded Systems · System models Algorithmic complexity Hybrid approaches · Tester models FIGURE 2.14 Scalability issues. Mental complexity The best answer to (1) is that tools and methodologies that work best should be embraced. The system-model-driven approach is theoretically attractive, compositional, and requires less manual work than the tester-model-driven approach, but it can fail because of inadequate tooling. Given our current affiliation with Conformiq Inc., we can be open about the fact that none of the currently available system-model-driven test generation tools are known to scale to complex models without challenges, even though the exact nature of those challenges are tool specific. In contrast, the tester-model-driven approach is easily supported by robust (both free and commercial) tools, but it still leaves the design of testing strategies to the user, and thus provides less upside for productivity improvement. In some contexts, the present processes may enforce for example a use case centric test design methodology, which may make certain types of tester-model-driven tools attractive. To answer (2), observe that given that the main challenge for system-model-driven test generation is of algorithmic complexity, it can be predicted that this approach will gain in capacity and popularity in the future. It will follow the same trajectory as, for instance, programming languages and hardware circuit design methods have followed in the past. When high-level programming language compilers came, they eventually replaced handwritten object code. Automatic circuit layout replaced manual layout, and also since 1990s digital systems are verified not by hand but by computerized methods based on recent advances in algorithmic problems such as Boolean satisfiability. There is no fundamental reason to believe that the same transition would not take place in due time around the algorithmic challenges of system-model-driven test generation. 2.6 Nondeterministic Models The complexity-theoretic analysis above excludes nondeterministic system models, that is, system models whose behavior is not completely determined by their inputs. Most of the offline test generation tools, that is, tools which generate executable test scripts, do not support nondeterministic systems because the test scripts are usually linear. However, SpecExplorer supports nondeterministic system models even though the slicing requirement forces the nondeterminism on the system’s side to cause only finite and in practice relatively narrow branching in the state space. SpecExplorer exports the computed testing strategy as a finite-state model, and it can be then executed by a test execution subsystem that supports branching based on the system under test’s variable responses. Functional Test Generation 45 Conformiq Designer originally supported online testing of nondeterministic systems based on nondeterministic system models, but the support was removed later. Similarly, Conformiq Test Generator, a tester-model-driven tool, was capable of running online tests against a nondeterministic system. The two main reasons why nondeterministic systems were excluded above are that (1) it is complicated to define what is a “complete” test suite for a nondeterministic system because the actual model-based test coverage can be calculated only during test execution and depends on how the system under test implements its nondeterministic choices; and (2) in practice, it is a recognized principle that a system should be as deterministic as possible in order to admit good testing. This second point is certainly a methodological one and is related more to practice than theory. That being said, the conclusions presented previously in this chapter can be transferred to the case of nondeterministic systems also: tester models are difficult to construct also for nondeterministic systems (even more so), and generating tester models from system models of nondeterministic systems is computationally difficult (even more so). So the main arguments stand. 2.7 Conclusions The tester-model-driven test generation approach is well established in the industry and has been adopted in its different forms by a multitude of engineering teams. The system-modeldriven solution in its modern form was developed in the early 2000s by several, independent research teams and is currently, at the time of writing this chapter, in its early adoption phase. Ultimately, the system model versus tester model dichotomy is about the computation platform’s capability to deliver an algorithmically complex result (system-model-driven test generation) versus organizations’ capability and desire to spend human labor to carry out the same task by hand. Thus, it is a choice between either an industrial, mechanical solution Use of human labor · Manual test design · Tester models · System models ·? FIGURE 2.15 Industrialization of test design. Computational capability 46 Model-Based Testing for Embedded Systems or a solution using human labor. To put it succintly, the choice is a matter of industrialization. Thus, given the history of industrialization in general as well as in the area of software engineering, it can be safely and confidently predicted that as science continues its perpetual onward march, the system-model-driven test generation approach will ultimately become dominant over the tester-model-driven solution. This would be a logical conclusion of the presented facts and illustrated by Figure 2.15. The timescale of that transition is, however, yet veiled from our eyes. References Aho, A. V., Dahbura, A. T., Lee, D., and Uyar, M. U¨ . (1991). An optimization technique for protocol conformance test generation based on UIO sequences and rural chinese postman tours. IEEE transactions on communications, 39(11):1604–1615. Baker, P., Dai, Z. R., Grabowski, J., Haugen, Ø., Schieferdecker, I., and Williams, C. (2007). Model-Driven Testing Using the UML Testing Profile. Springer. Barnett, M., Rustan,K., Leino, M., and Schulte, W. (2005). The Spec# programming system: An overview. In Construction and Analysis of Safe, Secure, and Interoperable Smart Devices, Lecture Notes in Computer Science, Pages: 49–69. Bland, R. G., Goldfarb, D., and Todd, M. J. (1981). Ellipsoid method: A survey. Operations Research, 29(6):1039–1091. Bringmann, E., and Kra¨mer, A. (2006). Systematic testing of the continuous behavior of automotive systems. In International Conference on Software Engineering, Pages: 13–20. ACM. Conformiq. (2009a). http://www.conformiq.com/. Conformiq. (2009b). http://www.conformiq.com/news.php?tag=qtronic-hpc-release. Demri, S., Laroussinie, F., and Schnoebelen, P. (2006). A parametric analysis of the stateexplosion problem in model checking. Journal of Computer and System Sciences, 72 (4):547–575. Dulz, W., and Zhen, F. (2003). MaTeLo—statistical usage testing by annotated sequence diagrams, Markov chains and TTCN-3. In Third International Conference On Quality Software, pages 336–342. IEEE. Duran, J. W., and Ntafos, S. C. (1984). Evaluation of random testing. IEEE Transactions on Software Engineering, SE–10(4):438–444. Edmonds, J., and Johnson, E. L. (1973). Matching, Euler tours and the Chinese postman. Mathematical Programming, 5(1):88–124. Fredriksson, H. (2009). Experiences from using model based testing in general and with Qtronic in particular. In Fritzson, P., Krus, P., and Sandahl, K., editors, 3rd MODPROD Workshop on Model-Based Product Development. See http://www.modprod.liu.se/workshop 2009. Functional Test Generation 47 Ghosh, S., and Mathur, A. P. (1999). Issues in testing distributed component-based systems. In First International ICSE Workshop on Testing Distributed Component-Based Systems. Gurevich, Y., Rossman, B., and Schulte, W. (2005). Semantic essence of AsmL. Theoretical Computer Science, 343:370–412. Huima A. (2007).Implementing Conformiq Qtronic. In Alexandre Petrenko et al., editor, Testing of Software and Communicating Systems, number 4581/2007 in LNCS, Pages: 1–12. Springer: Berlin/Heidelberg. Karp, R. M. (1972). Reducibility among combinatorial problems. In Miller, R. E., and Thatcher, J., editors, Complexity of Computer Computations, Pages: 85–103. Plenum. Matiyasevich, Y. (1993). Hilbert’s 10th Problem. The MIT Press. Microsoft Research. (2009). http://research.microsoft.com/en-us/projects/specexplorer/. Nelder, J. A., and Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7(4):308–313. Object Management Group. (2005). UML testing profile 1.0. Published standard. Papadimitriou, C. H. (1981). On the complexity of integer programming. Journal of the ACM, 28(4):765–768. Papadimitriou, C. H. (1993). Computational Complexity. Addison Wesley. Pretschner, A. (2005). Model-based testing in practice. In Proc. Formal Methods 2005, number 3582 in LNCS, Pages: 537–541. Springer. Smartesting. (2009). http://www.smartesting.com/. Utting, M., and Legeard, B. (2006). Practical Model-Based Testing: A Tools Approach. Morgan Kauffman. Utting, M., Perrone, G., Winchester, J., Thompson, S., Yang, R., and Douangsavanh, P. (2009). http://www.cs.waikato.ac.nz/ marku/mbt/modeljunit/. Veanes, M., Campbell, C., Grieskamp, W., Schulte, W., Tillmann, N., and Nachmanson L. (2008). Model-based testing of object-oriented reactive systems with Spec Explorer. In Hierons R. M. et al., editors, Formal Methods and Testing, number 4949/2007 in LNCS, Pages: 39–76. This page intentionally left blank 3 Test Framework Architectures for Model-Based Embedded System Testing Stephen P. Masticola and Michael Gall CONTENTS 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.1.1 Purpose and structure of this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.1.2 Quality attributes of a system test framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1.3 Testability antipatterns in embedded systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2 Preliminary Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2.1 Requirements gathering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2.2 Evaluating existing infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2.3 Choosing the test automation support stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2.4 Developing a domain-specific language for modeling and testing the SUT . . . . . . 56 3.2.5 Architectural prototyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3 Suggested Architectural Techniques for Test Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3.1 Software product-line approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3.2 Reference layered architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.3 Class-level test framework reference architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.3.4 Methods of device classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.5 Supporting global operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3.6 Supporting periodic polling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.7 Supporting diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3.8 Distributed control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.4 Brief Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.5 Supporting Activities in Test Framework Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.5.1 Support software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.5.2 Documentation and training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.5.3 Iterating to a good solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.1 Introduction Model-based testing (MBT) (Dias Neto et al. 2007) refers to the use of models to generate tests of components or entire systems. There are two distinct kinds of MBT, which we will call here behavior based (Utting and Legeard 2006) and use based, (Hartmann et al. 2005) depending on whether we are modeling, respectively, the system under test (SUT) itself or the use cases which the SUT is intended to support. Regardless of the modeling technique used, once the tests are generated from the model, they must be executed. If possible, automated execution is preferable to reduce cost and human error. Automation 49 50 Model-Based Testing for Embedded Systems usually requires some sort of a test harness around the SUT. This is especially true for embedded systems. In designing test harnesses, we are concerned not with the modeling itself, but with executing the tests that are generated from the models. Test engineers will frequently also want to write some tests manually, as well as generate them from models. Test harnesses for embedded systems are employed in both production testing to identify manufacturing defects and engineering testing to identify design defects. Modern test harnesses for either type of application are almost universally controlled by software. Software control allows the test harness to exercise the SUT thoroughly and repeatably. It is useful to divide test harness software into two categories: test scripts, which specify how the SUT is to be exercised, and test framework, which runs the scripts and performs other “housekeeping” functions such as logging test results. Test scripts are almost always written by the test team that must test the SUT. The test framework is usually some combination of commercial and purpose-built software. Figure 3.1 shows the test framework and the types of script creation that must be supported in MBT. To some degree, the test framework is typically customized or custom designed for the SUT. 3.1.1 Purpose and structure of this chapter This chapter describes a reference architecture for a test framework for embedded systems. The frameworks we describe are model based, in the sense that the systems under test are explicitly modeled in specialized scripting languages. This naturally supports a use-based MBT context. The remainder of Section 3.1 presents several common quality goals in software test frameworks that control test harnesses for testing embedded and mechatronic systems and testability “antipatterns” in the systems under test that such frameworks must support. The rest of this chapter describes a practical way in which a test system architect can meet Test engineer Create Model Generate Create Test framework Test scripts Load Control and SUT monitor FIGURE 3.1 Test generation and execution process. Test Framework Architectures for MBT 51 these quality goals in a test framework, and some methods to work around the antipatterns in the SUT. Section 3.2 describes the preliminary activities of the test system architect, such as gathering requirements for the test framework, evaluating existing infrastructure for reuse, selecting a test executive, modeling the SUT, and prototyping the architecture. From this, we show in Section 3.3 how to architect the test harness. In particular, we present this in detail as a specialization of a reference architecture for the test framework. Section 3.4 then presents a brief description of an implementation of the reference architecture more fully described in Masticola and Subramanyan (2009). Finally, Section 3.5 reviews supporting activities, such as iterating to a satisfactory architecture, planning for support software, and creating and presenting documentation and training materials. Throughout this chapter, we will note “important points” and “hints.” The important points are facts of life in the sort of MBT automation frameworks that we describe. The hints are methods we have found useful for solving specific problems. 3.1.2 Quality attributes of a system test framework The quality of the test framework’s architecture has a major impact on whether an automated testing project succeeds or fails. Some typical quality attributes of a test framework architecture include: • Ease of script development and maintenance, or in other words, the cost and time required to write a test suite for the SUT. This is a concern for manually developed scripts, but not for scripts automatically generated from a model. If scripts are manually written, then they must be as easy as possible to write. • Ease of specialization to the SUT, or the cost and time to adapt the test framework to the SUT. This is especially a concern if multiple systems have to be tested by the same test group. • Endurance. This is necessary to support longevity or soak testing to make sure that the performance of the SUT remains acceptable when it is run for long periods of time. The test framework thus must be able to run for such long periods, preferably unattended. • Scalability. If the SUT can scale, then the test framework must support a corresponding scaling of the test harness. • Ability to interchange and interoperate simulators and field hardware. Simply modifying the test framework may not be enough. For best productivity in test development, the test framework should support such interchange without modifying the test scripts. Easy integration with simulation environments is a necessary precondition. • Support for diagnosis. The test framework must support engineers in diagnosing failures from logged data in long-running tests, without sacrificing ease of script development. These failures include both failures in the SUT and failures in the test harness. Diagnosis support from logs is particularly important in scenarios where the tests are run unattended. • Timing accuracy. The requirements for time measurement accuracy for a specific SUT sometimes contain subtleties. For example, it may be necessary to measure reaction time to only 100 ms, but synchronization to 5 ms. Monitoring overhead and jitter should be kept as low as practicable. 52 Model-Based Testing for Embedded Systems • MBT environment support. Easy integration to MBT environments, especially to test generation facilities, enhances the value of both the test framework and the MBT tools. • Flexible test execution. Support for graphical user interface (GUI), manual test execution, and automated test execution, may be needed, depending on the SUT. 3.1.3 Testability antipatterns in embedded systems Embedded systems have certain recurring particular problems. We cannot always avoid these “antipatterns” (Brown et al. 1998). Instead, the best we can do is live with them. We list some testability antipatterns and workarounds for them here. The SUT does not have a test interface. Designing, programming, and maintaining a test interface in an embedded system cost money and other resources that are often in short supply. When budgets must be tightened, test interfaces are often one of the features that are considered disposable. In this case, we are forced to either modify the SUT or, somehow, mimic its human users via automation. At the very worst, we can try to use robotics technology to literally push the buttons and watch the screen. Sometimes, though, the situation is better than that, and we can use extensibility mechanisms (such as buses for plug-in modules or USB interfaces) that already exist in the SUT to implement our own test interface. The SUT has a test interface, but it does not work well enough. Controllability and runtime state sensing of the SUT may not be sufficient to automate the tests you need, or may not be set up in the correct places in the system. Alternatively, the communication format might be incorrect—we have seen systems in which it was possible to subscribe to stateupdate messages, but where we could never be sure that we had ever obtained a baseline state to update. We know of no elegant solution to this antipattern short of the redesign of the SUT’s test interface. Modifying the SUT for test is not possible. In the absence of a reasonable test interface, it is tempting to try to solder wires onto the SUT, or otherwise modify it so that tests can be automated. There are a variety of reasons why this may not be possible. The technology of the SUT may not permit this, or the SUT may be an expensive or one-of-a-kind system such that the stakeholders may resist. The stakeholders might resist such modifications for fear of permanent damage. Regardless of the reason, we are forced to test the SUT in a way that does not modify it, at least not permanently. Stakeholders may often be willing to accept temporary modifications to the SUT. The SUT goes into an unknown state, and the test framework must recover. Here, we are trying to run a large test suite and the SUT becomes unstable. In high-stakes and agile projects, we cannot afford to let the SUT remain idle. We have to force it to reset and resume testing it. Working around this testability antipattern often requires extra support in the hardware and software of the test harness. Putting this support into place can, with good fortune, be done without running afoul of any other antipattern. In addition, before we initiate recovery, we must ensure that all diagnostic data are safely recorded. 3.2 Preliminary Activities Before we architect a test framework for the SUT, we must conduct some preliminary information-gathering and creative activities. This is true regardless of whether we are Test Framework Architectures for MBT 53 custom creating a test framework, specializing one from a software product line, or, indeed, creating the product line. 3.2.1 Requirements gathering The first activity is to gather the requirements for testing the SUT (Berenbach et al. 2009). These testing requirements are different from the functional and quality requirements for the SUT itself and often are not well developed before they are needed. However, the testing requirements will always support verification of the SUT requirements. Some examples of the testing requirements that must be gathered are the following: • The SUT’s external interfaces. These interfaces between the SUT and its environment drive some aspects of the low-level technology selection for the test framework. • The scenarios that must be tested. Frequently, project requirements documents for the SUT will list and prioritize operational scenarios. These can be used as a starting point. Be aware, though, that not all scenarios can necessarily be found in the project documents. One such scenario is interchanging components with different firmware versions to test firmware compatibility. Important Point: When you capture these scenarios, keep track of the major components of the SUT with which the tester directly interacts and the actions that he or she performs to test the system. These will become the “nouns” (i.e., the grammatical subjects and objects) and “verbs” (i.e., actions) of the domain model of the SUT, which is a model of the SUT’s usage in, and interaction with, the domain for which it is intended. If you are capturing the scenarios as UML sequence diagrams (Fowler 2003), then the tester-visible devices of the SUT and the test harness are represented as the business objects in the sequence diagrams, and the actions are the messages between those business objects. The classes of the business objects are also usually captured, and they correspond to device classes in the test framework, as described in Section 3.3.3. As we will see in Section 3.2.4, these scenarios are crucial in forming the test model of the SUT. • Performance requirements of the test harness. These derive from the performance requirements of the SUT. For example, suppose the SUT is an intelligent traffic system. If it is supposed to scale to support 1000 traffic sensors, then the test harness must be capable of driving 1000 real or simulated traffic sensors at their worst-case data rates. If the SUT is a hard real-time or mechatronic system (Broekman and Notenboom 2003), then the time-measuring accuracy of the test harness will be important. These performance requirements will drive such basic decisions as whether to distribute control among multiple test computers. • Test execution speed. This requirement usually stems from the project parameters. If, for example, you are using the test harness to run smoke tests on the SUT for an overnight build and you must run 1000 test cases, then you will be in trouble if each test case takes any longer than about 30 s to run, including all setup and logging. • Quiescent state support. The SUT may, sometimes, have one or more idle states when it is running but there is no activity. We call such a state a quiescent state. It is very useful, at the start and end of a test case, to verify that the system is in a quiescent state. If a quiescent state exists, you will almost certainly want to support it in your test framework. 54 Model-Based Testing for Embedded Systems • Data collection requirements. In addition to simple functional testing, the test framework may also be called upon to gather data during a test run. This data may include the response time of the SUT, analog signals to or from the SUT, or other data streams. You will have to determine the data rates and the required precision of measurement. For very large systems in which the test framework must be distributed, clock synchronization of the test framework computers during data collection may become an important issue. You may also have to reduce some of the collected data while a test is running, for example, to determine pass/fail criteria for the test. If so, then supporting this data reduction at run time is important. This set of requirements is not exhaustive but only a starting point. Plan, at the start of the project, to enumerate the testing scenarios you will have to support. Be as specific as you can about the requirements of the test framework. The system test plan and system test design for the SUT can provide much of the information on the requirements of the test framework. For example, the system test plan will often have information about the SUT’s external interfaces, data collection requirements, performance requirements of the test harness, and at least a few scenarios to be tested. The system test design will provide the remainder of the scenarios. If the test framework is being designed before the system test plan, then the requirements gathered for the test framework can also provide information needed for the system test plan and at least some typical scenarios for the system test design. Interview the stakeholders for the test harness and people with knowledge you can use. The manual testers who are experienced with the SUT or systems like it are often the best source of information about what it will take to test the SUT. Developers can often tell you how to work around problems in the test interfaces because they have had to do it themselves. Project managers can provide the project parameters that your test harness must support. 3.2.2 Evaluating existing infrastructure Few projects are built in “green fields” anymore. There will almost always be existing test artifacts that you can reuse. Inventory the existing hardware and software infrastructure that is available for testing your system. Pay special attention to any test automation that has been done in the past. Evaluate ways in which improvement is needed—check with the stakeholders. Find out what the missing pieces are and what can be reused. You will have to make a decision about whether to use existing infrastructure. Be aware that some stakeholders have “pet” tools and projects, and they may plead (or demand) that you use them. You will have to evaluate on a case-by-case basis whether this is sensible. 3.2.3 Choosing the test automation support stack We define the test automation support stack as the generic hardware and software necessary to implement a test harness. The test automation support stack consists of three major subsystems: • The test executive, which parses and controls the high-level execution of the test scripts. The test executive may also include script editing and generic logging facilities. • The adaptation software, which adapts the high-level function calls in the test scripts to low-level device driver calls. The software test frameworks that this chapter describes are implemented in the adaptation software. Test Framework Architectures for MBT 55 • The low-level device drivers necessary for the interface between the test system and the hardware interface to the SUT. Hint: When deciding on the support stack, prefer single vendors who provide the entire stack, rather than mixing vendors. Vendors who provide the entire stack have a strong motivation to make the entire stack work together. Be prepared, however, to do some architectural prototyping to evaluate the support stack for your specific needs. See Section 3.2.5 for a description of architectural prototyping. A test executive is a system that executes test scripts and logs results. It is a key component of the test framework. Some examples of test executives include Rational System Test, National Instruments TestStand, and Froglogic Squish. Many test executives are patterned around integrated development environments and support features such as breakpoints, single-stepping, and variable inspection in the test scripts. It is the part of a test framework that the test engineer sees the most. Test executives are also often designed to support different markets, for example, GUI testing versus test automation for embedded and mechatronic products. You will have to decide whether to buy one or build one. Each of these choices leads to other decisions, for example, which test executive to buy, which scripting language platform to build your own test executive upon, etc. If you are going to buy the test executive, then evaluate the commercial off-the-shelf (COTS) alternatives. The choice you make should demonstrate, to your satisfaction, that it supports the requirements you have gathered. In most cases, you will want to experiment with evaluation versions of each COTS test executive to decide whether it will meet the testing requirements. If you decide that no COTS test executive meets your requirements, you will be forced to build one. In special circumstances, this may be a wise decision. If you make this decision, be aware that building a commercial-quality test executive is a lengthy and expensive process, and you will probably not be able to build one that is as sophisticated as the ones you can buy. In addition to deciding on your test executive, you may also have to decide on the scripting language that test engineers will use to write tests and that the MBT generator will emit. Many test executives allow you to choose from among several scripting languages, such as VBScript, Python, Perl, etc. The preferences and knowledge of the test team play a large part in making this choice. Vendors generally add at least some test-executive-specific functions to the scripting language’s standard library. Changes to syntax for purposes of vendor lock-in (commonly known as “vendorscripts”) should be avoided. Adaptation software is the software that glues the SUT to the test harness. Most of the rest of this chapter is about the architecture of the adaptation software. Important Point: The adaptation software should support a strong objectoriented development model. This is necessary in order to make the implementation of the test framework (as described here) tractable. At the lowest level of the adaptation, software is the interface between the test framework and the SUT (or the electronic test harness attached to the SUT). Often, a change of software technology is forced at this interface by availability of test harness components, and different software technologies do not always interoperate well. If the test harness supports it, you will also want to report events in the SUT to the test framework without having to poll the state of the SUT. The test framework must function efficiently through this interface, so you will likely have to do some architectural prototyping at the low-level 56 Model-Based Testing for Embedded Systems interface level to make sure that it will. (Refer to Section 3.2.5 for details on architectural prototyping.) 3.2.4 Developing a domain-specific language for modeling and testing the SUT A domain-specific language (DSL) (Kelly and Tolvanen 2008) is a computer language that has been created or adapted to model a specific domain. DSLs are not necessarily programming languages because their users are not necessarily doing programming. In test automation frameworks, a test automation DSL is a DSL that represents the SUT and the test framework in test scripts. A test automation DSL is often represented in the test framework as a library of functions that are used to access, control, and check the SUT. The test automation DSL is usually specialized for a specific SUT or class of similar SUTs. In the requirements gathering phase of creating a test framework (Section 3.2.1), you should start to understand what parts of the SUT you will have to control and monitor and what information must be passed. You should always assume that some scripting will have to be done manually to prevent forcing the test team to depend solely on the modelbased test generator. Therefore, the test automation DSL will have to have a good level of developer friendliness. The scenarios that you identified in the requirements gathering phase are the key input into developing the test automation DSL. They contain the objects of the SUT that the tester interacts with and the kinds of interactions themselves. Important Point: The test automation DSL should be specified from the point of view of a tester testing the system. Do not waste time specifying the entire test automation DSL in the early stages. It will change as the team learns more about automating the tests for the SUT. A test modeling DSL is used in use-based test modeling (as defined in Section 3.1). It is typically created by customizing a generic use-based test modeling language to the SUT. The design of the test automation DSL should be coordinated with the design of the test modeling DSL, as we mentioned in Section 3.2.1. Fortunately, this coordination can be done quite efficiently since the same requirements engineering and customization process can be used to create the test modeling DSL and the test automation DSL. A single DSL can thus be used for both test modeling and test automation, as long as the DSL includes the semantics necessary for both the test modeling and test automation tasks. Using the same DSL for both improves project efficiency. Often, a modeler will want to represent the same objects and activities in the models that a test developer is representing in their tests. Doing this will make both the models and the test scripts intuitively analogous to the SUT. 3.2.5 Architectural prototyping Before starting serious construction of the test framework, it is advisable to do some architectural prototyping (Bardram, Christensen, and Hansen 2004) to make sure that the known technical risks are addressed. An architectural prototype is a partial implementation of a risky part of the system and is created to ensure that the issue can be solved. Functionality and performance risks—in particular, requirements for scalability and timing accuracy—can often be addressed early by architectural prototyping. Architectural prototyping helps you work out the known problems, but not the unknown ones. Only a complete and successful implementation of the test framework can totally Test Framework Architectures for MBT 57 eliminate any risk that it will fail. The goal of architectural prototyping is not that stringent. Through architectural prototyping, you will eliminate the severe risks that you know about before development starts. 3.3 Suggested Architectural Techniques for Test Frameworks Once the preliminary activities have been completed, the test framework is architected and implemented. Here, we describe some architectural techniques that are useful in creating test frameworks for embedded systems, especially in model-driven environments. 3.3.1 Software product-line approach A software product line (Clements and Northrup 2002) is a set of related systems that are specialized from reusable, domain-specific core assets (see Figure 3.2). Specialization of the core assets to a product may be done by hand or may be assisted by special-purpose tools. The great business advantage of a product-line approach is that it can allow the enterprise to efficiently create related systems or products. A product line of test frameworks would thus consist of core assets that could be specialized for a particular class of SUTs. If the SUT is configurable, if it is one of several similar systems you must test, or if the SUT is developed using a product-line approach,∗ then consider a product-line approach for the test framework. Test engineer Specialize Core assets Generate Test framework Control and monitor SUT FIGURE 3.2 Software product-line approach for test framework. ∗If the SUT is part of a product line, then the product line will usually include the test framework as a specializable core asset. 58 Model-Based Testing for Embedded Systems Trying to build a software product line before the first SUT has been placed under test will probably result in a suboptimal design. Generally, at least three instances of any product (including test frameworks) are necessary before one can efficiently separate core assets from specialized assets. 3.3.2 Reference layered architecture For the following example, we assume that the SUT is a typical embedded system that contains several components that the test engineer must exercise. These include the following: • A human–machine interface (HMI). We will assume that the HMI is interfaced to the test harness electronics via custom hardware. • Several sensors of various types. These may be either actual hardware as intended for use in the SUT, or they may be simulators. Actual hardware is interfaced via custom electronics, while simulators are interfaced via a convenient standard bus. • Several effectors of various types. Again, these may be actual or simulated. • Adapters of various types for interfacing the SUT to other systems. These may be based on either proprietary technology or standard interface stacks. • A logging printer, interfaced via RS-232. Figure 3.3 shows a layered architecture for the test framework software. As we shall see, this layering system can accommodate product-line approaches or one-of-a-kind test framework implementations. The layers are as follows: • The Test Script Layer (TSL) contains the test scripts that execute on the test executive. These are written in the scripting language of the test executive, which has been extended with the test automation DSL. Test script layer (TSL) Script control Script support layer (SSL) - core Special logging Reset Device abstraction layer (DAL) Component Device Button Hardware interface layer (HIL) Digital I/O RS-232 TCP socket Network failure test Normal operation test Startup Sensor HMI Adapter Printer Physical sensor Simulated sensor HW simulator Test scripts (custom) Adaptation layer (custom) Infrastructure (vendor provided) Core assets of the test framework FIGURE 3.3 Layered architecture of the test framework. Test Framework Architectures for MBT 59 Important Point: Test scripts should contain descriptions of tests, not descriptions of the SUT. Represent the SUT in the layers under the TSL. To maintain this clear boundary between the TSL and other layers, it may prove helpful to implement the TSL in a different language than the adaptation layer software uses, or to otherwise limit access to the adaptation layer software from test scripts. Fortunately, testers seem to prefer scripting languages to object-oriented (OO) languages, and the reverse is true for software developers, so this decision is culturally easy to support. • The Script Support Layer (SSL) contains helper functions for the test scripts. Some examples we have used include system configuration, resetting to a quiescent state, and checking for a quiescent state. Again, it is important to maintain clear layer boundaries. Since the SSL is generally written in the same programming language as the two layers below, it is more difficult to ensure separation of the layers. Hint: The SSL can be a good place to put test-script-visible functionality that concerns multiple devices or the entire SUT. If the functionality concerns single devices or components of devices, consider putting it in the layer(s) below. Functionality that concerns multiple devices does not always have to be kept in the SSL, though. For example, we have found that there is a need for Factory design patterns (Gamma et al. 1995) that yield concrete Device instances, even at the lowest layers. This functionality concerns multiple devices, but is not part of the test automation DSL and should not be accessed through the test scripts. In fact, we have put information-hiding techniques into place to prevent test scripts from directly accessing the device factories either accidentally or deliberately. Hint: You can also use the SSL to hide the layers below from the test scripts of the TSL. We have, for instance, used the SSL to hide incompatible technology in the lower layers from the test scripts, or to simplify the test automation DSL, by defining a common interface library within the SSL to act as a fac¸ade for the lower layers. • The Device Abstraction Layer (DAL) contains abstract classes that represent the devices in the SUT, but with an abstract implementation. Continuing the example from above, the device class representing an HMI would typically have methods for activating touchscreens and buttons, but the interface to the hardware that actually does this work must be left unspecified. The DAL contains both core assets and system-specific abstract device classes. • The Hardware Interface Layer (HIL) contains one or more concrete implementor class for each abstract device class. These implementor classes form the “view” of the SUT that is represented more abstractly in the DAL. Again continuing the same example, if the sensors in the SUT are actual field hardware interfaced by analog I/O devices, then a class HardwareSensor in the HIL would implement the class Sensor in the DAL. HardwareSensor would also contain configuration information for the analog I/O devices. If the sensors in the SUT are simulated, then a class SimulatedSensor would implement Sensor and would contain configuration information about the simulator. The decision of which implementor of Sensor to use can be deferred until run time, when the test framework is being initialized, by using Bridge and Factory design patterns (Gamma et al. 1995). This allows dynamic configurability of the test framework. (See Figure 3.8 for an example of the Bridge pattern.) 60 Model-Based Testing for Embedded Systems As software projects age, layers and other information-hiding mechanisms have a tendency to become blurred. If possible, the test system architect should therefore put some mechanism into place to defer this blurring as long as possible. Hint: Use the directory structure of the source code to help keep the layers separated and to keep products cleanly separated in a product line. We have had good success in encouraging and enforcing layer separation by including the lower layer directories within the upper layer directories. Sibling directories within each layer directory can be further used to separate core assets from SUT-specific assets and to separate assets for different SUTs from each other, in a product-line approach. Directory permissions can be used to limit access to specific groups of maintainers to further avoid blurring the structure of the framework. Figure 3.4 shows an example of a directory structure that encourages layer separation and product separation. 3.3.3 Class-level test framework reference architecture Figure 3.5 shows a class-level reference architecture for a test framework. We have successfully implemented this reference architecture to meet the quality attribute requirements outlined in Section 3.1.2 (Masticola and Subramanyan 2009). From the SSL, the script-visible functionality is kept in a package labeled TestExecInterface. All the functionality in this package is specific to a particular test executive. Much of it is also specific to a particular SUT product type in a software factory. In the DAL (and hence in the HIL), we have found it helpful to define two abstract classes that represent the SUT: Device and Component. An instance of a Device can be individually referenced by the test script and configured to a particular hardware interface in the SUT. Components exist only as pieces of devices and cannot be individually configured. Often, a Device serves as a container for Components, and there are cases in which this is the only functionality that the Device has. Important Point: The test steps in the script address the Devices by their names. Therefore, Devices have names, Components do not. Since Components do not exist independently of Devices, it is not necessary (from the standpoint of test framework architecture) to give them names. However, it may be convenient to assign them handles or similar identifiers in some specific implementations. FIGURE 3.4 Directory structure used to encourage layer separation. Test Framework Architectures for MBT 61 SSL SUT ResetSystem CheckQuiescentState CreateSystem TestExecInterface DAL Device Name Device Registry Lookup Clear All Devices GetAllDevices Component Example of a commonly used abstract component that is part of the core framework. (Abstract devices) Set Sense CheckQuiescentState In some cases (Abstract components) Set Sense CheckQuiescentState Led Sense Array Sense Detect Blinking CheckQuiescentState Bridge pattern Bridge pattern HIL Device factory (Device implementor) CreateDevice Creates (Component implementor) FIGURE 3.5 Reference class architecture for a test framework. Important Point: If a tester-visible unit of the SUT must be addressed individually by the test script, make it a Device. Hint: The business classes in the testing scenarios that were captured in the requirements gathering phase of Section 3.2.1 identify the tester-visible devices of the SUT. The Device instances will, again, mostly correspond to the business objects in the SUT that are identified in the scenario capture of Section 3.2.1. Continuing the example of Section 3.3.2, the scenarios will identify the HMI as a business object in the testing process. The test developer will therefore have to access the HMI in test scripts. As the test system architect, you might also decide that it’s more useful to represent and configure the parts of the HMI together as a single device than to treat them individually. Under these circumstances, HMI can usefully be a subclass of Device. Hint: Introduce Components when there is repeated functionality in a Device, or when the Device has clearly separable concerns. Again, from the continuing example of Section 3.3.2, a typical HMI may have a video display, touchscreen overlaying the video, “hard” buttons, LEDs, and a buzzer. The HMI class may then be usefully composed of HmiDisplay, HmiTouchscreen, HmiButtons, HmiLEDs, 62 Model-Based Testing for Embedded Systems and HmiBuzzer subclasses of Component. HmiButtons and HmiLeds are containers for even smaller Components. Hint: In a product-line architecture, create generic components in the core architecture if they are repeated across the product line. Continuing the example from Section 3.3.2, HmiButtons and HmiLeds can be composed of generic Button and Led subclasses of Component, respectively. Buttons and LEDs are found in so many SUTs that it is sensible to include them in the core assets of the test framework. 3.3.4 Methods of device classes English sentences have a “subject-verb-object” structure and the test automation DSL has the same parts of speech. The tester-visible Device objects make up the “subjects” of the test automation DSL that is used in the test scripts to control the test state. The methods of these tester-visible Device objects which are visible from the test scripts correspond, indirectly, to the “verbs” of the test automation DSL. Expected results or test stimulus data are the “object phrases” of the test automation DSL.∗ Hint: The actions in the testing scenarios that were captured in the requirements gathering phase of Section 3.2.1 can help identify the “verbs” of the test automation DSL. However, direct translation of the actions into test automation DSL verbs is usually not desirable. You will have to form a test automation DSL that represents everything that the verbs represent. Keep in mind, though, that you want a test automation DSL that is easy to learn and efficient to use. We present here some common verbs for test automation DSLs. Setting and sensing device state. The test script commands some devices in the SUT, for example, pushing buttons or activating touchscreen items in the HMI, or controlling simulated (or physical) sensor values. For this, Device objects often have Set methods. Because different types of devices are controlled differently, the Set methods cannot typically be standardized to any useful type signature. Furthermore, not all objects have Set methods. For instance, effectors would typically not since the SUT is controlling them and the test framework is only monitoring the behavior of the SUT in controlling them. Sense methods are similar to Set methods in that they cannot be standardized to a single-type signature. On the other hand, devices may have more than one Sense method. If the scripting language supports polymorphism, this can help in disambiguating which Sense method to call. Otherwise, the disambiguation must be done by naming the “Sense-like” methods differently. One could envision abstract classes to represent the Actual State and Expected State arguments of Set and Sense methods, but our persuasion is that this is usually overengineering that causes complications in the test automation DSL and scripting language and may force the test engineers to do extra, superfluous work. For these reasons, the base Device and Component classes of Figure 3.5 do not define Set or Sense methods. Instead, their subclasses in the DAL should define these methods and their type signatures, where necessary. ∗We have never had a situation in which a Device was part of an “object phrase” in the test automation DSL, but it is possible. An example would be if two Devices had to be checked for a state that is “compatible,” in some sense that would be difficult or tedious to express in the scripting language. Since the corresponding “verb” involves an operation with multiple devices, any such “object phrase” properly belongs in the test executive interface package in the SSL. Test Framework Architectures for MBT 63 Hint: Do not include Set and Sense methods in the Device base class. However, do keep the abstract device classes that have Sense and Set methods as similar as you can. Such regularity improves the ease with which developers can understand the system. Checking for expected state. In embedded systems, the usual sequence of test steps is to apply a stimulus (via setting one or more device states) and then to verify that the system reaches an expected state in response. If the actual state seen is not an expected one, then the actual and expected states are logged, the cleanup procedure for the test case is executed, and the test case ends with a failure. Many commercial test executives support this control flow in test cases. This sequence may appear to be simple, but in the context of a test framework, there are some hidden subtleties in getting it to work accurately and efficiently. Some of these subtleties include: • The response does not usually come immediately. Instead, there are generally system requirements that the expected state be reached within a specified time. Polling for the expected state is also usually inefficient and introduces unnecessary complexity in the test script. Hint: Include expected states and a timeout parameter as input parameters in the Sense methods. The Sense methods should have at least two output parameters: (a) the actual state sensed and (b) a Boolean indicating whether that state matched the expected state before the timeout. • There may not be a single expected state, but multiple (or even infinite) acceptable expected states. Hint: Add a “don’t care” option to the expected states of the low-level components. Hint: If the expected state of a device cannot be easily expressed as the cross-product of fixed states of its components with don’t-cares included, add extra check functions to the SSL or the abstract device to support checking the expected state. For example, suppose that two effectors in a single device must represent the sine and cosine of a given angle whose value can vary at run-time. An extra check function would verify that the effectors meet these criteria within a specified time and precision. Hint: Add a singleton “SUT” class at the SSL layer to support test-harnesswide “global” operations (see Figure 3.5). The SUT class is useful when there are system states that exist independently of the configuration of the system, such as a quiescent state. We will talk more about how to specifically implement this in Section 3.3.5. • The test frameworks described here are designed around devices. A check for an arbitrary expected state of several devices must therefore be composed of individual checks for the states of one or more individual devices. Note that in some special cases, such as a system-wide quiescent state, we add support for checking the state of the entire system. In testing, we usually want to check that several devices, or the entire system, reach their expected states within a specified time. Most scripting languages are basically sequential, so we list the devices to check in some sequence. If we check devices sequentially, we often cannot be certain which one is going to reach its expected state first. If any device has to wait, all the devices after it in the sequence may have to wait, even if they are 64 Model-Based Testing for Embedded Systems already in their expected states. Hardcoding the individual device wait times in the test script may thus result in spurious test failures (or successes) if the devices do not reach their expected states in the order they are checked. Hint: Sample a “base time” when the stimulus is applied and sense that each device reaches its expected state within a specified interval from the base time. This mechanism does not always guarantee that slow-responding devices will not cause spurious failures. • Executing the Sense methods takes some time. Even with the base time approach, it is possible that a slow Sense method call may cause a later Sense method call to start later than its timeout with respect to the base time. The result is that the second Sense method call instantly times out. Hint: Consider adding a SenseMultiple container class to the SSL. (Not shown in Figure 3.5.) This is a more robust alternative to the “base time” idea above. Before the stimulus is applied, the test script registers devices, their expected states, and their timeout parameters with the SenseMultiple container. After the stimulus is applied, the SenseMultiple container is called in the test script to check that all its registered devices reach their expected states before their individual timeouts. Such a SenseMultiple container class may require or benefit from some specific support from the devices, for example, callbacks on state change, to avoid polling. Some SUTs may also have a global “quiescent state” that is expected between test cases. It is useful to add support to the test framework for checking that the system is in its quiescent state. Hint: Include overridable default CheckQuiescentState methods in the Device and Component base classes to check that these are in a quiescent state. Implement these methods in the DAL or HIL subclasses, as appropriate. The default methods can simply return true so that they have no effect if they are unimplemented. The use of these CheckQuiescentState methods to reset the system is described in Section 3.3.5. Accumulating data for later checking and analysis. In some cases, we wish to record data for later analysis, but not analyze it as the test runs. One example of such data would be analog signals from the effectors that are to be subjected to later analysis. Sometimes it is desirable to start and stop recording during a test. Usually the data recording must at least be configured, for example, to specify a file to store the data. Hint: If a data stream is associated with a particular kind of device, include methods for recording and storing it in the Device class. Otherwise, consider adding a Device class for it as a pseudo-device. The pseudo-device will do nothing but control the data recording. Configuring the device. Configuring a device involves declaring that an instance of the abstract device exists, how the test scripts can look the instance up, the concrete implementor class, and how the implementor is configured. These are the four parameters to the Configure method of an abstract device. Hint: Include abstract Configure methods in the Device and Component base classes. Test Framework Architectures for MBT 65 Variant device configuration parameters to these Configure methods may be serialized or otherwise converted to a representation that can be handled by any device or component. The Configure method must be implemented in the concrete device class in the HIL. A DeviceFactory class in the HIL produces and configures concrete device instances and registers them with the framework. Resetting the device. As mentioned before, it is very convenient to be able to sense that the SUT is in a quiescent state. It is likewise convenient to be able to put the SUT back into a known state (usually the quiescent state) with a minimum amount of special-purpose scripting when a test is finished. Returning the SUT to its quiescent state can sometimes be done by individually resetting the devices. Hint: Include overridable default Reset methods in the Device and Component base classes to return the test harness controls to their quiescent values. Override these methods in the DAL or HIL subclasses, as appropriate. In some common cases, however, resetting the individual devices is insufficient. The SUT must be navigated through several states (for example, through a series of calls to the HMI Set and Sense methods) to reach the quiescent state, in a manner similar to, though usually much simpler than, recovery techniques in fault-tolerant software (Pullum 2001). This is usually the case if the SUT is stateful. The use of these Reset methods to reset the system is described in Section 3.3.5. Hint: If necessary, include in the SSL a method to return the SUT to a quiescent state, by navigation if possible, but by power cycling if necessary. If the system fails to reset automatically through navigation, the only way to return it to a quiescent state automatically may be to power cycle it or to activate a hard system reset. The test harness must include any necessary hardware to support hard resets. 3.3.5 Supporting global operations The test framework must support a small number of global operations that affect the entire SUT. We present some of these here. Specific SUTs may require additional global operations, which often can be supported via the same mechanisms. Finding configured devices. Since the Device classes address specific devices by name, it is necessary to look up the configured implementors of devices at almost every test step. This argues for using a very fast lookup mechanism with good scalability (O(log n) in the number of configured devices) and low absolute latency. Hint: Create a Device Registry class in the DAL. The Device Registry, as shown in Figure 3.5, is simply a map from device names to their implementors. The Device base class can hide the accesses to the Device Registry from the abstract device classes and implementors and from the test scripts. Occasionally, we also wish to find large numbers of individual Devices of a specific type without hardcoding the Device names into the test script. This is useful for, for example, scalability testing. If this sort of situation exists, the Device Registry can also support lookup by other criteria, such as implementor type, regular expression matching on the name, etc. Configuring the system. The Configure method described above configures individual devices. To configure the entire system, we usually read a configuration file (often XML) and create and configure the specific Device implementors. Hint: Create a Configuration singleton class in the SSL. 66 Model-Based Testing for Embedded Systems The Configuration singleton will parse the test harness configuration file and create and configure all devices in the system. We have also used the Configuration class to retain system configuration information to support configuration editing. Avoid retaining information on the configured Devices in the Configuration class as this is the job of the Device Registry. Hint: Create a Device Factory singleton class in the HIL. This class creates Device implementors from their class names, as mentioned above, and thus supports system configuration and configuration editing. The Device Factory is implementation specific and thus belongs in the HIL. Resetting the system. Some controllable Devices have a natural quiescent state as mentioned above and can be individually supported by a Reset method. Testing for common system-wide states. The same strategy that is used for Reset can be used for testing a system-wide state, such as the quiescent state. Hint: Create static ResetAllDevices and CheckQuiescentState methods in the Device class. Expose the system-wide ResetSystem and CheckQuiescentState methods to test scripts via the static SUT class in the SSL. The system-wide Reset method should simply call the Reset methods of all registered Devices. Similarly, the system-wide CheckQuiescentState method should simply calculate the logical “and” of the CheckQuiescentState methods of all registered Devices. We suggest that these methods may be implemented in the Device Registry and exposed to scripts through a Fac¸ade pattern in the Device class (Gamma et al. 1995). This avoids exposing the Device Registry to scripts. The test harness can provide additional hardware support for resetting the SUT, including “hard reset” or power cycling of the SUT. If the test harness supports this, then ResetAllDevices should likewise support it. Alternatively, ResetAllDevices might optionally support navigation back to the quiescent state without power cycling, if doing so would be reasonably simple to implement and robust to the expected changes in the SUT. 3.3.6 Supporting periodic polling Polling is usually done to support updates to the state of values in the test harness that must be sensed by the test script. Even though polling is usually inefficient, it will likely be necessary to poll periodically. For example, polling will be necessary at some level if we must sense the state of any component that does not support some kind of asynchronous update message to the test harness. There are good ways and bad ways to support polling. It is usually a very bad idea to poll in the test script. Where it is necessary, the test framework should support polling with as little overhead as possible. Additionally, the Devices and Components that require polling should not be hard-coded into the test framework. Hint: Create an abstract Monitor class that represents something that must be polled and a concrete Monitor Engine class with which instances of the class Monitor are registered. Subclass Monitor to implement configuration and polling (this is not shown in Figure 3.5). The Monitor Engine is initiated and runs in the background, calling the Poll method on each of its registered Monitors periodically. Monitors are created, configured, and registered with the Monitor Engine when the Devices or Components they support are configured. An alternative, though somewhat less flexible, method to support polling would be to implement callback functions in Devices or Components. Monitors can support polling Test Framework Architectures for MBT 67 at different rates, and thus a Device or Component using Monitors can support polling at different rates. A Device or Component with a polling method would have to be polled at a single rate. Using Monitors also helps prevent races within Devices or Components by cleanly separating the data to be used in the primary script thread and the one in which the Monitor Engine runs. 3.3.7 Supporting diagnosis The SUT can fail tests, and we wish to log sufficient information to understand the reason for the failure, even in a long-running test. It is therefore important to log sufficient information from the SUT to diagnose failures. This is primarily a requirement of the test harness design. Hint: Create Monitors to periodically log the important parts of the SUT state. Since Monitors run in the background, this avoids having to clutter the test scripts with explicit logging steps. Important Point: In addition to the SUT, the test harness can also fail. Failure can occur because of a mechanical or electronic breakdown or a software error. For this reason, errors must be reported throughout the test framework. Hint: Create an Error class and add Error In and Error Out parameters to all methods in the test framework (this is not shown in Figure 3.5). This is the standard methodology used in National Instruments’ LabVIEW product (Travis and Kring 2006), and it can also be applied to procedural and OO languages. Any specific information on the error is logged in the Error object. If the Error In parameter of any method indicates that an error has occurred previously, then the method typically does nothing and Error Out is assigned the value of Error In. This prevents the test framework from compounding the problems that have been detected. Important Point: Error logging should include as much information as is practical about the context in which the error occurred. Ideally, the calling context would be similar to that provided by a debugger (i.e., the stack frames of all threads executing when the error occurred, etc). Hint: If there is a reasonably good source-level debugger that works in the language(s) of the adaptation software, consider using it to help log the error context. However, in normal testing, do not force the test execution to stop because of an error. Instead, just log the debug information, stop executing the current test case, restore the system to quiescent state, and continue with the next test case. It may alternatively be worthwhile to try to recover from the error rather than aborting the current test case. It is also worth mentioning that the test framework should be designed defensively. Such a defensive design requires you to know the type, severity, and likelihood of the failures in the test harness that can stop the execution of a test. The countermeasures you choose will depend on the failure modes you expect. For example, if power failure may cause the test harness to fail, then you may have to install an uninterruptible power supply. Hint: Consider conducting a risk analysis of the test harness, to identify the scenarios for which you must support error recovery. (Bach 1999) If you are testing a high-cost system, the number of scenarios may be considerable because stopping testing once it has begun can incur the cost of idle time in the SUT. 68 Model-Based Testing for Embedded Systems 3.3.8 Distributed control If the SUT is very large or complex, it is possible that a single test computer will be inadequate in control it. If this is the case, you will have to distribute control of the test framework. If distributed control is not necessary, then it is best to avoid the added expense and complexity. Important Point: If you think that it may be necessary to distribute your test framework, then conduct experiments in the architectural prototyping phase to determine whether this is necessary indeed. It is better to learn the truth as early as possible. If distributed control proves to be necessary, some additional important technical issues will have to be resolved. Remoting system. Often the test automation support stack will determine what remoting system(s) you may use for distributed control. There still may be multiple choices, depending on the level in the stack at which you decide to partition. Partitioning scheme. There are several ways to divide the load among control computers. You can have the test script run on a single control computer and distribute the Device instances by incorporating remoting information into the Device Registry entries. Alternatively, you can partition the test script into main and remote parts. Hint: Prefer partitioning at the test script level if possible. This can avoid creating a bottleneck at the main control computer. Load balancing. You will have to balance resource utilization of all types of resources (processor, memory, file I/O, network traffic) among the control computers in a distributed framework. Important Point: Consider all the different types of load that your test framework will generate when balancing. This includes network and disk traffic because of data collection and logging, as well as the more obvious CPU load. Clock synchronization and timing. The Network Time Protocol (NTP) (Mills, RFC 1305—Network Time Protocol (Version 3) Specification, Implementation and Analysis 1992) and Simple Network Time Protocol (SNTP) (Mills, RFC 1361—Simple Network Time Protocol (SNTP) 1992) are widely used to synchronize clocks in distributed computer systems. Accuracy varies from 10 ms to 200 µs, depending on conditions. NTP and SNTP perform better when the network latency is not excessive. Important Point: Dedicate one computer in the test harness as an NTP or SNTP server and set it up to serve the rest of the harness. The time server should probably not be the main control computer. Instead, make it a lightly loaded (or even dedicated) computer. Hint: Minimize network delays between the test harness computers. Designing the local area network (LAN) for the test harness as an isolated subnet with minimum latency will allow more accurate time synchronization. Sensing and verifying distributed state. You will still have to sense whether the SUT is in a particular state to verify it against expected results. This becomes more complicated when the system is distributed, if there are Byzantine cases in which parts of the system go into and out of the expected state. In a na¨ıve implementation, the framework may report that the SUT was in the expected state when it never completely was. Although the likelihood of such an error is low, it is good practice to eliminate it to the extent possible. Test Framework Architectures for MBT 69 Hint: Instead of just returning a Boolean pass–fail value, consider having the Sense methods for the distributed computers report the time intervals during which the SUT was in the expected state. This at least ensures that, whenever the main control computer reports that the SUT as a whole was in a given state, it actually was in that state, within the limits of time synchronization. The downsides of reporting time intervals rather than pass–fail values are increased data communication and processing loads. Slightly longer test step times are also necessary because of the need for the Sense methods to wait for a short interval after reaching their expected state to provide a safe overlap. 3.4 Brief Example This section presents an example instance of the test framework reference architecture outlined in this chapter. The test framework in this example was designed for testing complex fire-safety systems and is more fully described in Masticola and Subramanyan (2009). Figure 3.6 shows an example script which uses the test automation DSL of the example test framework. The test executive is National Instruments TestStand (Sumathi and Sureka 2007). The view of the script shows the test steps, but not the test data, which are visible in a separate window in the TestStand user interface, as shown in Figure 3.7. For this specific test automation system, we chose to relax the recommendation, in Section 3.2.3, against vendorscripts in an engineering tradeoff against the other benefits of the automation stack. We also confirmed, through interviews with other users of TestStand, FIGURE 3.6 Example script using a test automation DSL for fire-safety system testing. 70 Model-Based Testing for Embedded Systems FIGURE 3.7 Variables window in TestStand. that the vendor typically provided reasonable pathways for customers to upgrade customer software based on their test automation support stack. We decided that the risk of planned obsolescence of the scripting language was sufficiently small, thus TestStand was selected. Referring to the reference class architecture of Figure 3.5, we can see how some of the classes and methods are filled in. CreateSystem, in the SUT class in the SSL, is implemented using the DeviceFactory class in the HIL. CheckQuiescentState in the SUT class is implemented using the individual CheckQuiescentState methods of the device implementers in the HIL and a collection of configured devices in the SSL. The “nouns” in the script shown in Figure 3.6 correspond to components of the fire-safety system and of the test automation DSL (RPM, PMI, ZIC4A, ZIC8B, etc.). The “nouns” also correspond to subclasses of Device in the DAL. The “verbs” correspond to the actions that the test executive is to take with these components. For example, “Check ZIC4A state” is a “sense” method and “Clear Fault on ZIC4A 5” is a “set” method. Figure 3.8 shows the concrete realization of a portion of the reference class architecture. Many of the devices in the SUT have repeated components, which are implemented by subclasses of Component. For example, Person–Machine Interface (PMI) has several LEDs and function keys. Classes representing a LED and a function key exist as subclasses of Component. In the HIL, Device and Component implementor classes map the abstract PMI, and other devices, and their components, to concrete digital I/O devices that control and monitor the corresponding physical devices. For example, a LED can only be sensed, so HardwareLed class maps a particular digital input line to a particular LED on the PMI. 3.5 Supporting Activities in Test Framework Architecture Once the preliminary test framework is architected and implemented, support software, documentation, and training material must be prepared. In this section, we describe typical test framework support software, provide key points to include in the test framework documentation, and discuss iterating the test framework architecture to a good solution. Test Framework Architectures for MBT Pmi 71 FunctionKeys dwarePmi Bridge pattern to support script transparency between HW and simulator testbed Led GetState() FunctionKey PressAndRelease() SimulatedLed SimulatedFunctionKey HardwareLed HardwareFunctionKey HardwareLed(digitalIo, bit) HardwareFunctionKey(digitalIo, bit) FIGURE 3.8 Concrete realization of a part of the test framework (the device and component classes are not shown). (Reproduced from Masticola, S., and Subramanyan, R., Experience with developing a high-productivity test framework for scalable embedded and mechatronic systems, 2009 ASME/IEEE International Conference on Mechatronic and Embedded Systems and Applications (MESA09), c 2009 IEEE.) 3.5.1 Support software The test framework you create is likely going to require some support software to perform functions in addition to test execution. Some examples of support software that we have developed on past projects include: • Editing configuration files. Manually editing complicated XML configuration files is both tedious and error-prone. Unless the test framework has been designed to be robust in the presence of mistakes in its configuration files, it makes good economic sense to provide tool support to reduce the likelihood of such errors. Hint: Incorporate support for configuration editing into the test framework. One way we did this was to require each device implementor to implement EditConfig and GetConfig functions (Masticola and Subramanyan 2009). EditConfig pops up an edit page with the current configuration values for the device and allows this information to be edited. A similar GetConfig method returns a variant containing the current (edited) configuration of the device. Editing the configuration was thus implemented by actually editing the set of configured devices in a Config Editor GUI. • Importing configuration files from the tools that manage the configuration of the SUT. The SUT configuration can usually provide some of the information necessary for test framework configuration. For example, the list of Devices present in the SUT may be derivable from the SUT configuration. Generally, the SUT configuration will contain information not required by the test framework (e.g., the configuration of devices not interfaced to the test framework) and vice versa (e.g., the classes of the device implementors). The import facility also helps to keep the test framework configuration synchronized with the SUT configuration. 72 Model-Based Testing for Embedded Systems • Test log analysis tools. For example, you may wish to perform extract-transform-load operations on the test logs to keep a project dashboard up to date, or to assist in performance analysis of the SUT. • Product builders. These are the tools in a software product line (Clements and Northrup 2002) that create components of the software product from “core assets.” Here, we consider the test framework as the software product we are specializing. A core asset in this context would probably not be part of the core test framework, but might instead be a template-like component class, device class, or subsystem. If you are implementing the test framework as part of a software product line, then it may be economical to create product builders to automate the specialization of the test framework to a SUT product type. 3.5.2 Documentation and training Two forms of documentation are necessary: documentation for test automation developers and documentation for the software developers of the test framework. These groups of users of the test framework have different goals. A single document will not serve both purposes well. A document for test developers will typically include: • Details on how to execute tests, read the test log, and resolve common problems with the test harness. • A description of the test automation DSL you have created to automate the testing of the SUT. • Examples of test scripts for common situations that the test developers may face, including examples of the test automation DSL. • Information on how to configure the test framework to match the configuration of the test harness and SUT. • Information on how to use the support software described in Section 3.5.1. A document for the software developers of the test framework will typically include: • The architectural description of the core test framework and an example specialization for a SUT product. These may be described in terms of architectural views (Hofmeister, Nord, and Soni 2000). • Detailed descriptions of the core components of the test framework. • The conventions used in maintaining the test framework, such as naming conventions and directory structure. • An example of how to specialize the core test framework for a particular SUT product. This will include specializing the Device and Component classes, configuration and editing, etc. Documentation is helpful, but it is seldom sufficient to ensure efficient knowledge transfer. Hands-on training and support are usually necessary to transfer the technology to its users. If your organization has the budget for it, consider organizing a hands-on training for your test framework when it is mature enough for such a training to be beneficial. Test Framework Architectures for MBT 73 3.5.3 Iterating to a good solution Hint: Be prepared to have to iterate on the test framework architecture. You will probably have to rework it at least once. On a recent project (Masticola and Subramanyan 2009), for example, we explored three different technologies for the software test framework via architectural prototyping before we found the choice that we considered the best. Even after that, we did two major reworks of the framework architecture before we reached results that would realize all of the quality attributes listed in Section 3.1.2. The test framework works well as intended, but with greater load or tighter requirements, it may require another revision. Hint: Implement the test framework in small steps. Learn from your mistakes and previous experience. Small steps reduce the cost of a rewrite. You can learn from your own experience, from that of your colleagues on the project, and from experts outside of your organization, such as the authors of this book. The authors welcome comments and suggestions from other practitioners who have architected model-based frameworks for test automation control, especially of embedded and mechatronic systems. References Bach, J. Heuristic risk-based testing. Software Testing and Quality Magazine, November 1999:99. Bardram, J., Christensen, H., and Hansen, K. (2004). Architectural Prototyping: An Approach for Grounding Architectural Design and Learning. Oslo, Norway: Fourth Working IEEE/IFIP Conference on Software Architecture (WICSA 2004). Berenbach, B., Paulish, D., Kazmeier, J., and Rudorfer, A. (2009). Software & Systems Requirements Engineering in Practice. McGraw-Hill: New York, NY. Broekman, B., and Notenboom, E. (2003). Testing Embedded Software. Addison-Wesley: London. Brown, W., Malveau, R., McCormick, H., and Mowbray, T. (1998). AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis. Wiley: New York, NY. Clements, P., and Northrup, L. (2002). Software Product Lines: Practices and Patterns. Addison-Wesley: Boston, MA. Dias Neto, A., Subramanyam, R., Vieira, M., and Travassos, G. (2007). A survey on model-based testing approaches: a systematic review. Atlanta, GA: 1st ACM International Workshop on Empirical Assessment of Software Engineering Languages and Technologies. Fowler, M. (2003). UML Distilled: A Brief Introduction to the Standard Object Modeling Language. Addison-Wesley: Boston, MA. Gamma, E., Helm, R., Johnson, R., and Vlissides, J. M. (1995). Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley: Reading, MA. 74 Model-Based Testing for Embedded Systems Graham, D., Veenendaal, E. V., Evans, I., and Black, R. (2008). Foundations of Software Testing: ISTQB Certification. Intl Thomson Business Pr: Belmont, CA. Hartmann, J., Vieira, M., Foster, H., and Ruder, A. (2005). A UML-based approach to system testing. Innovations in Systems and Software Engineering, Volume 1, Number 1, Pages: 12–24. Hofmeister, C., Nord, R., and Soni, D. (2000). Applied Software Architecture. AddisonWesley: Reading, MA. Kelly, S., and Tolvanen. J. -P. (2008). Domain-Specific Modeling: Enabling Full Code Generation. Wiley-Interscience: Hoboken, NJ. Masticola, S., and Subramanyan, R. (2009). Experience with Developing a HighProductivity Test Framework for Scalable Embedded and Mechatronic Systems. San Diego, CA: 2009 ASME/IEEE International Conference on Mechatronic and Embedded Systems and Applications (MESA09). Mills, D. (1992). RFC 1305—Network Time Protocol (Version 3) Specification, Implementation and Analysis. Internet Engineering Task Force. http://www.ietf.org/rfc/ rfc1305.txt?number=1305. Mills, D. (1992). RFC 1361—Simple Network Time Protocol (SNTP). Internet Engineering Task Force. http://www.ietf.org/rfc/rfc1361.txt?number=1361. Pullum, L. (2001). Software Fault Tolerance Techniques and Implementation. Artech House: Boston, MA. Sumathi, S., and Sureka, P. (2007). LabVIEW based Advanced Instrumentation Systems. Springer: New York, NY. Travis, J., and Kring, J. (2006). LabVIEW for Everyone: Graphical Programming Made Easy and Fun (3rd Edition). Prentice Hall: Upper Saddle River, NJ. Utting, M., and Legeard, B. (2006). Practical Model-Based Testing: A Tools Approach. Morgan Kaufmann: San Francisco, CA. Part II Automatic Test Generation This page intentionally left blank 4 Automatic Model-Based Test Generation from UML State Machines Stephan Weißleder and Holger Schlingloff CONTENTS 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.1.1 UML state machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.1.2 Example—A kitchen toaster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.1.3 Testing from UML state machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.1.4 Coverage criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.1.5 Size of test suites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.2 Abstract Test Case Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.2.1 Shortest paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2.2 Depth-first and breadth-first search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.3 Input Value Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.3.1 Partition testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.3.2 Static boundary value analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.3.3 Dynamic boundary value analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.4 Relation to Other Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.4.1 Random testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.4.2 Evolutionary testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.4.3 Constraint solving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.4.4 Model checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.4.5 Static analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.4.6 Abstract interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.4.7 Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.1 Introduction Model-based testing is an efficient testing technique in which a system under test (SUT) is compared to a formal model that is created from the SUT’s requirements. Major benefits of model-based testing compared to conventional testing techniques are the automation of test case design, the early validation of requirements, the traceability of requirements from model elements to test cases, the early detection of failures, and an easy maintenance of test suites for regression testing. This chapter deals with state machines of the Unified Modeling Language (UML) [91] as a basis for automated generation of tests. The UML is a widespread semiformal modeling language for all sorts of computational systems. In particular, UML state machines can be used to model the reactive behavior of embedded systems. We present and compare several approaches for the generation of test suites from UML state machines. 77 78 Model-Based Testing for Embedded Systems For most computational systems, the set of possible behaviors is infinite. Thus, complete testing of all behaviors in finite time is impossible. Therefore, the fundamental question of every testing methodology is when to stop the testing process. Instead of just testing until the available resources are exhausted, it is better to set certain quality goals for the testing process and to stop testing when these goals have been met. A preferred metrics for the quality of testing is the percentage to which certain aspects of the SUT have been exercised; these aspects could be the requirements, the model elements, the source code, or the object code of the SUT. Thus, test generation algorithms often strive to generate test suites satisfying certain coverage criteria. The definition of a coverage criterion, however, does not necessarily entail an algorithm how to generate tests for this criterion. For model-based testing, coverage is usually measured in terms of covered model elements. The standard literature provides many different coverage criteria, for example, focusing on data flow, control flow, or transition sequences. Most existing coverage criteria had been originally defined for program code and have now been transferred and applied to models. Thus, these criteria can be used to measure the quality of test suites that are generated from models. Test generation algorithms can be designed and optimized with regard to specific coverage criteria. In this chapter, we present several test generation approaches that strive to satisfy different coverage criteria on UML state machines. This chapter is structured as follows: in the following, we give an introduction to UML state machines and present the basic ideas of testing from UML state machines. Subsequently, we describe abstract path generation and concrete input value generation as two important aspects in automatic test generation from state machines: the former is shown in Section 4.2 by introducing graph traversal techniques. The latter is shown in Section 4.3 by presenting boundary value analysis techniques. In Section 4.4, we describe the relation of these two aspects to other techniques. We go into random testing, evolutionary testing, constraint solving, model checking, and static analysis. 4.1.1 UML state machines The UML [91] is a widely used modeling language standardized and maintained by the Object Management Group (OMG). In version 2, it comprises models of 13 different diagrams, which can be grouped into two general categories: Structure diagrams are used to represent information about the (spatial) composition of the system. Behavior diagrams are used to describe the (temporal) aspects of the system’s actions and reactions. All UML diagram types are defined in a common meta model, so the same modeling elements may be used in different types of diagrams, and there is no distinct separation between the various diagram types. Among the behavior diagrams, state machine diagrams are the most common way to specify the control flow of reactive systems. Intuitively, a UML state machine can be seen as a hierarchical parallel automaton with an extended alphabet of actions. In order to precisely describe test generation algorithms, we give a formal definition of the notion of UML state machines used in this chapter. A labeled transition system is a tuple M = (A, S, T, s0), where A is a finite nonempty alphabet of labels, S and T are finite sets of states and transitions, respectively, T ⊆ S × A × S, and s0 is the initial state. In UML, the initial state is a so-called pseudostate (not belonging to the set of states) and marked by a filled circle. Assume a set E of events, a set C of conditions, and a set A of actions. A simple state machine is a labeled transition system where A = 2E × C × 2A, that is, each label consists of a set e of input events, a condition c, and a set a of output actions. The input events of a transition are called its triggers, the condition is the guard, and the set of actions is the effect of the transition. The transition (s, (e, c, a), s ) is depicted as s e−[c→]/a s , where sets are just denoted Automatic Model-Based Test Generation 79 by their elements, and empty triggers, guards, and effects can be omitted. States s and s are the source and target of the transition, respectively. A (finite) run of a transition system is any word w = (s0, t0, s1, t1, . . . , tn−1, sn) such that s0 is the initial state, and (si, ti, si+1) ∈ T for all i < n. The trace of a run is the sequence (t0, t1, . . . , tn−1). For a simple state machine, we assume that there is an evaluation relation |= ⊆ S × C that is established iff a condition c ∈ C is satisfied in a state s ∈ S. A word w is a run of the state machine if in addition to s0 being initial, for all i < n and ti = (ei, ci, ai) it holds that si |= ci. Moreover, it must be true that 1. ei = ∅ and (si, ti, si+1) ∈ T , or 2. ei = {e} and (si, (ei, ci, ai), si+1) ∈ T for some ei containing e, or 3. ei = {e}, (si, (ei, ci, ai), si+1) ∈ T for any ei containing e, and si+1 = si These clauses reflect the semantics of UML state machines, which allows for the following 1. Completion transitions (without trigger). 2. Transitions being enabled if any one of its triggers is satisfied. 3. A trigger being lost if no transition for this trigger exists. In order to model data dependencies, simple state machines can be extended with a concept of variables. Assume a given set of domains or classes with Boolean relations defined between elements. The domains could be integer or real numbers with values 0, 1, <, ≤, etc. An extended state machine is a simple state machine augmented by a number of variables (x, y, ...) on these domains. In an extended state machine, a guard is a Boolean expression involving variables. For example, a guard could be (x > 0 ∧ y ≤ 3). A transition effect in the state machine may involve the update (assignment) of variables. For example, an effect could be (x := 0; y := 3). The UML standard does not define the syntax of assignments and Boolean expressions; it suggests that the Object Constraint Language (OCL) [90] may be used here. For our purposes, we rely on an intuitive understanding of the relevant concepts. In addition to simple states, UML state machines allow a hierarchical and orthogonal composition of states. Formally, a UML state machine consists of a set of regions, each of which contains vertices and transitions. A vertex can be a state, a pseudostate, or a connection point reference. A state can be either simple or composite, where a state is composite if it contains one or more regions. Pseudostates can be, for example, initial or fork pseudostates where connection point references are used to link certain pseudostates. A transition is a connection from a source vertex to a target vertex, and it can contain several triggers, a guard, and an effect. A trigger references an event, for example, the reception of a message or the execution of an operation. Similar to extended state machines, a guard is a Boolean condition on certain variables, for instance, class attributes. Additionally, UML also has a number of further predicates that may be used in guards. Finally, an effect can be, for example, the assignment of a value to an attribute, the triggering of an event, or a postcondition defined in OCL. In Figure 4.1, this syntax is graphically described as part of the UML meta model, a complete description of which can be found in [91]. The UML specification does not give a definite semantics of state machines. However, there is a generally agreed common understanding on the meaning of the above concepts. A state machine describes the behavior of all instances of its context class. The status of each instance is given by the values of all class attributes and the configuration of the state machine, where a configuration of the machine is a set of concurrently active vertices. Initially, all those vertices are active and are connected to the outgoing transitions of the initial pseudostates of the state machine’s regions. A transition can be traversed if its source vertex is active, one of the triggering events occurs, and the guard evaluates to true. As a 80 Model-Based Testing for Embedded Systems StateMachine 0..1 container 0..1 stateMachine 1..* region Region 1 container * subVertex * region * Transition Vertex 1 source 1 target 0..1 state * outgoing * incoming 0..1 state 0..1 Transition 0..1 0..1 0..1 effect Pseudostate * connectionPoint * connection ConnectionPointReference State 0..1 state 0..1 owningState UML::CommonBehaviors:: BasicBehaviors::Behavior * trigger UML::CommonBehaviors:: 0..1 stateInvariant Communications::Trigger UML::Classes::Kernel:: 0..1 guard Constraint FIGURE 4.1 Part of the meta model for UML state machines. consequence, the source vertex becomes inactive, the actions in the effect are executed, and the target vertex becomes active. In this way, a sequence of configurations and transitions is obtained, which forms a run of the state machine. Similarly as defined for the labeled transition system, the semantics of a state machine is the set of all these runs. 4.1.2 Example—A kitchen toaster State machines can be used for the high-level specification of the behavior of embedded systems. As an example, we consider a modern kitchen toaster. It has a turning knob to choose a desired browning level, a side lever to push down the bread and start the toasting process, and a stop button to cancel the toasting process. When the user inserts a slice of bread and pushes down the lever, the controller locks the retainer latch and switches on the heating element. In a basic toaster, the heating time depends directly on the selected browning level. In more advanced products, the intensity of heating can be controlled, and the heating period is adjusted according to the temperature of the toaster from the previous toasting cycle. When the appropriate time has elapsed or the user pushes the stop button, the the heating is switched off and latch is released. Moreover, we require that the toaster has a “defrost” button that, when activated, causes to heat the slice of bread with low temperature (defrosting) for a designated time before beginning the actual toasting process. In the following, we present several ways of describing the behavior of this kitchen toaster with state machines: we give a basic state machine, a semantically equivalent hierarchical machine, and an extended state machine that makes intensive use of variables. First, the toaster can be modeled by a simple state machine as shown in Figure 4.2. The alphabets are I ={push, stop, time, inc, dec, defrost, time d } and O = {on, off }. The toaster can be started by pushing (push) down the latch. As a reaction, the heater is turned on (on). The toaster stops toasting (off ) after a certain time (time) or after the stop button (stop) has been pressed. Furthermore, the toaster has two heating power levels, one of which Automatic Model-Based Test Generation 81 defrost S0 defrost push/on stop stop /off /off time /off S1 time_d S2 push/on S3 inc dec inc S4 dec push/on inc S5 dec defrost defrost S6 def stop /off time stop /off /off S7 time_d inc dec push/on FIGURE 4.2 Simple state machine model of a kitchen toaster. can be selected by increasing (inc) or decreasing (dec) the heating temperature. The toaster also has a defrost function (defrost) that results in an additional defrosting time (time d ) of frozen toast. Note that time is used in our modeling only in a qualitative way, that is, quantitative aspects of timing are not taken into account. This simple machine consists of two groups of states: s0 . . . s3 for regular heating and s4 . . . s7 for heating with increased heating power. From the first group of states, the machine accepts an increase of the heating level, which brings it into the appropriate high-power state; vice versa, from this state, it can be brought back by decreasing the heating level. Thus, in this machine only two heating levels are modeled. It is obvious how the model could be extended for three or more such levels. However, with a growing number of levels the diagram would quickly become illegible. It is clear that this modeling has other deficits as well. Conceptually, the setting of the heating level and defrosting cycle are independent of the operation of latch and stop button. Thus, they should be modeled separately. Moreover, the decision of whether to start a preheating phase before the actual toasting is “local” to the part dealing with the busy operations of the toaster. Furthermore, the toaster is either inactive or active and so active is a superstate that consists of substates defrosting and toasting. To cope with these issues, UML offers the possibility of orthogonal regions and hierarchical nesting of states. This allows a compact representation of the behavior. Figure 4.3 shows a hierarchical state machine with orthogonal regions. It has the same behavior as the simple state machine in Figure 4.2. The hierarchical state machine consists of the three regions: side latch, set temperature, and set defrost. Each region describes a separate aspect of the toaster: in region side latch, the reactions to moving the side latch, pressing the stop button, and waiting for a certain time are described. The state active contains the substates defrosting and toasting, as well as a choice pseudostate. The region set temperature depicts the two heating levels and how to select them. In region set defrost, setting up the defrost functionality is described. The defroster can only be (de)activated if the toaster is not currently in state active. Furthermore, the defroster is deactivated after each toasting process. Both the models in Figures 4.2 and 4.3 are concerned with the control flow only. Additionally, in any computational system the control flow is also influenced by data. In both of the above toaster models, the information about the current toaster setting is encoded in the states of the model. This clutters the information about the control flow and leads to 82 Model-Based Testing for Embedded Systems Side latch Set temperature Set defrost stop/off Inactive push/on Active [isInState('on_d'] [else] Defrosting time_d time/off Toasting Warm inc dec Hot off_d defrost [not isInState('active)'] off defrost [not isinstate('active)'] On_d FIGURE 4.3 A hierarchical state machine model. an excessive set of states. Therefore, it is preferable to use a data variable for this purpose. One option to do so is via extended finite state machines, where for instance, the transitions may refer to variables containing numerical data. Figure 4.4 shows a more detailed model of a toaster that contains several variables. This model also contains the region residual heat to describe the remaining internal temperature of the toaster. Since a hot toaster reaches the optimal toasting temperature faster, the internal temperature is used in the computation of the remaining heating time. The state machine consists of the four regions: side latch, set heating time, residual heat, and set defrost. The names of the regions describe their responsibilities. The region side latch describes the reaction to press the side latch: If the side latch is pushed (push), the heater is turned on (releasing the event on and setting h = true). As a result, the toaster is in the state active. If the defrost button has been pressed (d = true), the toaster will be in the state defrosting for a certain time (time d ). The heating intensity (h int) for the toasting process will be set depending on the set heat (s ht) for the browning level and the residual heat (r ht). The details of regulating the temperature are described in the composite state toasting: Depending on the computed value h int, the toaster will raise the temperature (fast) or hold it at the current level. The toaster performs these actions for the time period time and then stops toasting. As an effect of stopping, it triggers the event off and sets h = false. The region set heating time allows to set the temperature to one of the levels 0 to 6. In the region residual heat, the heating up and the cooling down of the internal toaster temperature are described. The region set defrost allows to (de)activate the defrost mode. After completing one toasting cycle, the defrost mode will be deactivated. 4.1.3 Testing from UML state machines Testing is the process of systematically experimenting with an object in order to detect failures, measure its quality, or create confidence in its correctness. One of the most important quality attributes is functional correctness, that is, determining whether the SUT satisfies the specified requirements. To this end, the requirements specification is compared to the SUT. In model-based testing, the requirements are represented in a formal model, and the SUT is compared to this model. A prominent approach for the latter is to derive test cases from the model and to execute them on the SUT. Following this approach, requirements, test cases, and the SUT can be described by a validation triangle as shown in Figure 4.5. Automatic Model-Based Test Generation 83 (a) stop/off; h = false Side latch Inactive push/on; h = true active Set heating time Residual heat inc [s_ht < 5] /s_ht = s_ht@pre+1 Time_h [r_ht < 5] /r_ht = r_ht@pre+1 Heating Set defrost defrost [not h and d] /d = false [else] [d] Defrosting Heat off on Defrost time_d /h_int = 5+s_ht–r_ht time/off Toasting h = false dec [s_ht > 0] / s_ht = s_ht@pre–1 Cooling off /d = false time_c [r_ht > 0] /r_ht = r_ht@pre–1 defrost [not h and not d] /d = true (b) Toasting [h_int < 1] Hold temp [h_int > = 1 and h_int < 5] time_r/h_int = h_int@pre–1 Raise temp time_r /h_int = h_int@pre–1 [h_int > = 5] Raise temp fast FIGURE 4.4 A UML state machine with variables. Requirements (specification) Is represented by Is represented by Implements Is derived from System under test Is executed on Is validated by Test suite FIGURE 4.5 Validation triangle. 84 Model-Based Testing for Embedded Systems A test case is the description of a (single) test; a test suite is a set of test cases. Depending on the aspect of an SUT that is to be considered, test cases can have several forms—see Table 4.1. This table is neither a strict classification nor exhaustive. As a consequence, systems can be in more than one category, and test cases can be formulated in many different ways. Embedded systems are usually modeled as deterministic reactive systems, and thus, test cases are sequences of events. The notion of test execution and test oracle has to be defined for each type of SUT. For example, the execution of reactive system tests consists of feeding the input events into the SUT and comparing the corresponding output events to the expected ones. For our example, the models describe the control of the toaster. They specify (part of) its observable behavior. Therefore, the observable behavior of each run of the state machine can be used as a test case. We can execute such a test case as follows: If the transition is labeled with an input to the SUT (pushing down the lever or pressing a button), we perform the appropriate action, whereas if it is labeled with an output of the SUT (locking or releasing the latch, turning heating on or off), we see whether we can observe the appropriate reaction. As shown in Table 4.2, model-based tests can be performed on various interface levels, depending on the development stage of the SUT. An important fact about model-based testing is that the same logical test cases can be used on all these stages, which can be achieved by defining for each stage a specific TABLE 4.1 Different SUT Aspects and Corresponding Test Cases SUT Characteristics Test Case functional reactive nondeterministic parallel interactive real time hybrid pair (input value, output value) sequence of events decision tree partial order test script or program timed event structure set of real functions TABLE 4.2 Model-Based Testing Levels Acronym Stage MiL Model-in-the-Loop SUT System Model SiL Software-in-the-Loop Control software (e.g., C or Java code) PiL Processor-in-the-Loop Binary code on a host machine emulating the behavior of the target HiL Hardware-in-the-Loop Binary code on the target architecture System-in-the-Loop Actual physical system Testing Interfaces Messages and events of the model Methods, procedures, parameters, and variables of the software Register values and memory contents of the emulator I/O pins of the target microcontroller or board Physical interfaces, buttons, switches, displays, etc. Automatic Model-Based Test Generation 85 test adapter that maps abstract events to the concrete testing interfaces. For example, the user action of pushing the stop button can be mapped to send the event stop to the system model, to call of Java AWT ActionListener class method actionPerformed(stop), to write 1 into address 0x0CF3 in a certain emulator running, for example, Java byte code, or to set the voltage at pin GPIO5 of a certain processor board to high. System-in-the-loop tests are notoriously difficult to implement. In our example, we would have to employ a robot that is able to push buttons and observe the browning of a piece of toast. 4.1.4 Coverage criteria Complete testing of all possible behaviors of a reactive system is impossible. Therefore, an adequate subset has to be selected, which is used in the testing process. Often, coverage criteria are used to control the test generation process or to measure the quality of a test suite. Coverage of a test suite can be defined with respect to different levels of abstraction of the SUT: requirements coverage, model coverage, or code coverage. If a test suite is derived automatically from one of these levels, coverage criteria can be used to measure the extent to which it is represented in the generated test suite. In the following, we present coverage criteria as a means to measure the quality of a test suite. Experience has shown that there is a direct correlation between the various coverage notions and the fault detection capability of a test suite. The testing effort (another quality aspect) is measured in terms of the size of the test suite. In practice, one has to find a balance between minimal size and maximal coverage of a test suite. Model coverage criteria can help to estimate to which extent the generated test suite represents the modeled requirements. Usually, a coverage criterion is defined independent of any specific test model, that is, at the meta-model level. Therefore, it can be applied to any instance of that meta-model. A model coverage criterion applied to a certain test model results in a set of test goals, which are specific for that test model. A test goal can be any model element (state, transition, event, etc.) or combination of model elements, for example, a sequence describing the potential behavior of model instances. A test case achieves a certain test goal if it contains the respective model element(s). A test suite satisfies (or is complete for) a coverage criterion if for each test goal of the criterion there is a test case in the suite that contains this test goal. The coverage of a test suite with respect to a coverage criterion is the percentage of test goals in the criterion, which are achieved by the test cases of the test suite. In other words, a test suite is complete for a coverage criterion iff its coverage is 100%. Typical coverage criteria for state machine models are as follows: 1. All-States: for each state of the machine, there is a test case that contains this state. 2. All-Transitions: for each transition of the machine, there is a test case that contains this transition. 3. All-Events: the same for each event that is used in any transition. 4. Depth-n: for each run (s0, a1, s1, a2, . . . , an, sn) of length at most n from the initial state or configuration, there is a test case containing this run as a subsequence. 5. All-n-Transitions: for each run of length at most n from any state s ∈ S, there is a test case that contains this run as a subsequence (All-2-Transitions is also known as All-Transition-Pairs; All-1-Transitions is the same as All-Transitions, and All-0-Transitions is the same as All-States). 6. All-Paths: all possible transition sequences on the state machine have to be included in the test suite; this coverage criterion is considered infeasible. 86 Model-Based Testing for Embedded Systems In general, satisfying only All-States on the model is considered too weak. The main reason is that only the states are reached but the possible state changes are only partially covered. Accordingly, All-Transitions is regarded a minimal coverage criterion to satisfy. Satisfying the All-Events criterion can also be regarded as an absolute minimal necessity for any systematic black-box testing process. It requires that every input is provided at least once, and every possible output is observed at least once. If there are input events that have never been used, we cannot say that the system has been thoroughly tested. If there are specified output actions that could never be produced during testing, chances are high that the implementation contains a fault. Depth-n and All-n-Transitions can result in test suites with a high probability to detect failures. On the downside, the satisfaction of these criteria also often results in big test suites. The presented coverage criteria are related. For instance, in a connected state machine, that is, if for any two simple states there is a sequence of transitions connecting them, the satisfaction of All-Transitions implies the satisfaction of All-States. In technical terms, All-Transitions subsumes All-States. In general, coverage criteria subsumption is defined as follows: if any test suite that satisfies coverage criterion A also always satisfies the coverage criterion B, then A is said to subsume B. The subsuming coverage criterion is considered stronger than the subsumed one. However, this does not mean that a test suite satisfying the coverage criterion A necessarily detects more failures than a test suite satisfying B. All-Transition-Pairs subsumes All-Transitions. There is no such relation for All-Events and All-Transitions. There may be untriggered transitions that are not executed by a test suite that calls all events; likewise, a transition may be activated by more than one event, and a test suite that covers all transitions does not use all of these events. Likewise, Depthn is unrelated to All-Events and All-Transitions. For practical purposes, besides the AllTransitions criterion often the Depth-n criterion is used, where n is set to the diameter of the model. The criterion All-n-Transitions is more extensive; for n ≥ 3, this criterion often results in a very large test suite. Clearly, All-n-Transitions subsumes Depth-n, All(n + 1)-Transitions subsumes All-n-Transitions for all n, and All-Paths subsumes all of the previously mentioned coverage criteria except All-Events. Figure 4.6 shows the corresponding subsumption hierarchy. The relation between All-n-Transitions and Depth-n is dotted because it only holds if the n for All-n-Transitions has at least the same value as the n of Depth-n. All-paths All-n-transitions (n > 2) All-transition-pairs Depth-n All-transitions Depth-2 All-states Depth-1 FIGURE 4.6 Subsumption hierarchy of structural coverage criteria. All-events Automatic Model-Based Test Generation 87 Beyond simple states, UML state machines can contain orthogonal regions, pseudostates, and composite states. Accordingly, the All-States criterion can be modified to entail the following: 1. All reachable configurations, 2. All pseudostates, or 3. All composite states. Likewise, other criteria such as the All-Transitions criterion can be modified such that all triggering events of all transitions or all pairs of configurations and outgoing transitions are covered [69]. Since there are potentially exponentially more configurations than simple states, constructing a complete test suite for all reachable configurations is often infeasible. Conditions in UML state machine transitions are usually formed from atomic conditions with Boolean operators {and, or, not}, so the following control-flow-based coverage criteria focused on transition conditions have been defined [115]: 1. Decision Coverage, which requires that for every transition guard c from any state s, there is one test case where s is reached and c is true, and one test case where s is reached and c is false. 2. Condition Coverage, which requires the same as Decision Coverage for each atomic condition of every guard. 3. Condition/Decision Coverage, which requires that the test suite satisfies both Condition Coverage and Decision Coverage. 4. Modified Condition/Decision Coverage (MC/DC)[32, 31], which additionally requires to show that each atomic condition has an isolated impact on the evaluation of the guard. 5. Multiple Condition Coverage, which requires test cases for all combinations of atomic conditions in each guard. Multiple Condition Coverage is the strongest control-flow-based coverage criterion. However, if a transition condition is composed of n atomic conditions, a minimal test suite that satisfies Multiple Condition Coverage may require up to 2n test cases. MC/DC [32] is still considered very strong, and it is part of DO178-B [107] and requires only linear test effort. The subsumption hierarchy of control-flow-based coverage criteria is shown in Figure 4.7. There are further coverage criteria that are focused on the data flow in a state machine, for example, on the definition and use of variables. 4.1.5 Size of test suites The existence of a unique, minimal, and complete test suite, for each of the coverage criteria mentioned above, cannot be guaranteed. For the actual execution of a test suite, its size is an important figure. The size of a test suite can be measured in several ways or combinations of the following: 1. The number of all events, that is, the lengths of all test cases. 2. The cardinality, that is, the number of test cases in the test suite. 3. The number of input events. At first glance, the complexity of the execution of a test suite is determined by the number of all events that occur in it. At a closer look, resetting the SUT after one test in order to run the next test turns out to be a very costly operation. Hence, it may be 88 Model-Based Testing for Embedded Systems Multiple condition coverage Modified condition/ decision coverage Condition/decision coverage Decision coverage Condition coverage FIGURE 4.7 Subsumption hierarchy of condition-based coverage criteria. advisable to minimize the number of test cases in the test suite. Likewise, for manual test execution, the performance of a (manual) input action can be much more expensive than the observation of the (automatic) output reactions. Hence, in such a case the number of inputs must be minimized. These observations show that there is no universal notion of minimality for test suites; for each testing environment different complexity metrics may be defined. A good test generation algorithm takes these different parameters into account. Usually, the coverage increases with the size of the test suite; however, this relation is often nonlinear. 4.2 Abstract Test Case Generation In this section, we present the first challenge of automatic test generation from UML state machines: creating paths on the model level to cover test goals of coverage criteria. State machines are extended graphs, and graph traversal algorithms can be used to find paths in state machines [3, 62, 80, 82, 84, 88]. These paths can be used as abstract test cases that are missing the details about input parameters. In Section 4.3, we present approaches to generate the missing input parameters. Graph traversal has been thoroughly investigated and is widely used for test generation in practice. For instance, Chow [33] creates tests from a finite state machine by deriving a testing tree using a graph search algorithm. Offutt and Abdurazik [92] identify elements in a UML state machine and apply a graph search algorithm to cover them. Other algorithms also include data flow information [23] to search paths. Harman et al. [67] consider reducing the input space for search-based test generation. Gupta et al. [61] find paths and propose a relaxation method to define suitable input parameters for these paths. We apply graph traversal algorithms that additionally compute the input parameter partitions [126, 127]. Graph traversing consists of starting at a certain start node nstart in the graph and traversing edges until a certain stopping condition is satisfied. Such stopping conditions are, for example, that all edges have been traversed (see the Chinese postman problem in [98]) or a certain node has been visited (see structural coverage criteria [115]). There are many different approaches to graph traversal. One choice is whether to apply forward or backward searching. In forward searching, transitions are traversed forward from the Automatic Model-Based Test Generation 89 start state to the target state until the stopping condition is satisfied, or it is assumed that the criterion cannot be satisfied. This can be done in several ways such as, for instance, breadth-first, depth-first, or weighted breadth-first as in Dijkstra’s shortest path algorithm. In backward searching, the stopping condition is to reach the start state. Typical nodes to start this backward search from are, for example, the states of the state machine in order to satisfy the coverage criterion All-States. Automated test generation algorithms strive to produce test suites that satisfy a certain coverage criterion, which means reaching 100% of the test goals according to the criterion. The choice of the coverage criterion has significant impact on the particular algorithm and the resulting test suite. However, none of the above described coverage criteria uniquely determines the resulting test suite; for each criterion, there may be many different test suites achieving 100% coverage. For certain special cases of models, it is possible to construct test suites that satisfy a certain coverage criterion while consisting of just one test case. The model is strongly connected, if for any two states s and s there exists a run starting from s and ending in s . If the model is strongly connected, then for every n there exists a one-element test suite that satisfies All-n-Transitions: from the initial state, for all states s and sequence of length n from s, the designated run traverses this sequence and returns to the initial state. An Eulerian path is a run that contains each transition exactly once, and a Hamilton path is a run that contains each state exactly once. An Eulerian or Hamiltonian cycle is an Eulerian or Hamiltonian path that ends in the initial state, respectively. Trivially, each test suite containing an Eulerian or Hamiltonian path is complete for All-Transitions or AllStates, respectively. There are special algorithms to determine whether such cycles exist in a graph and to construct them if so. In the following, we present different kinds of search algorithms: Dijkstra’s shortest path, depth-first, and breadth-first. The criteria of when to apply which algorithm depend on many aspects. Several test generation tools implement different search algorithms. For instance, the Conformiq Test Designer [38] applies forward breadth-first search, whereas ParTeG [122] applies backward depth-first search. 4.2.1 Shortest paths Complete coverage for All-States in simple state machines can be achieved with Dijkstra’s single-source shortest path algorithm [108]. Dijkstra’s algorithm computes for each node the minimal distance to the initial node via a greedy search. For computing shortest paths, it can be extended such that it also determines each node’s predecessor on this path. The algorithm is depicted in Figures 4.8 and 4.9: Figure 4.8 shows the algorithm to compute shortest path information for all nodes of the graph. With the algorithm in Figure 4.9, a shortest path is returned for a given node of the graph. The generated test suite consists of all maximal paths that are constructed by the algorithm, that is the shortest paths for all nodes that are not covered by other shortest paths. For our toaster example in Figure 4.2, this algorithm can generate the test cases depicted in Figure 4.10. The same algorithm can be used for covering All-Transitions by inserting a pseudostate in every transition as described in [124]. Furthermore, the generated path is extended by the outgoing transition of the just inserted pseudostate. In the generated test suite, only those sequences must be included, which are not prefixes (initial parts) of some other path. This set can be constructed in two ways: 1. In decreasing length, where common prefixes are eliminated. 2. In increasing length, where new test cases are only added if their length is maximal. 90 Model-Based Testing for Embedded Systems 01 void Dijkstra(StateMachine sm, Node source) { 02 for each node n in sm { 03 dist[n] = infinity; // distance function from source to n 04 previous[n] = undefined; // Previous nodes determine optimal path 05 } 06 dist[source] = 0: // initial distance for source 07 set Q = all nodes in sm; 08 while Q is not empty { 09 u = node in Q with smallest value dist[u]; 10 if (dist[u] = infinity) 11 break; // all remaining nodes cannot be reached 12 remove u from Q; 13 for each neighbor v of u { 14 alt = dist[u] + dist_between(u, v); 15 if alt < dist[v] { 16 dist[v] = alt; 17 previous[v] = u; 18 } } } } FIGURE 4.8 Computing shortest distance for all nodes in the graph by Dijkstra. 01 Sequence shortestPath(Node target) { 02 S = new Sequence(); 03 Node u = target; 04 while previous[u] is defined { 05 insert u at the beginning of S; 06 u = previous[u]; 07 } } FIGURE 4.9 Shortest path selection by Dijkstra. TC1: (s0, (push, , on), s1, (dec, , ), s7) TC2: (s0, (defrost, , ), s2, (push, , on), s3, (inc, , ), s5) TC3: (s0, (inc, , ), s6, (defrost, , ), s4) FIGURE 4.10 Test cases generated by the shortest path algorithm by Dijkstra. The presented shortest path generation algorithm is just one of several alternatives. In the following, we will introduce further approaches. 4.2.2 Depth-first and breadth-first search In this section, we describe depth-first and breadth-first graph traversal strategies. We defined several state machines that describe the behavior of a toaster. Here, we use the flat state machine of Figure 4.2 to illustrate the applicability of depth-first and breadth-first. The algorithm to find a path from the initial pseudostate of a state machine to certain state s via depth-first search is shown in Figure 4.11. The returned path is a sequence of transitions. The initial call is depthF irstSearch(initialN ode, s). Automatic Model-Based Test Generation 91 01 Sequence depthFirstSearch(Node n, Node s) { 02 if(n is equal to s) { // found state s? 03 return new Sequence(); 04 } 05 for all outgoing transitions t of n { // search forward 06 Node target = t.target; // target state of t 07 Sequence seq = depthFirstSearch(target, s); 08 if(seq is not null) { // state s has been found before 09 seq.addToFront(t); // add the used transitions 10 return seq; 11 } } 12 if(n has no outgoing transitions) // abort depth-search 13 return null; 14 } FIGURE 4.11 Depth-first search algorithm. TC: (s0, (push, , on), s1, (inc, , ), s7, (time, , off), s6, (defrost, , ), s4, (dec, , ), s2, (push, , on), s3, (inc, , ), s5, (stop, , off), s6) FIGURE 4.12 Test case generated by ParTeG for All-States. 01 Sequence breadthFirstSearch(Node n, Node s) { 02 TreeStructure tree = new TreeStructure(); 03 tree.addNode(n); 04 while(true) { // run forever (until sequence is returned with this loop) 05 NodeSet ls = tree.getAllLeaves(); // get all nodes without outgoing transitions 06 for all nodes/leaves l in ls { 07 if(l references s) { // compare to searched state 08 Sequence seq = new Sequence(); 09 while (l.incoming is not empty) { // there are incoming transitions 10 seq.addToFront(l.incoming.get(0)); // add incoming transition 11 l = l.incoming.get(0).source; } // l is set to l’s predecessor 12 return seq; 13 } // else 14 for all outgoing transitions t of l { // search forward - build tree 15 Node target = t.target; // target state of t 16 new_l = tree.addNode(target); // get tree node that references target 17 tree.addTransitionFromTo(t, l, new_l); // add an edge from node l 18 // to node new_l; this new edge references transition t 19 } } } } FIGURE 4.13 Breadth-first search algorithm. For the example in Figure 4.2, ParTeG generates exactly one test case to satisfy All-States. Figure 4.12 shows this test case in the presented notation. Figure 4.13 shows an algorithm for breadth-first search. Internally, it uses a tree structure to keep track of all paths. Just like a state machine, a tree is a directed graph with nodes and edges. Each node has incoming and outgoing edges. The nodes and edges of the tree 92 Model-Based Testing for Embedded Systems reference nodes and edges of the state machine, respectively. It is initiated with the call breadthF irstSearch(initialN ode, s). Both algorithms start at the initial pseudostate of the state machine depicted in Figure 4.2. They traverse all outgoing transitions and keep on traversing until s has been visited. Here, we present the generated testing tree for breadth-first search in the toaster example. We assume that the goal is to visit state S5. The testing tree is shown in Figure 4.14. It contains only edges and nodes; events are not presented here. Because of loops in transition sequences, the result may be in general an infinite tree. The tree, however, is only built and maintained until the desired condition is satisfied, that is, the identified state is reached. In this example, the right-most path reaches the state S5. A finite representation of this possibly infinite tree is a reachability tree, where each state is visited only once. Figure 4.15 shows such a reachability tree for the toaster example. Again, the figure depicts only edges and nodes, but no event or effect information. Graph traversal approaches can also be applied to hierarchical state machines such as presented in Figure 4.3. For each hierarchical state machine, there exists an equivalent simple state machine; for instance, the models in Figures 4.3 and 4.2 have exactly the same behavior. Basically, each state in the flat state machine corresponds to a state configuration, that is, a set of concurrently active states, in the parallel state machine. Extended state machine such as the one presented in Figure 4.4 can contain variables on infinite domains, and transitions can have arithmetic guard conditions and effects of S0 S2 S1 S6 S0 S3 S4 S0 S0 S7 S7 S0 S4 S1 S6 S1 S6 S1 S6 S6 S2 S5 FIGURE 4.14 Testing tree that shows the paths for breadth-first search. S0 S2 S1 S6 S3 S7 S4 S5 FIGURE 4.15 Reachability tree that shows only the paths to reach all states. Automatic Model-Based Test Generation 93 arbitrary complexity. The problem of reaching a certain state or transition in an extended state machine is therefore non trivial and, in the general case, undecidable. Therefore, for such models, the state set is partitioned into equivalence classes, and representatives from the equivalence classes are selected. These methods will be described in the next section. 4.3 Input Value Generation In this section, we present the second challenge for automatic test generation: selecting concrete input values for testing. All previously presented test generation techniques are focused on the satisfaction of coverage criteria that are applied to state machines. The corresponding test cases contain only the necessary information to traverse a certain path. Such test cases are called abstract—information about input parameters is given only partly as a partition of the possible input value space. Boundary value analysis is a technique that is focused on identifying representatives of partitions that are as close as possible to the partition boundaries. In the following, we present partition testing, as well as static and dynamic boundary value analysis. 4.3.1 Partition testing Partition testing is a technique that consists of defining input value partitions and selecting representatives of them [64, 128, 89, 24, page 302]. There are several variants of partition testing. For instance, the category partition method [96] is a test generation method that is focused on generating partitions of the test input space. An example for category partitioning is the classification tree method (CTM) [60, 46], which enables testers to manually define partitions and to select representatives. The application of CTM to testing embedded systems is demonstrated in [83]. Basanieri and Bertolino use the category classification approach to derive integration tests using case diagrams, class diagrams, and sequence diagrams [13]. Alekseev et al. [5] show how to reuse classification tree models. The CostWeighted Test Strategy (CoWTeSt) [14, 15] is focused on prioritizing test cases to restrict their absolute number. CoWTeSt and the corresponding tool CowSuite have been developed by the PISATEL laboratory [103]. Another means to select test cases by partitioning and prioritization is the risk-driven approach presented by Kolb [79]. For test selection, a category partition table could list the categories as columns and test cases as rows. In each row, the categories that are tested are marked with an X. For the toaster, such a category partition table could look like depicted in Table 4.3. There are two test cases TC1 and TC2 that cover all of the defined categories. Most of the presented partition testing approaches are focused on functional black-box testing that are solely based on system input information. For testing with UML state machines, the structure of the state machine and the traversed paths have to be included in TABLE 4.3 Category Partition Table Test Cases Defrost TC1 TC2 X No Defrost X High Browning Level X Low Browning Level X 94 Model-Based Testing for Embedded Systems the computation of reasonable input partitions. Furthermore, the selection of representatives from partitions is an important issue. Boundary value analysis (BVA) consists of selecting representatives close to the boundaries of a partition, that is, values whose distances to representatives from other partitions are below a certain threshold. Consider the example in Figure 4.4. For the guard condition s ht > 0, 1 is a meaningful boundary value for s ht to satisfy the condition, and 0 is a meaningful value to violate the condition. The task is to derive these boundary values automatically. Here, we present two approaches of integrating boundary value analysis and automatic test generation with UML state machines: static and dynamic boundary value analysis [125]. 4.3.2 Static boundary value analysis In static boundary value analysis, BVA is included by static changes of the test model. For model-based test generation, this corresponds to transforming the test model. Model transformations for including BVA in test generation from state machines have been presented in [26]. The idea is to, for example, split a guard condition of the test model into several ones. For instance, a guard [x >= y] is split into the three guards [x = y], [x = y + 1], and [x > y + 1]. Figure 4.16 presents this transformation applied to a simple state machine. The essence of this transformation is to define guard conditions that represent boundary values of the original guard’s variables. As a consequence, the satisfaction of the transformed guards forces the test generator to also select boundary values for the guard variables. This helps to achieve the satisfaction of, for example, All-Transitions [115, page 117] requires the satisfaction of each guard and thus the inclusion of static BVA. There are such approaches for model checkers or constraint solvers that include the transformation or mutation of the test model. As one example, the Conformiq Test Designer [38] implements the approach of static BVA. The advantages of this approach are the easy implementation and the linear test effort. However, this approach has also several shortfalls regarding the resulting test quality. In [125], we present further details. 4.3.3 Dynamic boundary value analysis In dynamic boundary value analysis, the boundary values are defined dynamically during the test generation process and separately for each abstract test case. Thus, in contrast to static BVA, the generated boundary values of dynamic BVA are specific for each abstract test case. There are several approaches to implement dynamic BVA. In this section, we present a short list of such approaches. In general, for dynamic boundary value analysis no test model transformations are necessary. For instance, an evolutionary approach can be used to create tests that cover certain parts of the model. In this case, a fitness function that returns good fitness values for parameters that are close to partition boundaries results in test cases with such input parameters that are close to these boundaries. Furthermore, any standard test generation approach can A [x>= y] B [x = y] [x = y+1] A B [x > y+1] FIGURE 4.16 Semantic-preserving test model transformation for static BVA. Automatic Model-Based Test Generation 95 be combined with a constraint solver that is able to include linear optimization, for example, lp solve [19] or Choco [112], for generating input parameter values. There are many constraint solvers [58, 53, 11, 117, 48] that could be used for this task. Besides the presented approaches to dynamic BVA, there are industrial approaches to support dynamic BVA for automatic test generation with UML or B/Z [81, 110]. All these approaches to dynamic BVA are based on searching forward. Another approach of searching backward instead of forward is called abstract backward analysis. It is based on the weakest precondition calculus [49, 129, 30] and on searching backward. During the generation of abstract test cases, all guards to enable the abstract test case are collected and transformed into constraints of input parameters. As a result, the generated abstract test case also contains constraints about the enabling input parameters. These constraints define partitions and thus can be used for BVA. This approach has been implemented in the model-based test generation prototype ParTeG [122, 126, 123]. In this implementation, the test generation algorithm starts at certain model elements that are specified by the applied structural coverage criterion and iterates backward to the initial node. As a result, the corresponding structural [115] and boundary-based [81] coverage criteria can be combined. 4.4 Relation to Other Techniques The previous two sections dealt with the basic issues of generating paths in the state machine and selecting meaningful input data, respectively. In this section, we show several other techniques that may be used to support the two basic issues. In the following, we present random testing in Section 4.4.1, evolutionary testing in Section 4.4.2, constraint solving in Section 4.4.3, model checking in Section 4.4.4, and static analysis in Section 4.4.5. 4.4.1 Random testing Many test generation approaches put a lot of effort in generating test cases from test models in a “clever” way, for instance, finding a shortest path to the model element to cover. It has been questioned whether this effort is always justified [104]. Any sort of black-box testing abstracts from internal details of the implementation, which are not in the realm of the test generation process. Nevertheless, these internals could cause the SUT to fail. Statistical approaches to testing such as random testing have proven to be successful in many application areas [21, 85, 97, 34, 116, 35, 36]. Therefore, it has been suggested to apply random selection also to model-based test generation. In random testing, model coverage is not the main concern. The model abstracts from the SUT, but it is assumed that faults are randomly distributed across the entire SUT. Thus, random testing has often advantages over any kind of guided test generation. The model is used to create a large number of test cases without spending much effort on the selection of single tests. Therefore, random algorithms quickly produce results, which can help to exhibit design flaws early in the development process, while the model and SUT are still under development. There are several publications on the comparison of random test generation techniques and guided test generation techniques. Andrews et al. [8] use a case study to show that random tests can perform considerably worse than coverage-guided test suites in terms of fault detection and cost-effectiveness. However, the effort of applying coverage criteria cannot be easily measured, and it is still unclear which approach results in higher costs. Mayer and Schneckenburger [86] present a systematic comparison of adaptive random testing 96 Model-Based Testing for Embedded Systems techniques. Just like Gutjahr [63], Weyuker and Jeng [128] also focus their work on the comparison of random testing to partition testing. Major reasons for the success of random testing techniques are that other techniques are immature to a certain extent or that the used requirements specifications are partly faulty. Finally, developers as well as testers make errors (see Beizer [17] for the prejudice Angelic Testers). For instance, testers can forget some cases or simply do not know about them. Random test generation can also be applied to model-based testing with UML state machines. For instance, this approach can be combined with the graph traversal approach of the previous section so as the next transition to traverse is selected randomly. Figure 4.17 shows one possible random test generation algorithm. First, it defines the desired length of the test case (line 03). Then, it selects and traverses one of the current node’s outgoing transitions (line 06). This step is repeated until the current node has no outgoing transitions (line 07) or the desired test length has been reached (line 05). The resulting sequence is returned in line 13. Figure 4.18 shows several randomly generated test cases for our toaster example in Figure 4.2. 4.4.2 Evolutionary testing Evolutionary test generation consists of adapting an existing test suite until its quality, for example, measured with a fitness function, reaches a certain threshold. The initial test suite can be created using any of the above approaches. Based on this initial test suite, evolutionary testing consists of four steps: measuring the fitness of the test suite, selecting only the fittest test cases, recombining these test cases, and mutating them. In evolutionary testing, the set of test cases is also called population. Figure 4.19 depicts the process of evolutionary test generation. The dotted lines describe the start and the end of the test 01 Sequence randomSearch(Node source) { 02 Sequence seq = new Sequence(); 03 int length = random(); 04 Node currentNode = source; 05 for(int i = 0; i < length; ++i) { 06 transitions = currentNode.getOutgoing(); 07 if (transitions.isEmpty()) { break; } 08 traverse = randomly select a representative of transitions; 09 seq.add(traverse); 10 // set current node to target node of traverse 11 currentNode = traverse.getTarget(); 12 } 13 return seq; 14 } FIGURE 4.17 Random search algorithm. TC1: (s0, (push, , on), s1) TC2: (s0, (inc, , ), s6, (dec, , ), s0, (push, , ), s1, (stop, , ), s0) TC3: (s0, (inc, , ), s6, (push, , ), s7, (dec, , ), s1) FIGURE 4.18 Randomly generated test cases. Automatic Model-Based Test Generation 97 Initial population Test case mutation Current population Measuring fitness Test case recombination Final population Test case selection FIGURE 4.19 Evolutionary testing process. generation process, that is, the initial population and—given that the measured fitness is high enough—the final population. There are several approaches to steer test generation or execution with evolutionary approaches [87, 99, 78, 68, 119]. An initial (e.g., randomly created or arbitrarily defined) set of test input data is refined using mutation and fitness functions to evaluate the quality of the current test suite. For instance, Wegener et al. [120] show application fields of evolutionary testing. A major application area is the area of embedded systems [111]. Wappler and Lammermann apply these algorithms for unit testing in object-oriented programs [118]. Bu¨hler and Wegener present a case study about testing an autonomous parking system with evolutionary methods [25]. Baudry et al. [16] present bacteriological algorithms as a variation of mutation testing and as an improvement of genetic algorithms. The variation from the genetic approach consists of the insertion of a new memory function and the suppression of the crossover operator. They use examples in Eiffel and a .NET component to test their approach and show its benefits over the genetic approach for test generation. 4.4.3 Constraint solving The constraint satisfaction problem is defined as a set of objects that must satisfy a set of constraints. The process of finding these object states is known as constraint solving. There are several approaches to constraint solving depending on the size of the application domain. We distinguish large but finite and small domains. For domains over many-valued variables, such as scheduling or timetabling, Constraint Programming (CP) [106], Integer Programming (IP) [105], or Satisfiability Modulo Theories (SMT) [12] with an appropriate theory is used. For extensionally representable domains, using solvers for Satisfiability (SATSolver) [20] and Answer Set Programming (ASP) [10, 57] is state of the art. SAT is often used for hardware verification [50]. There are many tools (solvers) to support constraint solving techniques. Examples for constraint programming tools are the Choco Solver [112], MINION [58], and Emma [53]. Integer programming tools are OpenOpt [94] and CVXOPT [45]. An example for SMT solvers is OpenSMT [109]. There are several competitions for solvers [11, 117, 48]. Constraint solving is also used for testing. Gupta et al. [61] use a constraint solver to find input parameter values that enable a generated abstract test case. Aichernig and Salas [4] use constraint solvers and mutation of OCL expressions for model-based test generation. Calame et al. [27] use constraint solving for conformance testing. 98 Model-Based Testing for Embedded Systems 4.4.4 Model checking Model checking determines whether a model (e.g., a state machine) satisfies a certain property (e.g., a temporal logic formula). The model checking algorithm traverses the state space of the model and formula to deduce whether the model meets the property for certain (e.g., the initial or all) states. Typical properties are deadlock- or live-lock-freedom, absence of race conditions, etc. If a model checker deduces that a given property does not hold, then it returns a path in the model as a counter example. This feature can be used for automatic test generation [7, 56, 55]. For that, each test goal is expressed as a temporal logic formula, which is negated and given to the model checker. For example, if the test goal is to reach “state 6,” then the formula expresses “state 6 is unreachable.” The model checker deduces that the test model does not meet this formula and returns a counter example. In the example, the counter example is a path witnessing that state 6 is indeed reachable. This path can be used to create a test case. In this way, test cases for all goals of the coverage criterion can be generated such that the resulting test suite satisfies the coverage criterion. For our toaster example, the hierarchical state machine model depicted in Figure 4.3 can be coded in the input language of the NuSMV model checker as shown in Figure 4.20. The property states that states “toasting” and “on d” are not reachable simultaneously. NuSMV finds that this is not true and delivers the path (test case) shown in Figure 4.21. Model checking and test generation have been combined in different ways. Our example above is based on the work described in Hong et al. [72], which discuss the application of model checking for automatic test generation with control-flow-based and data-flow-based coverage criteria. They define state machines as Kripke structures [37] and translate them to inputs of the model checker SMV [73]. The applied coverage criteria are defined and negated as properties in the temporal logic CTL [37]. Callahan et al. [28] apply user-specified temporal formulas to generate test cases with a model checker. Gargantini and Heitmeyer [56] also consider control-flow-based coverage criteria. Abdurazik et al. [1] present an evaluation of specification-based coverage criteria and discuss their strengths and weaknesses when used with a model checker. In contrast, Ammann et al. [7] apply mutation analysis to measure the quality of the generated test suites. Ammann and Black [6] present a set of important questions regarding the feasibility of model checking for test generation. Especially, the satisfaction of more complex coverage criteria such as MC/DC [32, 31] is difficult because their satisfaction often requires pairs of test cases. Okun and Black [93] also present a set of issues about software testing with model checkers. They describe, for example, the higher abstraction level of formal specifications, the derivation of logic constraints, and the visibility of faults in test cases. Engler and Musuvathi [51] compare model checking to static analysis. They present three case studies that show that model checking often results in much more effort than static analysis although static analysis detects more errors than model checking. In [76], a tool is demonstrated that combines model checking and test generation. Further popular model checkers are the SPIN model checker [18], NuSMV [74], and the Java Pathfinder [70]. 4.4.5 Static analysis Static analysis is a technique for collecting information about the system without executing it. For that, a verification tool is executed on integral parts of the system (e.g., source code) to detect faults (e.g., unwanted or forbidden properties of system attributes). There are several approaches and tools to support static analysis that vary in their strength from analyzing only single statements to including the entire source code of a program. Static analysis is known as a formal method. Popular static analysis tools are the PC-Lint tool [59] Automatic Model-Based Test Generation 99 MODULE main VAR state_sidelatch : {inactive, active_defrosting, active_toasting}; state_settemp : {warm, hot}; state_setdefrost : {off_d, on_d}; action : {push, stop, inc, dec, defrost, on, off, time, time_d}; ASSIGN init(state_sidelatch) := inactive; init(state_settemp) := warm; init(state_setdefrost) := off_d; next(state_sidelatch) := case state_sidelatch=inactive & action=push & state_setdefrost=on_d : active_defrosting; state_sidelatch=inactive & action=push : active_toasting; state_sidelatch=active_defrosting & action=time_d : active_toasting; state_sidelatch=active_toasting & action=time : inactive; state_sidelatch=active_defrosting & action=stop : inactive; state_sidelatch=active_toasting & action=stop : inactive; 1 : state_sidelatch; esac; next(state_settemp) := case state_settemp=warm & action=inc : hot; state_settemp=hot & action=dec : warm; 1 : state_settemp; esac; next(state_setdefrost) := case state_setdefrost=off_d & action=defrost & state_sidelatch=inactive : on_d; state_setdefrost=on_d & action=off : off_d; state_setdefrost=on_d & action=defrost & state_sidelatch=inactive : off_d; 1 : state_setdefrost; esac; next(action) := case state_sidelatch=inactive & action=push : on; state_sidelatch=active_toasting & action=time : off; state_sidelatch=active_defrosting & action=stop : off; state_sidelatch=active_toasting & action=stop : off; 1 : {push, stop, inc, dec, defrost, on, off, time, time_d}; esac; SPEC AG ! (state_sidelatch=active_toasting & state_setdefrost=on_d) FIGURE 4.20 SMV code for the hierarchical state machine toaster model. for C and C++ or the IntelliJ IDEA tool [77] for Java. There are also approaches to apply static analysis on test models for automatic test generation [22, 95, 44, 100, 101]. Abdurazik and Offutt [2] use static analysis on UML collaboration diagrams to generate test cases. In contrast to state-machine-based approaches that are often focused on describing the behavior of one object, this approach is focused on the interaction of several objects. Static and dynamic analysis are compared in [9]. Ernst [52] argues for focusing on the similarities of both techniques. 4.4.6 Abstract interpretation Abstract interpretation was initially developed by Patrick Cousot. It is a technique that is focused on approximating the semantics of systems [40, 42] by deducing information without executing the system and without keeping all information of the system. An abstraction of the real system is created by using an abstraction function. Concrete values can be represented as abstract domains that describe the boundaries for the concrete values. Several properties of the SUT can be deduced based on this abstraction. For mapping these 100 Model-Based Testing for Embedded Systems >NuSMV.exe toaster-hierarch.smv *** This is NuSMV 2.5.0 zchaff (compiled on Mon May 17 14:43:17 UTC 2010) -- specification AG !(state_sidelatch = active_toasting & state_setdefrost = on_d) -- is false as demonstrated by the following execution sequence Trace Description: CTL Counterexample Trace Type: Counterexample -> State: 1.1 <- state_sidelatch = inactive state_settemp = warm state_setdefrost = off_d action = push -> State: 1.2 State: 1.3 State: 1.4 State: 1.5 State: 1.6 State: 1.7 State: 1.8 State: 1.9 [--type=] [--suffix=] jumbl Check jumbl Prune Use in the Testing Process Convert a constructed usage model from one format to another format (SM by default). The model formats supported by the JUMBL include SM, TML, MML, EMML, GML, CSV, MOD, DOT, GDL, and HTML. Check if the usage model has correct structure. If so, it also reports some overall model statistics. Prune a bad usage model. Remove unreachable nodes (from the source) and trapped nodes (that cannot reach the sink). continued Automated Statistical Testing for Embedded Systems 145 Testing Process Model analysis and validation Test planning JUMBL Command jumbl Flatten [--collapse] jumbl Analyze [--key=] [--suffix=] [--model engine=] jumbl GenTest --min [--key=] jumbl GenTest [--num=] [--key=] jumbl GenTest --weight [--num=] [--key=] [--sum] jumbl CraftTest jumbl ManageTest List jumbl ManageTest Add ( | )+ jumbl ManageTest Insert ( | )+ Use in the Testing Process Flatten a usage model that con- tains references to component models by either collapsing or instantiating (the default). The result is a single “flat” model. Analyze a usage model with the specified distribution key and model analysis engine, and generate a comprehensive report of model statistics in HTML. Supported model analysis engines include Quick (the default), Fast, Simple, and Simulation). Generate minimum coverage test cases (test cases that cover all the arcs of the model with the minimum cost or by default the minimum number of test steps). Generate a specified number of random test cases (by default a single random test case) from the model with the specified distribution key. Generate a specified number of weighted test cases (by default a single weighted test case) from the model with the specified distribution key in either decreasing order of probability (by default) or increasing order of weight (arc weights are summed). Create test cases by hand, or edit existing test cases. Display a directory of the content of a test record. Add test cases to a test record. The test cases are added at the end of the test record. Add test cases to a test record. The test cases are added starting at a given index. continued 146 Testing Process Testing Product and process measurement Model-Based Testing for Embedded Systems JUMBL Command jumbl ManageTest Delete jumbl ManageTest Export [--type=] [--suffix=] [-extension=.] jumbl ManageTest ReadResults + jumbl ManageTest WriteResults jumbl RecordResults ∗ jumbl RecordResults --file= jumbl ManageTest ShowResults jumbl Analyze [--key=] [--suffix=] [--test engine=] Use in the Testing Process Remove selected test cases from a test record. Write individual test cases in a test record to separate files (usually used to write test cases in executable form for automated testing). By default the individual test cases are written in TXT files. Read one or more test result files containing execution information and apply the information to the test record. Write the test execution information stored in a test record to a test result file in XML format. Record the results of executing one or more test cases in a test record. Record failure steps and indicate whether testing stopped after the last failure step for failed test cases. Display a directory of the content of a test record with results of test execution shown, along with some simple reliability measures. Analyze a test record with the specified distribution key and test analysis engine, and generate a comprehensive report of use statistics in HTML, including reliabilities and measures of test sufficiency. Supported test analysis engines include Simple. 6 How to Design Extended Finite State Machine Test Models in Java Mark Utting CONTENTS 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 6.1.1 What is model-based testing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.1.2 What are the pros and cons? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.2 Different Kinds of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.3 How to Design a Test Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.3.1 Designing an FSM model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 6.3.2 From FSM to EFSM: Writing models in Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.4 How to Generate Tests with ModelJUnit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 6.5 Writing Your Own Model-Based Testing Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.6 Automating the Execution of Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.6.1 Offline testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.6.2 Online testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.6.3 Test execution results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 6.6.4 Mutation analysis of the effectiveness of our testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 6.6.5 Testing with large amounts of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 6.7 Testing an Embedded System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 6.7.1 The SIM card model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 6.7.2 Connecting the test model to an embedded SUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 6.8 Related Work and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 6.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 6.1 Introduction Above all others, the key skill that is needed for model-based testing (MBT) is the ability to write good test models that capture just the essential aspects of your system under test (SUT). This chapter focuses on developing the skill of modeling for MBT. After this introduction, which gives an overview of MBT and its pros and cons, Section 6.2 compares two of the most common styles of models used for MBT—SUT input models and finite state models (FSM)—and discusses their suitability for embedded systems. Then, in Section 6.3, we develop some simple graphical FSM test models for testing a well-known kind of Java collection (Set) and show how this model can be expressed as an extended finite state machine (EFSM) model in Java. Section 6.4 illustrates how the ModelJUnit tool (ModelJUnit 2010) can be used to generate a test suite from this model and discusses several different kinds of test generation algorithms. Section 6.5 describes one of the simplest test generation algorithms possible and shows how you can implement a complete MBT tool in just a couple of dozen lines of code, 147 148 Model-Based Testing for Embedded Systems using Java reflection. Section 6.6 turns to the practical issues of connecting the generated tests to some implementation of Set and reports on what happens when we execute those tests on a HashSet object and on an implementation of Set that has an off-by-one bug. It also describes how we can estimate the strength of the generated test suite using SUT code coverage metrics and the Jumble mutation analysis tool (Jumble 2010). As well as illustrating general EFSM testing techniques, Sections 6.3 through 6.6 are also useful as a brief tutorial introduction to using ModelJUnit. Section 6.7 discusses the modeling and testing of a larger, embedded system example—a subset of the GSM 11-11 protocol used within mobile phones, Section 6.8 discusses related work and tools, and Section 6.9 draws some brief conclusions. 6.1.1 What is model-based testing? The basic idea of MBT is that instead of designing dozens or hundreds of test cases manually, we design a small model of the desired behavior of the SUT and then select an algorithm to automatically generate some tests from that model (El-Far and Whittaker 2002, Utting and Legeard 2007). In this chapter, most of the models that we write will be state machines, which have some internal state that represents the current state of the SUT, and some actions that represent the behaviors of the SUT. We will express these state machine models in the Java programming language, so some programming skills will be required when designing the models. The open-source ModelJUnit tool can then take one of these models, use reflection to automatically explore the model, visualize the model, and generate however many test cases you want. It can also measure how well the generated tests cover the various aspects of the model, which can give us some idea of how comprehensive the test suite is. 6.1.2 What are the pros and cons? Like all test automation techniques, MBT has advantages and disadvantages. One of the advantages is that generating the tests automatically can save large amounts of time, compared to designing tests by hand. However, this is partially offset by the time taken to design the test model. Most published case studies show that MBT reduces overall costs (Dalal et al. 1999, Farchi, Hartman, and Pinter 2002, Bernard et al. 2004, Horstmann, Prenninger, and El-Ramly 2005, Jard et al. 2005), typically by 20% –30% , but sometimes more dramatically—up to 90% (Clark 1998). Another advantage of MBT is that it is easy to generate lots of tests, far more than could be designed by hand. For example, it can be useful to generate and execute thousands of tests overnight, with everything automated. Of course, having more tests does not necessarily mean that we have better tests. But MBT can produce a test suite that systematically covers all the combinations of behavior in the model, and this is likely to be less ad hoc than a manually design test suite where it is easy to miss some cases. Case studies have shown that model-based test suites are often as good at fault detection as manually designed test suites (Dalal et al. 1999, Farchi, Hartman, and Pinter 2002, Bernard et al. 2004, Pretschner et al. 2005). In addition, model-based test suites can be better at detecting requirements errors than manually designed test suites because typically half or more of all the faults found by a model-based test suite are because of errors in the model (Stobie 2005). Detecting these model errors is very useful since they often point to requirements issues and misunderstandings about the expected behavior of the SUT. The process of modeling the SUT exposes requirements issues as well. The main disadvantage of MBT is the time and expertise necessary to design the model. A test model has to give an accurate description of the expected SUT behavior, so precise executable models are needed. They may be expressed in some programming How to Design Extended Finite State Machine Test Models in Java 149 language, in a precise subset of UML with detailed state machines, or using some finitestate machine notation such as graphs. So the person designing the model needs to have some programming or modeling skills as well as SUT expertise. It takes some experience to be able to design a test model at a good level of abstraction so that it is not overly detailed and large, but it still captures the essence of the SUT that we want to test. This chapter will give examples of how to develop such models for several different kinds of SUT. One last advantage that we must mention is evolution. When requirements change, updating a large manually designed test suite can be a lot of work. But with MBT, it is not necessary to update the tests—we can just update the test model and regenerate a new test suite. Since a good test model is much smaller than the generated test suite, this can result in faster response to changing requirements. 6.2 Different Kinds of Models The term “model-based testing” can be used to describe many different kinds of test generation (Utting and Legeard 2007, page 7). This is because different kinds of models are appropriate for different kinds of SUT. Two of the most widely used kinds of models for MBT are input models and finite-state models, so we shall start with a brief overview and comparison of these two kinds. If your SUT is batch oriented (it takes a collection of input values, processes them, and then produces some output), then one simple kind of model is to just define a small set of test values for each input variable. For example, if we are testing a print function that must print several different kinds of documents onto several different kinds of printers and work on several different operating systems, we might define an input model that simply defines several important test values for each input variable: document: {plain text, rich text+images, html+images, PDF} printer: {color inkjet printer, black&white laser, postscript printer} op.system: {Windows XP, Windows Vista, Linux, Mac OS X} Given this input model, we could then choose between several different algorithms to generate a test suite. If we want to test all combinations of these test inputs, our test suite would contain 4 × 3 × 4 = 48 test cases. If we want to test all pairs of test input values (Czerwonka 2008), then 16 test cases would suffice. If we are happy with the dangerous strategy of testing all input values but ignoring any interactions between different choices, then four test cases could cover all the test input values. This is an example of how we can model the possible inputs of our SUT in a very simple way and then choose a test generation strategy/algorithm to generate a test suite from that model of the inputs. This kind of input-only model is useful for generating test inputs in a systematic way, but it does not help us to know what the expected output is or to determine whether the test passes or fails. Another example of input-only models is grammar-based testing (Coppit and Lian 2005), where various random generation algorithms are used to generate complex input values (such as sample programs to test a compiler or SQL queries to test a database system) from a regular expression or a context free grammar. In this chapter, we focus on testing state-based SUTs, where the behavior of the SUT varies depending upon what state it is in. For such systems, our test cases usually contain a sequence of actions that interact with the SUT, sending it a sequence of input commands and values, as well as specifying the expected outputs of the SUT. The output 150 Model-Based Testing for Embedded Systems of the SUT depends on the current state of the SUT, as well as upon the current input value. For example, if we call the isEmpty() method of a Java collection object, it will sometimes return true and sometimes false, depending on whether the internal state of the collection object is empty or not. Similarly, if we send a “TurnLeft” command to a wheelchair controller, it may respond differently depending on whether the wheelchair is currently moving or stationary. Embedded systems that contain software are usually best modeled as state-based systems. For these state-based systems, it is important to use a richer state-based model of the expected behavior of the SUT that keeps track of the current state of the SUT. This means that the model cannot only be used to generate input values to send to the SUT, but it can also tell us the expected response of the SUT because the model knows roughly what state the SUT is in. For modeling state-based systems, it is common to use finite-state machines or UML state machines (Lee and Yannakakis 1996, Binder 1999, Utting and Legeard 2007, Jacky et al. 2008). In this chapter, we will see how one style of extended finite-state machine can be written in Java and used to generate test sequences that send input values and actions to the SUT as well as checking the expected SUT outputs. By using this kind of rich model of the SUT, we can generate test cases from the model automatically, and those test cases can automate the verdict assignment problem of deciding whether each test has passed or failed when it is executed. If you want the generated tests to automate the pass/fail verdict, your model must capture the current state or expected outputs of the SUT, so use a finite-state model, not an input-only model. 6.3 How to Design a Test Model We will start by designing a test model for a very small system that we want to test. We will model the Java Set interface, which is an interface to a collection of objects of type E. In later sections, we will generate tests from this model and execute those tests on a couple of different implementations of sets. Here is a summary of the main methods defined in the Set interface. We divide them into two groups: the mutator methods that can change the state of the set and the query methods that return information about the set but do not change its state. Mutator Methods Query Methods Result Type boolean boolean void boolean boolean boolean int Iterator SetMethod add(E obj) remove(Object obj) clear() contains(Object obj) equals(Object obj) isEmpty() size() iterator() Description adds obj to this set removes obj from this set removes all elements from this set true if this set contains obj compares this set with obj true if this set contains no elements the number of elements in this set iterates over the elements in this set How to Design Extended Finite State Machine Test Models in Java 151 The first step of modeling any embedded system is the same: identify the input commands that change the state of the SUT and the query/observation points that allow us to observe the state of the SUT without changing its state. 6.3.1 Designing an FSM model To understand the idea of a state-based model, let us start by drawing a diagram of the states that a set may go through as we call some of its mutator methods. Starting from a newly constructed empty set, imagine that we add some string called s1, then add a second string s2, then remove s2, then remove s1 to get an empty set again. If we draw a diagram of this sequence of states, we get Figure 6.1. Each circle represents one state of the set (a snapshot of what we would see if we could look inside the set object), with the contents of the set written inside the circle. Each arrow represents an action (a call to a mutator method) that changes the set from one state to another state. Of course, a moments thought makes us realize that the first and last states are both empty and are actually indistinguishable. All our query methods give exactly the same results for a newly constructed empty set as they do for a set that has just had all its members removed. So we should redraw our state diagram to merge these two states into one. Similarly for the two states that contain just the s1 string. They are indistinguishable, so should be merged. This gives us a smaller diagram, where some of the arrows form loops (Figure 6.2). This state diagram is a big improvement over our first state diagram because it has several loops, and these loops give us more ways of going through the diagram and generating tests. Note that any path through the state diagram defines a sequence of method calls, and we can view any sequence as a test sequence. The more loops, choices, and alternative paths we have in our model, the better because they enable us to generate a wider variety of test sequences. For example, the leftmost loop tells us that the remove(s1) method undoes the effect of the add(s1) method because it returns to the same empty state. So no matter how many times we go around the add(s1);remove(s1) loop, the set should still be empty. Similarly, the rightmost loop shows that remove(s2) undoes the effect of the add(s2) method. add(s1) empty s1 add(s2) remove(s2) s1,s2 s1 remove(s1) empty FIGURE 6.1 Example states of a Set object. add(s1) add(s2) empty s1 s1,s2 remove(s1) remove(s2) FIGURE 6.2 States from Figure 6.1, with identical states merged. 152 Model-Based Testing for Embedded Systems We do not want to just generate lots of test sequences; we also want to be able to execute each test sequence on a SUT and automatically determine whether the test has passed or failed. There are two ways we can do this. For methods that return results, we can annotate each transition of our state diagram with the expected result of each method call. For example, the add(s1) transition from the empty state should return true because the s1 string was not a member of the empty set—so we could write this transition as add(s1)/true to indicate the expected result. The other way of checking whether a test sequence has passed or failed is to check that the internal state of the SUT agrees with the expected state of the model. It is not always possible to do this because if the internal state of the SUT is private, we may not be able to observe it. But most SUTs provide a few query methods that give us some information about the current state of the SUT, and this allows us to check if that state agrees with our model. For our Set example, we can use the size() method to check that a SUT contains the expected number of strings, and we can use the contains(String) method to check if each of the expected strings is in the set. In fact, it is a good strategy to call as many of the query methods as possible after each state transition since this helps to test all the query methods (checking that they are consistent with each other) and also verifies that the SUT state is correct. We could explicitly show every query method as a self-transition in our state diagram, but this would clutter the state diagram too much. So we will show only the mutator methods in our state diagrams here, but we will see later how the query methods can be added into the model after each transition. Our state diagram is already a useful little test model that captures some of the expected behavior of a Set implementation, but it does not really test the full functionality yet. The clear() method is never used, and we are testing only two strings so far. We need to add some more transitions and states to obtain a more comprehensive model. This raises the most important question of MBT: How big does our model have to be? The answer usually is, the smaller the better. A small model is quicker to write, easier to understand, and will not give an excessive number of tests. A good model will have a high level of abstraction, which means that it will omit all details that are not essential for describing the behavior that we want to test. However, we still want to meet our test goals, which in this case is to test all the mutator methods. So we will add some clear() transitions into our model. Also, it is often a good goal to ensure that the model is complete, which means that we have modeled the behavior of every mutator method call in every state. Our state diagram above calls add(s1) from the empty state, but not from the other states, so it is currently incomplete. If we expand it to include all five actions (clear(), add(s1), add(s2), remove(s1), remove(s2)) in every state, we get the state diagram shown in Figure 6.3. Note how our goal of having a complete model forced us to consider several additional cases that we might not have considered if we were designing test sequences in a more ad hoc fashion. For example, the add(s1) transition out of the s1 state models the behavior of add(s1) when the string s1 is already in the set and checks that we do not end up with two copies of s1 in the set. Similarly, the remove(s1) transition out of the s2 state models what should happen when the member to be removed is not in the set—the remove method should leave the set unchanged and should return false. The clear() transition out of the empty state might not have occurred to a manual test designer, but it serves the useful purpose of ensuring that clear() can be called multiple times in a row without crashing. The point is that designing a model (especially a complete model) leads us to consider all How to Design Extended Finite State Machine Test Models in Java add(s1) remove(s2) remove(s1) remove(s2) clear() add(s1) empty remove(s1) remove(s2) add(s2) add(s2) s1 remove(s2) clear() add(s1) s2 remove(s1) s1,s2 add(s2) remove(s1) FIGURE 6.3 Finite-state diagram for Set with two strings, s1 and s2. 153 add(s1) add(s2) the possible cases in a very systematic way, which can improve the quality of our testing, and is a good way of finding omissions and errors in the requirements (Stobie 2005). The remaining question about our model that we should discuss is how many different string values should we test? Why have we tested just two strings? A real implementation can handle hundreds or millions of strings, so should we not test large numbers of strings, too? This is another question about how abstract our model should be. To keep our model small, we want to model as few strings as possible, but still exercise the essential features of sets. Zero strings would be rather uninteresting since the set would always be empty. One string would mean that the set is either empty or contains just that one string. This would allow us to check that the set ignores duplicate adds and duplicate removes, but it would not allow us to test that adding a string leaves all other strings in the set unchanged. Two is the minimum number of strings that covers the main behaviors of a set, so this is the best number of strings to use in our model. If we expanded our model to three different strings, it would have 8 states and 7 actions, with a total of 56 transitions. This would be significantly more time consuming to design, but it would give little additional testing power. One of the key skills of developing good test models is finding a good level of abstraction, to minimize the size of the model, while still covering the essential SUT features that you want to test. 6.3.2 From FSM to EFSM: Writing models in Java Embedded systems often have quite complex behavior, so they require reasonably large models to accurately summarize their behavior. As models become larger, it quickly becomes tedious to draw them graphically. Instead, we will write them as Java classes, following an EFSM style. An extended finite-state machine is basically a finite-state machine with some state variables added to the model to keep track of more details about the current SUT state and actions (code that updates the state variables) added to the transitions. These features can make models much more concise because the state variables can define many 154 Model-Based Testing for Embedded Systems different states, and one Java method can define many similar transitions in the model. We will use the ModelJUnit style of writing the models because it is simple and effective. ModelJUnit is an open-source tool that aims to be the simplest possible introduction to MBT for Java programmers (Utting and Legeard 2007). The models are written in Java so that you do not have to learn some new modeling language. In fact, a model is just a Java class that implements a certain interface (FsmModel). The state variables of the class are used to define all the possible states of the state machine model, and the “Action” methods of the Java class define the transitions of the state machine model. Figure 6.4 shows some Java code that defines our two-string test model of the Set interface—we model just the three mutator operations at this stage. We now discuss each feature of this class, showing how it defines our two-string test model. Line 02 defines a class called SimpleSet and says that it implements the FsmModel interface defined by ModelJUnit. This tells us that the class can be used for MBT and means that it must define the getState and reset methods. Line 04 defines a Boolean variable for each of the two strings that we are interested in. The programmer realized that the two strings can be treated independently and that all we need to know about each string is whether it is in the set or not. So when the variable s1 is true, it means that the first string is in the set, and when the variable s2 is true, it means that the second string is in the set. (We will decide on the precise contents of the two strings later). Choosing the state variables of the model is the step that requires the most insight and creativity from the programmer. Lines 06–07 define the getState() method, which allows ModelJUnit to read the current state of the model at any time. It returns a string that shows the values of the two Boolean variables, with each Boolean converted to a single “T” or “F” character to make the state names shorter. Lines 09–10 define the reset method, which is called each time a new test sequence is started. It sets both Boolean variables to false, meaning that the set is empty. The remaining lines of the model give five action methods. These define the transitions of the state machine because the code inside these methods changes the state variables of the model. For example, the addS1 method models the action of adding the first string into the model, so it sets the s1 flag to true to indicate that the first string should now be in 01: /** A model of a set with two elements: s1 and s2. */ 02: public class SimpleSet implements FsmModel 03: { 04: protected boolean s1, s2; 05: 06: public Object getState() 07: { return (s1 ? "T" : "F") + (s2 ? "T" : "F"); } 08: 09: public void reset(boolean testing) 10: { s1 = false; s2 = false; } 11: 12: @Action public void addS1() { s1 = true;} 13: @Action public void addS2() { s2 = true;} 14: @Action public void removeS1() { s1 = false;} 15: @Action public void removeS2() { s2 = false;} 16: @Action public void clear() { s1 = false; s2 = false;} 17: } FIGURE 6.4 Java code for the SimpleSet model. How to Design Extended Finite State Machine Test Models in Java 155 the set. These action methods are marked with a @Action annotation, to distinguish them from other auxiliary methods that are not intended to define transitions of the model. 6.4 How to Generate Tests with ModelJUnit ModelJUnit provides a graphical user interface (GUI) that can load a model class, explore that model interactively or automatically, visualize the state diagram that the model produces, generate any number of tests from the model, and analyze how well the generated tests cover the model. If we compile our SimpleSet model (using a standard Java compiler) and then load it into the ModelJUnit GUI, we see something like Figure 6.5, where the “Edit Configuration” panel shows several test generation options that we can choose between. If we accept the default options and use the “Random Walk” test generation algorithm to generate the default size test suite of 10 tests, the small test sequence shown in the left panel of Figure 6.5 is generated. Each triple (Sa, Action, Sb) indicates one step of the test sequence, where Action is the test method that is being executed, starting in state Sa and finishing in state Sb. For example, the first line tells us to start with an empty set (state = “FF”), add the second string, and then check that the set corresponds to state FT (i.e., it contains the second string but not the first string). Then, the second and third lines check that adding then removing the first string brings us back to the same FT state. Since this test sequence is generated by a purely random walk, it is not very smart (it tests the addS2 action on the full set four times!). However, even such a naive algorithm as this will test every transition (i.e., every action going out of every state) if we generate a long enough test sequence. On average, the random walk algorithm will cover every transition of this small model if we generate a test sequence of about 125 steps. More sophisticated FIGURE 6.5 Screenshot of ModelJUnit GUI and Test Configuration Panel. 156 Model-Based Testing for Embedded Systems 01: /** An example of generating tests from the set model. */ 02: public static void main(String[] args) 03: { 04: Tester tester = new RandomTester(new SimpleSet()); 05: tester.addListener(new VerboseListener()); // print the tests 06: tester.generate(1000); // generate a long sequence of tests 07: } FIGURE 6.6 ModelJUnit code to generate tests by a random traversal of a model. algorithms can cover every transition more quickly. For example, ModelJUnit also has a “Greedy Random Walk” algorithm that gives priority to unexplored paths, and this takes about 55 steps on average to cover every transition. There is also the “Lookahead Walk” algorithm that does a lookahead of several transitions (three by default) to find unexplored paths, and this takes only 25 steps to test all 25 transitions. This happens to be the shortest possible test sequence that ensures all-transitions coverage of this model. Such minimumlength test sequences are called Chinese Postman Tours (Kwan 1962, Thimbleby 2003) because postmen also have the goal of finding the shortest closed circuit that takes them down every street in their delivery area. The ModelJUnit GUI is convenient but not necessary. We can also write code that automates the generation of a test suite from our model. For example, the code shown in Figure 6.6 will generate and print a random sequence of 1000 tests. We put the test generation code inside a main method so that we can execute it from the command line. Another common approach is to put it inside a JUnit test method so that it can be executed as part of a larger suite of tests. The code in Figure 6.6 generates a random sequence of 1000 add, remove, and clear calls. First, we create a “tester” object and initialize it to use a RandomTester object, which implements a “Random Walk” algorithm. We pass an instance of our SimpleSet model to the RandomTester object, and it uses Java reflection facilities to determine what actions our model provides. The next line (Line 05) adds a VerboseListener object to the tester so that some information about each test step will be printed to standard output as the tests are generated. The final line asks the tester to generate a sequence of 1000 test steps. This will include a random mixture of add, remove, and clear actions and will also perform a reset action occasionally, which models the action of creating a new instance of the Set class that starts off in the empty state. The reset actions also mean that we generate lots of short understandable test sequences, rather than one long sequence. Although the usual way of generating tests is via the ModelJUnit API, as in Figure 6.6, for simple testing scenarios, the ModelJUnit GUI can write this kind of test generation code for you. As you modify the test configuration options, it displays the Java code that implements the currently chosen options so that you can see how to use the API or cut and paste the code into your Java test generation programs. 6.5 Writing Your Own Model-Based Testing Tool ModelJUnit provides a variety of useful test generation algorithms, model visualization features, model coverage statistics, and other features. However, its core idea of using reflection and randomness to generate tests from a Java model is very simple and can easily be implemented in other languages or in application-specific ways. Figure 6.7 shows the How to Design Extended Finite State Machine Test Models in Java 157 public class SimpleMBT { public static final double RESET_PROBABILITY = 0.01; protected FsmModel model_; protected List methods_ = new ArrayList(); protected Random rand_ = new Random(42L); // use a fixed seed SimpleMBT(FsmModel model) { this.model_ = model; for (Method m : model.getClass().getMethods()) { if (m.getAnnotation(Action.class) != null) { methods_.add(m); } } } /** Generate a random test sequence of length 1. * @return the name of the action done, or "reset". */ public String generate() throws Exception { if (rand_.nextDouble() < RESET_PROBABILITY) { model_.reset(true); return "reset"; } else { int i = rand_.nextInt(methods_.size()); methods_.get(i).invoke(model_, new Object[0]); return methods_.get(i).getName(); } } public static void main(String[] args) throws Exception { FsmModel model = new SimpleSet(); SimpleMBT tester = new SimpleMBT(model); for (int length = 0; length < 100; length++) { System.out.println(tester.generate() + ": " + model.getState()); } } } FIGURE 6.7 A simple MBT tool. code for a simple MBT tool that just performs random walks of all the @Action methods in a given Java model, with a 1% probability of doing a reset at each step instead of an action. This occasional reset helps to prevent the test generation from getting stuck within one part of the model when the model contains irreversible actions. Many variations and improvements of this basic strategy are possible, but this illustrates how easy it can be to develop a simple MBT tool that is tailored to your testing environment. 6.6 Automating the Execution of Tests We have now seen how we can generate tests automatically from a model of the expected behavior of the SUT. The generated test sequences have been printed in a human-readable 158 Model-Based Testing for Embedded Systems format. If our SUT has a physical interface or a GUI, we could manually execute these generated test sequences by pushing buttons and looking to see if the current state of the SUT seems to be correct. This can be quite a useful approach for embedded systems that are difficult to connect to a computer. But it would be nice to automate the execution of the tests, as well as the generation, if possible. This section discusses two alternative ways of automating the test execution: offline and online testing. Both approaches can be used on embedded systems. They both require an API connection to the SUT so that commands can be sent to the SUT and its current state can be observed. For embedded SUTs, this API often connects with some hardware, such as digital to analog converters, which connect to the SUT. 6.6.1 Offline testing One simple, low-tech approach to executing the tests is to write a separate adaptor program that reads the generated test sequence, converts each action in a call to a Set implementation, and then checks the new state of that implementation after the call to ensure that it agrees with the expected state, and report a test failure when they disagree. This adaptor program is essentially a little interpreter of the generated test commands, sending commands to the SUT via the API and checking the results. It plays the same role as a human who interfaces to the SUT and executes the tests manually. This approach is called offline testing because the test generation and the test execution are done independently, at separate times and perhaps on separate computers. Offline testing can be useful if you need to execute the generated tests in many different environments or on a different computer to the test generator, or you want to use your existing test management tool to manage and execute the generated tests. 6.6.2 Online testing Online testing is when the tests are being executed on the SUT at the same time as they are being generated from the model. This gives immediate feedback and even allows a test generation algorithm to observe the actual SUT output and adapt its test generation strategy accordingly, which is useful if the model or the SUT is nondeterministic (Hierons 2004, Miller et al. 2005). Online testing creates a tighter, faster connection between the test generator and the SUT, which can permit better error reporting and fast execution of much larger test suites, so it is generally the best approach for embedded systems, unless there are clear reasons why offline testing is preferable. In this section, we shall extend our SimpleSet model so that it performs online testing of a Java SUT object that implements the Set interface. Figure 6.8 shows a Java Model class that is similar to SimpleSet, but also has a pointer (called sut) to a Set implementation that we want to test. Each of the @Action methods is extended so that as well as updating the state of the model (the s1 and s2 variables), it also calls one of the SUT methods. For example, after the addS1 action sets s1 to true (to indicate that string s1 should be in the set after this action), it calls sut.add(s1) to make the corresponding change to the SUT object. Then, it calls various query methods to check that the updated SUT state is the same as the state of the model (since all of the @Action methods do the same state checks in this example, we move those checks into a method called checkSUT() and call this at the end of each @Action method). We have written this online testing class as a standalone class so that you can see the model updating code and the SUT updating code next to each other. An alternative style is to use inheritance to extend a model class (like SimpleSet) by creating a subclass that overrides each action method and adds the SUT actions and the checking code. How to Design Extended Finite State Machine Test Models in Java 159 01: public class SimpleSetWithAdaptor implements FsmModel 02: { 03: protected boolean s1, s2; 04: protected Set sut; // the implementation we are testing 05: 06: // our test data for the SUT 07: protected String str1 = "some string"; 08: protected String str2 = ""; // empty string 09: 10: /** Tests a StringSet implementation. */ 11: public SimpleSetWithAdaptor(Set systemUnderTest) 12: { this.sut = systemUnderTest; } 13: 14: public Object getState() 15: { return (s1 ? "T" : "F") + (s2 ? "T" : "F"); } 16: 17: public void reset(boolean testing) 18: { s1 = false; s2 = false; sut.clear(); checkSUT(); } 19: 20: @Action public void addS1() 21: { s1 = true; sut.add(str1); checkSUT(); } 22: 23: @Action public void addS2() 24: { s2 = true; sut.add(str2); checkSUT(); } 25: 26: @Action public void removeS1() 27: { s1 = false; sut.remove(str1); checkSUT(); } 28: 29: @Action public void removeS2() 30: { s2 = false; sut.remove(str2); checkSUT(); } 31: 32: /** Check that the SUT is in the expected state. */ 33: protected void checkSUT() 34: { 35: Assert.assertEquals(s1, sut.contains(str1)); 36: Assert.assertEquals(s2, sut.contains(str2)); 37: int size = (s1 ? 1 : 0) + (s2 ? 1 : 0); 38: Assert.assertEquals(size, sut.size()); 39: Assert.assertEquals(!s1 && !s2, sut.isEmpty()); 40: Assert.assertEquals(!s1 && s2, 41: sut.equals(Collections.singleton(str2))); 42: } 43: } FIGURE 6.8 An extension of SimpleSet that performs online testing. A checking method such as checkSUT() typically calls one or more of the SUT query methods to see if the expected state (of the model) and the actual state of the SUT agree. For this example, we have decided to test a set of strings, using the two sample strings defined as str1 and str2 in Figure 6.8. So we can see if the first string is in the set by calling sut.contains(str1), and we expect that this should be true exactly when our model has set the Boolean variable s1 to true. So we use standard JUnit methods to check that s1 is equal to sut.contains(str1). We check that relationship between str1 and s2 in the same 160 Model-Based Testing for Embedded Systems way. We add several additional checks on the size(), isEmpty(), and equals( ) methods of the SUT, partly to gain more confidence that the SUT state is correct, and partly so that we test those SUT query methods. They will be called many times, in every SUT state that our model allows, so they will be well tested. Finally, note that in Figure 6.8, we are not checking the return value of sut.add( ), but we can easily do this by checking that the return value equals the initial value of the s1 flag. Note how each action method updates the model, then updates the SUT in a similar way, then checks that the model state agrees with the SUT state. So as we execute a sequence of these action methods, the model and the SUT are evolving in parallel, each making the same changes, and the checkSUT() method is checking that they agree about what the next state should be. This nicely illustrates the essential idea behind MBT: Implement your system twice and run the two implementations in parallel to check them against each other. But of course, no one really wants to implement a system twice! The trick that makes MBT useful is that those two “implementations” have very different goals: 1. The SUT implementation needs to be efficient, robust, scale to large data sets, and it must implement all the functionality in the requirements. 2. The model “implementation” can be a vastly simplified system that implements only one or two key requirements, handles only a few small data values chosen for testing purposes, and does not need to be efficient or scalable. This difference means that it is usually practical to “implement” (design and code) a model in a few hours or a few days, whereas the real SUT takes months of careful planning and coding. We repeat: the key to cost-effective modeling is finding a good level of abstraction for the model. Abstraction: Deciding which requirements are the key ones that must be tested and which ones can be ignored or simplified for the purposes of testing. 6.6.3 Test execution results We can use this model to test the HashSet class from the standard Java library simply by passing new HashSet() to the constructor of our SimpleSetWithAdaptor class and then using that to generate any number of tests, either by using the ModelJUnit GUI or by executing some test generation code similar to Figure 6.6. When we do this, no errors are detected. This is not surprising since the standard Java library classes are widely used and thoroughly tested. If we write our own simple implementation of Set and insert an off-by-one bug into its equals method (see the StringSetBuggy class in the ModelJUnit distribution for details), we get the following output when we try to generate a test sequence of length 60 using the Greedy Random Walk algorithm. How to Design Extended Finite State Machine Test Models in Java 161 done (FF, addS2, FT) done (FT, addS1, TT) done (TT, removeS1, FT) done (FT, removeS2, FF) done (FF, removeS2, FF) FAILURE: failure in action addS1 from state FF due to AssertionFailedError: expected: but was: ... Caused by: AssertionFailedError: expected: but was: ... at junit.framework.Assert.assertEquals(Assert.java:149) at SimpleSetWithAdaptor.checkSUT(SimpleSetWithAdaptor.java:123) at SimpleSetWithAdaptor.addS1(SimpleSetWithAdaptor.java:87) ... 10 more This pinpoints the failure as being detected by the sut.equals call on line 41 of Figure 6.8, when the checkSUT method was called from the addS1 action with the set being empty. Interestingly, the test sequence shows us that checkSUT had tested the equals method on an empty set several times previously, but the failure did not occur then—it required a removeS1 followed by an addS2 to detect the failure. A manually designed JUnit test suite may not have tested that particular combination, but the random automatic generation will always eventually generate such combinations and detect such failures, if we let it generate long enough sequences. If we fix our off-by-one error, then all the tests pass, and ModelJUnit reports that 100% of the transitions of the model have been tested. If we measure the code coverage of this StringSet implementation, which just implements a set as an ArrayList with no duplicate entries, we find that the generated test suite has covered 93.3% of the code (111 out of 119 JVM instructions, as measured by the EclEmma plugin for Eclipse [Emma 2009]). The untested code is the iterator() method, which we did not call in our checkSUT() method, and one exception case to do with null strings. 6.6.4 Mutation analysis of the effectiveness of our testing It is also interesting to use the Jumble mutation analysis tool (Jumble 2010) to measure the effectiveness of our automatically generated test suite. Jumble analyzes the Java bytecode of a SUT class, creates lots of mutants (minor modifications that cause the program to have different behavior), and then runs our tests on each mutant to see if they detect the error that has been introduced. On this SUT class, StringSet.java, Jumble creates 37 different mutants and reports that our automatically generated tests detect 94% (35 out of 37) of those mutants. Mutating modeljunit.examples.StringSet Tests: modeljunit.examples.StringSetTest Mutation points = 37, unit test time limit 2.58s M FAIL: modeljunit.examples.StringSet:44: changed return value .M FAIL: modeljunit.examples.StringSet:56: 0 -> 1 .................................. Score: 94% This is a high level of error detection, which indicates that our automatically generated tests are testing our simple set implementation quite thoroughly and that our model accurately 162 Model-Based Testing for Embedded Systems captures most of the behavior of the Set interface. One of the two mutants that were not detected is in the iterator() method, which we did not test in our model. The other mutant indicates that we are not testing the case where the argument to the equals method is a different type of object (not a set). This is a low-priority case that could be ignored or could easily be covered by a manually written JUnit test. 6.6.5 Testing with large amounts of data What if we wanted to do some performance testing to test that sets can handle hundreds or thousands of elements? For example, we might know that a SUT like HashSet expands its internal data structures after a certain number of elements have been added, so we suspect that testing a set with only two elements is inadequate. One approach would be to expand our model so that it uses a bit vector to keep track of hundreds of different strings and knows exactly when each string is in or out of the set. It is not difficult to write such a model, but when we start to generate tests, we quickly find that the model has so many states to explore that it will be impossible to test all the states or all the transitions. For example, with 100 strings, the model would have 2100 states and even more transitions. Many of these states would be similar, so many of the tests that we generate would be repetitive and uninteresting. A more productive style is to keep our model small (e.g., two or three Boolean flags), but change our interpretation of one of those Boolean flags s2 so that instead of meaning that “str2 is in the set,” it now means “all the strings ‘x1’, ‘x2’ ... ‘x999’ are in the set.” This leaves the behavior of our model unchanged and means that all we need to change is the code that updates the SUT. For example, the addS2() action becomes 23: 24a: 24b: 24c: 24d: 24e: 24f: @Action public void addS2() { s2 = true; for (int i=1; i<1000; i++) { sut.add("x"+i); } checkSUT(); } With this approach, we can generate the same short test sequence as earlier and easily cover all the states and transitions of our small model, while the tests can scale up to any size of set that we want. This is another good example of using abstraction when we design the model—we decided that even though we want to test a thousand strings, it is probably not necessary to test them all independently—testing two groups of strings should give the same fault-finding power. When possible, it is good to keep the model small and abstract and make the adaptor code do the donkey work. 6.7 Testing an Embedded System In this section, we shall briefly see how this same MBT approach can be used to model and test an embedded system such as the Subscriber Identification Module (SIM) card embedded How to Design Extended Finite State Machine Test Models in Java 163 in GSM mobile phones. The SIM card stores various data files that contain private data of the user and of the Telecom provider, so it protects these files via a system of access permissions and PIN codes. When a SIM card is inserted into a mobile phone, the phone communicates with the SIM card by sending small packets of bytes that follow the GSM 11.11 standard protocol (Bernard et al. 2004). A summary of some key features of this GSM 11.11 protocol is given in Utting and Legeard (2007, Chapter 9), together with use cases, UML class diagrams and UML state machine models, plus examples of generating tests from those UML models. In this section, we give a brief overview of how the same system can be modeled in Java and show how we can generate tests that send packets of bytes to the SIM and check the correctness of its responses. We execute the generated tests on a simulator of the SIM card so that we can measure the error detection power using Jumble. The generated tests could equally well be executed on real hardware, if we have a test execution platform with the hardware to connect to the physical SIM and send and receive the low-level packets produced by the tests. 6.7.1 The SIM card model Our test model of the SIM card is defined in a Java class called SimCard (420 source lines of code), which contains 6 enumerations, 12 data variables, and 15 actions, plus the usual reset and getState methods. There is also a small supporting class called SimFile (24 source lines of code) that models the relevant aspects of the File objects stored within the SIM—we do not model the full contents of each file—a couple of bytes of data is sufficient to test that the correct file contents are being retrieved. The full source code of this SIM card model is included as one of the example models in the ModelJUnit distribution. Figure 6.9 shows all the data variables of the model. The files map models the contents of all the files and directories on the SIM—these are constant throughout testing since this model does not include any write operations. The DF and EF variables model the currently selected directory file and the currently selected elementary file within that directory, respectively. The PIN variable corresponds to the correct PIN number, which is set to 11 by the reset method of the model, then may be set to 12 or back to 11 by changePIN actions during testing (two PIN numbers are sufficient for testing purposes). The next four variables (status en, counter PIN try, perm session, and status PIN block) model all the PIN-related aspects of the SIM security, and the following two variables (counter PUK try and status blocked) model the Personal Unblocking Key (PUK) checking—entry of a correct PUK code allows a user to unblock a card that had status PIN block set to Blocked because of three incorrect PIN attempts. However, after 10 incorrect PUK attempts, status PUK block will be set to Blocked, which means that all future attempts to unblock the SIM by entering a correct PUK will fail. When testing a real SIM chip this effectively destroys the SIM chip since there is no way of resetting the SIM to normal functionality once it has blocked PUK entry. Figure 6.10 shows one of the more interesting methods in the model, Unblock PIN. This model the user trying to enter a PUK code in order to set the SIM to use a new PIN number, which is typically done after the old PIN is blocked due to three incorrect PIN attempts. The Unblock PIN method takes the PUK code and the new PIN code as inputs, and these are typically eight digit and four digit integers, respectively. If we chose input values at random, there would be 108×104=1012 possible combinations of inputs, most of which would have the same effect. So to focus the test generation on the most interesting cases, we decide to define just two actions that call Unblock PIN—one with the correct PUK number and a new PIN code of 12, and one with an incorrect PUK number. Repeated applications of the latter action will test the PUK blocking features of the SIM. This illustrates a widely used 164 Model-Based Testing for Embedded Systems public class SimCard implements FsmModel { public enum E_Status {Enabled, Disabled}; public enum B_Status {Blocked, Unblocked}; public enum Status_Word {sw_9000, sw_9404, sw_9405, sw_9804, sw_9840, sw_9808, sw_9400}; public enum File_Type {Type_DF, Type_EF}; public enum Permission {Always, CHV, Never, Adm, None}; public enum F_Name {MF, DF_GSM, EF_LP, EF_IMSI, DF_Roaming, EF_FR, EF_UK}; // These variables model the attributes within each Sim Card. protected static final int GOOD_PUK = 1223; // the correct PUK code public static final int Max_Pin_Try = 3; public static final int Max_Puk_Try = 10; /** This models all the files on the SIM and their contents */ protected Map files = new HashMap(); /** The currently-selected directory (never null) */ protected SimFile DF; /** The current elementary file, or null if none is selected */ protected SimFile EF; /** The correct PIN (can be 11 or 12) */ protected int PIN; /** Say whether PIN-checking is Enabled or Disabled */ protected E_Status status_en; /** Number of bad PIN attempts: 0 .. Max_Pin_Try */ protected int counter_PIN_try; /** True means a correct PIN has been entered in this session */ protected boolean perm_session; /** Set to Blocked after too many incorrect PIN attempts */ protected B_Status status_PIN_block; /** Number of bad PUK attempts: 0 .. Max_Puk_Try */ protected int counter_PUK_try; /** Set to Blocked after too many incorrect PUK attempts */ protected B_Status status_PUK_block; /** The status word returned by each command */ protected Status_Word result; /** The data returned by the Read_Binary command */ protected String read_data; /** The adaptor object that interacts with the SIM card */ protected SimCardAdaptor sut = null; FIGURE 6.9 Data variables of the SimCard model. test design strategy called equivalence classes (Copeland 2004): when testing a generalpurpose method that has many possible combinations of input values, we manually choose just a strategic few of those input combinations—one for each different kind of behavior that is possible. In our Unblock PIN model method, the choice of a good or bad PUK code determines the outcome of the second if condition (puk == GOOD PUK), and since all the other if conditions are determined by the state variables of the model, these two PUK values are sufficient for us to test all the possible behaviors of this model method. This style of having several @Action methods that all call the same method, with carefully chosen different parameter values, is often used in ModelJUnit to reduce the size of the state space How to Design Extended Finite State Machine Test Models in Java 165 @Action public void unblockPINGood12() { Unblock_PIN(GOOD_PUK,12);} @Action public void unblockPINBad() { Unblock_PIN(12233446,11);} public void Unblock_PIN(int puk, int newPin) { if (status_block == B_Status.Blocked) { result = Status_Word.sw_9840; /*@REQ: Unblock_CHV1 @*/ } else if (puk == GOOD_PUK) { PIN = newPin; counter_PIN_try = 0; counter_PUK_try = 0; perm_session = true; status_PIN_block = B_Status.Unblocked; result = Status_Word.sw_9000; if (status_en == E_Status.Disabled) { status_en = E_Status.Enabled; /*@REQ: Unblock5 @*/ } else { // leave status_en unchanged } /*@REQ: Unblock7,Unblock2 @*/ } else if (counter_PUK_try == Max_Puk_Try - 1) { System.out.println("BLOCKED PUK!!! PUK try counter="+counter_PUK_try); counter_PUK_try = Max_Puk_Try; status_block = B_Status.Blocked; perm_session = false; result = Status_Word.sw_9840; /*@REQ: REQ7, Unblock4 @*/ } else { counter_PUK_try = counter_PUK_try + 1; result = Status_Word.sw_9804; /*@REQ: Unblock3 @*/ } if (sut != null) { sut.Unblock_PIN(puk, newPin, result); } } FIGURE 6.10 The Unblock PIN actions of the SimCard model. that is explored during testing, while still ensuring that the important different behaviors are tested. 6.7.2 Connecting the test model to an embedded SUT The last two lines of the Unblock PIN method in Figure 6.10 show how we can connect the model to an implementation of the SIM, via some adapter code (shown in Figure 6.11) that handles the low-level details of assembling, sending, and receiving packets of bytes. The SimCard model defines quite a large finite-state machine. If we analyze the state variables of the model and think about which combinations of values are possible, we find that there are 10 possible directory/file settings (DF and EF), 2 PIN values, 4 values for counter PIN try, and 11 values for counter PUK try, plus several other flags (but their values are generally correlated with other data values), so there are likely to be around 10 × 2 × 4 × 11 = 880 states in the model and up to 15 times that number of transitions (since there are 15 @Action methods in the model). This is too large for us to want to test exhaustively, but it is easy to use the various random walk test generation algorithms of 166 Model-Based Testing for Embedded Systems public class SimCardAdaptor { protected byte[] apdu = new byte[258]; protected byte[] response = null; protected GSM11Impl sut = new GSM11Impl(); /** Sets up the first few bytes of the APDU, ready to send to the SIM. */ protected void initCmd(int cmdnum, int p1, int p2, int p3) { for (int i=0; i not(A) and pre(never A); tel; c1 c2 c3 c4 ... A false false true false ... never A true true false false ... FIGURE 7.3 Example of a Lustre node. identical throughout the program execution; at any cycle, X and E have the same value. Once a node is defined, it can be used inside other nodes similar to any other operator. The operators supported by Lustre are the common arithmetic and logical operators (+, -, *, /, and, or, not) as well as two specific temporal operators: the precedence (pre) and the initialization (->). The pre operator introduces to a sequence a delay of one time unit, while the -> operator –also called followed by (fby)– allows the initialization of a sequence. Let X = (x0, x1, x2, x3, . . .) and E = (e0, e1, e2, e3, . . .) be two Lustre expressions. Then, pre(X ) denotes the sequence (nil, x0, x1, x2, x3, . . .), where nil is an undefined value, while X ->E denotes the sequence (x0, e1, e2, e3, . . .). Lustre neither supports loops (constructs such as for and while) nor recursive calls. Consequently, the execution time of a Lustre program can be statically computed and the satisfaction of the synchrony hypothesis can be checked. A simple Lustre program is given in Figure 7.3, followed by an instance of its execution. This program has a single input Boolean variable and a single output Boolean variable. The output is true if and only if the input has never been true since the beginning of the program execution. 7.1.1 Operator network The transformation of the inputs into the outputs in a Lustre program is done via a set of operators. Therefore, it can be represented by a directed graph, the so-called operator network. An operator network is a graph with a set of N operators that are connected to each other by a set of E ⊆ N × N directed edges. Each operator represents a logical or a numerical computation. With regard to the corresponding Lustre program, an operator Automatic Testing of LUSTRE/SCADE Programs 175 network has as many input and output edges as the program input and output variables, respectively. Figure 7.4 shows the corresponding operator network for the node of Figure 7.3. At the first execution cycle, the output never A is the negation of the input A; for the rest of the execution, the output equals to the result of the conjunction of its previous value and the negation of A. An operator represents a data transformation from an input edge into an output edge. There are two types of operators: 1. The basic operators that correspond to a basic computation. 2. The compound operators that correspond to the case where in a program, a node calls another node. A basic operator is denoted as ei, s , where ei, i = 1, 2, 3, . . ., stands for its inputs edges and s stands for the output edge. 7.1.2 Clocks in LUSTRE In Lustre, any variable or expression denotes a flow that is each infinite sequence of values is defined on a clock, which represents a sequence of time. Thus, a flow is a pair that consists of a sequence of values and a clock. The clock serves to indicate when a value is assigned to the flow. This means that a flow takes the n-th value of its sequence of values at the n-th instant of its clock. Any program has a cyclic behavior and that cycle defines a sequence of times, a clock, which is the basic clock of a program. A flow on the basic clock takes its n-th value at the n-th execution cycle of the program. Slower clocks can be defined through flows of Boolean values. The clock defined by a Boolean flow is the sequence of times at which the flow takes the value true. Two operators affect the clock of a flow: when and current. 1. When is used to sample an expression on a slower clock. Let E be an expression and B a Boolean expression with the same clock. Then, X=E when B is an expression whose clock is defined by B and its values are the same as those of E ’s only when B is true. This means that the resulting flow X has not the same clock with E or, alternatively, when B is false, X is not defined at all. 2. Current operates on expressions with different clocks and is used to project an expression on the immediately faster clock. Let E be an expression with the clock defined by the Boolean flow B, which is not the basic clock. Then, Y=current(E) has the same clock as B and its value is the value of E at the last time that B was true. Note that until B is true for the first time, the value of Y will be nil. A L1 L3 pre L2 never_A FIGURE 7.4 The operator network for the node Never. 176 Model-Based Testing for Embedded Systems TABLE 7.1 The Use of the Operators when and current E e0 e1 e2 e3 e4 e5 e6 e7 e8 . . . B false false true false true false false true true . . . X=E when B x0 = e2 x1 = e4 x2 = e7 x3 = e8 . . . Y=current(X) y0 = nil y1 = nil y2 = e2 y3 = e2 y4 = e4 y5 = e4 y6 = e4 y7 = e7 y8 = e8 . . . node ex2cks(m:int) returns (c:bool; y:int); var (x:int) when c; let y = if c then current(x) else pre(y)-1; c = true -> (pre(y)=0); x = m when c; tel; m when x current M1 y M2 − ITE 1 M5 M3 pre True = M4 c 0 FIGURE 7.5 The ex2cks example and the corresponding operator network; two clocks are used, the basic clock and the flow c. The sampling and the projection are two complementary operations: a projection changes the clock of a flow to the clock that the flow had before its last sampling operation. Trying to project a flow that was not sampled produces an error. Table 7.1 provides further detail on the use of the two temporal Lustre operators. An example [8] of the use of clocks in Lustre is given in Figure 7.5. The Lustre node ex2cks, as indicated by the rectangle with a dashed outline, receives as input the signal m. Starting from this input value when the clock c is true, the program counts backwards until zero; from this moment, it restarts from the current input value and so on. 7.2 Automating the Coverage Assessment The development of safety-critical software, such as deployed in aircraft control systems, requires a thorough validation process ensuring that the requirements have been exhaustively checked and the program code has been adequately exercised. In particular, according to the DO-178B standard, at least one test case must be executed for each requirement; the achieved code coverage is assessed on the generated C program. Although it is possible to apply many of the adequacy criteria to the CFG of the C program, this is not an interesting option for many reasons. First, the translation from Lustre to C depends on the used compiler and compilation options. For instance, the C code may implement a Automatic Testing of LUSTRE/SCADE Programs 177 sophisticated automaton minimizing the execution time, but it can also be a “single loop” without explicit representation of the program states. Second, it is difficult if not impossible to formally establish a relation between the generated C code and the original Lustre program. As a result, usual adequacy criteria applied to the generated C code do not provide meaningful information on the Lustre program coverage. For these reasons, specific coverage criteria have been defined for Lustre applications. More precisely, in this section, we describe a coverage assessment approach that conforms to the synchronous data-flow paradigm on which Lustre/Scade applications are based. After a brief presentation of the Lustre language and its basic features, we provide the formal definitions of the structural coverage metrics. Then, we introduce some extensions of these metrics that help adequately handle actual industrial-size applications; such applications are usually composed of several distinct components that constantly interact with each other and some functions may use more than one clock. Therefore, the proposed extensions allow efficiently applying the coverage metrics to complex major applications, taking into account the complete set of the Lustre language. 7.2.1 Coverage criteria for LUSTRE programs The following paragraphs present the basic concepts and definitions of the coverage criteria for Lustre programs. 7.2.1.1 Activation conditions Given an operator network N, paths can be defined in the program, that is, the possible directions of flows from the inputs through the outputs. More formally, a path is a finite sequence of edges e0, e1, . . . , en , such that for ∀i [0, n − 1], ei+1 is a successor of ei in N. A unit path is a path with two edges (thus, with only one successive edge). For instance, in the operator network of Figure 7.4, the following complete paths can be found. p1 = A, L1, never A p2 = A, L1, L3, never A p3 = A, L1, never A, L2, L3, never A p4 = A, L1, L3, never A, L2, L3, never A Obviously, one could discover infinitely many paths in an operator network depending on the number of cycles repeated in the path (i.e., the number of pre operators in the path). However, we only consider paths of finite length by limiting the number of cycles. That is, a path of length n is obtained by concatenating a path of length n−1 with a unit path (of length 2). Thus, beginning from unit paths, longer paths can be built. A path is then finite, if it contains no cycles or if the number of cycles is limited. A Boolean Lustre expression is associated with each pair e, s , denoting the condition on which the data flows from the input edge e through the output s. This condition is called activation condition. The evaluation of the activation condition depends on what type of operators the paths is composed of. Informally, the notion of the activation of a path is strongly related to the propagation of the effect of the input edge through the output edge. More precisely, a path activation condition shows the dependencies between the path inputs and outputs. Therefore, the selection of a test set satisfying the activation conditions of the paths in an operator network leads to a notion for program coverage. Since covering all paths in an operator network could be impossible because of their potentially infinite number and length, in our approach, coverage is defined with regard to a given path length that is actually determined by the number of cycles included in the path. 178 Model-Based Testing for Embedded Systems TABLE 7.2 Activation Conditions for All Lustre Operators Operator Activation Condition s = N OT (e) s = AN D (a, b) s = OR (a, b) s = IT E (c, a, b) relational operator s = F BY (a, b) s = P RE (e) AC (e, s) = true AC (a, s) = not (a) or b AC (b, s) = not (b) or a AC (a, s) = a or not (b) AC (b, s) = b or not (a) AC (c, s) = true AC (a, s) = c AC (b, s) = not (c) AC (e, s) = true AC (a, s) = true -> false AC (b, s) = f alse -> true AC (e, s) = f alse -> pre (true) Table 7.2 summarizes the formal expressions of the activation conditions for all Lustre operators (except for when and current for the moment). In this table, each operator op, with the input e and the output s, is paired with the respective activation condition AC (e, s) for the unit path e, s . Note that some operators may define several paths through their output, so the activation conditions are listed according to the path inputs. Let us consider the path p2 = A, L1, L3, never A in the corresponding operator network for the node Never (Figure 7.4). The condition under which that path is activated is represented by a Boolean expression showing the propagation of the input A through the output never A. To calculate its activation condition, we progressively apply the rules for the activation conditions of the corresponding operators according to Table 7.2.∗ Starting from the end of the path, we reach the beginning, moving one step at a time along the unit paths. Therefore, the necessary steps would be the following: AC (p2) = f alse -> AC (p ), where p = A, L1, L3 AC (p ) = not (L1) or L2 and AC (p ) = A or pre (never A) and AC (p ), where p = A, L1 AC (p ) = true After backward substitutions, the Boolean expression for the activation condition of the selected path is: AC (p2) = f alse -> A or pre (never A). In practice, in order for the path output to be dependent on the input, either the input has to be true at the current execution cycle or the output at the previous cycle has to be true. Note that at the first cycle of the execution, the path is not activated. 7.2.1.2 Coverage criteria A Lustre/SCADE program is compiled into an equivalent C program. Given that the format of the generated C code depends on the compiler, it is difficult to establish a formal relation between the original Lustre program and the final C one. In addition, major ∗In the general case (path of length n), the path p containing the pre operator is activated if its prefix p is activated at the previous cycle of execution, that is AC (p) = f alse -> pre (AC (p )). Similarly, in the case of the initialization operator fby, the given activation conditions are respectively generalized in the forms: AC (p) = AC (p ) -> f alse (i.e., the path p is activated if its prefix p’ is activated at the initial cycle of execution) and AC (p) = f alse -> AC (p ) (i.e., the path p is activated if its prefix p’ is always activated except for the initial cycle of execution). Automatic Testing of LUSTRE/SCADE Programs 179 industrial standards, such as DO-178B in the avionics field, demand coverage to be measured on the generated C code. In order to tackle these problems, three coverage criteria specifically defined for Lustre programs have been proposed [14]. They are specified on the operator network according to the length of the paths and the input variable values. Let T be the set of test sets (input vectors) and Pn = {p|length(p) ≤ n} the set of all complete paths in the operator network whose length is less than or equal to n. Then, the following families of criteria are defined for a given and finite order n ≥ 2. The input of a path p is denoted as in (p), whereas a path edge is denoted as e. 1. Basic Coverage Criterion (BC). This criterion is satisfied if there is a set of test input sequences, T , that activates at least once the set Pn. Formally, ∀p ∈ Pn, ∃t ∈ T : AC (p) = true. The aim of this criterion is basically to ensure that all the dependencies between inputs and outputs have been exercised at least once. In case that a path is not activated, certain errors, such as a missing or misplaced operator, could not be detected. 2. Elementary Conditions Criterion (ECC). In order for an input sequence to satisfy this criterion, it is required that the path p is activated for both input values, true and false (taking into account that only Boolean variables are considered). Formally, ∀p ∈ Pn, ∃t ∈ T : in (p) ∧ AC (p) = true and not (in (p)) ∧ AC (p) = true. This criterion is stronger than the previous one in the sense that it also takes into account the impact that the input value variations have on the path output. 3. Multiple Conditions Criterion (MCC). In this criterion, the path output depends on all the combinations of the path edges, including the internal ones. A test input sequence is satisfied if and only if the path activation condition is satisfied for each edge value along the path. Formally, ∀p ∈ Pn, ∀e ∈ p, ∃t ∈ T : e ∧ AC (p) = true and not (e) ∧ AC (p) = true. The above criteria form a hierarchical relation: MCC satisfies all the conditions that ECC does, which also subsumes BC. The path length is a fundamental parameter of the criteria definition. It is mainly determined by the number of cycles that a complete path contains. In fact, as this number increases, so does the path length as well as the number of the required execution cycles for its activation. Moreover, the coverage of cyclic paths strongly depends on the number of execution cycles and, consequently, on the test input sequences length. In practice, professionals are usually interested in measuring the coverage for a set of paths of a given number of cycles (c ≥ 0)∗ rather than a given path length. Therefore, it is usually more convenient to consider various sets of complete paths in an operator network according to the number of cycles c contained in them and hence determine the path length n in relation to c. 7.2.2 Extension of coverage criteria to when and current operators The above criteria have been extended in order to support the two temporal Lustre operators when and current. These operators allow to handle the case where multiple clocks are present, which is a common case in many industrial applications. The use of multiple clocks implies the filtering of some program expressions. It consists of changing their execution cycle, activating the latter only at certain cycles of the basic clock. Consequently, the associated paths are activated only if the respective clock is true. As a result, the tester must adjust this filtered path activation rate according to the global timing. ∗Note that c = 0 denotes the set of complete cycle-free paths. 180 Model-Based Testing for Embedded Systems 7.2.2.1 Activation conditions for when and current Informally, the activation conditions associated with the when and current operators are based on their intrinsic definition. Since the output values are defined according to a condition (i.e., the true value of the clock), these operators can be represented by means of the conditional operator if-then-else. For the expression E and the Boolean expression B with the same clock, 1. X=E when B could be interpreted as X=if B then E else NON DEFINED and similarly, 2. Y=current(X) could be interpreted as Y=if B then X else pre(X). Hence, the formal definitions of the activation conditions result as follows: Definition 1. Let e and s be the input and output edges, respectively, of a when operator and let b be its clock. The activation conditions for the paths p1 = e, s and p2 = b, s are AC(p1) = b AC(p2) = true Definition 2. Let e and s be the input and output edges, respectively, of a current operator and let b be the clock on which it operates. The activation condition for the path p = e, s is AC(p) = b. As a result, to compute the paths and the associated activation conditions of a Lustre node involving several clocks, one has to just replace the when and current operators by the corresponding conditional operator (see Figure 7.6). At this point, two basic issues must be further clarified. The first one concerns the when case. Actually, there is no definition of the value of the expression X, when the clock B is not true (branch NON DEF in Figure 7.6a). By default, at these instants, X does not occur and such paths (beginning with a nondefined value) are infeasible.∗ In the current case, the operator implicitly refers to the clock parameter B, without using a separate input variable (see Figure 7.6b). This indicates (a) E B When X (b) X Current Y ~~ ~~ E B X ITE NON_DEF B X Y ITE pre FIGURE 7.6 Modeling the when and current operators using if-then-else. ∗An infeasible path is a path that is never executed by any test cases, hence it is never covered. Automatic Testing of LUSTRE/SCADE Programs 181 that current always operates on an already sampled expression, so the clock that determines its output activation should be the one on which the input is sampled. Let us assume the path p = m, x, M1, M2, M3, M4, c in the example of Section 7.1.2, displayed in bold in Figure 7.5. Following the same procedure for the activation condition computation and starting from the last path edge, the activation conditions for the intermediate unit paths are AC (p) = false -> AC (p1), where p1 = m, x, M1, M2, M3, M4 AC (p1) = true and AC (p2), where p2 = m, x, M1, M2, M3 AC (p2) = f alse -> pre (AC (p3)), where p3 = m, x, M1, M2 AC (p3) = c and AC (p4), where p4 = m, x, M1 AC (p4) = c and AC (p5), where p5 = m, x AC (p5) = c After backward substitutions, the activation condition of the selected path is AC (p) = f alse -> pre (c) . This condition corresponds to the expected result and is consistent with the above definitions, according to which the clock must be true to activate the paths with when and current operators. In order to evaluate the impact of these temporal operators on the coverage assessment, we consider the operator network of Figure 7.5 and the paths p1 = m, x, M1, y p2 = m, x, M1, M2, M3, M4, c p3 = m, x, M1, M2, M3, M5, y Intuitively, if the clock c holds true, any change of the path input is propagated through the output, hence the above paths are activated. Formally, the associated activation conditions to be satisfied by a test set are AC (p1) = c AC (p2) = f alse -> pre (c) AC (p3) = not (c) and f alse -> pre (c) Eventually, the input test sequences satisfy the BC. Indeed, as soon as the input m causes the clock c to take the required values, the activation conditions are satisfied since the latter depend only on the clock. In particular, in case the value of m at the first cycle is an integer different from zero (for sake of simplicity, let us consider m = 2), the BC is satisfied in two steps since the corresponding values for c are c=true, c=false. On the contrary, if at the first execution cycle m is equal to zero, the basic criterion is satisfied after three steps with the corresponding values for c: c=true, c=true, c=false. These two samples of input test sequences and the corresponding outputs are shown in Table 7.3. Admittedly, the difficulty to meet the criteria is strongly related to the complexity of the system under test as well as to the test case generation effort. Moreover, activation conditions covered with short input sequences are easy to be satisfied, as opposed to long test sets that correspond to complex execution instances of the system under test. Experimental evaluation on more complex case studies, including industrial software components, is necessary and part of our future work in order to address these problems. Nonetheless, the enhanced definitions of the structural criteria presented above complete the coverage assessment issue for Lustre programs, as all the language operators are supported. In addition, the complexity of the criteria is not further affected, because, in essence, we use nothing but if-then-else operators. 182 TABLE 7.3 Test Cases Samples for the Input m c1 c2 c3 c4 m i1 (= 0) i2 i3 i4 c true false false true y i1 i1 − 1 0 i4 Model-Based Testing for Embedded Systems ... c1 c2 c3 c4 ... m i1 (= 0) i2 i3 ... ... c true true false ... ... y 0 i2 i2 − 1 ... It should be noted that the presented coverage criteria are limited to Lustre specifications that exclusively handle Boolean variables. The definition of the criteria implies that the path activation is examined in relation to the possible values that path inputs can take on, that is true and false. This means that, in case of integer inputs, the criteria would be inapplicable. Since in practice, applications deal with variables of different types, the criteria extension to more variable types appears to be a significant task and must be further studied. 7.2.3 LUSTRUCTU LustrUCTU [13] is an academic tool that integrates the above criteria and automatically measures the structural coverage of Lustre/SCADE programs. It requires three inputs: the Lustre program under test, the required path length and the maximum number of loops in a path, and finally the criterion to satisfy. The tool analyzes the program and constructs its operator network. It then finds the paths that satisfy the input parameters and extracts the conditions that a test input sequence must satisfy in order to meet the given criterion. This information is recorded in a separate Lustre file, the so-called coverage node. This node receives as inputs; the inputs of the program under test and computes the coverage ratio at the output. The program outputs become the node local variables. For each path of length lower or equal to the value indicated in the input, its activation condition and the accumulated coverage ratio are calculated. These coverage nodes are compiled and executed (similar to any other regular Lustre program) over a given test data set∗ and the total coverage ratio† is computed. An important remark is that the proposed coverage assessment technique is independent of the method used for test data generation. In other words, Lustructu simply considers a given test data set and computes the achieved coverage ratio according to the given criterion. In theory, any test data generation technique may be used. However, in our tests, we generally employ randomly generated test cases in order to obtain unbiased results, independent of any functional or structural requirements. 7.2.4 SCADE MTC In SCADE, coverage is measured through the Model Test Coverage (MTC) module, in which the user can define custom criteria by defining the conditions to be activated during testing. Indeed, MTC measures the coverage of low-level requirements (LLR coverage), with regard to the demands and objectives of DO-178B standard, by assessing how thoroughly the SCADE model (i.e., system specification) has been exercised. In particular, each elementary SCADE operator is associated with a set of features concerning the possible behaviors of the operator. Therefore, structural coverage of the SCADE model is determined by the activation ratio of the features of each operator. Thus, the coverage approach previously presented could be easily integrated in SCADE in the sense that activation conditions ∗Test input sequences are given in a .xml file. †Coverage ratio = Number of satisfied activation conditions Number of activation conditions . Automatic Testing of LUSTRE/SCADE Programs 183 corresponding to the defined criteria (BC, ECC, MCC) could be assessed once they are transformed into suitable MTC expressions. 7.2.5 Integration testing So far, the existing coverage criteria are defined on a unit-testing basis and cannot be applied to Lustre nodes that locally employ user-defined operators (compound operators). The cost of computing the program coverage is affordable as long as the system size remains small. However, large or complex nodes must be locally expanded and code coverage must be globally computed. As a result, the number and the length of the paths to be covered increase substantially, which renders these coverage metrics impracticable when the system size becomes large. In particular, as far as relatively simple Lustre programs are concerned, the required time for coverage computation is rather short. This holds particularly in the case of basic and of elementary condition coverage [11] for which paths are relatively short and the corresponding activation conditions are simple, respectively. As long as the path length remains low, the number of the activation conditions to be satisfied is computationally affordable. However, coverage analysis of complex Lustre nodes (Figure 7.7) may involve a huge number of paths and the coverage cost may become prohibitive and, consequently, the criteria inapplicable. This is particularly true for the MCC criterion, where the number of the activation conditions to be satisfied increases dramatically when the length and the number of paths are high. In fact, in order to measure the coverage of a node that contains several other nodes (compound operators), the internal nodes are unfolded, the paths and the corresponding activation conditions are locally computed, and then they are combined with the global node coverage. This may result in a huge number of paths and activation conditions. Indeed, covering a path of length k requires 2 (k − 1) activation conditions to be satisfied. Consequently, satisfying a criterion for the set of paths Pn, ri being the number of paths of length equal to i, requires the satisfaction of 2 (r2 + 2r3 + · · · + (n − 1) rn) activation conditions. We are currently investigating an integration testing technique for the coverage measurement of large-scale Lustre programs that involve several internal nodes. This coverage assessment technique involves an approximation for the coverage of the called nodes by extending the definition of the activation conditions for these nodes. Coverage criteria are redefined, not only according to the length of paths but also with respect to the level of Node1 Node4 Node3 pre Node2 Node5 FIGURE 7.7 Example of the operator network of a complex Lustre program. 184 Model-Based Testing for Embedded Systems integration. This extension reduces the total number of paths at the system level and hence, the overall complexity of the coverage computation. To empirically evaluate the proposed coverage approach, the extended criteria were applied to an alarm management component developed for embedded software used in the field of avionics. This component involves several Lustre nodes and it is representative of typical components in the avionics application area. The module on which we focused during the experiment contains 148 lines of Lustre code with 10 input variables and 3 output variables, forming two levels of integration. The associated operator network comprises 32 basic operators linked to each other by 52 edges. Tests were performed on a Linux Fedora 9, Intel Pentium 2GHz and 1GB of memory. We are interested in complexity issues in terms of the prospective gain in the number of paths with reference to the coverage criteria that do not require full node expansion, the relative difficulty to meet the criteria, as well as the fault detection ability of the criteria.∗ For complete paths with at most three cycles, the preliminary results show a remarkable decrease in the number of paths and activation conditions, particularly for the MCC, which suggests that the extended criteria are useful for measuring the coverage of large-scale programs. The required time to calculate the activation conditions is relatively negligible; a few seconds (maximum 2 minutes) were necessary to calculate complete paths with maximum of 10 cycles and the associated activation conditions. Even for the MCC, this calculation remains minor, considering that the number of paths to be analyzed is computationally affordable. For a complete presentation of the extended criteria as well as their experimental evaluation, the reader is advised to refer to [20]. 7.3 Automating the Test Data Generation This section introduces a technique for automated, functional test data generation, based on formal specifications. The general approach used by Lutess to automatically generate test data for synchronous programs is first presented. It uses a specification language, based on Lustre, including specific operators applied to specify test models. Recent research extended this approach so that programs with integer parameters can be included [27]. Furthermore, existing test operators were adapted to the new context and new operators were added. These extensions are implemented into a new version of the tool, called Lutess V2 [28]. After presenting the specification language and its usage, a general methodology to apply while testing with Lutess V2 is proposed. The application of this methodology in a well-known case study [19] showed that it allows for an efficient specification and testing of industrial programs. 7.3.1 LUTESS Lutess is a tool transforming a formal specification into a test data generator. The dynamic generation of test data requires three components to be provided by the user: the software environment specification (∆), the system under test (Σ), and a test oracle (Ω) describing the system requirements, as shown in Figure 7.8. The system under test and the oracle are both synchronous executable programs. ∗Mutation testing [2] was used to simulate various faults in the program. In particular, a set of mutation operators was defined and several mutants were automatically generated. Then, the mutants and the coverage nodes were executed over the same test input data and the mutation score (ratio of killed mutants) was compared with the coverage ratio. Automatic Testing of LUSTRE/SCADE Programs 185 Lutess builds a test input generator from the test specification and links it to the system under test and the oracle. It coordinates their execution and records the input and output sequences as well as the associated oracle verdicts using a trace collector. A test is a sequence of single action–reaction cycles: 1. The generator produces an input vector. 2. It sends this input vector to the system under test. 3. The system reacts with an output vector that is sent back to the generator. The generator produces a new input vector, and this sequence is repeated. At each cycle, the oracle observes the produced inputs and outputs to detect failures. 7.3.1.1 LUTESS V2 testnodes A test specification is defined in a special node, called testnode, written in a language that is a superset of Lustre. The inputs and outputs of the software under test are the outputs and inputs for a testnode, respectively. The general form of a testnode is given in Figure 7.9. Environment description ∆ Input data generator Test harness Dynamically produced input data Program output Oracle Verdict Trace collector System under test Σ Communication link FIGURE 7.8 The Lutess testing environment. Object provided by the user testnode Env() returns (); var ; let environment(Ec1, Ec2, ...., Ecn); prob(C1, E1, P1); ... prob(Cm, Em, Pm); safeprop(Sp1, Sp2, ...., Spk); hypothesis(H1, H2, ...., Hl); ; tel; FIGURE 7.9 Testnode syntax. 186 Model-Based Testing for Embedded Systems There are four operators specifically introduced for testing purposes: 1. The environment operator makes it possible to specify invariant properties of the program environment. 2. The prob operator is used to define conditional probabilities. The expression prob(C,E,P) means that if the condition C holds, then the probability of the expression E to be true is equal to P. 3. The safeprop operator is exploited by Lutess to guide the test generation toward situations that could violate the program safety properties (see safety-propertyguided testing). 4. The hypothesis operator introduces knowledge or assumptions in the test generation process targeting to improve the fault-detection ability of safety-propertyguided testing. These operators are illustrated in a simple example and explained in detail in the next sections. 7.3.1.2 An air-conditioner example Figure 7.10 shows the signature of a simple air conditioner controller. The program has three inputs: 1. OnOff is true when the On/Off button is pressed by the user and false otherwise: 2. Tamb is the ambient temperature expressed in Celsius degrees, 3. Tuser is the temperature selected by the user, and two outputs: 1. IsOn indicates that the air conditioner is on, 2. Tout is the temperature of the air emitted by the air conditioner. This program is supposed to compute, according to the difference between the ambient and the user-selected temperature, the temperature of the air to be emitted by the air conditioner. 7.3.2 Using LUTESS V2 The following paragraphs describe the basic steps in the specification of the external environment of a system using Lutess V2. 7.3.2.1 The environment operator Figure 7.11 shows a trivial use of the environment operator. This specification would result in a test data generator issuing random values for OnOff, Tamb, and Tuser. Obviously, the behavior of the actual software environment, although not completely deterministic, is not random. For instance, the temperature variation depends on the respective values of the ambient temperature and of the issued air temperature. This can node AC(OnOff: bool; Tamb, Tuser : int) returns (IsOn: bool; Tout: int) FIGURE 7.10 The interface of the air conditioner. Automatic Testing of LUSTRE/SCADE Programs 187 be expressed by means of two properties, stating that if the air emitted by the air conditioner is either hotter or colder than the ambient temperature, the latter cannot decrease or increase, respectively. Moreover, we can specify that the ambient temperature remains in some realistic interval. We can write such properties with usual relational and arithmetical operators available in the Lustre language. To allow a test generator for producing the test data consistent with such constraints, they are specified in the environment operator, as shown in Figure 7.12. Each property in the environment operator is a Lustre expression that can refer to the present or past values of the inputs and only to past values of the outputs. Therefore, the resulting test generator issues at any instant a random input satisfying the environment properties. Table 7.4 shows an instance of the generated test sequence corresponding to the testnode of Figure 7.12. 7.3.2.2 The prob operator The prob operator enables defining conditional probabilities that are helpful in guiding the test data selection. These probabilities are used to specify advanced execution scenarios such testnode EnvAC(IsOn: bool; Tout: bool) returns (OnOff: bool; Tamb, Tuser : int) let environment(true); tel; FIGURE 7.11 Unconstrained environment. testnode EnvAC(IsOn: bool; Tout: bool) returns (OnOff: bool; Tamb, Tuser : int) let environment( -- the user can choose a -- temperature between 10◦ and 40◦ Tuser >= 10 and Tuser <= 40, -- the ambient temperature -- should be between -20◦ and 60◦ Tamb >= -20 and Tamb <= 60, -- the temperature cannot decrease -- if hot air is emitted true -> implies(pre IsOn and pre (Tout - Tamb) > 0, not(Tamb < pre Tamb)) , -- the temperature cannot increase -- if cold air is emitted true -> implies(pre IsOn and pre (Tout - Tamb) < 0, not(Tamb > preTamb)) ); tel; FIGURE 7.12 Constrained environment for the air conditioner. 188 Model-Based Testing for Embedded Systems as operational profiles [17] or fault simulation. Let us consider Figure 7.13. The previous example of the air-conditioner environment specification has been modified with some of the invariant properties now specified as expressions that hold with some probability. Also, probabilities have been added that specify low and high probability to push the OnOff button when the air conditioner is on and off, respectively. This leads to longer sub-sequences with a working air conditioner (IsOn = true). Note that any invariant property included in the environment operator has an occurrence probability equal to 1.0. In other words, environment(E)⇔prob(true,E,1.0). No static check of consistency on the probability definitions is performed, so the user can, in fact, specify a set of conditional probabilities that are impossible to satisfy at a given situation. If the generator encounters such a situation, different options to allow a satisfiable solution, such as partial satisfaction, can be specified. Table 7.5 shows an instance of a test sequence after the execution of the generator corresponding to the testnode of Figure 7.13. 7.3.2.3 The safeprop operator Safety properties express that the system cannot reach highly undesirable situations. They must always hold during the system operation. The safeprop operator automates the searching for the test data according to their ability to violate the safety properties. The basic idea is to ignore such test sequences that cannot violate a given safety property. Consider a simple property i ⇒ o, where i is an input and o an output of the software. In this case, the input i = f alse should not be generated since the property could not be violated regardless of the value of the produced output o. Of course, even after ignoring such sequences, it is not guaranteed that the program under test will reach a faulty situation since outputs are not known. Table 7.6 shows a sequence produced when the following operator is added to the testnode of Figure 7.13: safeprop(implies(IsOn and TambTuser)); TABLE 7.4 Generated Test Data—Version 1 t 0 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 t 13 OnOff 0 1 1 0 1 0 0 1 0 0 1 0 0 0 Tamb -5 8 31 38 40 4 10 30 43 10 23 28 21 10 Tutil 20 21 14 35 13 17 24 22 20 11 36 17 20 40 IsOn 0 1 0 0 1 1 1 0 0 0 1 1 1 1 Tout 30 25 9 34 4 21 28 20 13 11 40 14 20 50 TABLE 7.5 Generated Test Data—Version 2 t0 OnOff 0 Tamb 28 Tutil 36 t1 t2 t3 t4 t5 t6 100110 31 27 29 4 41 18 26 32 29 40 32 32 t 7 t 8 t 9 t 10 t 11 t 12 t 13 000 0 0 0 0 52 59 7 13 5 57 -2 19 22 36 10 19 12 18 IsOn 0 1 1 1 0 1 1 1 1 1 1 1 1 1 Tout 38 25 33 29 52 29 36 8 10 45 9 23 -3 24 Automatic Testing of LUSTRE/SCADE Programs 189 testnode EnvAC(IsOn: bool; Tout: bool) returns (OnOff: bool; Tamb, Tuser : int) let environment( -- the user can choose a -- temperature between 10◦ and 40◦ Tuser >= 10 and Tuser <= 40, -- the ambient temperature -- should be between -20◦ and 60◦ Tamb >= -20 and Tamb <= 60 ); -- if hot air is emitted, -- the ambient temperature can hardly decrease prob( false -> pre IsOn and pre (Tout-Tamb)>0, true -> Tamb < pre Tamb, 0.1 ); -- if cold air is emitted, t -- the ambient temperature hardly increases prob( false -> pre IsOn and pre (Tout-Tamb)<0, true -> Tamb > pre Tamb, 0.1 ); -- High probability to press the OnOff button -- when the air-conditioner is not On prob( false -> not(pre IsOn), OnOff, 0.9 ); -- Low probability to press the OnOff button -- when the air-conditioner is On prob( false -> pre IsOn, OnOff, 0.1 ); tel; FIGURE 7.13 Using occurrence probabilities for expressions. TABLE 7.6 Safety-Property-Guided Testing t0 OnOff 0 Tamb -9 Tuser 36 t1 t2 t3 t4 t5 01 001 17 27 -20 7 10 26 32 29 40 32 t 6 t 7 t 8 t 9 t 10 t 11 t 12 t 13 1 1 01 0 1 0 1 6 10 -20 7 -14 14 0 14 32 19 40 36 10 19 12 18 IsOn 0 0 1 1 1 0 1 0 0 1 1 0 0 1 Tout 51 29 33 45 51 39 40 22 60 45 18 20 16 19 Note that the generated values satisfy Tamb < Tuser, which is a necessary condition to violate this property. As a rule, a safety property can refer to past values of inputs that are already assigned. Thus, the generator must anticipate values of the present inputs that allow the property to be violated in the future. Given a number of steps k, chosen by the user, safeprop(P) means that such test inputs should be generated that can lead to a violation of P in the next k execution cycles. In order to do so, Lutess posts the property constraints for each cycle, according to three strategies: 1. The Union strategy would select inputs able to lead to a violation of P at any of the next k execution cycles: ¬Pt ∨ ¬Pt+1 ∨ ... ∨ ¬Pt+k−1. 2. The Intersection strategy would select inputs able to lead to a violation of P at each of the next k execution cycles: ¬Pt ∧ ¬Pt+1 ∧ ... ∧ ¬Pt+k−1. 190 Model-Based Testing for Embedded Systems 3. The Lazy strategy would select inputs able to lead to a violation of P as soon as possible within the next k execution cycles: ¬Pt ∨ (Pt ∧ ¬Pt+1) ∨ ... ∨ ((Pt ∧ ... ∧ Pt+k−2) ∧ Pt+k−1). Depending on the type of the expression inside the safeprop operator, each of these strategies produces different results. In most cases, as the value of k increases, the union strategy is too weak (input values are not constrained) and the intersection strategy too strong (unsatisfiable). The lazy strategy is a trade-off between these two extremes. To illustrate this, consider the safety property it−1 ∧ ¬it ⇒ ot. In this case, with k = 2, we obtain with the following: 1. Using the union strategy, we only impose it = true when it−1 = f alse, otherwise any value of it is admitted. 2. Using the intersection strategy, there is no solution at all. 3. Using the lazy strategy, we impose always it = ¬it−1, resulting in a sequence alternating the value of i at each step. 7.3.2.4 The hypothesis operator The generation mode guided by the safety properties has an important drawback. Since the program under test is considered as a black-box, the input computation is made assuming that any reaction of the program is possible. In practice, the program would prevent many of the chosen test inputs from leading to a state where a property violation is possible. Taking into account hypotheses on the program could be an answer to this problem. Such hypotheses could result from the program analysis or could be properties that have been successfully tested before. They can provide information, even incomplete, on the manner how outputs are computed and hence provide better inputs for safety-propertyguided testing. By adding to the testnode of Figure 7.13 the following two statements: hypothesis( true -> OnOff = IsOn<>pre(IsOn) ); safeprop( implies(IsOn and TambTuser) ), we introduce a hypothesis stating that the OnOff button turns the air conditioner on or off. The condition IsOn=true is necessary to violate the safety property, but since IsOn is an output of the software, we cannot directly set it to true. The hypothesis provides information about the values to be given to the OnOff input in order to obtain IsOn=true as output. The violation of the safety property then depends only on the Tout output. Table 7.7 shows a sequence produced by the test generator corresponding to the above specification. We can remark that the OnOff button is pressed only once when the air conditioner was off (pre IsOn = false). TABLE 7.7 Using Hypotheses in Safety-Property-Guided Testing t0 OnOff 0 Tamb -9 Tuser 36 t1 t2 t3 t4 t5 t6 10 0000 17 27 -20 7 10 6 26 32 29 40 32 32 t 7 t 8 t 9 t 10 t 11 t 12 t 13 0 00 0 0 0 0 10 -20 7 -14 14 0 14 19 40 36 10 19 12 18 IsOn 0 1 1 1 1 1 1 1 1 1 1 1 1 1 Tout 51 29 33 45 51 39 40 22 60 45 18 20 16 19 Automatic Testing of LUSTRE/SCADE Programs 191 7.3.3 Toward a test modeling methodology The above operators enable the test engineer to build test models according to a methodology that has been defined and applied in several case studies. One of such case studies is a steam boiler control system [19], a system that operates on a significantly large set of input/output variables and internal functions. In previous work, it has been used to assess the applicability of several formal methods [1]. The primary function of the boiler controller is to keep the water level between the given limits, based on inputs received from different boiler devices. The modeling and testing methodology consists of the following incremental approach: 1. Domain definition: Definition of the domain for integer inputs. For example, the water level cannot be negative or exceed the boiler capacity. 2. Environment dynamics: Specification of different temporal relations between the current inputs and past inputs/outputs. These relations often include, but are not limited to, the physical constraints of the environment. For example, we could specify that when the boiler valve opens, the water level can only decrease. The above specifications are introduced in the testnode by means of the environment operator. Simple random test sequences can be generated, without a particular test objective, but considering all and only inputs allowed by the environment. 3. Scenarios: Having in mind a specific test objective, the test engineer can specify more precise scenarios, by providing additional invariant properties or conditional probabilities (applying the prob operator). As a simple example, consider the stop input that stops the controller when true; a completely random value will stop the controller prematurely and thus prevent the testing of all the following behaviors. In this case, lowering the probability of stop being true keeps the controller running. 4. Property-based testing: This step uses formally specified safety properties in order to guide the generation toward the violation of such a property. Test hypotheses can also be introduced and possibly make this guidance more effective. Applying this methodology to the steam boiler case study showed that relevant test models for the steam boiler controller were not difficult to build. Modeling the steam boiler environment required a few days of work. Of course, the effort required for a complete test operation is not easy to assess as it depends on the desired thoroughness of the test sequences, which may lead the tester to write several conditional probabilities corresponding to different situations (and resulting in different testnodes). Building a new testnode to generate a new set of test sequences usually requires a slight modification of a previous testnode. Each of these testnodes can then be used to generate a large number of test sequences with little effort. Thus, when compared to manual test data construction, which is still a current practice by many test professionals, such an automatic generation of test cases could certainly facilitate the testing process. The steam boiler problem requires exchanging a given number of messages between the system controller and the physical system. The main program handles 38 inputs and 34 outputs, Boolean or integer, and it is composed of 30 internal functions. The main node is comprised, when unfolded, of 686 lines of Lustre code. Each testnode consists of about 20 invariant properties modeling the boiler environment to which various conditional probabilities or safety properties are added. The average size of a testnode, together with 192 Model-Based Testing for Embedded Systems the auxiliary nodes, approximates 200 lines of Lustre code. It takes less than 30 seconds to generate a sequence of hundred steps, for any of the test models we used (tests performed on a Linux Fedora 9, Intel Pentium 2GHz, and 1GB of memory). References 1. Abrial, J.-R. (1995). Steam-boiler control specification problem. Formal Methods for Industrial Applications, Volume 1165 of LNCS, 500–509. 2. Budd, T. A., DeMillo, R. A., Lipton, R. J., and Sayward, F.G. (1980). Theoretical and empirical studies on using program mutation to test the functional correctness of programs. In ACM Symposium on Principles of Programming Languages, Las Vegas, Nevada. 3. Caspi, P., Pilaud, D., Halbwachs, N., and Plaice, J. (1987). Lustre: A declarative language for programming synchronous systems. POPL, 178–188. 4. Chen, T.Y., and Lau, M. F. (2001). Test case selection strategies based on boolean specifications. Software Testing, Verification and Reliability, 11 (3), 165–180. 5. Chilenski, J.J., and Miller, S.P. (1994). Applicability of modified condition/decision coverage to software testing. Software Engineering Journal, 9 (5), 193–200. 6. Clarke, L. A., Podgurski, A., Richardson, D. J., and Zeil, S. J. (1989). A formal evaluation of data flow path selection criteria. IEEE Transactions on Software Engineering, 15 (11), 1318–1332. 7. DO-178B (1992). Software Considerations in Airborne Systems and Equipment Certification. Technical report, RTCA, Inc., www.rtca.org. 8. Girault, A., and Nicollin, X. (2003). Clock-driven automatic distribution of lustre programs. In 3rd International Conference on Embedded Software, EMSOFT’03, Volume 2855 of LNCS, Pages: 206–222. Springer-Verlag, Philadelphia. 9. Halbwachs, N., Caspi, P., Raymond, P., and Pilaud, D. (1991). The synchronous data flow programming language lustre. Proceedings of the IEEE, 79 (9), 1305–1320. 10. Halbwachs, N., Lagnier, F., and Ratel, C., (1992). Programming and verifying realtime systems by means of the synchronous data-flow language lustre. Transactions on Software Engineering, 18 (9), 785–793. 11. Lakehal, A., and Parissis, I. (2007). Automated measure of structural coverage for lustre programs: A case study. In proceedings of the 2nd IEEE International Workshop on Automated Software Testing (AST’2007), a joint event of the 29th ICSE . Minneapolis, MN. 12. Lakehal, A., and Parissis, I. (2005). Lustructu: A tool for the automatic coverage assessment of lustre programs. In Proceedings of the 16th IEEE International Symposium on Software Reliability Engineering, Pages: 301–310. Chicago, IL. 13. Lakehal, A., and Parissis, I. (2005). Lustructu: A tool for the automatic coverage assessment of lustre programs. In IEEE International Symposium on Software Reliability Engineering, Pages: 301–310. Chicago, IL. Automatic Testing of LUSTRE/SCADE Programs 193 14. Lakehal, A., and Parissis, I. (2005). Structural test coverage criteria for lustre programs. In Proceedings of the 10th International Workshop on Formal Methods for Industrial Critical Systems: a satellite event of the ESEC/FSE’05, Pages: 35–43, Lisbon, Portugal. 15. Laski, J. W., and Korel, B. (1983). A data flow oriented program testing strategy. IEEE Transactions on Software Engineering 9 (3), 347–354. 16. Marre, B. and Arnould, A. (2000). Test sequences generation from lustre descriptions: Gatel. Proceedings of the 15th IEEE Conference on Automated Software Engineering, Grenoble, France, 229–237. 17. Musa, J. D. (1993). Operational profiles in software-reliability engineering. IEEE Software, 10 (2), 14–32. 18. Ntafos, S. C. (1984). An evaluation of required element testing strategies. In International Conference on Software Engineering, Pages: 250–256. Orlando, FL. 19. Papailiopoulou, V., Seljimi, B., and Parissis, I. (2009). Revisiting the steam-boiler case study with lutess: modeling for automatic test generation. In 12th European Workshop on Dependable Computing, Toulouse, France. 20. Papailiopoulou, V. (2010). Test automatique de programmes lustre/scade. Phd thesis, Universit´e de Grenoble, France. 21. Parissis, I., and Ouabdesselam, F. (1996). Specification-based testing of synchronous software. ACM-SIGSOFT Foundations of Software Engineering, 127–134. 22. Parissis, I., and Vassy, J. (2003). Thoroughness of specification-based testing of synchronous programs. In Proceedings of the 14th IEEE International Symposium on Software Reliability Engineering, 191–202. 23. Pilaud, D., and Halbwachs, N. (1988). From a synchronous declarative language to a temporal logic dealing with multiform time. Proceedings of Formal Techniques in RealTime and Fault-Tolerant Systems, Warwick, United Kingdom, Volume 331 of Lecture Notes in Computer Science, 99–110. 24. Rajan, A. (2008). Coverage metrics for requirements-based testing. Phd thesis, University of Minnesota, Minneapolis. 25. Raymond, P., Nicollin, X. Halbwachs, N., and Weber, D. (1998). Automatic testing of reactive systems. Proceedings of the 19th IEEE Real-Time Systems Symposium, Madrid, Spain, 200–209. 26. Richardson, D., and Clarke, L. (1985). Partition analysis: a method combining testing and verification. IEEE Transactions on Software Engineering, 11 (12), 1477–1490. 27. Seljimi, B., and Parissis, I. (2006). Using CLP to automatically generate test sequences for synchronous programs with numeric inputs and outputs. In 17th International Symposium on Software Reliability Engineering, Pages: 105–116. Raleigh, North Carolina. 28. Seljimi, B., and Parissis, I. (2007). Automatic generation of test data generators for synchronous programs: Lutess V2. In Workshop on Domain Specific Approaches to Software Test Automation, Pages: 8–12. Dubrovnik, Croatia. 29. Vilkomir, S. A., and Bowen, J. P. (2001). Formalization of software testing criteria using the Z notation. In International Computer Software and Applications Conference (COMPSAC), Pages: 351–356. Chicago, IL. 194 Model-Based Testing for Embedded Systems 30. Vilkomir, S. A., and Bowen, J. P. (2002). Reinforced condition/decision coverage (RC/DC): A new criterion for software testing. In International Conference of B and Z Users, Pages: 291–308. Grenoble, France. 31. Woodward, M. R., Hedley, D., and Hennell, M. A. (1980). Experience with path analysis and testing of programs. IEEE Transactions on Software Engineering, 6 (3), 278–286. 8 Test Generation Using Symbolic Animation of Models Fr´ed´eric Dadeau, Fabien Peureux, Bruno Legeard, R´egis Tissot, Jacques Julliand, Pierre-Alain Masson, and Fabrice Bouquet CONTENTS 8.1 Motivations and Overall Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 8.1.1 Context: The B abstract machines notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 8.1.2 Model-based testing process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 8.1.3 Plan of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 8.2 Principles of Symbolic Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 8.2.1 Definition of the behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 8.2.2 Use of the behaviors for the symbolic animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 8.3 Automated Boundary Test Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 8.3.1 Extraction of the test targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 8.3.2 Computation of the test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 8.3.3 Leirios test generator for B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 8.3.4 Limitations of the automated approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 8.4 Scenario-Based Test Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 8.4.1 Scenario description language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 8.4.1.1 Sequence and model layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 8.4.1.2 Test generation directive layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 8.4.2 Unfolding and instantiation of scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 8.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 8.5.1 Automated versus manual testing—The GSM 11.11 case study . . . . . . . . . . . . . . . . 211 8.5.2 Completing functional tests with scenarios—The IAS case study . . . . . . . . . . . . . . . 212 8.5.3 Complementarity of the two approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 8.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 8.6.1 Model-based testing approaches using coverage criteria . . . . . . . . . . . . . . . . . . . . . . . . . 214 8.6.2 Scenario-based testing approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 8.7 Conclusion and Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 In the domain of embedded systems, models are often used either to generate code, possibly after refinement steps, but they also provide a functional view of the modeled system that can be used to produce black-box test cases, without considering the actual details of implementation of this system. In this process, the tests are generated by appling given test selection criteria on the model. These test cases are then played on the system and the results obtained are compared with the results predicted by the model, in order to ensure the conformance between the concrete system and its abstract representation. Test selection criteria aim at achieving a reasonable coverage of the functionalities or requirements of the system, without involving a heavyweight human intervention. We present in this chapter work on the B notation to support model design, intermediate verification, and test generation. In B machines, the data model is described using 195 196 Model-Based Testing for Embedded Systems abstract data types (such as sets, functions, and relations) and the operations are written in a code-like notation based on generalized substitutions. Using a customized animation tool, it is possible to animate the model, that is, to simulate its execution, in order to ensure that the operations behave as expected w.r.t. the initial informal requirements. Furthermore, this animation process is also used for the generation of test cases, with more or less automation. More precisely, our work focuses on symbolic animation that improves classical model animation by avoiding the enumeration of operation parameters. Parameter values become abstract variables whose values are handled by dedicated tools (provers or solvers). This process has been tool supported with the BZ-Testing-Tools framework that has been industrialized and commercialized by the company Smartesting (Jaffuel and Legeard 2007). We present in this chapter the techniques used to perform the symbolic animation of B models using underlying set-theoretical constraint solvers, and we describe two test generation processes based on this approach. The first process employs animation in a fully automated manner, as a means for building test cases that reach specific test targets computed so as to satisfy a structural coverage criterion over the operations of the model, also called static test selection criterion. In contrast, the second one is a Scenario-Based Testing (SBT) approach, also said to satisfy dynamic test selection criteria, in which manually designed scenarios are described as sequences of operations, possibly targeting specific states. These scenarios are then animated in order to produce the test cases. The goals of such automation are twofold. First, it makes it possible to reduce the effort in test design, especially on large and complex systems. Second, the application of model coverage criteria improves the confidence in the efficiency of the testing phase in detecting functional errors. We illustrate the use and the complementarity of these two techniques on the industrial case of a smart card application named IAS— Identification Authentication Signature—an electronic platform for loading applications on latest-generation smart cards. 8.1 Motivations and Overall Approach In the domain of embedded systems, a model-based approach for design, verification, or validation is often required, mainly because these kinds of systems are often of a safetycritical nature (Beizer 1995). In that sense, a defect can be relatively costly in terms of money or human lives. The key idea is thus to detect the possible malfunctions as soon as possible. The use of formal models, on which mathematical reasoning can be performed, is therefore an interesting solution. In the context of software testing, the use of formal models makes it possible to achieve an interesting automation of the process, the model being used as a basis from which the test cases are computed. In addition, the model predicts the expected results, named the oracle, that describe the response that the System Under Test (SUT) should provide (modulo data abstraction). The conformance of the SUT w.r.t. the initial model is based on this oracle. We rely on the use of behavioral models, which are models describing an abstraction of the system, using state variables, and operations that may be executed, representing a transition function described using generalized substitutions. The idea for generating tests from these models is to animate them, that is, to simulate their execution by invoking their operations. The sequences obtained represent abstract test cases that have to be concretized to be run on the SUT. Our approach considers two complementary test generation techniques that use model animation in order to generate the tests. The first one is based on a structural coverage of the operations of the model, and the second is based on dynamic selection criteria using user-defined scenarios. Test Generation Using Symbolic Animation of Models 197 Before going further into the details of our approach, let us define the perimeter of the embedded systems we target. We consider embedded systems that do not present concurrency, or strong real-time constraints (i.e., time constraints that cannot be discretized). Indeed, our approach is suitable for validating the functional behaviors of electronic transaction applications, such as smart cards applets or discrete automotive systems, such as front wipers or cruise controllers. 8.1.1 Context: The B abstract machines notation Our work focuses on the use of the B notation (Abrial 1996) for the design of the model to be used for testing an embedded system. Several reasons motivate this choice. B is a very convenient notation for modeling embedded systems, grounded on a well-defined semantics. It makes it possible to easily express the operations of the system using a functional approach. Thus, each command of the SUT can be modeled by a B operation that acts as a function updating the state variables. Moreover, the operations syntax displays conditional structures (IF...THEN...ELSE...END) that are similar to any programming language. One of the advantages of B is that it does not require the user to know the complete topology of the system (compared to automata-based formal notations), which simplifies its use in the industry. Notice that we do not consider the entire development process described by the B method. Indeed, this latter starts from an abstract machine and involves successive refinements that would be useless for test generation purposes (i.e., if the code is generated from the model, there is no need to test the code). Here, we focus on abstract machines; this does not restrict the expressiveness of the language since a set of refinements can naturally be flattened into a single abstract machine. B is based on a set-theoretical data model that makes it possible to describe complex structures using sets, relations (set of pairs), and a large variety of functions (total/partial functions, injections, surjections, bijections), along with numerous set/relational operators. The dynamics of the model, namely the initialization and the operations, are expressed using Generalized Substitutions that describe the possible atomic evolution of the state variables including simple assignments (x := E), multiple assignments (x, y := E, F also written x := E y := F), conditional assignments (IF Cond THEN Subst1 ELSE Subst2 END), bounded choice substitutions (CHOICE Subst1 OR .... OR SubstN END), or unbounded choice substitutions (ANY z WHERE Predicate(z) THEN Subst END) (see Abrial 1996, p. 227 for a complete list of generalized substitutions). An abstract machine is organized in clauses that describe (1) the constants of the system and their associated properties, (2) the state variables and the invariant (containing the data typing information) (3) the actual invariant (properties that one wants to see preserved through the possible execution of the machine), (4) the initial state, and (5) the atomic state evolution described by the operations. Figure 8.1 gives an example of a B abstract machine that will be used to illustrate the various concepts presented in this chapter. This machine models an electronic purse, similar as those embedded on smart cards, managing a given amount of money (variable balance). A PIN code is also used to identify the card holder (variable pin). The holder may try to authenticate using operation VERIFY PIN. Boolean variable auth states whether or not the holder is authenticated. A limited number of tries is given for the holder to authenticate (three in the model). When the user fails to authenticate, the number of tries decreases until reaching zero, corresponding to a state in which the card is definitely blocked (i.e., no command can be successfully invoked). The model provides a small number of operations that make it possible: to set the value of the PIN code (SET PIN operation), to authenticate the holder (VERIFY PIN operation), and to credit the purse (CREDIT operation) or to pay a purchase (DEBIT operation). 198 Model-Based Testing for Embedded Systems MACHINE purse CONSTANTS max tries PROPERTIES max tries ∈ N ∧ max tries = 3 VARIABLES balance, pin, tries, auth INVARIANT balance ∈ N ∧ balance ≥ 0 ∧ pin ∈ -1..9999 ∧ tries ∈ 0..max tries ∧ auth ∈ BOOLEAN ∧ ... INITIALIZATION balance := 0 pin := -1 tries := max tries auth := false OPERATIONS sw ← SET PIN(p) =ˆ ... sw ← VERIFY PIN(p) =ˆ ... sw ← CREDIT(a) =ˆ ... sw ← DEBIT(a) =ˆ ... END FIGURE 8.1 B abstract machine of a simplified electronic purse. 8.1.2 Model-based testing process We present in this part the use of B as a formal notation that makes it possible to describe the behavior of the SUT. In order to produce the test cases from the model, the B model is animated using constraint solving techniques. We propose to develop two test generation techniques based on this principle, as depicted in Figure 8.2. The first technique is fully automated and aims at applying structural coverage criteria on the operations of the machine so as to derive test cases that are supposed to exercise all the operations of the system, involving decision coverage and data coverage as a boundary analysis of the state variables. Unfortunately, this automated process shows some limitations, which we will illustrate. This leads us to consider a guided technique based on the design of scenarios. Both techniques rely on the use of animation, either to compute the test sequences by a customized state exploration algorithm or to animate the user-defined scenarios. These two processes compute test cases that are said to be abstract since they are expressed at the model level. These tests thus need to be concretized to be run on the SUT. To achieve that, the validation engineer has to write an adaptation layer that will be in charge of bridging the gap between the abstract and the concrete level (basically model operations are mapped to SUT commands, and abstract data values are translated into concrete data values). 8.1.3 Plan of the chapter The chapter is organized as follows. Section 8.2 describes the principle of symbolic animation that will be used in the subsequent sections. The automated boundary test generation technique is presented in Section 8.3, whereas the SBT approach is described Test Generation Using Symbolic Animation of Models 199 Informal specifications Modeling Formal B model Machine M SETS S1 = {e1, e2, e3} Variables xx, yy, zz Invariant --- Model validation Coverage criteria Boundarybased test generator Symbolic animator Scenariobased test generator Test scenario Abstract test cases Test bench Test execution environment Adaptation layer FIGURE 8.2 Test generation processes based on symbolic animation. in Section 8.4. The usefulness and complementarity of these two approaches are illustrated in Section 8.5 on industrial case studies on smart card applets. Finally, Section 8.6 presents the related works, and Section 8.7 concludes and gives an overview of the open issues. 8.2 Principles of Symbolic Animation For the test generation approaches to be relevant, it is mandatory to ensure that the model behaves as expected since the system will be checked against the model. Model animation is thus used for ensuring that the model behaves as described in the initial requirements. This step is done in a semi-automated way, by using a dedicated tool—a model animator— with which the validation engineer interacts. Concretely, the user chooses which operation he wants to invoke. Depending on the current state of the system and the values of the parameters, the animator computes and displays the resulting states that can be obtained. By comparing these states with the informal specification, the user can evaluate its model and correct it if necessary. This process is complementary to the verification that involves properties that have to be formally verified on the model. The symbolic animation improves the “classical” model animation by giving the possibility to abstract the operation parameters. Once a parameter is abstracted, it is replaced by a symbolic variable that is handled by dedicated constraints solvers. Abstracting all the parameter values turns out to consider each operation as a set of “behaviors” that are the basis from which symbolic animation can be performed (Bouquet et al. 2004). 200 Model-Based Testing for Embedded Systems 8.2.1 Definition of the behaviors A behavior is a subpart of an operation that represents one possible effect of the operation. Each behavior can be defined as a predicate, representing its activation condition, and a substitution that represents its effect, namely the evolution of the state variables and the instantiation of the return parameters of the operation. The behaviors are computed as the paths in the control flow graph of the considered B operation, represented as a before–after predicate.∗ Example 1 (Computation of behaviors). Consider a smart card command, named VERIFY PIN aimed at checking a PIN code proposed as parameter against the PIN code of the card. As for every smart card command, this command returns a code, named sw for status word, that indicates whether the operation succeeded or not and possibly indicating the cause of the failure. The precondition specifies the typing information on the parameter p (a four-digit number). First, the command cannot succeed if there are no remaining tries on the card and if the current PIN code of the card has been previously set. If the digits of the PIN code match, the card holder is authentified, otherwise there are two cases: either there are enough tries on the card, and the returned status word indicates that the PIN is wrong, or the holder has performed his/her last try, and the status word indicates that the card is now blocked. This operation is given in Figure 8.3, along with its control flow graph representation. This command presents four behaviors, which are made of the conjunction of the predicates on the edges of a given path, that are denoted by the sequence of nodes from 1 to 0. For example, behavior [1,2,3,4,0], defined by predicate p ∈ 0..9999 ∧ tries > 0 ∧ pin = −1 ∧ p = pin ∧ auth = true ∧ tries = max_tries ∧ sw = ok represents a successful authentication of the card holder. In this predicate, X designates the value of variable X after the execution of the operation. 8.2.2 Use of the behaviors for the symbolic animation When performing the symbolic animation of a B model, the operation parameters are abstracted and the operations are considered through their behaviors. Each parameter is thus replaced by a symbolic variable whose value is managed by a constraint solver. sw ← VERIFY_PIN (p) =^ PRE p ∈ 0 . . 9999 THEN IF tries > 0 ∧ pin ≠ –1 THEN IF p = pin THEN auth : = true || tries : = max_tries || sw := ok ELSE tries : = tries – 1 || auth := false || IF tries = 1 THEN sw := blocked ELSE sw := wrong_pin END END ELSE sw := wrong_mode END END tries > 0 ∧ pin ≠ –1 1 p ∈ 0 . . 9999 2 pin = pin 3 pin ≠ pin ∨ pin = -1 4 5 9 tries’ = tries – 1 ∧ auth’ = false auth’ = true ∧ tries’ = max_tries tries = 1 6 tries ≠ 1 ∧ sw = ok 7 8 sw = wrong_mode sw = sw = blocked wrong_pin 0 FIGURE 8.3 B code and control flow graph of the VERIFY PIN command. ∗A before–after predicate is a predicate involving state variables before the operation and after, using a primed notation. Test Generation Using Symbolic Animation of Models 201 Definition 1 (Constraint Satisfaction Problem [CSP]). A CSP is a triplet X, D, C in which • X = {X1, . . . , XN } is a set of N variables, • D = {D1, . . . , DN } is a set of domains associated to each variable (Xi ∈ Di), • C is a set of constraints that relate variable values altogether. A CSP is said to be consistent if there exists at least one valuation of the variables in X that satisfies the constraints of C. It is inconsistent otherwise. Activating a transition from a given state is equivalent to solving a CSP whose variables X are given by the state variables of the current state (i.e., the state from which the transition is activated), the state variables of the after state (i.e., the state reached by the activation of the transition), and the parameters of the operation. According to the B semantics, the domains D of the state variables and the operation parameters can be found in the invariant of the machine and in the precondition of the operation, respectively. The constraints C are the predicates composing the behavior that is being activated, enriched with equalities between the before and after variables that are not assigned within the considered behavior. The feasibility of a transition is defined by the consistency of the CSP associated to the activation of the transition from a given state. The iteration over the possible activable behaviors is done by performing a depth-first exploration of the behavior graph. Example 2 (Behavior activation). Consider the activation of the VERIFY PIN operation given in Example 1. Suppose the activation of this operation from the state s1 defined by: tries = 2, auth = false, pin = 1234. Two behaviors can be activated. The first one corresponds to an invocation ok ← VERIFY_PIN(1234) that covers path [1,2,3,4,0], and produces the following consistent CSP (notice that data domains have been reduced so as to give the most human-readable representation of the corresponding states): CSP1 = {tries, auth, pin, p, tries , auth , pin , sw}, {{2}, {f alse}, {1234}, {1234}, {3}, {true}, {1234}, {ok}}, {Inv, Inv , tries > 0, pin = −1, p = pin, tries = 3, auth = true, pin = pin, sw = ok} (8.1) where Inv and Inv designate the constraints from the machine invariant that apply on the variables before and after the activation of the behavior, respectively. The second behavior that can be activated corresponds to an invocation wrong_pin ← VERIFY_PIN(p) that covers path [1,2,3,5,6,8,0] and produces the following consistent CSP: CSP2 = {tries, auth, pin, p, tries , auth , pin , sw}, {{2}, {f alse}, {1234}, 0..1233 ∪ 1235..9999, {1}, {f alse}, {1234}, {wrong pin}}, {Inv, Inv , tries > 0, pin = −1, p = pin, tries = tries − 1, auth = f alse, tries = 1, pin = pin, sw = wrong pin} (8.2) State variables may also become symbolic variables, if their after value is related to the value of a symbolic parameter. A variable is said to be symbolic if the domain of the 202 Model-Based Testing for Embedded Systems variable contains more than one value. A system state that contains at least one symbolic state variable is said to be a symbolic state (as opposed to a concrete state). Example 3 (Computation of Symbolic States). Consider the SET_PIN operation that sets the value of the PIN on a smart card: sw ← SET PIN(p) =ˆ PRE p ∈ 0..9999 THEN IF pin = -1 THEN pin := p ELSE sw := wrong mode END END sw := ok From the initial state, in which auth = false, tries = 3, and pin = -1, the SET PIN operation can be activated to produce a symbolic state associated with the following CSP: CSP0 = {tries, auth, pin, p, tries , auth , pin , sw}, {{3}, {f alse}, {−1}, 0..9999, {3}, {f alse}, 0..9999, {ok}}, {Inv, Inv , pin = −1, pin = p, sw = ok} (8.3) The symbolic animation process works by exploring the successive behaviors of the considered operations. When two operations have to be chained, this process acts as an exploration of the possible combinations of successive behaviors for each operation. In practice, the selection of the behaviors to be activated is done in a transparent manner and the enumeration of the possible combinations of behaviors chaining is explored using backtracking mechanisms. For animating B models, we use CLPS-BZ (Bouquet, Legeard, and Peureux 2004), a set-theoretical constraint solver written in SICStus Prolog (SIC 2004) that is able to handle a large subset of the data structures existing in the B machines (sets, relations, functions, integers, atoms, etc.). Once the sequence has been played, the remaining symbolic parameters can be instantiated by a simple labeling procedure, which consists of solving the constraints system and producing an instantiation of the symbolic variables, obtaining an abstract test case. It is important to notice that constraint solvers work with an internal representation of constraints (involving constraint graphs and/or polyhedra calculi for relating variable values). Nevertheless, consistency algorithms used to acquire and propagate constraints are insufficient to ensure the consistency of a set of constraints, and a labeling procedure always has to be employed to guarantee the existence of solutions in a CSP associated to a symbolic state. The use of symbolic techniques avoids the complete enumeration of the concrete states when animating the model. It thus makes it possible to deal with large models, which represent billions of concrete states, by gathering them into symbolic states. As illustrated in the experimental section, such techniques ensure the scalability of the overall approach. The next two sections will now describe the use of symbolic animation for the generation of test cases. 8.3 Automated Boundary Test Generation We present in this section, the use of the symbolic animation for automating the generation of model-based test cases. This technique aims at a structural coverage of the transitions of Test Generation Using Symbolic Animation of Models 203 the system. To make it simple, each behavior of each operation of the B machine is targeted; the test cases thus aim at covering all the behaviors. In addition, a symbolic representation of the system states makes it possible to perform a boundary analysis from which the test targets will result (Legeard, Peureux, and Utting 2002, Ambert, Bouquet, Legeard, and Peureux 2003). This technique is recognized as a pertinent heuristics for generating test data (Beizer 1995). The tests that we propose comprise four parts, as illustrated in Figure 8.4. The first part, called preamble, is a sequence of operations that brings the system from the initial state to a state in which the test target, namely a state from which the considered behavior can be activated, is reached. The body is the activation of the behavior itself. Then, the identification phase is made of user-defined calls to observation operations that are supposed to retrieve internal values of the system so that they can be compared to model data in order to establish the conformance verdict of the test. Finally, the postamble phase is similar to the preamble, but it brings the system back to the initial state or to another state that reaches another test target. The latter part is important to chain the test cases. It is particularly useful when testing embedded systems since the execution of the tests on the system is very costly and such systems take usually much time to be reset by hand. This automated test generation technique requires some testability hypotheses to be employed. First, the operations of the B machine have to represent the control points of the system to be tested, so as to ease the concretization of the test cases. Second, it is mandatory that the concrete data of the SUT can be compared to the abstract data of the model, so as to be able to compare the results produced by the execution of the test cases with the results predicted by the model. Third, the SUT has to provide observation points that can be modeled in the B machine (either by return values of operations, such as the status words in the smart cards or by observation operations). We will now describe how the test cases can be automatically computed, namely how the test targets are extracted from the B machine and how the test preambles and postambles are computed. 8.3.1 Extraction of the test targets The goal of the tests is to verify that the behaviors described in the model exist in the SUT and produce the same result. To achieve that, each test will focus on one specific behavior of an operation. Test targets are defined as the states from which a given behavior can be activated. These test targets are computed so as to satisfy a structural coverage of the machine operations. Definition 2 (Test Target). Let OP = (Act1, Eff1)[] . . . [](ActN , EffN ) be the set of behaviors extracted from operation OP , in which Acti denotes the activation condition of behavior i, Effi denotes its effect, and [] is an operator of choice between behaviors. Let Preamble Body FIGURE 8.4 Composition of a test case. Postamble 204 Model-Based Testing for Embedded Systems Inv be the machine invariant. A test target is defined by a predicate that characterizes the states of the invariant from which a behavior i can be activated: Inv ∧ Acti. The use of underlying constraint solving techniques makes it possible to provide interesting possibilities for data coverage criteria. In particular, we are able to perform a boundary analysis of the behaviors of the model. Concretely, we will consider boundary goals that are states of the model for which at least one of the state variable is at an extremum (minimum or maximum) of its current domain. Definition 3 (Boundary Goal). Let minimize(V, C) and maximize(V, C) be functions that instantiate a symbolic variable V to its minimal and maximal value, respectively, under the constraints given in C. Let Acti be the activation condition of behavior i, let P be the parameters of the corresponding operation, and let V be the set of state variables that occur in behavior i, the boundary goals for the variables V are computed by BGmin = minimize(f (V ), Inv ∧ ∃P .Acti) BGmax = maximize(f (V ), Inv ∧ ∃P .Acti) in which f is an optimization function that depends on the type of the variable: if X is a set of integers, f (X) = x∈X x if X is a set of sets, f (X) = x∈X card(x) otherwise, f (X) = 1 Example 4 (Boundary Test Targets). Consider behavior [1,2,3,4,5,0] from the VERIFY PIN operation presented in Figure 8.3. The machine invariant gives the following typing informations: Inv =ˆ tries ∈ 0..3 ∧ pin ∈ −1..9999 ∧ auth ∈ {true, f alse} The boundary test targets are computed using the minimization/maximization formulas: BGmin = minimize(tries + pin, Inv ∧ ∃p ∈ 0..9999.(tries > 0 ∧ pin = −1 ∧ pin = p)) ; tries = 1, pin = 0 BGmax = maximize(tries + pin, Inv ∧ ∃p ∈ 0..9999.(tries > 0 ∧ pin = −1 ∧ pin = p)) ; tries = 3, pin = 9999 In order to improve the coverage of the operations, a predicate coverage criterion (Offutt, Xiong, and Liu 1999) can be applied by the validation engineer. This criterion acts as a rewriting of the disjunctions in the decisions of the B machine. Four rewritings are possible, which enables satisfying different specification coverage criteria, as given in Table 8.1. Rewriting 1 leaves the disjunction unmodified. Thus, the Decision Coverage criterion will be satisfied if a test target satisfies either P1 or P2 indifferently (also satisfying the Condition TABLE 8.1 Decision Coverage Criteria Depending on Rewritings N Rewriting of P1 ∨ P2 Coverage Criterion 1 P1 ∨ P2 Decision Coverage (DC) 2 P1 [] P2 Condition/Decision Coverage (C/DC) 3 P1 ∧ ¬P2 [] ¬P1 ∧ P2 Full Predicate Coverage (FPC) 4 P1 ∧ P2 [] P1 ∧ ¬P2 [] ¬P1 ∧ P2 Multiple Condition Coverage (MCC) Test Generation Using Symbolic Animation of Models 205 Coverage criterion). Rewriting 2 produces two test targets, one considering the satisfaction of P1, and the other the satisfaction of P2. Rewriting 3 will also produce two test targets, considering an exclusive satisfaction of P1 without P2 and vice versa. Finally, Rewriting 4 produces three test targets that will cover all the possibilities to satisfy the disjunctions. Notice that the consistency of the resulting test targets is checked so as to eliminate inconsistent test targets. Example 5 (Decision coverage). Consider behavior [1,2,9,0] from operation VERIFY_PIN presented in Figure 8.3. The selection of the Multiple Condition Coverage criterion will produce the following test targets: 1. Inv ∧ ∃p ∈ 0..9999 . (tries ≤ 0 ∧ pin = −1) 2. Inv ∧ ∃p ∈ 0..9999 . (tries > 0 ∧ pin = −1) 3. Inv ∧ ∃p ∈ 0..9999 . (tries ≤ 0 ∧ pin = −1) providing contexts from which boundary goals will then be computed. We now describe how symbolic animation reaches these targets by computation of the test preamble. 8.3.2 Computation of the test cases Once the test targets and boundary goals are defined, the idea is to employ symbolic animation in an automated manner that will aim at reaching each target. To achieve that, a state exploration algorithm that is a variant of the A* path-finding algorithm and based on a Best-First exploration of the system states has been developed. This algorithm aims at finding automatically a path, from the initial state, that will reach a given set of states characterized by a predicate. A sketch of the algorithm is given in Figure 8.5. From a given state, the symbolic successors, through each behavior, are computed using symbolic animation (procedure compute successors). Each of these successors is then evaluated to compute the distance to the target. This latter is based on a heuristics that considers the “distance” between the current state and the targeted states (procedure compute distance). To do that, the sum of the distances between each state variable is considered; if the domains of the two variables intersect, then the distance for these variables is 0, otherwise a customized formula, involving the type of the variable and the size of the domains, computes the distance (see Colin, Legeard, and Peureux 2004 for more details). The computation of the sequence restarts from the most relevant state, that is, the one presenting the smallest distance to the target (procedure remove minimal distance returning the most interesting triplet state, sequence of behaviors, distance and removing it from the list of visited states). The algorithm starts with the initial state (denoted by s init and obtained by initializing the variables according to the INITIALIZATION clause of the machine denoted by the initialize function). It ends if a zero-distance state is reached by the current sequence, or if all sequences have been explored for a given depth. Since reachability of the test targets cannot be decided, this algorithm is bounded in depth. Its worst-case complexity is O(nd), where n is the number of behaviors in all the operations of the machine and d is the depth of the exploration (maximal length of test sequence). Nevertheless, the heuristics consisting in computing the distance between the states explored and the targeted states to select the most relevant states improves the practical results of the algorithm. The computation of the preamble ends for three possible reasons. It may have found the target, and thus, the path is returned as a sequence of behaviors. Notice that, in practice, this path is often the shortest from the initial state, but it is not always the case because 206 Model-Based Testing for Embedded Systems SeqOp ← compute preamble(Depth, Target) begin s init ← initialize ; Seq curr ← [init] ; dist init ← compute distance(Target,s init) ; visited ← [ s init, Seq curr, dist init ] ; while visited = [] do s curr, Seq curr, M inDist ← remove minimal distance(visited) ; if length(Seq curr) < Depth then [(s 1, Seq 1), . . . , (s N , Seq N )] ← compute successors((s curr, Seq curr)) ; for each (s i, Seq i) ∈ [(s 1, Seq 1), . . . , (s N , Seq N )] do dist i ← compute distance(Target,s i) ; if dist i = 0 then return Seq i; else visited ← visited ∪ (s i, Seq i, dist i) ; end if done end if done return []; end FIGURE 8.5 State exploration algorithm. of the heuristics used during the search. The algorithm may also end by stating that the target has not been reached. This can be because the exploration depth was too low, but it may also be because of the unreachability of the target. Example 6 (Reachability of the test targets). Consider the three targets given in Example 5. The last two can easily be reached. Target 2 can be reached by setting the value of the PIN, and Target 3 can be reached by setting the value of the PIN, followed by three successive authentication failures. Nevertheless, the first target will never be reached since the decrementation of the tries can only be done if pin = −1. In order to avoid considering unreachable targets, the machine invariant has to be complete enough to catch at best the reachable states of the system, or, at least, to exclude unreachable states. In the example, completing the invariant by: pin = −1 ⇒ tries = 3 makes Target 1 inconsistent, and thus removes it from the test generation process. The sequence returned by the algorithm represents the preamble, to which the invocation of the considered behavior (representing the test body) is concatenated. If operation parameters are still constrained, they are also instantiated to their minimal or maximal value. The observation operations are specified by hand, and the (optional) postamble is computed on the same principle as the preamble. 8.3.3 Leirios test generator for B This technique has been industrialized by the company Smartesting,∗ a startup created from the research work done at the university of Franche-Comt´e in 2003, in a toolset named ∗www.smartesting.com. Test Generation Using Symbolic Animation of Models 207 Leirios∗ Test Generator for B machines (Jaffuel and Legeard 2007) (LTG-B for short). This tool presents features of animation, test generation, and publication of the tests. In a perspective of industrial use, the tool brings out the possibility of requirements traceability. Requirements can be tagged in the model by simple markers that will make it possible to relate them to the corresponding tests that have been generated (see Bouquet et al. 2005 for more details). The tool also presents test generation reports that show the coverage of the test targets and/or the coverage of the requirements, as illustrated in the screenshot shown in Figure 8.6. 8.3.4 Limitations of the automated approach Even though this automated approach has been successfully used in various industrial case studies on embedded systems (as will be described in Section 8.5), the feedback from the field experience has shown some limitations. The first issue is the problem of reachability of the test targets. Even if the set of system states is well defined by the machine invariant, the experience shows that some test targets require an important exploration depth to be reached automatically, which may strongly increase the test generation time. Second, the lack of observations on the SUT may weaken the conformance relationship. As explained before, it is mandatory to dispose of a large number of observations points on the SUT to improve the accuracy of the conformance verdict. Nevertheless, if a limited number of observation is provided by the test bench (e.g., in smart cards only status words can be observed), it is mandatory to be able to check that the system has actually and correctly evolved. Finally, an important issue is the coverage of the dynamics of the system (e.g., ensure that a given sequence of commands cannot be executed successfully if the sequence is broken). Knowing the test-generation driving possibilites of the LTG-B tool, it is possible to encode the dynamics of the system by additional (ghost) variables on which a specific coverage criterion will be applied. This FIGURE 8.6 A screenshot of the LTG-B user interface. ∗Former name of the company Smartesting. 208 Model-Based Testing for Embedded Systems solution is not recommended because it requires a good knowledge of how the tool works to be employed, which is not necessarily the case of any validation engineer. Again, if limited observation points are provided, this task is all the more complicated. This weakness is amplified by the fact that the preambles are restricted to a single path from the initial state and do not cover possibly interesting situations that would have required different sequences of operation to be computed (e.g., increasing their length, involving repetitions of specific sequences of operations, etc.). These reasons led us to consider a complementary approach, also based on model animation, that would overcome the limitations described previously. This solution is based on user-defined scenarios that will capture the know-how of the validation engineer and assist him in the design of his/her test campaigns. 8.4 Scenario-Based Test Generation SBT is a concept according to which the validation engineer describes scenarios of use cases of the system, thus defining the test cases. In the context of software testing, it consists of describing sequences of actions that exercise the functionalities of the system. We have chosen to express scenarios as regular expressions representing sequences of operations, possibly presenting intermediate states that have to be reached. Such an approach is related to combinatorial testing, which uses combinations of operations and parameter values, as done in the TOBIAS tool (Ledru et al. 2004). Nevertheless, combinatorial approaches can be seen as input-only, meaning that they do not produce the oracle of the test and only provide a syntactical means for generating tests, without checking the adequacy of the selected combinations w.r.t. a given specification. Thus, the numerous combinations of operations calls that can be produced may turn out to be not executable in practice. In order to improve this principle, we have proposed to rely on symbolic animation of formal models of the system in order to free the validation engineer from providing the parameters of the operations (Dadeau and Tissot 2009). This makes it possible to only focus on the description of the successive operations, possibly punctuated with checkpoints, as intermediate states, that guide the steps of the scenario. The animation engine is then in charge of computing the feasibility of the sequence at unfolding-time and to instantiate the operation parameters values. One of the advantages of our SBT approach is that it helps the production of test cases by considering symbolic values for the parameters of the operations. Thus, the user may force the animation to reach specific states, defined by predicates, that add constraints to the state variables values. Another advantage is that it provides a direct requirement traceability of the tests, considering that each scenario addresses a specific requirement. 8.4.1 Scenario description language We present here the language that we use for designing the scenarios, first introduced in Julliand, Masson, and Tissot 2008a. As its core are regular expressions that are then unfolded and played by the symbolic animation engine. The language is structured in three layers: the sequence layer, the model layer, and the directive layer, which are described in the following. 8.4.1.1 Sequence and model layers The sequence layer (Figure 8.7) is based on regular expressions that make it possible to define test scenarios as operation sequences (repeated or alternated) that may possibly Test Generation Using Symbolic Animation of Models 209 SEQ ::= OP1 | ”(” SEQ ”)” | SEQ ”.” SEQ | SEQ REPEAT (ALL or ONE)? | SEQ CHOICE SEQ | SEQ ”;(” SP ”)” REPEAT ::= ”?” | n | n..m FIGURE 8.7 Syntax of the sequence layer. OP ::= operation name | ”$OP” | ”$OP \ {” OPLIST ”}” OPLIST ::= operation name | operation name ”,” OPLIST SP ::= state predicate FIGURE 8.8 Syntax of the model layer. lead to specific states. The model layer (Figure 8.8) describes the operation calls and the state predicates at the model level and constitutes the interface between the model and the scenario. A set of rules specifies the language. Rule SEQ (axiom of the grammar) describes a sequence of operation calls as a regular expression. A step in the sequence is either a simple operation call, denoted by OP1, or a sequence of operation calls that leads to a state satisfying a state predicate, denoted by SEQ ;(SP). This latter represents an improvement w.r.t. usual scenarios description languages since it makes it possible to define the target of an operation sequence, without necessarily having to enumerate all the operations that compose the sequence. Scenarios can be composed by the concatenation of two sequences, the repetition of a sequence, and the choice between two or more sequences. In practice, we use bounded repetition operators: 0 or 1, exactly n times, at most m times, and between n and m times. Rule SP describes a state predicate, whereas OP is used to describe the operation calls that can be (1) an operation name, (2) the $OP keyword, meaning “any operation,” or (3) $OP\{OPLIST} meaning “any operation except those of OPLIST.” 8.4.1.2 Test generation directive layer This layer makes it possible to drive the step of test generation, when the tests are unfolded. We propose three kinds of directives that aim at reducing the search for the instantiation of a test scenario. This part of the language is given in Figure 8.9. Rule CHOICE introduces two operators denoted | and ⊗, for covering the branches of a choice. For example, if S1 and S2 are two sequences, S1 | S2 specifies that the test generator has to produce tests that will cover S1 and other tests that will cover sequence S2, whereas S1 ⊗ S2 specifies that the test generator has to produce test cases covering either S1 or S2. 210 CHOICE ::= ”|” | ”⊗” ALL or ONE ::= ” one” Model-Based Testing for Embedded Systems OP1 ::= OP | ”[”OP”]” | ”[” OP ”/w” BHRLIST ”]” | ”[” OP ”/e” BHRLIST ”]” BHRLIST ::= bhr label (”,” bhr label)* FIGURE 8.9 Syntax of the test generation directive layer. Rule ALL or ONE makes it possible to specify if all the solutions of the iteration will be returned (when not present) or if only one will be selected ( one). Rule OP1 indicates to the test generator that it has to cover one of the behaviors of the OP operation (default option). The test engineer may also require all the behaviors to be covered by surrounding the operation with brackets. Two variants make it possible to select the behaviors that will be applied, by specifying which behaviors are authorized (/w) or refused (/e) using labels that have to tag the operations of the model. Example 7 (An example of a scenario). Consider again the VERIFY_PIN operation from the previous example. A piece of scenario that expresses the invocation of this operation until the card is blocked, whatever the number of remaining tries might be, is expressed by (VERIFY_PIN0..3 one) ; (tries=0). 8.4.2 Unfolding and instantiation of scenarios The scenarios are unfolded and animated on the model at the same time, in order to produce the test cases. To do that, each scenario is translated into a Prolog file, directly interpreted by the symbolic animation engine of BZ-Testing-Tools framework. Each solution provides an instantiated test case. The internal backtracking mechanism of Prolog is used to iterate on the different solutions. The instantiation mechanism involved in this part of the process aims at computing the values of the parameters of the operations composing the test case so that the sequence is feasible (Abrial 1996, p. 290). If a given scenario step cannot be activated (e.g., because of an unsatisfiable activation condition), the subpart of the execution tree related to the subsequence steps of the sequence is pruned and will not be explored. Example 8 (Unfolding and instantiation). When unfolded, scenario (VERIFY_PIN0..3 one) ; (tries=0) will produce the following sequences: (1) ; (tries=0) (2) VERIFY_PIN(P1) ; (tries=0) (3) VERIFY_PIN(P1) . VERIFY_PIN(P2) ; (tries=0) (4) VERIFY_PIN(P1) . VERIFY_PIN(P2) . VERIFY_PIN(P3) ; (tries=0) where P1, P2, P3 are variables that will have to be instantiated afterwards. Suppose that the current system state gives tries=2 (remaining tries) and pin=1234. Sequence (1) can not be satisfied, (2) does not make it possible to block the card after a single authentication failure, sequence (3) and (4) are feasible, leading to a state in which the card is blocked. According to the selected directive ( one), only one sequence will be kept (here, (3) since it represents the lowest number of iterations). The solver then instantiates parameters P1 and P2 for sequence (3). This sequence activates behavior [1, 2, 3, 5, 6, 8, 0] of VERIFY_PIN followed by behavior [1, 2, 3, 5, 6, 7, 0] that blocks the card (cf. Figure 8.3). The constraints associated with the variables representing Test Generation Using Symbolic Animation of Models 211 FIGURE 8.10 The jSynoPSys SBT tool. the parameters are thus P1 = 1234 and P2 = 1234. A basic instantiation will then return P1 = P2 = 0, resulting in sequence: VERIFY_PIN(0); VERIFY_PIN(0). These principles have been implemented into a tool named jSynoPSys (Dadeau and Tissot 2009), a SBT tool working on B Machines. A screenshot of the tool is displayed in Figure 8.10. The tool makes it possible to design and play the scenarios. Resulting tests can be displayed in the interface or exported to be concretized. Notice that this latter makes it possible to reuse existing concretization layers that would have been developed for LTG-B. 8.5 Experimental Results This section relates the experimental results obtained during various industrial collaborations in the domain of embedded systems: smart card applets (Bernard 2004) or operating systems (Bouquet et al. 2002), ticketing applications, automotive controllers (Bouquet, Lebeau, and Legeard 2004), and space on-board software (Chevalley, Legeard, and Orsat 2005). We first illustrate the relevance of the automated test generation approach compared to manual test design. Then, we show the complementary of the two test generation techniques presented in this chapter. 8.5.1 Automated versus manual testing—The GSM 11.11 case study In the context of an industrial partnership with the smart card division∗ of the Schlumberger company, a comparison has been done between a manual and an automated approach for the generation of test cases. The selected case study was the GSM 11.11 standard (European Telecommunications Standards Institute 1999) that defines, on mobile phones, the interface between the Subscriber Identification Module (SIM) and the Mobile Equipment (ME). The part of the standard that was modeled consisted of the structure of the SIM, namely its organization in directories (called Dedicated Files—DF) or files (called Elementary ∗Now Parkeon – www.parkeon.com. 212 Model-Based Testing for Embedded Systems Files—EF), and the security aspects of the SIM, namely the access control policies applied to the files. Files are accessible for reading, with four different access levels: ALWays (access can always be performed), CHV (access depends on a Card Holder Verification performed previously), ADM (for administration purposes), and NEVer (the file cannot be directly accessed through the interface). The commands modeled were SELECT FILE (used to explore the file system), READ BINARY (used to read in the files if permitted), VERIFY CHV (used to authenticate the holder), and UNBLOCK CHV (used to unblock the CHV when too many unsuccessful authentication attempts with VERIFY CHV happened). In addition, a command named STATUS makes it possible to retrieve the internal state of the card (current EF, current DF, and current values of tries counters). Notice that no command was modeled to create/delete files or set access control permission: the file system structure and permission have been modeled as constants and manually created on the test bench. The B model was about 500 lines of code and represents more than a milion of concrete states. Although it was written by our research team members, the model did not involve complicated B structures and thus did not require a high level of expertise in B modeling. A total of 42 boundary goals have been computed, leading to the automated computation of 1008 test cases. These tests have been compared to the existing test suite, which had been handwritten by the Schlumberger validation team and covering the same subset of the GSM 11.11 standard. This team performed the comparison. It showed that the automated test suite included 80% of the manual tests. More precisely, since automated test cases cover behaviors atomically, a single manual test may usually exercise the SUT in the same way that several automated tests would do. On the opposite end of the spectrum, 50% of the automated tests were absent from the manual test suite. Among them, for 20% of tests that were not produced automatically, three reasons appear. Some of the missing tests (5%) considered boundary goals that have not been generated. Other tests (35%) considered the activation of several operations from the boundary state that is not considered by the automated approach. Whereas these two issues are not crucial, and do not put the process into question; it appeared that the rest of the tests (60%) covered parts of the informal requirements that were not expressed in the B model. To overcome this limitation, a first attempt of SBT has been proposed, asking the validation engineer to provide tests designed independently, with the help of the animation tool. The study also compared the efforts for designing the test cases. As shown in Table 8.2, the automated process reduces test implementation time, but adds time for the design of the B model. On the example, the overall effort is reduced by 30%. 8.5.2 Completing functional tests with scenarios—The IAS case study The SBT process has been designed during the French National project POSE∗ that involved the leader of smart cards manufacturers, Gemalto, and that aimed at the validation of security policies for the IAS platform. TABLE 8.2 Comparison in Terms of Time Spent on the Testing Phase in Persons/Day Manual Design Automated Process Design of the test plan Implementation and test execution Total 6 p/d 24 p/d 30 p/d Modeling in B Test generation Test execution Total 12 p/d Automated 6 p/d 18 p/d ∗http://www.rntl-pose.info. Test Generation Using Symbolic Animation of Models 213 IAS stands for Identification, Authentication, and electronic Signature. It is a standard for Smart Cards developed as a common platform for e-Administration in France and specified by GIXEL. IAS provides identification, authentication, and signature services to the other applications running on the card. Smart cards, such as the French identity card or the “Sesame Vitale 2” health card, are expected to conform to IAS. Being based on the GSM 11.11 interface, the models present similarities. This platform presents a file system containing DFs and EFs. In addition, DFs host Security Data Objects (SDO) that are objects of an application containing highly sensitive data such as PIN codes or cryptographic keys. The access to an object by an operation in IAS is protected by security rules based on the security attributes of the object. The access rules can possibly be expressed as a conjunction of elementary access conditions, such as Never (which is the rule by default, stating that the command can never access the object), Always (the command can always access the object), or User (user authentication: the user must be authenticated by means of a PIN code). The application of a given command to an object can then depend on the state of some other SDOs, which complicates the access control rules. The B model for IAS is 15,500 lines long. The complete IAS commands have been modeled as a set of 60 B operations manipulating 150 state variables. A first automated test generation campaign was experimented with and produced about 7000 tests. A close examination of the tests found the same weakness as for the GSM 11.11 case study, namely, interesting security properties were not covered at best, and manual testing would be necessary to overcome this weakness. The idea of the experiment was to relate to the Common Criteria (C.C.) norm (CC 2006), a standard for the security of Information Technology products that provides a set of assurances w.r.t. the evaluation of the security implemented by the product. When a product is delivered, it can be evaluated w.r.t. the C.C. that ensure the conformance of the product w.r.t. security guidelines related to the software design, verification, and validation of the standard. In order to pass the current threshold of acceptance, the C.C. require the use of a formal model and evidences of the validation of the given security properties of the system. Nevertheless, tracing the properties in the model in order to identify dedicated tests was not possible since some of the properties were not directly expressed in the original B model. For the experimentation, we started by designing a simplified model called Security Policy Model (SPM) that focuses on access control features. This model is 1100 lines long with 12 operations manipulating 20 state variables and represents the files management with authentications on their associated SDOs. In order to complete the tests that are generated automatically from the complete model, three scenarios have been designed for exercising specific security properties that could not be covered previously. The scenarios and their associated tests provide direct evidences of the validation of given properties. Each scenario is associated with a test need that informally expresses the intention of the scenario w.r.t. the property and provides documentation on the test campaign. • The first scenario exercises a security property stating that the access to an object protected by a PIN code requires authentication by means of the PIN code. The tests produced automatically exercise this property in a case where the authentication is obtained, and in a case where it is not. The scenario completes these tests by considering the case in which the authentication has first been obtained, but lost afterwards. The unfolding of this scenario provided 35 instantiated sequences, illustrating the possible ways of losing an authentication. • The second scenario exercises the case of homonym PIN files located in different DFs, and their involvement in the access control conditions. In particular, it aimed at ensuring that an authenticated PIN in a specific DF is not mistakenly considered in an access 214 Model-Based Testing for Embedded Systems control condition that involves another PIN displaying the same name but located in another DF. The unfolding of this scenario resulted in 66 tests. • The third and last scenario exercises a property specifying that the authentication obtained by means of a PIN code not only depends on the location of the PIN but also on the life cycle state of the DF where a command protected by the PIN is applied. This scenario aimed at testing situations where the life cycle state of the directory is not always activated (which was not covered by the first campaign). The unfolding of this scenario produced 82 tests. In the end, the three scenarios produced 183 tests that were run on the SUT. Even if this approach did not reveal any errors, the execution of these tests helps increasing the confidence in the system w.r.t. the considered security properties. In addition, the scenarios could provide direct evidence of the validation of these properties, which were useful for the C.C. evaluation of the IAS. Notice that, when replaying the scenarios on the complete IAS model, the SBT approach detected a nonconformance between the SPM and the complete model because of a different interpretation of the informal requirements in the two models. 8.5.3 Complementarity of the two approaches These two case studies illustrate the complementarity of the approaches. The automated boundary test generation approach is efficient at replacing most of the manual design of the functional tests, saving efforts in the design of the test campaigns. Nevertheless, it is mandatory to complete the test suite to exercise properties related to the dynamics of the system to be tested. To this end, the SBT approach provides an interesting way to assist the validation engineer in the design of complementary tests. In both cases, the use of symbolic techniques ensures the scalability of the approach. Finally, it is important to notice that the effort of model design is made beneficial by the automated computation of the oracle and the possibility to script the execution of the tests and the verdict assignment. Notice also that, if changes appear in the specifications, a manual approach would require the complete test suite to be inspected and updated, whereas our approaches would only require to propagate these changes in the model and let the test generation tool recompute the new test suites, saving time and efforts of test suite maintenance. 8.6 Related Work This section is divided into two subsections The first subsection is dedicated to automated test generation using model coverage criteria. The second compares our SBT process with similar approaches. 8.6.1 Model-based testing approaches using coverage criteria Many model-based testing approaches rely on the use of a Labeled Transition System or a Finite-State Machine from which the tests are generated using dedicated graph exploration algorithms (Lee and Yannakakis 1996). Tools such as TorX (Tretmans and Brinksma 2003) and TGV (Jard and J´eron 2004) use a formal representation of the system written as Input–Output Labeled Transition Systems, on which test purposes are applied to select the relevant test cases to be produced. In addition, TorX proposes the use of test heuristics that Test Generation Using Symbolic Animation of Models 215 help filtering the resulting tests according to various criteria (test length, cycle coverage, etc.). The conformance is established using the ioco (Tretmans 1996) relationship. The major differences with our automated approach is that, first, we do not know the topology of the system. As a consequence, the treatment of the model differs. Second, these processes are based on the online (or on-the-fly) testing paradigm in which the model program and the implementation are considered together. On the contrary, our approach is amenable to offline testing that requires a concretization step for the tests to be run on the SUT and the conformance to be established. Notice that the online testing approaches described previously may also be employed offline (J´eron 2009). The STG tool (Clarke et al. 2001) improves the TGV approach by considering Input– Output Symbolic Transitions Systems, on which deductive reasoning applies, involving constraint solvers or theorem provers. Nevertheless, the kind of data manipulated are often restricted to integers and Booleans, whereas our approach manipulates additional data types, such as collections (sets, relation, functions, etc.) that may be useful for the modeling step. Similarly, AGATHA (Bigot et al. 2003) is a test generation tool based on constraint solving techniques that works by building a symbolic execution graph of systems modeled by communicating automata. Tests are then generated using dedicated algorithms in charge of covering all the transitions of the symbolic execution graph. The CASTING (van Aertryck, Benveniste, and Le Metayer 1997) testing method is also based on the use of operations written in DNF for extracting the test cases (Dick and Faivre 1993). In addition, CASTING considers decomposition rules that have to be selected by the validation engineer so as to refine the test targets. CASTING has been implemented for B machines. Test targets are computed as constraints applying on the before and after states of the system. These constraints define states that have to be reached by the test generation process. To achieve that, the concrete state graph is built and explored. Our approach improves this technique by considering symbolic techniques that perform a boundary analysis for the test data, potentially improving the test targets. Moreover, the on-thefly exploration of the state graph avoids the complete enumeration of all the states of the model. Also based on B specifications, ProTest (Satpathy, Leuschel, and Butler 2005) is an automated test generator coupled with the ProB model checker (Leuschel and Butler 2003). ProTest works by first building the concrete system state graph through model animation that is then explored for covering states and transitions using classical algorithms. One point in favor of ProTest/ProB is that it covers a larger subset of the B notation as our approach, notably supporting sequences. Nevertheless, the major drawback is the exhaustive exploration of all the concrete states that complicates the industrial use of the tool on large models. In particular, the IAS model used in the experiment reported in Section 8.5.2 can not be handled by the tool. 8.6.2 Scenario-based testing approaches In the literature, a lot of SBT work focuses on extracting scenarios from UML diagrams, such as the SCENTOR approach (Wittevrongel and Maurer 2001) or SCENT (Ryser and Glinz 1999), both using statecharts. The SOOFT approach (Tsai et al. 2003) proposes an objectoriented framework for performing SBT. In Binder (1999), Binder proposes the notion of round-trip scenario test that covers all event-response path of a UML sequence diagram. Nevertheless, the scenarios have to be completely described, contrary to our approach that abstracts the difficult task of finding well-suited parameter values. In the study by Auguston, Michael, and Shing (2005), the authors propose an approach for the automated scenario generation from environment models for testing of real-time reactive systems. The behavior of the system is defined as a set of events. The process 216 Model-Based Testing for Embedded Systems relies on an attributed event grammar (AEG) that specifies possible event traces. Even if the targeted applications are different, the AEG can be seen as a generalization of regular expressions that we consider. Indirectly, the test purposes of the STG (Clarke et al. 2001) tool, described as IOSTS (Input/Output Symbolic Transition Systems), can be seen as scenarios. Indeed, the test purposes are combined with an IOSTS of the SUT by an automata product. This product restricts the possible executions of the system to those evidencing the test purpose. Such an approach has also been adapted to the B machines in (Julliand, Masson, and Tissot 2008b). A similar approach is the test by model checking, where test purposes can be expressed in the shape of temporal logic properties, as is the case in Amman, Ding, and Xu (2001) or Tan, Sokolsky, and Lee (2004). The model checker computes witness traces of the properties by a synchronized product of the automata of the property and of a state/transition model of the sytem under test. These traces are then used as test cases. An input/output temporal logic has also been described in Rapin (2009) to express temporal properties w.r.t. IOSTS. The authors use an extension of the AGATHA tool to process such properties. As explained in the beginning of this chapter, we were inspired by the TOBIAS tool (Ledru et al. 2004) that works with scenarios expressed using regular expressions representing the combinations of operations and parameters. Our approach improves this principle by avoiding the enumeration of the combinations of input parameters. In addition, our tool provides test driving possibilities that may be used to easily tackle the combinatorial explosion, inherent to such an approach. Nevertheless, on some points, the TOBIAS input language is more expressive than ours and a combination of these two approaches, which would employ the TOBIAS tool for describing the test cases, is currently under study. Notice that an experiment has been done in Maury, Ledru, and du Bousquet (2003) for coupling TOBIAS with UCASTING, the UML version of the CASTING tool (van Aertryck, Benveniste, and Le Metayer 1997). This work made it possible to use UCASTING for (1) filtering the large tests sequences combinatorially produced by TOBIAS, by removing traces that were not feasible on the model or (2) to instantiate operation parameters. Even if the outcome is similar, our approach differs since the inconsistency of the test cases is detected without having to completely unfold the test sequences. Moreover, the coupling of these tools did not include as many test driving options, to reduce the number of test cases, as we propose. The technique for specifying scenarios can be related to Microsoft Parameterized Unit Tests (PUT for short) (Tillmann and Schulte 2005), in which the user writes skeletons of test cases involving parameterized data that will be instantiated automatically using constraint solving techniques. Moreover, the test cases may contain basic structures such as conditions and iterations, which will be unfolded during the process, so as to produce test cases. Our approach is very similar in its essence, but some differences exist. First, our scenarios do not contain data parameters. Second, we express them on the model, whereas the PUT approach aims at producing test cases that will be directly executed on the code, leaving the question of the oracle not addressed. Nevertheless, the question of refining the scenario description language so as to propagate some symbolic parameterized data along the scenario is under study. 8.7 Conclusion and Open Issues This chapter has presented two test generation techniques using the symbolic animation of formal models, written in B, used for automating test design in the context of embedded systems such as smart cards. The first technique relies on the computation of boundary Test Generation Using Symbolic Animation of Models 217 goals that define tests targets. These are then automatically reached by a customized state exploration algorithm. This technique has been industrialized by the company Smartesting and applied on various case studies in the domain of embedded systems, in particular in the domain of electronic transactions. The second technique considers user-defined scenarios, expressed as regular expressions on the operations of the model and intermediate states, that are unfolded and animated on the model so as to filter the inconsistent test cases. This technique has been designed and experimented with during an industrial partnership. This SBT approach has shown to be very convenient, firstly with the use of a dedicated scenario description language that is easy to put into practice. Moreover, the connection between the tests, the scenarios, and the properties from which they originate can be directly established, providing a means for ensuring the traceability of the tests, which is useful in the context of high-level evaluation of C.C, that requires evidences of the validation of specific properties of the considered software. The work presented here has been applied to B models, but it is not restricted to this formalism, and the adaptation to UML, in partnership with Smartesting, is currently being studied. Even if the SBT technique overcomes the limitations of the automated approach, in terms of relevance of the preambles, reachability of the test targets, and observations, the design of the scenario is still a manual step that requires the validation engineer to intervene. One interesting lead would be to automate the generation of the scenarios, in particular using high-level formal properties that they would exercise. Another approach is to use model abstraction (Ball 2005) for generating the tests cases, based on dynamic test selection criteria, expressed by the scenarios. Finally, we have noticed that a key issue in the process is the ability to deal with changes and evolutions of the software at the model level. We are now working on integrating changes in the Model-based Testing process. The goal is twofold. First, it would avoid the complete recomputation of the test suites, thus saving computation time. Second, and more importantly, it would make it possible to classify tests into specific test suites dedicated to the validation of software evolutions by ensuring nonregression and nonstagnation of the parts of system. References Abrial, J. (1996). The B-Book, Cambridge University Press, Cambridge, United Kindgom. Ambert, F., Bouquet, F., Legeard, B., and Peureux, F. (2003). Automated boundary-value test generation from specifications—method and tools. In 4th Int. Conf. on Software Testing, ICSTEST 2003, Pages: 52–68. Cologne, Allemagne. Amman, P., Ding, W., and Xu, D. (2001). Using a model checker to test safety properties. In ICECCS’01, 7th Int. Conf. on Engineering of Complex Computer Systems, Page: 212. IEEE Computer Society, Washington, DC. Auguston, M., Michael, J., and Shing, M.-T. (2005). Environment behavior models for scenario generation and testing automation. In A-MOST ’05: Proceedings of the 1st International Workshop on Advances in Model-Based Testing, Pages: 1–6. ACM, New York, NY. 218 Model-Based Testing for Embedded Systems Ball, T. (2005). A theory of predicate-complete test coverage and generation. In de Boer, F., Bonsangue, M., Graf, S., and de Roever, W.-P., eds, FMCO’04, Volume 3657, of LNCS, Pages: 1–22. Springer-Verlag, Berlin, Germany. Beizer, B. (1995). Black-Box Testing: Techniques for Functional Testing of Software and Systems. John Wiley & Sons, New York, NY. Bernard, E., Legeard, B., Luck, X., and Peureux, F. (2004). Generation of test sequences from formal specifications: GSM 11-11 standard case study. International Journal of Software Practice and Experience 34(10), 915–948. Bigot, C., Faivre, A., Gallois, J.-P., Lapitre, A., Lugato, D., Pierron, J.-Y., and Rapin, N. (2003). Automatic test generation with AGATHA. In Garavel, H. and Hatcliff, J., eds, Tools and Algorithms for the Construction and Analysis of Systems, 9th International Conference, TACAS 2003, Volume 2619, Lecture Notes in Computer Science, Pages: 591–596. Springer-Verlag, Berlin, Germany. Binder, R.V. (1999). Testing Object-oriented Systems: Models, Patterns, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston, MA. Bouquet, F., Jaffuel, E., Legeard, B., Peureux, F., and Utting, M. (2005). Requirement traceability in automated test generation—application to smart card software validation. In Procs. of the ICSE Int. Workshop on Advances in Model-Based Software Testing (A-MOST’05). ACM Press, St. Louis, MO. Bouquet, F., Julliand, J., Legeard, B., and Peureux, F. (2002). Automatic reconstruction and generation of functional test patterns—application to the Java Card Transaction Mechanism (confidential). Technical Report TR-01/02, LIFC—University of FrancheComt´e and Schlumberger Montrouge Product Center. Bouquet, F., Lebeau, F., and Legeard, B. (2004). Test case and test driver generation for automotive embedded systems. In 5th Int. Conf. on Software Testing, ICS-Test 2004, Pages: 37–53. Du¨sseldorf, Germany. Bouquet, F., Legeard, B., and Peureux, F. (2004). CLPS-B: A constraint solver to animate a B specification. International Journal on Software Tools for Technology Transfer, STTT 6(2), 143–157. Bouquet, F., Legeard, B., Utting, M., and Vacelet, N. (2004). Faster analysis of formal specifications. In Davies, J., Schulte, W., and Barnett, M., eds, 6th Int. Conf. on Formal Engineering Methods (ICFEM’04), Volume 3308, of LNCS, Pages: 239–258. SpringerVerlag, Seattle, WA. CC (2006). Common Criteria for Information Technology Security Evaluation, version 3.1, Technical Report CCMB-2006-09-001. Chevalley, P., Legeard, B., and Orsat, J. (2005). Automated test case generation for space on-board software. In Eurospace, ed, DASIA 2005, Data Systems In Aerospace Int. Conf., Pages: 153–159. Edinburgh, UK. Clarke, D., J´eron, T., Rusu, V., and Zinovieva, E. (2001). Stg: A tool for generating symbolic test programs and oracles from operational specifications. In ESEC/FSE-9: Proceedings of the 8th European software engineering conference held jointly with 9th ACM SIGSOFT international symposium on Foundations of software engineering, Pages: 301– 302. ACM, New York, NY. Test Generation Using Symbolic Animation of Models 219 Colin, S., Legeard, B., and Peureux, F. (2004). Preamble computation in automated test case generation using constraint logic programming. The Journal of Software Testing, Verification and Reliability 14(3), 213–235. Dadeau, F. and Tissot, R. (2009). jSynoPSys—a scenario-based testing tool based on the symbolic animation of B machines. ENTCS, Electronic Notes in Theoretical Computer Science, MBT’09 proceedings 253(2), 117–132. Dick, J. and Faivre, A. (1993). Automating the generation and sequencing of test cases from model-based specifications. In Woodcock, J. and Gorm Larsen, P. eds, FME ’93: First International Symposium of Formal Methods Europe, Volume 670 of LNCS, Pages: 268– 284. Springer, Odense, Denmark. European Telecommunications Standards Institute (1999). GSM 11-11 V7.2.0 Technical Specifications. Jaffuel, E. and Legeard, B. (2007). LEIRIOS test generator: Automated test generation from B models. In B’2007, the 7th Int. B Conference—Industrial Tool Session, Volume 4355 of LNCS, Pages: 277–280. Springer, Besancon, France. Jard, C. and J´eron, T. (2004). Tgv: Theory, principles and algorithms, a tool for the automatic synthesis of conformance test cases for non-deterministic reactive systems. Software Tools for Technology Transfer (STTT) 6. J´eron, T. (2009). Symbolic model-based test selection. Electronical Notes Theoritical Computer Science 240, 167–184. Julliand, J., Masson, P.-A., and Tissot, R. (2008a). Generating security tests in addition to functional tests. In AST’08, 3rd Int. workshop on Automation of Software Test, Pages: 41–44. ACM Press, Leipzig, Germany. Julliand, J., Masson, P.-A., and Tissot, R. (2008b). Generating tests from B specifications and test purposes. In ABZ’2008, Int. Conf. on ASM, B and Z, Volume 5238 of LNCS, Pages: 139–152. Springer, London, UK. Ledru, Y., du Bousquet, L., Maury, O., and Bontron, P. (2004). Filtering TOBIAS combinatorial test suites. In Wermelinger, M. and Margaria, T., eds, Fundamental Approaches to Software Engineering, 7th Int. Conf., FASE 2004, Volume 2984 of LNCS, Pages: 281– 294. Springer, Barcelona, Spain. Lee, D. and Yannakakis, M. (1996). Principles and methods of testing finite state machines— a survey. In Proceedings of the IEEE, Pages: 1090–1123. Legeard, B., Peureux, F., and Utting, M. (2002). Automated boundary testing from Z and B. In Proc. of the Int. Conf. on Formal Methods Europe, FME’02, Volume 2391 of LNCS, Pages: 21–40. Springer, Copenhaguen, Denmark. Leuschel, M. and Butler, M. (2003). ProB: A model checker for B. In Araki, K., Gnesi, S., and Mandrioli, D., eds, FME 2003: Formal Methods, Volume 2805 of LNCS, Pages: 855– 874. Springer. Maury, O., Ledru, Y., and du Bousquet, L. (2003). Intgration de TOBIAS et UCASTING pour la gnration de tests. In 16th International Conference Software and Systems and their applications-ICSSEA, Paris. 220 Model-Based Testing for Embedded Systems Offutt, A., Xiong, Y., and Liu, S. (1999). Criteria for generating specification-based tests. In Proceedings of the 5th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS’99), Pages: 119–131. IEEE Computer Society Press, Las Vegas, Nevada. Rapin, N. (2009). Symbolic execution based model checking of open systems with unbounded variables. In TAP ’09: Proceedings of the 3rd International Conference on Tests and Proofs, Pages: 137–152. Springer-Verlag, Berlin, Heidelberg. Ryser, J. and Glinz, M. (1999). A practical approach to validating and testing software systems using scenarios. Satpathy, M., Leuschel, M., and Butler, M. (2005). ProTest: an automatic test environment for B specifications. Electronic Notes in Theroretical Computer Science 111, 113–136. SIC (2004). SICStus Prolog 3.11.2 manual documents. http://www.sics.se/sicstus.html. Tan, L., Sokolsky, O., and Lee, I. (2004). Specification-based testing with linear temporal logic. In IRI’2004, IEEE Int. Conf. on Information Reuse and Integration, Pages: 413–498. Tillmann, N. and Schulte, W. (2005). Parameterized unit tests. SIGSOFT Softw. Eng. Notes 30(5), 253–262. Tretmans, G.J. and Brinksma, H. (2003). Torx: automated model-based testing. In Hartman, A. and Dussa-Ziegler, K., eds, First European Conference on Model-Driven Software Engineering, Pages: 31–43, Nuremberg, Germany. Tretmans, J. (1996). Conformance testing with labelled transition systems: implementation relations and test generation. Computer Networks and ISDN Systems, 29(1), 49–79. Tsai, W. T., Saimi, A., Yu, L., and Paul, R. (2003). Scenario-based object-oriented testing framework. qsic 00, 410. van Aertryck, L., Benveniste, M., and Le Metayer, D. (1997). Casting: a formally based software test generation method. Formal Engineering Methods, International Conference on 0, 101. Wittevrongel, J. and Maurer, F. (2001). Scentor: scenario-based testing of e-business applications. In WETICE ’01: Proceedings of the 10th IEEE International Workshops on Enabling Technologies, Pages: 41–48. IEEE Computer Society, Washington, DC. Part III Integration and Multilevel Testing This page intentionally left blank 9 Model-Based Integration Testing with Communication Sequence Graphs Fevzi Belli, Axel Hollmann, and Sascha Padberg CONTENTS 9.1 Introduction and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 9.2 Communication Sequence Graphs for Modeling and Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 9.2.1 Fault modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 9.2.2 Communication sequence graphs for unit testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 9.2.3 Communication sequence graphs for integration testing . . . . . . . . . . . . . . . . . . . . . . . . . 228 9.2.4 Mutation analysis for CSG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 9.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 9.3.1 System under consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 9.3.2 Modeling the SUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 9.3.3 Test generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 9.3.4 Mutation analysis and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 9.3.5 Lessons learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 9.4 Conclusions, Extension of the Approach, and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 While unit testing is supposed to guarantee the proper function of single units, integration testing (ITest) is intended to validate the communication and cooperation between different components. ITest is important because many events are caused by integration-related faults such as, for example, failures during money transfer, air- and spacecraft crashes, and many more that are not detectable during unit testing. This chapter introduces an approach to model-based integration testing. After a brief review of existing work (1) communication sequence graphs (CSG) are introduced for representing the communication between software components on a meta-level and (2) based on CSG and other introduced notions test coverage criteria are defined. A case study based on a robot-controlling application illustrates and validates the approach. 9.1 Introduction and Related Work Testing is the validation method of choice applied during different stages of software production. In practice, testing is often still carried out at the very end of the software development process. It is encouraging, however, that some companies, for example, in the aircraft industry, follow a systematic approach using phase-wise verification and validation while developing, for example, embedded systems. Disadvantages of a “Big-Bang-Testing” (Myers 1979) that is carried out at the end of development are obvious. Sources of errors interfere with 223 224 Model-Based Testing for Embedded Systems each other resulting in late detection, localization, and correction of faults. This, in turn, becomes very costly and time consuming. Several approaches to ITest have been proposed in the past. Binder (1999) gives different examples of ITest techniques, for example, top-down and bottom-up ITest. Hartmann et al. (Hartmann, Imoberdorf, and Meisinger 2000) use UML statecharts specialized for objectoriented programming (OOP). Delamaro et al. (Delamaro, Maldonado, and Mathur 2001) introduced a communication-oriented ITest approach that mutates the interfaces of software units. An overview of mutation analysis results is given by Offutt (1992). Saglietti et al. (Saglietti, Oster, and Pinte 2007) introduced an interaction-oriented, higher-level approach and several test coverage criteria. In addition, many approaches to ITest of object-oriented software (OOS) have been proposed. Buy et al. (Buy, Orso, and Pezze 2000) defined method sequence trees for representing the call structure of methods. Daniels et al. (Daniels and Tai 1999) introduced different test coverage criteria for method sequences. Martena et al. (Martena, DiMilano, Orso, and Pezz`e 2002) defined interclass testing for OOS. Zhao and Lin (2006) extended the approach of Hartmann, Imoberdorf, and Meisinger (2000) by using the method message paths for ITest, illustrating the communication between objects of classes. A method message path is defined as a sequence of method execution paths linked by messages, indicating the interactions between methods in OOS. Hu, Ding, and Pu (2009) introduced a pathbased approach focused on OOS in which a forward slicing technique is used to identify the call statements of a unit and by connecting the units via interface net and mapping tables. The path-based approach considers units as nodes; the interface nets are input and output ports of the nodes representing the parameters of the unit, and the mapping tables describe the internal mapping from the in-ports to the out-ports of a node. Furthermore, Sen (2007) introduced a concolic testing approach that integrates conditions into graphs for concrete and symbolic unit testing. Hong, Hall, and May (1997) detailed test termination criteria and test adequacy for ITest and unit testing. In this chapter, CSG are introduced to represent source code at different levels of abstraction. Software systems with discrete behavior are considered. In contrast to existing, mostly state-based approaches described above, CSG-based models are stateless, that is, they do not concentrate on internal states of the software components,∗ but rather focus on events. CSGs are directed graphs enriched with some semantics to adopt them for ITest. This enables the direct application of well-known algorithms from graph theory, automata theory, operation research, etc. for test generation and test minimization. Of course, UML diagrams could also be used for ITest, done by Hartmann et al. (2000); in this case, however, some intermediate steps would be necessary to enable the application of formal methods. The approach presented in this chapter is applicable to both OOP and non-OOP programming. The syntax of CSG is based on event sequence graphs (ESGs) (Belli, Budnik, and White 2006). ESGs are used to generate test cases for user-centered black-box testing of humanmachine systems. ITest makes use of the results of unit testing. Therefore, a uniform modeling for both unit testing and ITest is aimed at by using the same modeling techniques for both levels. Section 9.2 explains how CSG are deployed for unit testing (Section 9.2.2) and ITest (Section 9.2.3), after a short introduction to fault modeling on ITest (Section 9.2.1). This section also introduces a straightforward strategy for generating test cases and mutation testing to the CSG (Section 9.2.4). A case study in Section 9.3 exemplifies and validates the approach. ∗Note that “software component” and “unit” are used interchangeably. Model-Based ITest with Communication Sequence Graphs 225 For the case study, a robot-control application is chosen that performs a typical assembly process. Using different coverage criteria, test sets are generated from CSG models of the system under consideration (SUC). Mutation analysis is applied to SUC for evaluating the adequacy of the generated test sets. Section 9.4 gives a summary of the approach and concludes the chapter referring to future research work. 9.2 Communication Sequence Graphs for Modeling and Testing Depending on the applied programming language, a software component represents a set of functions including variables forming data structures. Classes contain methods and variables in the object-oriented paradigm. In the following, it is assumed that unit tests have already been conducted and ITest is to be started. In case that no model exists, the first step of ITest is supposed to model the components ci of the SUC, represented as C = {c1, . . . , cn}. 9.2.1 Fault modeling Figure 9.1 shows the communication between a calling software component, ci ∈ C, and an invoked component, cj ∈ C. Messages to realize this communication are represented as tuples M of parameter values and global variables and can be transmitted correctly (valid ) or faultily (invalid ), leading to the following combinations: • Mci (ci,cj): correct input from ci to cj, (valid case) • Mco (cj, ci): correct output from cj back to ci, (valid case) • Mfi (ci,cj): faulty input from ci to cj, (invalid case) • Mfo (cj, ci): faulty output from cj back to ci (invalid case). Figure 9.1 illustrates the communication process. Two components, ci, cj ∈ C of a software system C communicate with each other by sending a message from ci to cj , that is, the communication is directed from ci to cj. We assume that either Mci (ci,cj) or Mfi (ci,cj) is the initial invocation. As the reaction of this invocation, cj sends its response back to ci. The response is Mco (cj, ci) or Faulty output (ci) ci Mfi(ci, cj) Mci (ci, cj) Mco(cj, ci) Mfo(cj, ci) Legend: cj Faulty output (cj) Message direction Correct message Faulty message FIGURE 9.1 Message-oriented model of integration faults between two software components ci and cj. 226 Model-Based Testing for Embedded Systems Mfo (cj, ci). We assume further that the tuples of faulty message Mfi (ci,cj) and Mfo (cj, ci) cause faulty outputs of ci and cj as follows: • Component cj produces faulty results based on – Faulty parameters transmitted from ci in Mfi (ci,cj), or – Correct parameters transmitted from ci in Mci (ci,cj), but perturbed during trans- mission resulting in a faulty message. • Component ci produces faulty results based on – Faulty parameters transmitted from ci to cj, causing cj to send a faulty output back to ci in Mfo (cj, ci), or – Correct, but perturbed parameters transmitted from ci to cj, causing cj to send a faulty output back to ci in Mco (cj, ci) resulting in a faulty message. The message direction in this example indicates that cj is reactive and ci is pro-active. If cj sends the message first, the message will be transmitted in the opposite direction. This fault model helps to consider potential system integration faults and thus, to generate tests to detect them. Perturbation during transmission arises if either • The message is being corrupted, or • The messages are re-ordered, or • The message is lost. A message is corrupted when its content is corrupted during transmission. When the order of messages is corrupted, the receiving unit uses faulty data. When a message is lost, the receiving unit does not generate an output. The terminology in this chapter is used in such a manner that faulty and invalid, and correct and valid are interchangeable. A faulty message results from a perturbation of the message content. If a faulty message is sent to a correct software unit, this message can result in a correct output of the unit, but the output deviates from the specified output that corresponds to the input. This is also defined as a faulty output. 9.2.2 Communication sequence graphs for unit testing In the following, the term actor is used to generalize notions that are specific to the great variety of programming languages, for example, functions, methods, procedures, basic blocks, and so on. An elementary actor is the smallest, logically complete unit of a software component that can be activated by or activate other actors of the same or other components (Section 9.2.3). A software component, c ∈ C, can be represented by a CSG as follows. Definition 1. A CSG for a software component, c ∈ C is a directed graph CSG = (Φ, E, Ξ, Γ), where • The set of nodes Φ comprises all actors of component c, where a node/actor is defined as an abstract node/actor φa in case it can be refined to elementary actors φa1,2,3,...,n. • The set of edges E describes all pairs of valid concluding invocations (calls) within the component, an edge (φ, φ )∈ E denotes that actor φ is invoked after the invocation of actor φ (φ → φ ). • Ξ ⊆ Φ and Γ ⊆ Φ represent initial/final invocations (nodes). Model-Based ITest with Communication Sequence Graphs 227 Figure 9.2 shows a CSG for Definition 1 including an abstract actor φa2 that is refined using a CSG. This helps to simplify large CSGs. In this case, φa2 is an abstract actor encapsulating the actor sequence φa2.1, φa2.2, and φa2.3. To identify the initial and final invocations of a CSG, all φ ∈ Ξ are preceded by a pseudo vertex “[”∈/ Φ (entry) and all φ ∈ Γ are followed by another pseudo vertex “]”∈/ Φ (exit). In OOS, these nodes typically represent invocations of a constructor and destructor of a class. CSG is a derivate of ESG (Belli, Budnik, and White 2006) differing in the following features. • In an ESG, a node represents an event that can be a user input or a system response, both of which lead interactively to a succession of user inputs and expected system outputs. • In a CSG, a node represents an actor invoking another actor of the same or another software component. • In an ESG, an edge represents a sequence of immediately neighboring events. • In a CSG, an edge represents an invocation (call) of the successor node by the preceding node. Readers familiar with ITest modeling will recognize similarity of CSG with call graphs (Grove et al. 1997). However, they differ in many aspects as summarized below. • CSGs have explicit boundaries (entry [begin] and exit [end] in form of initial/final nodes) that enable the representation of not only the activation structure but also the functional structure of the components, such as the initialization and the destruction of a software unit (for example, the call of the constructor and destructor method in OOP). • CSGs are directed graphs for systematic ITest that enable the application of rich notions and algorithms of graph theory. The latter are useful not only for generation of test cases based on criteria for graph coverage but also for optimization of test sets. • CSGs can easily be extended not only to represent the control flow but also to precisely consider the data flow, for example, by using Boolean algebra to represent constraints (see Section 9.4). [ CSGs of software component c Φ1 Φ4 Φa2 Φ3 ] Φa2 [ Φa2.1 Φa2.2 Φa2.3 ] Φa2 FIGURE 9.2 A CSG including a refinement of the abstract actor φa2. 228 Model-Based Testing for Embedded Systems Definition 2. Let Φ and E be the finite set of nodes and arcs of CSG. Any sequence of nodes (φ1, . . . , φk) is called a communication sequence (CS) if (φi, φi+1)∈ E, for i = 1, . . . , k − 1. The function l (length) determines the number of nodes of a CS. In particular, if l (CS) = 1, then it is a CS of length 1, which denotes a single node of CSG. Let α and ω be the functions to determine the initial and final invocation of a CS. For example, given a sequence CS = (φ1, . . . , φk), the initial and final invocation are α (CS) = φ1 and ω (CS) = φk, respectively. A CS = (φ, φ ) of length 2 is called a communication pair (CP). Definition 3. A CS is a complete communication sequence (CCS) if α (CS ) is an initial invocation and ω (CS ) is a final invocation. Now, based on Definitions 2 and 3, the i-sequence coverage criterion can be introduced that requires generation of CCSs that sequentially invoke all CSs of length i ∈ N. At first glance, i-sequence coverage, also called sequence coverage criterion, is similar to All-n-Transitions coverage (Binder 1999). However, i-sequence coverage focuses on CSs. CSG does not have state transitions, but it visualizes CSs of different length (2, 3, 4, . . . , n) that are to be covered by tests cases. Section 9.4 will further discuss this aspect. The i-sequence coverage criterion is fulfilled by covering all sequences of nodes and arcs of a CSG of length i. It can also be used as a test termination criterion (Hong, Hall, and May 1997). All CSs of a given length i of a CSG are to be covered by means of CCSs that represent test cases. Thus, test case generation is a derivation of Chinese Postman Problem, understood as finding the shortest path or circuit in a graph by visiting each arc. Polynomial algorithms supporting this test generation process have been published in previous works (Aho et al. 1991, Belli, Budnik, and White 2006). The coverage criteria introduced in this chapter are named in accordance with the length of the CS to be covered. The coverage criterion of CSs of length 1 is called 1-sequence coverage criterion or actor coverage criterion, where every actor is visited at least once. The coverage criterion of CSs of length 2 is called 2-sequence coverage criterion or communication pair criterion, etc. Finally, coverage criteria of CSs of length len are called len-sequence coverage criterion or communication len-tuple criterion. Algorithm 1 sketches the test case generation process for unit testing. Algorithm 1 Test Case Generation Algorithm for Unit Testing Input: CSG len := maximum length of communication sequences (CS) to be covered Output: Test report of succeeded and failed test cases FOR i := 1 TO len Do Cover all CS of CSG by means of CCS Apply test cases to SUC and observe system outputs 9.2.3 Communication sequence graphs for integration testing For ITest, the communication between software components has to be tested thoroughly. This approach is based on the communication between pairs of components including the study of the control flow. Definition 4. Communication between actors of two different software components, CSGi = (Φi, Ei, Ξi, Γi) and CSGj = (Φj, Ej, Ξj, Γj) is defined as an invocation relation IR(CSGi, CSGj) = {(φ, φ ) |φ ∈ Φi and φ ∈ Φj, where φ activates φ }. Model-Based ITest with Communication Sequence Graphs 229 A φ ∈ Φ may invoke an additional φ ∈ Φ of another component. Without losing generality, the notion is restricted to communication between two units. If φ causes an invocation of a third unit, this can also be represented by a second invocation considering the third one. Definition 5. Given a set of CSG1, . . . , CSGn describing n components of a system C a set of invocation relations IR1, . . . , IRm, the composed CSGC is defined as CSGC = ({Φ1 ∪ · · · ∪ Φn}, {E1 ∪ · · · ∪ En ∪ IR1 ∪ · · · ∪ IRm}, {Ξ1 ∪ · · · ∪ Ξn}, {Γ1 ∪ · · · ∪ Γn}). An example of a composed CSG built of a CSG1 = ({φ1, φ2, φ3, φ4}, {(φ1, φ4), (φ1, φ2), (φ2, φ1), (φ2, φ3), (φ3, φ1), (φ3, φ3), (φ4, φ2) , (φ4, φ4)}, {φ1}, {φ3}) and CSG2 = ({φ1, φ2, φ3}, {(φ1, φ2), (φ2, φ1), (φ2, φ2), (φ2, φ3), (φ3, φ1)}, {φ2}, {φ3}) for two software components c1 and c2 is given by Figure 9.3. Invocation of φ1 by φ2 is denoted by a dashed line, that is IR(CSG1, CSG2) = {(φ2, φ1)}. Based on the i-sequence coverage criterion, Algorithm 2 represents a test case generation procedure. For each software component ci ∈ C, a CSGi and invocation relations IR serve as input. As a first step, the composed CSGC is to be constructed. The nodes of CSGC consist of the nodes of CSG1, . . . , CSGn. The edges of CSGC are given by the edges of CSG1, . . . , CSGn and the invocation relations IRs among these graphs. The coverage criteria applied for ITest are called in the same fashion as those for unit testing. Algorithm 2 Test Case Generation Algorithm for Integration Testing Input: CSG1, . . . , CSGn IR1, . . . , IRm len := maximum length of communication sequences (CS) to be covered Output: Test report of succeeded and failed test cases CSGC =({Φ1 ∪ · · · ∪ Φn}, {E1 ∪ · · · ∪ En∪IR1 ∪ · · · ∪IRm}, {Ξ1 ∪ · · · ∪ Ξn }, { Γ1 ∪ · · · ∪ Γn }) Use Algorithm 1 for test case generation. [ φ1 CSG1 φ4 φ2 φ3 ] Innovation of φ4′ from φ2 φ1′ φ2′ CSG2 φ3′ ] FIGURE 9.3 Composed CSGC consisting of CSG1 and CSG2 and an invocation between them. 230 Model-Based Testing for Embedded Systems 9.2.4 Mutation analysis for CSG The previous sections, 9.2.2 and 9.2.3, defined CSG and introduced algorithms for test case generation with regards to unit testing and ITest. In the following, mutation analysis is used to assess the adequacy of the test cases with respect to their fault detection effectiveness. Mutation analysis was introduced by DeMillo et al. in 1978 (DeMillo, Lipton, and Sayward 1978). A set of mutation operators syntactically manipulates the original software and thus, seeds semantic faults leading to a set of mutants that represent faulty versions of the software given. A test set is said to be mutation adequate with respect to the program and mutation operators if at least one test case of the test set detects the seeded faults for each mutant. In this case, the mutant is said to be distinguished or killed. Otherwise, the mutant remains live and the test set is marked as mutation inadequate. The set of live mutants may also contain equivalent mutants that must be excluded from analysis. Equivalent mutants differ from the original program in their syntax, but they have the same semantics. A major problem of mutation testing is that in general equivalent mutants cannot be detected automatically. Thus, the mutation score MS for a given program P and a given test set T is: MS(P, T ) = Number of all Number of killed mutants mutants − Number of equivalent . mutants The ideal situation results in the score 1, that is, all mutants are killed. Applying a mutation operator only once to a program yields a first-order mutant. Mul- tiple applications of mutation operators to generate a mutant are known as higher-order mutants. An important assumption in mutation analysis is the coupling effect, that is, assuming that test cases that are capable of distinguishing first-order mutants will also most likely kill higher-order mutants. Therefore, it is common to consider only first-order mutants. A second assumption is the competent programmer hypothesis, which assumes that the SUC is close to being correct (Offutt 1992). The procedure of integration and mutation testing a system or program P based on CSG is illustrated in Figure 9.4. Algorithms 1 and 2 generate the test sets (see Figure 9.4, arc (1)) to be executed on the system or program P (see arc (2)). If ITest does not reveal any faults, this could mean that • SUC is fault-free or, more likely, • The generated test sets are not adequate to detect the remaining faults in SUC. CSG (1) Test generation Test set T (2) Test execution (3) Mutant generation CSG* (5) Test execution (4) Applying the changes to P Program P Mutant P* FIGURE 9.4 Software ITest and mutation analysis with CSG. Model-Based ITest with Communication Sequence Graphs 231 TABLE 9.1 List of Mutation Operators for CSGs Name Description AddNod DelNod AddEdg DelEdg AddInvoc DelInvoc Inserts a new node into the CSG Deletes a node from the CSG Inserts a new edge into the CSG (also applicable for self-loops) Deletes an edge from the CSG by deactivating the destination of the edge (also applicable for self-loops) Inserts an invocation between actor φ of software component c to φ of com- ponent c Deletes an invocation between actor φ of component c to φ of component c Therefore, in order to check the adequacy of the test sets, a set of mutation operators modifies the CSG for generating first-order mutants (see Figure 9.4, arc (3)). Based on CSG, six basic operators are defined that realize insertion and/or deletion of nodes or edges of the CSG (compare to Belli, Budnik, and Wong 2006). The operators are listed in Table 9.1. After applying the mutation operators to P (see Figure 9.4, arc (4)) and so producing the mutants P ∗, the generated test sets are executed on P ∗ (see arc (5)). If some mutants are not killed, the test set is not adequate. In this case, the length of the CSs has to be increased. If all mutants are now killed, the test set is adequate for ITest. The operator AddNod in Table 9.1 adds a new node to the CSG, generating a new CS from one node via the new node to another node of the same software unit, that is, a new call is inserted in the source code of a component between two calls. DelNod deletes a node from the CSG and connects the former ingoing edges to all successor nodes of the deleted node, that is, an invocation is removed from the source code of a software unit. The mutation operator AddEdg inserts a new edge from one node to another node of the same software component that had no connection before applying the operator. Alternatively, it inserts a self-loop at one node that had no self-loop before applying the operator to the CSG. In other words, after a call it is possible to execute another invocation of the same component that was not a successor before the mutation. If a self-loop is added, a call is repeated using different message data. Similarly, DelEdg deletes an edge from one node to another node of the same software component that had a connection before applying the operator or deletes a self-loop at one node that had a self-loop before applying the operator on the CSG. In this manner, the order of the calls is changed. It is not possible to execute a call that was a successor of another invocation of the same unit before the mutation. In case of removing a self-loop, a call cannot be repeated. While AddInvoc inserts an invocation between actors of different software units, DelInvoc deletes it. In other words, a call is inserted in or removed from another component. 9.3 Case Study To validate and demonstrate the approach, a case study was performed using a robot (that is, RV–M1 manufactured by Mitsubishi Electronics). Figure 9.5 shows the robot in its working area within a control cabinet. 232 Model-Based Testing for Embedded Systems FIGURE 9.5 Robot System RV-M1 (refer to Belli, Hollmann, and Padberg 2009). Depot/stack 1&2 Robot 1 2 1 Robot-arm Item-matrix FIGURE 9.6 Working area/buffers of the robot (Belli, Hollmann, and Padberg 2009). 9.3.1 System under consideration The SUC is a part of the software system implemented in C++ that controls the robot RV-M1. The robot consists of two mechanical subsystems, an arm and a hand. The arm of RV-M1 can move items within the working area. These items can also be stored in two buffers as sketched in Figure 9.6. The robot can grab items contained in the item matrix and transport them to a depot. For this purpose, its arm is moved to the appropriate position, the hand is closed, moved to the stacking position, and the hand releases the item. 9.3.2 Modeling the SUC The mechanical subsystems of the robot are controlled by 14 software units listed in Table 9.2. An example of the CSG of the software unit, RC constructor/init, including the corresponding implementation, is given in a brief format in Figures 9.7, 9.8, and 9.9. Model-Based ITest with Communication Sequence Graphs 233 TABLE 9.2 List of Software Units Name StackConstruct SC control RobotControl RC constructor/init RC moveMatrixMatrix RC moveMatrixDepot RC moveDepotMatrix RC moveDepotDepot SerialInterface MoveHistory MH Add MH Undo MH UndoAll RoboPos RoboPosPair Description The main application constructing a stack of items in the depot taken from the item matrix. The main control block determines the destination of the items. The RobotControl class controls the other software units from SerialInterface to RoboPosPair. The RobotControl constructor method starts the units SerialInterface to RoboPosPair. The RobotControl method moveMatrixMatrix moves an item from one matrix position to another free position. The RobotControl method moveMatrixDepot moves an item from a matrix position to a free depot position. The RobotControl method moveDepotMatrix moves an item from a depot position to a free matrix position. The RobotControl method moveMatrixMatrix moves an item from one depot position to another free position. The SerialInterface class controls the interface of a PC to robot controlling units. The MoveHistory class saves all executed movements of the robot. The MoveHistory method Add adds a movement to the history. The MoveHistory method Undo removes a movement from the history und reversing the movement. If the history is empty, it corresponds to MH UndoAll. The RoboPos class provides the source and destination positions of the robot. The RoboPosPair class combines two stack positions of one depot destination position. The dashed lines between these graphs represent communication (method invocations) between the components. The RobotControl unit is initialized by its constructor method call RobotControl::RobotControl, which activates the SerialInterface, MoveHistory, RoboPos, and the RoboPosPair software units of the robot system. Figures 9.8 and 9.9 provide commentaries explaining the CSG structure of 9.7. Figure 9.10 shows the StackConstruct application of the robot system. The corresponding source code of the application is given in Figure 9.11. The StackConstruct application builds a stack on Depot 1/2 by moving matrix items. The robot system is initialized by rc->init() and the matrix items, 00,01,02,10,11,12,20,21 are moved to the depot positions D1LH (depot1 left hand) floor, D1LH second stack position, D1RH (depot1 right hand) floor position, D1RH second stack position, D2LH floor position, D2LH second stack position, D2RH floor position, or D2RH second stack position. Finally, all items are put back to their initial positions by rc->stop and the robot system is shut down. 9.3.3 Test generation Five test sets were generated by using Algorithm 2. Test sets Ti, consisting of CCSs, were generated to cover all sequences of length i ∈ {1,2,3,4,5}. Test set T1 achieves the actor coverage criterion and the test cases of T1 are constructed to cover all actors of the software 234 RobotControl(...) 2[ printDebugInfos() Init() Wait(...) Move(...) undoLastMove(...) Stop() ]2 undoAllMoves(...) MoveHistory MoveHistory:: 6[ MoveHistory(...) RoboControl::RobotControl(...)[ ] RobotControl::RobotControl(...) MoveHistory(...) RoboPos(...) setAboveFloor(...) SerialInterface() write(...) speed RoboPosPair(...) MoveHistory::add(...) ]6 MoveHistory::undoAll() MoveHistory::undo() RoboPos 3[ RoboPos::RoboPos(...) ]3 RoboPos::getId() Model-Based Testing for Embedded Systems 4[ SerialInterface SerialInterface::write(...) position SerialInterface::write(...)move SerialInterface::write(...)speed SerialInterface::write(...)init SerialInterface::read(...) SerialInterface:: openserialDevice(...) ]4 RoboPosPair RoboPosPair::RoboPosPair() RoboPosPair:: setAboveFloor(...) 5[ RoboPosPair:: getLiftedPosId() RoboPosPair:: getAboveFloor() ]5 RoboPosPair:: setBottomPos(...) RoboPosPair:: getBottomPosId() RoboPosPair:: isItemPresent() RoboPosPair:: setLiftedPos(...) RoboPosPair:: setItemPresent(...) RoboPosPair::RoboPospair(...) FIGURE 9.7 CSG for initializing the robot system (dashed lines represent calls between components). Model-Based ITest with Communication Sequence Graphs RoboControl :: RoboControl(int id, int speed) { if(speed < 0 || speed > 9) { } /* The invocation new SerialInterface() prepares the component SerialInterface, calling * its constructor method SerialInterface : : SerialInterface and new MoveHistory(this) * the unit MoveHistory saving all movements of the robot system */ si = new SerialInterface(); mh = new MoveHistory(this); this -> id = id; this -> speed = speed; /* si -> write(SP,this speed) calls the actor SerialInterface::write(int cmd, int id) * setting the speed of the robot */ si -> write(SP,this speed); // send desired speed to robot /* For defining the start and the destination of intermediate positions of the system new RoboPos(...) * invokes RoboPos::RoboPos(int id, float x, float y, float z, float slope, float roll) * which transfers the position information calling * void SerialInterface::write(int cmd, int id, float x, float y, float z, float slope, float roll). * As there are several positions the actors new RoboPos(...), * RoboPos::RoboPos(int id, float x, float y, float z, float slope, float roll) * and void SerialInterface::write(int cmd, int id, float x, float y, float z, float slope, float roll) * have self-loops */ idlePos = intermediatePosMatrix = intermediatePosDepot1 = intermediatePosDepot2 = new RoboPos(1, -2.7, +287.4, +333.3, -90.2, -3.1); new RoboPos(2, +14.4, +314.7, +102.7, -91.6, +.4); new RoboPos(3, +359.5, +23.1, +106.3, -90.5, -3.8); new RoboPos(4, +359.5, -73.1, +106.3, -90.5, +10.1); FIGURE 9.8 Extract of source code of robot component RoboControl::RoboControl including its invocations (part 1). 235 236 /* All placing positions for the items are set by calling new RoboPosPair(...), which comprises nested invocations. * It calls the constructor method calls RoboPosPair:: RoboPosPair and * RoboPosPair:: RoboPosPair(RoboPos* bottom, RoboPos* lifted, bool itemPresent) of the unit RoboPosPair. * This component defines a stack of items which can be constructed at the placing positions. */ matrixPositons[0][0] = new RoboPosPair( new RoboPos(10, . . . ), new RoboPos(20, . . . ), true); matrixPositons[1][0] = new RoboPosPair( new RoboPos(11, . . . ), new RoboPos(21, . . . ), true); matrixPositons[2][0] = new RoboPosPair( new RoboPos(12, . . . ), new RoboPos(22, . . . ), true); matrixPositons[0][1] = new RoboPosPair( new RoboPos(13, . . . ), new RoboPos(23, . . . ), true); matrixPositons[1][1] = new RoboPosPair( new RoboPos(14, . . . ), new RoboPos(24, . . . ), true); matrixPositons[2][1] = new RoboPosPair( new RoboPos(15, . . . ), new RoboPos(25, . . . ), true); matrixPositons[0][2] = new RoboPosPair( new RoboPos(16, . . . ), new RoboPos(26, . . . ), true); matrixPositons[1][2] = new RoboPosPair( new RoboPos(17, . . . ), new RoboPos(27, . . . ), true); matrixPositons[2][2] = new RoboPosPair( new RoboPos(18, . . . ), new RoboPos(28, . . . ), true); /* Additionally new RoboPos(...) is called twice for setting two stack-postions on each depot-position, * again new RoboPos(...) invokes * void SerialInterface::write(int cmd, int id, float x, float y, float z, float slope, float roll) */ depotPositons[0][0] = new RoboPosPair( new RoboPos(30, . . . ), new RoboPos(32, . . . ), false); depotPositons[0][1] = new RoboPosPair( new RoboPos(31, . . . ), new RoboPos(33, . . . ), false); depotPositons[1][0] = new RoboPosPair( new RoboPos(34, . . . ), new RoboPos(36, . . . ), false); depotPositons[1][1] = new RoboPosPair( new RoboPos(35, . . . ), new RoboPos(37, . . . ), false); RoboPosPair* depot1Floor1RH = new RoboPosPair( new RoboPos(40, . . . ), new RoboPos(42, . . . ), false); RoboPosPair* depot1Floor1RH = new RoboPosPair( new RoboPos(41, . . . ), new RoboPos(43, . . . ), false); RoboPosPair* depot2Floor1RH = new RoboPosPair( new RoboPos(44, . . . ), new RoboPos(46, . . . ), false); RoboPosPair* depot2Floor1RH = new RoboPosPair( new RoboPos(45, . . . ), new RoboPos(47, . . . ), false); /* The initialization is finished by defining the four depot positions as well as the actucal * stack-position by calling depotPositions[0][0] -> setAboveFloor (depot1Floor1RH), where this * actor invokes void RoboPosPair::setAboveFloor (RoboPosPair* aboveFloor) * several times indicated by the self-loops on both actors. */ depotPositons[0][0] -> setAboveFloor(depot1Floor1RH); depotPositons[0][1] -> setAboveFloor(depot1Floor1LH); depotPositons[1][0] -> setAboveFloor(depot2Floor1RH); depotPositons[1][1] -> setAboveFloor(depot2Floor1LH); } FIGURE 9.9 Extract of source code of robot component RoboControl::RoboControl including its invocations (part 2). Model-Based Testing for Embedded Systems Model-Based ITest with Communication Sequence Graphs StackConstruct 1[ RoboControl() printDebugInfos() init() move(...) undoAllMoves(...) move(...) stop() ]1 RobotControl 2[ RobotControl:: printDebugInfos() RobotControl:: RobotControl() RobotControl::wait(...) RobotControl:: init() RobotControl:: move(...) ]2 RobotControl::stop() RobotControl:: undoLastMove(...) RobotControl:: undoAllMoves(...) MoveHistory 6[ MoveHistory:: MoveHistory(...) MoveHistory::add(...) ]6 MoveHistory::undoAll() MoveHistory::undo() 4[ SerialInterface SerialInterface::write(...) position SerialInterface::write(...) move SerialInterface::write(...) speed SerialInterface::write(...) init SerialInterface::read(...) SerialInterface:: openSerialDevice(...) ]4 FIGURE 9.10 CSG of StackConstruct application of robot system (dashed lines represent calls between components). 237 238 Model-Based Testing for Embedded Systems int main( int argc, char *argv[]) { RoboControl* rc = new RoboControl(1,4); rc->printDebugInfos(); rc->init(); rc->move(M00,D1LH); rc->move(M01,D1LH); rc->move(M02,D1RH); rc->move(M10,D1LH); rc->move(M11,D2LH); rc->move(M12,D2LH); rc->move(M20,D2RH); rc->move(M21,D2RH); rc->move(M22,M22); rc->undoAllMoves(); rc->stop(); printf("shutting down...\n"); return 0; } FIGURE 9.11 Source code of StackConstruct application. components including connected invocation relations. Test set T2 attains the coverage of the communication pair criterion. Test cases of T2 are generated to cover sequences of length 2. This means that every CP of all units and IRs are covered. Test set T3 fulfills the communication triple criterion. Test cases of T3 are constructed to cover each communication triple, that is, sequences of length 3 of the robot system. The test sets T4 and T5 achieve the communication quadruple and quintuple criterion. The test cases are constructed to cover the robot system sequences of length 4 or 5. The mutation adequacy of these test sets is evaluated by a mutation analysis in the following section. 9.3.4 Mutation analysis and results Each of the six basic mutation operators of Section 9.2.4 was used to construct one mutant of each unit of the SUC (14 software units and 6 mutants each, plus an additional two for adding or deleting self-loops). These 112 mutants were then applied to test sets T1, T2, T3, T4, and T5. Test generation is terminated if a higher coverage does not result in an increased mutation score. After the execution of the test cases of the test set T5, all faults injected were revealed. Figure 9.12 summarizes the complete analysis by calculating the mutation score. As a result of the case study, the mutation scores for the test sets T1, T2, T3 , T4, and T5 improved with respect to their length of CS. Test set T1 only detects the faults injected in the software unit StackConstruct (unit 1[) and its invocations, so this criterion is only applicable for systems having a simple invocation structure. While the length of the CSs increases, the CSs kill more mutants. T2 detects all mutants of T1 and faults injected in the RobotControl unit (unit 2[), including its invocations invoked by unit Model-Based ITest with Communication Sequence Graphs 239 Applied test sets based on coverage criteria 6 Mutation operators applied => 8 mutants generated of 14 units each T1 Actor coverage criterion T2 Communication pair criterion T3 Communication triple criterion T4 Communication quadruple criterion T5 Communication quintuple criterion 112 mutants 112 mutants 112 mutants 112 mutants 112 mutants Mutation Score (MS) 0.0680 0.2233 0.6699 0.9223 1.0000 FIGURE 9.12 Results of mutation analysis. StackConstruct (unit 1[). This continues through T5 that then kills all mutants of the robot system. Figure 9.13 shows the CS for detecting one mutant that is detectable by a test case of T5. The test case revealed a perturbed invocation in the MH Add software unit inserted by the mutation operator AddNod. The CS has the length of five to reach the mutation via the actors RobotControl::undoAllMoves(...), MoveHistory::undoAll(), MoveHistory::Add(...), t moveCmd(), and insert(...). Only the test case of T5 detected this mutant because every call of the sequence provided a message to the next call. They were not given in T4 to T1. 9.3.5 Lessons learned Modeling the SUC with CSG and analyzing the generated test cases using mutation analysis revealed some results that are summarized below. Lesson 1. Use different abstraction levels for modeling SUC As methods in classes contain several invocations, the overview of the system becomes unavailable when all invocations are drawn in one CSG of the system. The solution is to focus on the invocations of one software unit to all other units and to build several CSGs. Using abstract actors and their refinement in abstraction helps to keep a manageable view on the system. Lesson 2. Use mutation analysis to determine the maximum length of the communication sequences for generating the test cases Section 9.3.4 showed that all mutants were killed using the test cases of the set T5. Consequently, this SUC needs at least the entire T5 set to test the system thoroughly. In case when no faults can be found in the SUC by traditional testing, mutation analysis can also be used to find the maximum length of the CSs. The maximum length will be achieved if the mutation score reaches 100% by executing the test cases of the last generated test set. 240 StackConstruct 1[ RoboControl() printDebugInfos() init() move(...) undoAllMoves(...) move(...) stop() ]1 MoveHistory 1 4 φa62[ t_moveCmd() 5 insert(...) 6[ MoveHistory:: MoveHistory(...) MoveHistory::add(...) insert(...) ]φa62 RobotControl 2[ RobotControl:: printDebugInfos() RobotControl:: RobotControl() RobotControl::wait(...) ]2 RobotControl:: stop(...) RobotControl:: init() RobotControl:: move(...) RobotControl:: undoLastMove(...) RobotControl:: undoAllMoves(...) 6] 3 MoveHistory::undoAll(...) MoveHistory::undo(...) 2 Model-Based Testing for Embedded Systems FIGURE 9.13 Communication sequence for detecting mutant insert (. . . ) in component MoveHistory. Model-Based ITest with Communication Sequence Graphs 241 j=0 i ≥ 5 && j! = 0 [ Φ0 Φ1 ] i < 5 && j! = 0 Φ2 FIGURE 9.14 CSG augmented by Boolean expression. 9.4 Conclusions, Extension of the Approach, and Future Work This chapter introduced CSG and CSG-related notions, which are used in this approach to ITest. Mutation analysis is used to evaluate the adequacy of generated test sets. The CSG of a robot system was modeled and the corresponding implementation was exemplified. The generated test sets for testing the SUC were applied to all mutants of the system according to Figure 9.4. The results are the following, (1) ITest can be performed in a communicationoriented manner by analyzing the communication between the different units, (2) CSG can be used to model the SUC for generating test cases, (3) different abstraction levels of the CSG help to keep the testing process manageable, and (4) mutation analysis helps to determine the maximum length of the CSs. Ongoing research work includes augmenting CSG by labeling the arcs with Boolean expressions. This enables the consideration of guards, that are, conditions that must be fulfilled to invoke φ after φ. This extension requires appropriate expansion of the selected test case generation algorithms (Algorithms 1 and 2). An example of a fragment of a CSG augmented by Boolean expressions is given in Figure 9.14. Compared to concolic unit testing (Sen 2007), this approach is easier to apply to integration testing. Similar to the All-transitions criterion specified on state-based models (Binder 1999), an All-Invocations-criterion for generating test cases could be introduced to the CSG that covers all the invocations directly as a testing goal. The test cases generated by Algorithm 2, however, already include these invocations. Therefore, a special All-Invocations-criterion is not needed. At present, mutation operators reflect insertion and/or deletion of entities of the CSG. Apart from combining these basic operations in order to form operations of higher order, Boolean expressions should be included in the CSG concept. This also enables the consideration of further mutation operators. References Aho, A.V., Dahbura, A., Lee, D., and Uyar, M. (1991). An optimization technique for protocol conformance test generation based on UIO sequences and rural chinese postman tours. IEEE Transactions on Communications, Volume 39, Number 11, Pages: 1604–1615. Belli, F., Budnik, C.J., and White, L. (2006). Event-based modelling, analysis and testing of user interactions: approach and case study. Software Testing, Verification & Reliability, Pages: 3–32. 242 Model-Based Testing for Embedded Systems Belli, F., Budnik, C.J., and Wong, W.E. (2006). Basic operations for generating behavioral mutants. MUTATION ’06: Proceedings of the Second Workshop on Mutation Analysis, Page: 9. IEEE Computer Society, Los Alamitos, CA. Belli, F., Hollmann, A., and Padberg, S. (2009). Communication sequence graphs for mutation-oriented integration testing, Proceedings of the Workshop on Model-Based Verification & Validation, Pages: 373–378. IEEE Computer Press, Washington, DC. Binder, R.V. (1999). Testing object-oriented systems: models, patterns, and tools. AddisonWesley Longman Publishing Co., Inc., Boston, MA. Buy, U., Orso, A., and Pezze, M. (2000). Automated testing of classes. ACM SIGSOFT Software Engineering Notes, Volume 25, Number 5, Pages: 39–48. ACM, New York, NY. Daniels, F.J. and Tai, K.C. (1999). Measuring the effectiveness of method test sequences derived from sequencing constraints. International Conference on Technology of ObjectOriented Languages, Page: 74. IEEE Computer Society, Los Alamitos, CA. Delamaro, M.E., Maldonado, J.C., and Mathur, A.P. (2001). Interface mutation: an approach for integration testing. Transactions on Software Engineering, IEEE, Volume 27, Number 3, Pages: 228–247. IEEE Press, Piscataway, NJ. DeMillo, R.A., Lipton, R.J., and Sayward, F.G. (1978). Hints on test data selection: help for the practicing programmer. IEEE Computer, Volume 11, Number 4, Pages: 34–41. Grove, D., DeFouw, G., Dean, J., and Chambers, C. (1997). Call graph construction in object-oriented languages. Proceedings of the 12th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, Volume 32, Number 10, Pages: 108–124. ACM, New York, NY. Hartmann, J., Imoberdorf, C., and Meisinger, M. (2000). UML-based integration testing. ISSTA ’00: Proceedings of the 2000 ACM SIGSOFT International Symposium on Software Testing and Analysis, Pages: 60–70. ACM, New York, NY. Hong, Z., Hall, P.A.V., and May, J.H.R. (1997) Software unit test coverage and adequacy. ACM Computing Surveys, Volume 29, Number 4, Pages: 366–427. Hu, J., Ding, Z., and Pu, G. (2009). Path-based Approach to Integration Testing. Proceedings of the Third IEEE International Conference on Secure Software Integration and Reliability Improvement, Pages: 431–432. IEEE Computer Press, Washington, DC. Martena, V., DiMilano, P., Orso, A., and Pezz`e, M. (2002). Interclass testing of object oriented software. Proceedings of the IEEE International Conference on Engineering of Complex Computer System, Pages: 135–144. Georgia Institute of Technology, Washington, DC. Myers, G.J. (1979). Art of Software Testing. John Wiley & Sons, Inc., New York, NY. Offutt, A.J. (1992). Investigations of the software testing coupling effect. ACM Transactions on Software Engineering and Methodology, Pages: 5–20. ACM, New York, NY. Saglietti, F., Oster, N., and Pinte, F. (2007). Interface coverage criteria supporting modelbased integration testing. Workshop Proceedings of 20th International Conference on Model-Based ITest with Communication Sequence Graphs 243 Architecture of Computing Systems (ARCS 2007), Pages: 85–93. Berlin/Offenbach: VDE Verlag, University of Erlangen-Nuremberg, Erlangen, Germany. Sen, K. (2007). Concolic testing. Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE ’07), Pages: 571–572. ACM, New York, NY. Zhao, R. and Lin, L. (2006). An UML statechart diagram-based MM-path generation approach for object-oriented integration testing. International Journal of Applied Mathematics and Computer Sciences, Pages: 22–27. This page intentionally left blank 10 A Model-Based View onto Testing: Criteria for the Derivation of Entry Tests for Integration Testing Manfred Broy and Alexander Pretschner CONTENTS 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 10.2 Background: Systems, Specifications, and Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 10.2.1 Interfaces and behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 10.2.2 State machines and interface abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 10.2.3 Describing systems by state machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 10.2.4 From state machines to interface behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 10.2.5 Architectures and composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 10.2.6 Glass box views onto interpreted architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 10.2.7 Black box views onto architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 10.2.8 Renaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 10.2.9 Composing state machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.3 Model-Based Development: Specification and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.4 Testing Systems: Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 10.4.1 System tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 10.4.2 Requirements-based tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 10.5 Model-Based Integration Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 10.5.1 Integration tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 10.5.2 The crucial role of models for testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 10.5.3 Using the architecture to derive entry-level component tests . . . . . . . . . . . . . . . . . . . . 262 10.5.4 A resulting testing methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 10.5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 10.6 Summary and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 10.1 Introduction In many application domains, organization, cost, and risk considerations continue to lead to increasingly distributed system and software development processes. In these contexts, suppliers provide components, or entire subsystems, that are assembled by system integrators. One prominent, and tangible, example for such a development paradigm is the automotive domain where multiple embedded systems are integrated into a car (see Pretschner et al. 2007, Reiter 2010). For reasons of economy, suppliers aim at selling their subsystems to as many car manufacturers (usually and somewhat counterintuitively called original equipment manufacturers, or OEMs, in this context) as possible. This requires that their components work correctly in a multitude of different environments, which motivates thorough testing (and specification) of the components of a car under development. Each OEM, on the other 245 246 Model-Based Testing for Embedded Systems hand, wants to make sure that the external components work as expected in its particular cars. To reduce the cost of integration, the OEM subjects the external component to a separate set of component tests before integrating the component with the rest of the car, and subsequently performing integration tests. This process comes by the name of entry testing for integration testing, and the rational management of this process is the subject of this chapter. We tackle the following main problem. Assume an OEM orders some external component, to be integrated with the rest of its system, say a “residual” car that lacks this component (or several variants of such a “residual” car). Can we find criteria for test derivation that allows the OEM to reduce the overall cost of testing by pushing effort from the integration test for the residual car that is composed with the component to entry tests for the component only? In other words, is it possible to find circumstances and test criteria for the component that generalize to test criteria for the combination of the residual car and the component? In response to this question, we present three contributions: • First, we provide a formalized conceptual model that captures several testing concepts in the context of reactive systems, including the fundamental notions of module, integration, and system tests. In particular, we investigate the nature of test drivers and stubs for integrated embedded systems. As far as we know, no such set of precise definitions existed before. • Second, we relate these concepts to the activities of the systems development process, thus yielding a comprehensive view of a development process for distributed embedded systems. This comprehensive view relies on the formal framework supporting both architecture and component specifications. • Third, we show the usefulness of the formalized conceptual model by providing criteria for shifting effort from integration testing to component entry tests. We also investigate the benefits for the suppliers that have an interest in defining tests such that their components work correctly in all anticipated environments. Our contributions provide further arguments for the attractiveness of model-based development processes. Moreover, the results generalize to other application domains. For instance, we see an analogous fundamental structure in service-oriented architectures, or the cloud: for a provider P (the integrator, or OEM) to provide a service S (the car), P relies on a set of different services S1, . . . , Sn (components provided by the suppliers H1, . . . , Hm). Obviously, P wants to make sure that the supplied services perform as expected while only caring about the own service S (and its variants). The suppliers Hi, on the other hand, want to sell their services Sj to as many other parties as possible. They must hence find principles as to how to optimize the selection of their component tests. This chapter consists of a conceptual and a methodological part and is structured as follows. We introduce the fundamental concepts of systems, interfaces, behaviors, composition, and architectures in Section 10.2. These are necessary to precisely define a model-based development process and the fundamental notions of architecture and component faults in Section 10.3. Since the focus of this paper is on testing, we use the framework of Sections 10.2 and 10.3 to introduce a few essential testing concepts in Section 10.4. In Section 10.5, we continue the discussion on the model-based development process of Section 10.3 by focusing on the integration testing phase and by explaining how to select component tests on the grounds of the architecture. Because these tests are essentially derived from a simulation of the subsystem to be tested, the tests are likely to reflect behaviors that usually are verified A Model-Based View onto Testing 247 at integration time and are hence likely to identify faults that would otherwise surface only at integration testing time. We put our work in context and conclude in Section 10.6. 10.2 Background: Systems, Specifications, and Architectures In this section, we briefly introduce the syntactic and semantic notion of a system, its interface, and that of a component. This theoretical framework is in line with earlier work (Broy and Stølen 2001). While this chapter is self-contained, knowledge of this reference work may help with the intuition behind some of the formalizations. The fundamental concepts of system interfaces and system behaviors are introduced in Section 10.2.1. In Section 10.2.2, we show how to describe system behaviors by means of state machines. Section 10.2.3 introduces the notion of architectures that essentially prescribe how to compose subsystems. The formal machinery is necessary for the definition of a model-based development process in Section 10.2.3.4 and, in particular, for the precise definition of architecture and component faults. 10.2.1 Interfaces and behaviors We start by shortly recalling the most important foundations on which we will base our model-based development process for multifunctional systems. We are dealing with models of discrete systems. A discrete system is a technical or organizational unit with a clear boundary. A discrete system interacts with its environment across this boundary by exchanging messages that represent discrete events. We assume that messages are exchanged via channels. Each instance of sending or receiving a message is a discrete event. We closely follow the Focus approach described in Broy and Stølen (2001). Communication between components takes place via input and output channels over which streams of messages are exchanged. The messages in the streams received over the input channels represent the input events. The messages in the streams sent over the output channels represent the output events. Systems have syntactic interfaces that are described by their sets of input and output channels. Channels are used for communication by transmitting messages and to connect systems. Channels have a type that indicates which messages are communicated over the channels. Hence, the syntactic interfaces describe the set of actions for a system that are possible at its interface. Each action consists in the sending or receiving of an instance of a message on a particular channel. It is helpful to work with messages of different types. A type is a name for a data set, a channel is a name for a communication line, and a stream is a finite or an infinite sequence of data messages. Let TYPE be the set of all types. With each type T ∈ TYPE, we associate the set CAR(T) of its data elements. CAR(T) is called the carrier set for the type T. A set of typed channels is a set of channels where a type is given for each of its channels. Definition 1 (Syntactic Interface). Let I be a set of typed input channels and O be the set of typed output channels. The pair (I O) denotes the syntactic interface of this system. For each channel c ∈ I with type T1 and each message m ∈ CAR(T1), the pair (m, c) is called an input message for the syntactic interface (I O). For each channel c ∈ O with type T2 and each message m ∈ CAR(T2), the pair (m, c) is called an output message for the syntactic interface (I O). 248 Model-Based Testing for Embedded Systems x1 : S1 xn : Sn ... ... y1 : T1 F ym : Tm FIGURE 10.1 Graphical representation of a system F as a data flow node with its syntactic interface. The xi are input channels of type Si, and the yj are output channels of type Tj. Channels xi and yi need not be ordered. Figure 10.1 shows the system F with its syntactic interface in a graphical representation by a data flow node. In Focus, a system encapsulates a state and is connected to its environment exclusively by its input and output channels. Streams of messages (see below) of the specified type are transmitted over channels. A discrete system has a semantic interface represented by its interactive behavior. The behavior is modeled by a function mapping the streams of messages given on its input channels to streams of messages provided on its output channels. We call this the black box behavior or the interface behavior of discrete systems. Definition 2 ([Nontimed] Streams). Let IN denote the natural numbers. Given a set M, by M*, we denote the set of finite sequences of elements from M. By M∞, we denote the set of infinite sequences of elements of M that can be represented by functions IN\{0} → M. By Mω, we denote the set M* ∪ M∞, called the set of finite and infinite (nontimed) streams. In the following, we work with streams that include discrete timing information. Such streams represent histories of communications of data messages transmitted within a time frame. To keep the time model simple, we choose a model of discrete time where time is structured into an infinite sequence of finite time intervals of equal length. Definition 3 (Timed Streams). Given a message set M of data elements, we represent a timed stream s by a function s : IN\{0} → M∗, where M* is the set of finite sequences over the set M (which is the carrier set of the type of the stream). By (M*)∞, we denote the set of timed streams. Intuitively, a timed stream maps abstract time intervals to finite sequences of messages. For a timed stream s ∈ (M*)∞ and an abstract time interval t ∈ IN\{0}, the sequence s(t) of messages denotes the sequence of messages communicated within time interval t as part of the stream s. We will later work with one simple basic operator on streams: x↓t denotes the prefix of length t ∈ IN of the stream x (which is a sequence of length t carrying finite sequences as its elements; x↓0 is the empty sequence). A (timed) channel history for a set of typed channels C (which is a set of typed identifiers) assigns to each channel c ∈ C a timed stream of messages communicated over that channel. Definition 4 (Channel History). Let C be a set of typed channels. A (total) channel history is a mapping (let IM be the universe of all messages) x : C → (IN\{0} → IM∗) A Model-Based View onto Testing 249 such that x(c) is a stream of type Type(c) for each channel c ∈ C. We denote the set of all −→ channel histories for the channel set C by C A finite (also called partial) channel history is a mapping x : C → ({1, . . . , t} → IM∗) for some number t ∈ IN. For each history z ∈ −→C and each time t ∈ IN, z↓t yields a finite history for each of the channels in C represented by a mapping of the type C → ({1, . . . , t} → IM*). For a given syntactic interface (I O), the behavior of a system is defined by a relation that relates the input histories in −→I with the output histories in −→O . This way, we get a (nondeterministic) functional model of a system behavior. For reasons of compositionality, we require behavior functions to be causal. Causality assures a consistent time flow between input and output histories in the following sense: in a causal function, input messages received at time t do only influence output at times ≥t (in the case of strong causality at times ≥ t + 1, which indicates that there is a delay of at least one time interval before input has effect on output). A detailed discussion is contained in earlier work (Broy and Stølen 2001). Definition 5 ( I/O-Behavior). Let ℘(X) denote the powerset of set X. A strongly causal function F: −→ I → −→ ℘( O ) is called I/O-behavior. By IF[I O], we denote the set of all (total and partial) I/O-behaviors with syntactic interface (I O), and by IF, the set of all I/O- behaviors. Definition 6 (Refinement, Correctness). The black box behavior, also called interface behavior of a system with syntactic interface (I O) is given by an I/O-behavior F from IF[I O]. Every behavior F in IF[I O] with F (x) ⊆ F(x) for all x ∈ −→ I is called a refinement of F. A system implementation is correct w.r.t. the specified behavior F if its interface behavior is a refinement of F. 10.2.2 State machines and interface abstractions A system is any syntactic artifact; the semantics of which are defined as or can be mapped to an interface behavior as described above. Examples for systems include those specified by state machines or FOCUS formulae. Systems interact with their environment via their interfaces. Each system can be used as a component of a larger system, and each component is a system by itself. Components and systems can be composed to form larger systems. The composition of systems consists of connecting output channels of one component to one or more input channels of another component. In case of feedback to the same component, causality problems may arise that can be solved by adding delay, or latch, components (Broy and Stølen 2001). While components can of course be broken down hierarchically in a top-down development approach, it is sensible to speak of atomic components when a bottom-up development approach is favored: atomic components are those that are not the result of composing two or more existing components. It is sometimes more convenient to specify atomic components as state machines rather than by relations on streams. However, by virtue of interface abstractions, the former can directly be transformed into the latter. 250 Model-Based Testing for Embedded Systems 10.2.3 Describing systems by state machines In this section, we introduce the concept of a state machine with input and output that relates well to the introduced concept of interface. It will be used as model representing implementations of systems. Definition 7 (State Machine with Input and Output). Given a state space Σ, a state machine (∆, Λ) with input and output according to the syntactic interface (I O) with messages over some set M consists of a set Λ ⊆ Σ of initial states as well as of a state transition function ∆:(Σ × (I → M∗)) → ℘(Σ × (O → M∗)) By SM[I O], we denote the set of all state machines. For each state σ ∈ Σ and each valuation a: I → M* of the input channels in I by sequences, we obtain a set of state transitions. Every pair (σ , b) ∈ ∆(σ, a) represents a successor state σ and a valuation b: O → M* of the output channels. The channel valuation b consists of the sequences produced by the state transition as output. (∆, Λ) is a state machine with possibly infinite state space. As shown in Broy (2007a) and Broy (2007b), every such state machine describes an I/Obehavior for each state of its state space. Conversely, every I/O-behavior can be modeled by a state machine with input and output. Partial machines describe services that are partial I/O-behaviors. As shown in Broy (2007a) and (Broy 2007b), there is a duality between state transition machines with input and output and I/O-behaviors. Every state machine specifies an I/O-behavior and every I/O-behavior represents and can be represented by a state machine. Therefore, from a theoretical point of view, there is no difference between state machines and I/O-behaviors. I/O-behaviors specify the set of state machines with identical interface behaviors. 10.2.4 From state machines to interface behaviors Given a state machine, we may perform an interface abstraction. It is given by the step from the state machine to its interface behavior. Definition 8 (Black Box Behavior and Specifying Assertion). Given a state machine A = (∆, Λ), we define a behavior FA as follows (let Σ be the state space for A) −→ FA(x) = {y ∈ O } : ∃ σ : IN → Σ : σ(0) ∈ Λ ∧ ∀ t ∈ IN :(σ(t + 1), y.(t + 1)) ∈ ∆(σ(t), x.(t + 1))}. Here for t ∈ IN\{0}, we write x.t for the mapping in I → M* with (x.t)(c) = (x(c))(t) for c ∈ I. FA is called the black box behavior for A and the logical expression that is equivalent to the proposition y ∈ FA(x) is called the specifying assertion. FA is causal by construction. If A is a Moore machine (i.e., the output depends on the state only), then FA is strongly causal. State machines can be described by state transition diagrams or by state transition tables. 10.2.5 Architectures and composition In this section, we describe how to form architectures from subsystems, called the components of the architecture. Architectures are concepts to build systems. Architectures contain A Model-Based View onto Testing 251 precise descriptions of how the composition of their subsystems takes place. In other words, architectures are described by the sets of systems forming their components together with mappings from output to input channels that describe internal communication. In the following, we assume that each system used in architecture as a component, which has a unique identifier k. Let K be the set of names for the components of an architecture. Definition 9 (Set of Composable Interfaces). A set of component names K with a finite set of interfaces (Ik Ok) for each k ∈ K is called composable, if 1. the sets of input channels Ik, k ∈ K, are pairwise disjoint, 2. the sets of output channels Ok, k ∈ K, are pairwise disjoint, the channels in {c ∈ Ik: k ∈ K} ∩ {c ∈ Ok: k ∈ K} have the same channel types in {c ∈ Ik: k ∈ K} and {c ∈ Ok: k ∈ K}. If channel names are not consistent for a set of systems to be used as components, we simply may rename the channels to make them consistent. Definition 10 (Syntactic Architecture). A syntactic architecture A = (K, ξ) with interface (IA OA) is given by a set K of component names with composable syntactic interfaces ξ(k) = (Ik Ok) for k ∈ K. 1. IA = {c ∈ Ik: k ∈ K}\{c ∈ Ok: k ∈ K} denotes the set of input channels of the architecture, 2. DA = {c ∈ Ok: k ∈ K} denotes the set of generated channels of the architecture, 3. OA = DA\ {c ∈ Ik: k ∈ K} denotes the set of output channels of the architecture, 4. DA\OAdenotes the set of internal channels of the architecture, 5. CA = {c ∈ Ik: k ∈ K} ∪ {c ∈ Ok: k ∈ K} the set of all channels. By (IA DA), we denote the syntactic internal interface and by (IA OA), we denote the syntactic external interface of the architecture. A syntactic architecture forms a directed graph with its components as its nodes and its channels as directed arcs. The input channels in IA are ingoing arcs and the output channels in OA are outgoing arcs. Definition 11 (Interpreted Architecture). An interpreted architecture (K, ψ) for a syntactic architecture (K, ξ) associates an interface behavior ψ(k) ∈ IF[Ik Ok] with every component k ∈ K, where ξ(k) = (Ik Ok). In the following sections, we define an interface behavior for interpreted architectures by composing the behaviors of the components. 10.2.6 Glass box views onto interpreted architectures We first define composition of composable systems. It is the basis for giving semantic meaning to architectures. Definition 12 (Composition of Systems—Glass Box View). For an interpreted architecture A with syntactic internal interface (IA DA), we define the glass box interface behavior [×]A∈ IF[IA DA] by the equation (let ψ(k) = Fk): ([×]A)(x) = {y ∈ −→ DA : ∃ z ∈ −→ CA : x = z|IA ∧ y = z|DA ∧ ∀k ∈ K: z|Ok ∈ Fk(z|Ik)}, 252 Model-Based Testing for Embedded Systems where | denotes the usual restriction operator. Internal channels are not hidden by this composition, but the streams on them are part of the output. The formula defines the result of the composition of the k behaviors Fk by defining the output y of the architecture [×] A with the channel valuation z of all channels. The valuation z carries the input provided by x expressed by x = z|IA and fulfills all the input/output relations for the components expressed by z|Ok ∈ Fk(z|Ik). The output of the composite system is given by y which the restriction z|DA of z to the set DA of output channels of the architecture [×] A. For two composable systems Fk ∈ IF[Ik Ok], k = 1, 2, we write F1 × F2 for [×] {Fk: k = 1, 2}. Composition of composable systems is commutative F1 × F2 = F2 × F1 and associative (F1 × F2) × F3 = F1 × (F2 × F3). The proof of this equation is straightforward. We also write therefore with K = {1, 2, 3, . . .} [×]{Fk ∈ IF[Ik Ok] : k ∈ K} = F1 × F2 × F3 × · · · . From the glass box view, we can derive the black box view as demonstrated in the following chapter. 10.2.7 Black box views onto architectures The black box view of the interface behavior of an architecture is an abstraction of the glass box view. Definition 13 (Composition of Systems—Black Box View). Given an interpreted architecture with syntactic external interface (IA OA) and glass box interface behavior [×] A ∈ IF[IA DA], we define the black box interface behavior FA ∈ IF[IA OA] by FA(x) = (F(x))|OA Internal channels are hidden by this composition and in contrast to the glass box view not part of the output. For an interpreted architecture with syntactic external interface (IA OA), we obtain the black box interface behavior FA ∈ IF[IA OA] specified by −→ −→ FA(x) = {y ∈ O A : ∃ z ∈ C A : x = z|IA ∧ y = z|OA ∧ ∀ k ∈ K : z|Ok ∈ Fk(z|Ik)} and write FA = ⊗{Fk ∈ IF[Ik Ok] : k ∈ K}. For two composable systems Fk ∈ IF[Ik Ok], k = 1, 2, we write F1 ⊗ F2 for ⊗{F1,F2} Composition of composable systems is commutative F1 ⊗ F2 = F2 ⊗ F1 A Model-Based View onto Testing 253 I1\C2 F1 C1 F2 O2\C2 O1\C1 C2 I2\C1 FIGURE 10.2 Composition F1 ⊗ F2. and associative (F1 ⊗ F2) ⊗ F3 = F1 ⊗ (F2 ⊗ F3). The proof of this equation is straightforward. We also write therefore with K = {1, 2, 3, ...} ⊗{Fk ∈ IF[Ik Ok] : k ∈ K} = F1 ⊗ F2 ⊗ F3 ⊗ · · · . The idea of the composition of systems as defined above is shown in Figure 10.2 with C1 = I2 ∩ O1 and C2 = I1 ∩ O2. For properties of the algebra, we refer the reader to Broy and Stølen (2001) and Broy (2006). In a composed system, the internal channels are used for internal communication. Given a syntactic architecture A = (K, ξ) and specifying assertions Sk for the systems k ∈ K, the specifying assertion for the glass box behavior is given by ∀ k ∈ K: Sk, and for the black box behavior by ∃ c1, . . . , cj: ∀ k ∈ K: Sk, where {c1, . . . , cj} denotes the set of internal channels. The set of systems together with the introduced composition operators form an algebra. The composition of systems (strongly causal stream processing functions) yields systems and the composition of services yields services. Composition is a partial function on the set of all systems. It is only defined if the syntactic interfaces fit together. Syntactic interfaces fit together if there are no contradictions in the channel names and types. Since it ignores internal communication, the black box view is an abstraction of the glass box view of composition. 10.2.8 Renaming So far, we defined the composition using the names of components to connect them only for sets of components that are composable in the sense that their channel names and types fit together. Often, the names of the components may not fit. Then, renaming may help. Definition 14 (Renaming Components’ Channels). Given a component F ∈ IF [I O], a renaming is a pair of mappings α: I → I and β: O → O , where the types of the channels coincide in the sense that c and α(c) as well as e and β(e) have the same types for all c ∈ I and all e ∈ O. By a renaming ρ = (α, β) of F, we obtain a component ρ[F] ∈ IF [I O ] such that for x ∈ −→I ρ[F](x) = β(F(α(x))), where for x ∈ −→I the history α(x) ∈ −→I prime is defined by α(x)(c) = x(α(c)) for c ∈ I. 254 Model-Based Testing for Embedded Systems Note that by a renaming, a channel in I or O may be used in several copies in I or O . Given an interpreted architecture A = (K, ψ) with a set of components ψ(k) = Fk ∈ [Ik Ok] for k ∈ K} and a set of renamings R = {ρk: k ∈ K}, where ρk is a renaming of Fk for all k ∈ K, we call (A, R, ψ) an interpreted architecture with renaming if the set {ρk[Fk]: k ∈ K} is well defined and composable. The renamings R define the connections that make A an architecture. 10.2.9 Composing state machines A syntactic architecture forms a directed graph with its components as its nodes and its channels as directed arcs. The input channels in IA are ingoing arcs and the output channels in OA are outgoing arcs. Definition 15 (Architecture Implemented by State Machines). An implemented architecture (K, ζ) of a syntactic architecture (K, ξ) associates a state machine ζ(k) = (∆k, Λk) ∈ SM[Ik Ok] with every k ∈ K, where ξ(k) = (Ik Ok). In the following sections, we define an interface behavior for interpreted architectures by composing the behaviors of the components. Next, we define the composition of a family of state machines Rk = (∆k, Λk) ∈ SM[Ik Ok] for the syntactic architecture A = (K, ξ) with interface (IA OA) with ξ(k) = (Ik Ok). It is the basis for giving semantic meaning to implementations of architectures. Definition 16 (Composition of State Machines—Glass Box View). For an implemented architecture R = (K, ζ) for a syntactic architecture A = (K, ξ), we define the composition (∆R, ΛR) ∈ SM[IA DA] by the equation (let ζ(k) = (∆k, Λk) with state space Σk): The state ΣR is defined by the direct product (let for simplicity K = {1, 2, 3, . . . }) ΣR = Σ1 × Σ2 × Σ3 × · · · , the initial state is defined by ΛR = Λ1 × Λ2 × Λ3 × · · · , and the state transition function ∆ is defined by ∆R(σ, a) = {(σ , b) : ∃ z : C → M∗ : b = z|DA ∧ a = z|IA ∧ ∀ k ∈ K: (σ k, z|Ok) ∈ ∆k(σk, z|Ik)}. Internal channels are not hidden by this composition, but their messages on them are part of the output. Based on the implementation, we can talk about tests in the following section. 10.3 Model-Based Development: Specification and Implementation In the previous sections, we have introduced a comprehensive set of modeling concepts for systems. We can now put them together in an integrated system description approach. A Model-Based View onto Testing 255 When building a system, in the ideal case, we carry out the following steps that we will be able to cast in our formal framework: 1. System specification 2. Architecture design a. Decomposition of the system into a syntactic architecture b. Component specification (enhancing the syntactic to an interpreted architecture) c. Architecture verification 3. Implementation of the components a. (Ideally) code generation b. Component (module) test and verification 4. Integration a. System integration b. Component entry test c. Integration test and verification 5. System test and verification A system specification is given by a syntactic interface (I O) and a specifying assertion S (i.e., a set of properties), which specifies a system interface behavior F ∈ IF[I O]. An architecture specification is given by a composable set of syntactic interfaces (Ik Ok) for component identifiers k ∈ K and a component specification Sk for each k ∈ K. Each specification Sk specifies a behavior Fk ∈ IF[Ik Ok]. In this manner, we obtain an interpreted architecture. The architecture specification is correct w.r.t. the system specification F if the composi- tion of all components results in the architecture is correct if for a behavior that refines all input histories x ∈ −→the I, system specification F. Formally, ⊗{Fk : k ∈ K}(x) ⊆ F(x). Given an implementation with interface abstraction Rk Fk for each component is correct if for all x identifier ∈ −→ Ik we k ∈ K, have: the implementation Rk Fk(x) ⊆ Fk(x) (note that it does not matter if Fk was generated or implemented manually). Then, we can integrate the implemented components into an implemented architecture F = ⊗{Fk : k ∈ K}. The following basic theorem of modularity is easily proved by the construction of composition (for details see Broy and Stølen 2001). Theorem 1. Modularity. If the architecture is correct (i.e., if ⊗{Fk: k ∈ K}(x) ⊆ F(x)) and if the components are correct (i.e., Fk(x) ⊆ Fk(x) for all k), then the implemented system is correct: F (x) ⊆ F(x) for all x ∈ −→ I. A system (and also a subsystem) is hence called correct if the interface abstraction of its implementation is a refinement of its interface specification. 256 Model-Based Testing for Embedded Systems Before we consider the missing steps (4) and (5) of the development process in more detail in Sections 10.4 and 10.5, it is worthwhile to stress that we clearly distinguish between 1. the architectural design of a system, and 2. the implementation of the components of an architectural design. An architectural design consists in the identification of components, their specification, and the way they interact and form the architecture. If the architectural design and the specification of the constituting components are sufficiently precise, then we are able to determine the result of the composition of the components of the architecture, according to their specification, even without providing an implementation of all components! If the specifications address behavior of the components and the design is modular, then the behavior of the architecture can be derived from the behavior of the components and the way they are connected. In other words, in this case, the architecture has a specification and a—derived—specified behavior. This specified behavior can be put in relation with the requirements specification for the system, and, as we will discuss later, also with component implementations. The above process includes two steps of verification, component verification and architecture verification. These possibly reveal component faults (of a component/subsystem w.r.t. its specification) and architecture faults (of an architecture w.r.t. the system specification). If both verification steps are performed sufficiently carefully and the theory is modular, which holds here (see Broy and Stolen 2001), then correctness of the system follows from both verification steps. The crucial point here is that architecture verification w.r.t. the system specification is enabled without the need for actual implementations of the components. In other words, it becomes possible before the implemented system exists. The precise implementation of the verification of the architecture depends of course on how its components are specified. If the specification consists of state machines, then the architecture can be simulated, and simulation results compared to the system specification. In contrast, if the component specifications are given by descriptive specifications in predicate logic, then deductive verification becomes possible. Furthermore, if we have a hierarchical system, then the scheme of specification, design, and implementation can be iterated for each subhierarchy. An idealized top-down development process then proceeds as follows. We obtain a requirement specification for the system and from this, we derive an architectural design and specification. This results in specifications for components that we can take as requirements specifications for the subsequent step in which the components are designed and implemented. Given a specified architecture, test cases can be derived for integration test. Given component specifications, we implement the components with the specifications in mind and then verify them with respect to their specifications. This of course entails some methodological problems if the code for the components has been generated from the specification in which case only the code generator and/or environment assumptions can be checked, as described in earlier work (Pretschner and Philipps 2005). Now, if we have an implemented system for a specification, we can have either errors in the architecture design—in which case the architecture verification would fail—or we can have errors in the component implementation. An obvious question is that of the root cause of an architecture. Examples of architecture errors include 1. Connecting an output port to an incorrect input port and to forget about such a connection. 2. To have a mismatch in provided and expected sampling frequency of signals. A Model-Based View onto Testing 257 3. To have a mismatch in the encoding. 4. To have a mismatch in expected and provided units (e.g., km/h instead of m/s). One fundamental difference between architecture errors and component errors of course is liability: in the first case, the integrator is responsible, while in the second case, responsibility is with the supplier.∗ Assume a specified architecture to be given. Then, a component fault is a mismatch between the component specification, which is provided as part of the architecture, and the component implementation. An architecture fault is a mismatch between the behavior as defined by the architecture and the overall system specification. In an integrated system, we are hence able to distinguish between component faults and architecture faults. With the outlined approach, we gain a number of interesting options to make the entire development process more precise and controllable. First of all, we can provide the architecture specification by a model, called the architecture model, where we provide a possibly nondeterministic state machine for each of the components. In this case, we can even simulate and test the architecture before actually implementing it. A more advanced and ambitious idea would be to provide formal specifications for each of the components. This would allow us to verify the architecture by logical techniques since the component specifications can be kept very abstract at the level of what we call a logical architecture. Such a verification could be less involved than it would be, if it were performed at a concrete implementation level. Moreover, by providing state machines for each of the components, we may simulate the architecture. Thus, we can on the one hand test the architecture by integration tests in an early stage, and we can moreover generate integration tests from the architecture model to be used for the integration of the implemented system, as discussed below. The same is possible for each of the components with given state machine descriptions from which we can generate tests. We can, in fact, logically verify the components. Given state machines for the components, we can automatically generate hundreds of test cases as has been shown in Pretschner et al. (2005). For slightly different development scenarios, this leads to a fully automatic test case generation procedure for the component implementations. 10.4 Testing Systems: Preliminaries We are now ready to formally define central testing notions and concepts. In Section 10.4.1, we define tests and related concepts as such. In Section 10.4.2, we show how to formally relate requirements to test cases. 10.4.1 System tests A system test describes an instance of a finite system behavior. A system test case is given by a pair of finite histories. Such a pair is also called a scenario. Definition 17 (System Test Case). Given a syntactic interface (I O), a system test case till time t ∈ IN is a pair .., yn ∈ −→ O. The finite history (x↓t, {y1↓t, y2↓t, x↓t is called the . . . , yn↓t}) for histories stimulus and set {y1↓t, x ∈ −→ I y2↓t, . and y1, y2, . . , yn↓t} is called the anticipation that is used as oracle for the test. ∗Both architecture and component errors can be a result of an invalid specification and an incorrect implementation. This distinction touches the difference between validation and verification. We may safely ignore the case of invalid specifications (i.e., validation) in this chapter. 258 Model-Based Testing for Embedded Systems The anticipation specifies the set of correct outputs. Definition 18 (Test Suite). A test suite is a set of test cases. Before we turn our attention to the definition of nondeterministic tests, we define what it means for a system to pass a test. Definition 19 (Passing and Failing Tests). A positive test is a test where we expect the system behavior to match the anticipation. A negative test is a test where we expect the system behavior not to match the anticipation. Given a system with behavior F ∈ IF [I O] and a system test (a, B) till time t ∈ IN, we say that the system behavior F passes a (positive) test if there exist histories x ∈ −→I and y ∈ −→O with y ∈ F(x) and a = x↓t and y↓t ∈ B. Then, we write pt(F, (a, B)). Otherwise, histories x ∈w−→eI saayndthaaltl yF∈fa−→Oils the test. with y ∈ The F(x) system F passes and a = x↓t, we the get test universally if y↓t ∈ B. Then we for all write ptu(F, (a, B)) We say that the system passes a negative test (a, B) if there exist x ∈ −→ I and y ∈ −→ O with y ∈ F(x) and a = x↓t such that y↓t ∈/ B. It passes a negative test (a, B) universally if for all y ∈ −→ I and y ∈ −→ O with y ∈ F(x) and a = x↓t, it holds that y↓t ∈/ B. In general, we test, of course, not interface behaviors but implementations. An implementation of a system with syntactic interface (I O) is given by a state machine A = (∆, Λ) ∈ SM[I O]. The state machine passes a test if its interface abstraction FA passes the test. The decision to define anticipations as sets rather than as singletons is grounded in two observations, one relating to abstraction and one relating to nondeterminism. In terms of abstraction, it is not always feasible or desirable to specify the expected outcome in full detail (Utting, Pretschner, and Legeard 2006)—otherwise, the oracle of the system would become a full-fledged fully detailed model of the system under test. In most cases, this is unrealistic because of cost considerations. Hence, rather than precisely specifying one specific value, test engineers specify sets of values. This is witnessed by most assertion statements in xUnit frameworks, for instance, where the assertions usually consider only a subset of the state variables, and then usually specify sets of possible values for these variables (e.g., greater or smaller than a specific value). Hence, one reason for anticipations being set is cost effectiveness: to see if a system operates correctly, it is sufficient to see if the expected outcome is in a given range. The second reason is related to nondeterministic systems. Most distributed systems are nondeterministic—events happen in different orders and at slightly varying moments in time; asynchronous bus systems nondeterministically mix up the order of signals—and the same holds for continuous systems—trajectories exhibit jitter in the time and value domains. Testing nondeterministic systems of course is notoriously difficult. Even if a system passes a test case, there may exist runs that produce output that is not specified in the anticipation. Vice versa, if we run the system with input a = x↓t and it produces some y ∈ F(x) with y↓t ∈/ B, we cannot conclude that the system does not pass the test (but we know that it does not pass it universally). Hence, from a practical perspective, the guarantees that are provided by a test suite are rather weak in the nondeterministic case (but this is the nature of the beast, not of our conceptualization). However, from a practical perspective, in order to cater to jitter in the time and value domains as well as to valid permutations of events, it is usually safe to assume that the actual testing infrastructure takes care of this (Prenninger and Pretschner 2004): at the model level, A Model-Based View onto Testing 259 test cases assume deterministic systems, whereas at the implementation level, systems can be nondeterministic as far as jitter and specific event permutations are concerned. A deterministic system that passes a test always passes the test universally. Moreover, if a system passes a test suite (a set of tests) universally, this does not mean that the system is deterministic—it is only deterministic as far as the stimuli in the test suite are concerned. 10.4.2 Requirements-based tests Often it is recommended to produce test cases when documenting requirements. This calls for a consideration of the coverage of requirements. A functional requirement specifies the expected outcome for some input streams in the domain of a system behavior. Hence a functional requirement for a system (a set of which can form the system specification) with a given syntactic interface is a predicate R −→ :( I → −→ ℘( O )) → {true, false}. A test (a, B) is called positively relevant for a requirement if every system behavior F that does not pass the test universally does not fulfill the requirement R. Or expressed positively, if F fulfills requirement R, then it passes the test universally. This is formally expressed by R(F) ⇒ ptu(F, (a, B)). A test (a, B) is called negatively relevant for a requirement if every system behavior F that does pass the test universally does not fulfill the requirement R. Or expressed positively, if F fulfills requirement R, then it does not pass the test universally. This is formally expressed by R(F) ⇒ ¬ptu(F, (a, B)). Two comments are in order here. First, note that in this context, F denotes the set of possible executions of an implemented system. This is different from the specification of a system. In the context of a nondeterministic system, F is a “virtual” artifact as it can be obtained concretely. It is nevertheless necessary to define relevant concepts in the context of testing. Second, the intuition behind these definitions becomes apparent when considering their contraposition, as stated in the definitions. Positive relevance means that if a test does not pass (universally), then the requirement is not satisfied. This seems like a very natural requirement on “useful” test cases, and it will usually come with a positive test case. Negative relevance, in contrast, means that if a test passes universally, then the requirement is not satisfied which, as a logical consequence, is applicable in situations where negative tests are considered. At least from a theoretical perspective, it is perfectly possible to consider dual versions of relevance that we will call significance. A test (a, B) is called positively significant for a requirement if every system behavior F that does not fulfill the requirement R does not pass the test universally. Or expressed positively, if F passes the test universally, then it fulfills requirement R. This is formally expressed by ptu(F, (a, B)) ⇒ R(F). Again, by contraposition, significance stipulates that if a requirement is not satisfied, then the test does not pass. The fact that this essentially means that correctness of a system w.r.t. a stated requirement can be proved by testing demonstrates the limited practical applicability of the notion of significance, except maybe for specifications that come in the form of (existentially interpreted [Kru¨ger 2000]) sequence diagrams. 260 Model-Based Testing for Embedded Systems For symmetry, a test (a, B) is called negatively significant for a requirement if every system behavior F that does not pass the test universally fulfills the requirement R. This is formally expressed by ¬ptu(F, (a, B)) ⇒ R(F). Of course, in practice, a significant test is only achievable for very simple requirements. Among other things, testing can be driven by fault models rather than by requirements. Fault-based tests can be designed and run whenever there is knowledge of a class of systems. Typical examples include limit-value testing or stuck-at-1 tests. The idea is to identify those situations that are typically incorrectly developed. These “situations” can be of a syntactic nature (limit tests), can be related to a specific functionality (“we always get this wrong”), etc. In our conceptual model, fault-based tests correspond to tests for requirements where the requirement stipulates that the system is brought into a specific “situation.” 10.5 Model-Based Integration Testing We can now continue our discussion of the model-based development process sketched in Section 10.2.3.4. In Section 10.5.1, we use our formal framework to describe integration tests. In Section 10.5.2, we highlight the beneficial role of executable specifications that, in addition to being specifications, can be used for test case generation and also as stubs. In Section 10.5.3, we argue that these models, when used as environment model for a component to be tested, can be used to guide the derivation of tests that reflect the integration scenario. In Section 10.5.4, we propose a resulting testing methodology that we discuss in Section 10.5.5. 10.5.1 Integration tests The steps from syntactic architecture A = (K, ξ) with interface (IA OA) and an implemented architecture R = (K, ζ) to a system (∆R, ΛR) are called integration. The result of integration is an interpreted architecture B = (K, ψ) with ψ(k) = Fζ(k). Integration can be performed in a single step (called big-bang) by composing all components at once. It can also be performed incrementally by choosing an initial subset of components to be tested and then add some more components to be tested, etc., until the desired system is obtained. In practice, incremental integration requires the implementation of stubs and drivers. Traditionally, drivers are components that provide input to the system to be tested (mainly used in bottom-up integration). Stubs are components that provide interfaces and serve as dummies for the functionality of those components that are required for the system under test calls but that are not implemented yet. In our context, the distinction between stubs and drivers is immaterial. At the level of abstraction that we consider here, we do not have a notion of functions that are called. In addition, the purpose of a stub is to provide input (i.e., a return value) to the calling function. In other words, all that is necessary is a component that provides input (perhaps depending on some output received) to: (1) the top-level interface of a system, (2) all those channels that have not yet been connected to components (because these components are not part of the current incomplete architecture). Hence, the only part that matters is the input part—which we can easily encode in a test case! Assuming that developers have access A Model-Based View onto Testing 261 to all channels in the system, we simply move the internal channels for which input must be provided to the external interface of the system. An integration test consists of two parts: 1. A strategy that determines in which order sets of components are integrated with the current partial system under test (a single-stage big-bang, top-down, bottom-up, . . . ). 2. A set of finite histories for the (external) input channels and all those internal (output) channels of the glass box view of the architecture that are directly connected to the current partial system under test. Definition 20 (Integration Strategy). Let S be a system built by an architecture with the set of components that constitute the final glass box architecture with the set of component identifiers K. Any set {K1, . . . , Kj} with K1 ⊂ K2 · · · ⊂Kj = K is called an incremental integration strategy. Given a syntactic architecture A = (K, ξ) with interface (IA OA) and an implemented architecture R = (K, ζ), an integration strategy determines a family of syntactic architectures (Ki, ξ|Ki) with implemented architectures Ri = (Ki, ζ|Ki). We arrive at a family of interpreted architectures Bi = (Ki, ψ|Ki) with ψ(k) = Fζ(k) and interface behaviors Fi. An interesting question is what the relations between the Fi are. In general, these are not refinement relations. Definition 21 (Integration Test Set). Let all definitions be as above. We define behaviors Sj = [×] {Fk ∈ IF[Ik Ok]: k ∈ Ki } with external interface (Ij Oj) be a glass box architecture that is to be tested in iteration i, where 1 ≤ i ≤ j. A set of system test cases for Si is an integration test set for stage i of the integration. Definition 22 (Integration Test). For a glass box architecture consisting of a set of components, K, and a set of internal and external channels I and O, an integration test is a mapping {1, . . . , j}→ ℘(K)×℘(IF[I O]) that stipulates which components are to be tested at stage i of the integration, and by which tests. Note that the notions of failing and passing tests carry over to integration tests unchanged. 10.5.2 The crucial role of models for testing Usually, when we compose subsystems into architectures, the resulting system shows a quite different functionality compared to its subsystems. In particular, properties that hold for a particular subsystem do not hold any longer for the composed system (at least not for the external channels). The inverse direction also is true. This is because in order to go to the black box behavior on the one hand, a projection onto the output channels visible for the system is provided. On the other hand, some of the input of a component is no longer provided by the environment, but instead is now produced inside the systems on the internal channels. If this is the case, the behavior of the overall system is likely different from the behavior of its component and every test case of a component does not correspond to a test case, also not to an integration test case, for the overall system. Next, we study a special case. In this case, we assume that we have a subsystem of a larger architecture which receives its input mainly from the system’s environment, even within the overall architecture and produces output for the rest of the system only to a small extent and receives input from the rest of the system only to a small extent. This 262 Model-Based Testing for Embedded Systems is typical for systems in the automotive domain, where suppliers develop components that carry a certain subfunctionality of the overall system. One of the main issues is now to separate system and integration tests in a manner such that many of the system and integration test can be performed at the component test level already while only a subset of the system and the integration test remains for later phases. What we are interested in is finding appropriate methods to decompose system and integration tests such that the system and integration test can be performed as much as possible during the component test phases. The advantage of this is that we do not have more expensive debugging at the integration test and system test level since we can have early partial integration tests and early partial system tests. As a result, the development process is accelerated and moved one step closer to concurrent engineering is done. If the behavior specifications of the components happen to be executable—as, for instance, in the form of executable machines—we are in a particularly advantageous position. Declarative specifications enable us to derive meaningful tests, both the input part and the expected output part called the oracle (remember that an actual implementation is to be tested against this model). Operational specifications, in addition, allow us to directly use them as stubs when actual testing is performed, in the sense of model-in-the-loop testing (Sax, Willibald, and Mu¨ller-Glaser 2002). Hence, in addition to using them as specifications, we can use the models for two further purposes: for deriving tests that include the input and expected output parts, and as simulation components, or stubs, when it comes to performing integration tests.∗ This of course requires runtime driver components that bridge the methodologically necessary gap between the abstraction levels of the actual system under test and the model that serves as stub (Pretschner and Philipps 2005, Utting, Pretschner, and Legeard 2006). 10.5.3 Using the architecture to derive entry-level component tests In the model-based system description sketched in Section 10.2.3, we have access to both a system specification and an architecture. With models of the single components that are connected in the architecture, however, we are ready for testing. We have defined architecture faults to be mismatches between system specification and the interpreted architecture. These mismatches can happen at two levels: at the model level (architecture against system specification) and at the level of the implementation (implemented system against either architecture behavior or system specification). Architecture faults of the latter kind can, of course, only be detected at integration testing time. Component faults, in contrast, can be detected both at integration and module testing time. In the following, we will assume that it is beneficial to detect component faults at component testing time rather than at integration testing time, the simple reason being (a) that fault localization is simpler when the currently tested system is small and (b) that (almost) all integration tests have to be performed again by regression test once the faulty component has been shipped back to the supplier, fixed, and reintegrated with the system. One natural goal is then to find as many component faults as early as possible. With our approach of using both component and architecture models at the same time, we can— rather easily, in fact—shift testing effort from the integration testing phase to the component testing phase. The idea is simple. Assume we want to test component C in isolation and ensure that as many as possible faults that are likely to evidence during the integration testing phase are tested for during the component testing phase. Assume, then, that integration stage ∗We deliberately do not consider using models for automatic code generation here. A Model-Based View onto Testing 263 j is the first to contain component C (and all stages after j also contain C). Assuming a suitable model-based testing technology to exist, we can then derive tests for the subsystem of integration phase j + n and project these tests to the input and output channels of C. It is precisely the structure of this composed subsystem at stage j + n that we exploit for testing C in its actual context—without this structure, we would have to assume any environment, that is, no constraints on the possible behaviors. These projections are, without any further changes, tests for component C. By definition, they are relevant to the integration with all those components that are integrated with C at stage j + n. Faults in C that, without a model of the architecture and the other components, would only have been found when performing integration testing, are now found at the component testing stage. In theory, of course, this argument implies that no integration testing would have to be performed. By the very nature of models being abstractions, this is unfortunately not always the case, however. More formally, if we study a system architecture F = ⊗{Fk : k ∈ K} ∈ IF[I O] with the interface (I O), we can distinguish internal components from those that interact with the environment. Actually, we distinguish three classes of system components: 1. Internal components k ∈ K: for them there is no overlap in their input and output channels with the channels of the overall system F: If Fk ∈ IF[Ik Ok], then I ∩ Ik = ∅ and O ∩ Ok = ∅. 2. External output providing components k ∈ K: O ∩ Ok = ∅. 3. External input accepting components k ∈ K: I ∩ Ik = ∅. 4. Both external output providing and external input accepting components k ∈ K: I ∩ Ik = ∅ and O ∩ Ok = ∅. In the case of an external input and output providing component k ∈ K, we can separate the channels of Fk ∈ IF[Ik Ok] as follows: Ik = Ik ∩ I, Ik = Ik\I Ok = Ok ∩ O, Ok = Ok\O This leads to the diagram presented in Figure 10.3 that depicts the component Fk as a part of a system’s architecture. Following Broy (2010a), we specify projections of behaviors for systems. {Fj: j ∈ K\{k}} Ok′ Fk Ik″ Ik′ Ok″ FIGURE 10.3 Component Fk as part of an architecture. 264 Model-Based Testing for Embedded Systems Definition 23 (Projection of Behaviors). Given syntactic interfaces (I1 O1) and (I O), where (I1 O1) is a syntactic subinterface of (I O), we define for a behavior function F ∈ IF[I O] its projection F† (I1 O1) ∈ IF[I1 O1−→] to the syntactic interface (I1 O1) by the following equation (for all input histories x ∈ I 1): F†(I1 O1)(x) = {y|O1 : ∃ x −→I ∈ −→I : x = x |I1 ∧ y ∈ F(x )}. When doing component tests for component k, we consider the behavior Fk with interface (Ik ∪ Ik Ok ∪ Ok). The idea essentially is to use a simulation of the environment of k, ⊗{Fj: j ∈ K\{k}}, which, via Ik, provides input to k, to restrict the set of possible traces of k. This directly reduces the set of possible traces that can be used as tests. If tests for F†k (Ik Ok) are not included in the set of traces given by F† (Ik Ok) in the sense that the behavior of k is undefined for the respective input, then the respective component test is useless because it corresponds to a behavior that will never be executed. Using the environment of k, ⊗{Fj: j ∈ K\{k}} allows us to eliminate such useless tests. Note, of course, that these tests are useless only from the system integrator’s perspective. They are certainly not useless for the supplier of the component who, in contrast, wants to see the component used in as many different contexts as possible. This, of course, is the problem that providers of libraries face as well. In other words, we must consider the question of what the relationship and essential difference between the behavior Fk†[Ik Ok] and the system behavior F†[Ik Ok] is. Only if there are test cases (a, B) that can be used for both views, we can push some system tests for system F to the component testing phase. To achieve this, we compose a model (a simulation) of the component’s environment with the model of the component. We now use this composed model rather than the model of the component only for the derivation of tests. Using a projection of the resulting composed behavior to the I/O channels of the component only yields precisely the set of possible traces of the component when composed with the respective environment, and test selection can be restricted to this set. Note that there are no objections to exploiting the inverse of this idea and use tests for one component D, or rather the output part of these tests, as input parts of tests for all those components the input ports of which are connected to the output ports of D. In fact, this idea has been successfully investigated in earlier work (Pretschner 2003), but in this cited work, we failed to see the apparently more relevant opposite direction. 10.5.4 A resulting testing methodology The above considerations naturally lead to a proposal for a development and testing strategy for integrators and component suppliers. Briefly, it consists of the following steps. 1. Build (executable) models for all those components or subsystems that are to be integrated. These models serve as specification for the suppliers, as basis for test case generation at the integrator’s site, and as stubs or simulation components at the integrator’s site when integration tests are to be performed. 2. Build an architecture that specifies precisely which components are connected in which sense. Together with the models of the single components, this provides a behavior specification for each conceivable subsystem that may be relevant in the context of integration testing. A Model-Based View onto Testing 265 3. Derive component-level tests for each supplied component from the respective models in isolation. 4. Decide on an integration testing strategy. In other words, decide on subsystems in the architecture that form fundamental blocks. Examples include strongly connected subgraphs, more or less hierarchic substructures, etc. 5. Compose the models, according to the architecture, that correspond to the components at integration stage j. Since by compositionality, this is a model as well, derive tests for this composed model, and project the test cases to the I/O channels of each single component in the set. These projections are test cases for each single component. Collecting these components tests for all components at all stages yields the component tests that are relevant for integration testing. 6. Execute the generated tests from steps (3) and (5) for each component C by possibly using the executable models of the other components as stubs or simulation components. This outlines a methodology for development and testing for integrators and suppliers to save test efforts at the integration test level. 10.5.5 Discussion What we have shown is just an example of applying a strictly model-based theory for discussing different approaches to carry out tests, in this case integration tests. We conent that modeling techniques are useful not only when applied directly in system development: they are certainly useful when working out dedicated methodologies (e.g., for testing). We did not fully evaluate the possibilities of model-based development, for instance for test case generation, but rather designed and assessed certain test strategies on the basis of a model-based system description. 10.6 Summary and Outlook Based on the Focus modeling theory, we have worked out formal descriptions of fundamental notions for testing complex systems. In particular, we have precisely defined the difference between component and architecture errors that naturally leads to the requirement of architecture models in addition to models of components. In a second step, we have shown how to use these architecture models to reduce the set of possible test cases for a component by considering only those traces that occur in the integrated system. By definition, this allows a system integrator to reduce the number of entry-level component tests to those that really matter. Our results are particularly relevant in the automotive domain but generalize to distributed systems as found in service-oriented architectures or when COTS components are to be integrated into a system. The ideas in this chapter clearly advocate the use of model-based engineering processes. While most of these processes relate to the automatic generation of code, we carefully look into the advantages of models for testing, regardless of whether or not code is generated. To our knowledge, this has been done extensively for single components; we are not aware, however, of a systematic and well-founded treatment of model-based testing for distributed systems. 266 Model-Based Testing for Embedded Systems We are aware that there are plenty of well-known obstacles to implementing modeldriven engineering processes on a large scale. We see the pressing need for further research into understanding which general and which domain-specific abstractions can be used, into systematic treatments of bridging the different levels of abstraction when system tests are executed, into whether or not these abstractions discard too much information so as to be useful for testing, and into whether or not model-based testing is indeed a cost-effective technology. References Broy, M. and Stølen, K. (2001). Specification and Development of Interactive Systems: Focus on Streams, Interfaces, and Refinement. Springer, New York. Broy, M. (2006). The ‘grand challenge’ in informatics: engineering software-intensive systems. IEEE Computer. 72–80, 39, issue 10. Broy, M., Kru¨ger, I., and Meisinger, C.M. (2007). A formal model of services. TOSEM — ACM Trans. Softw. Eng. Methodol. 16, 1, article no. 5. Broy, M. (2010a). Model-driven architecture-centric engineering of (embedded) software intensive systems: Modeling theories and architectural milestones. Innovations Syst. Softw. Eng. Broy, M. (2010b). Multifunctional software systems: structured modelling and specification of functional requirements. Science of Computer Programming, accepted for publication. Kru¨ger, I. (2000). Distributed System Design with Message Sequence Charts, Ph.D. dissertation, Technische Universita¨t Mu¨nchen. Philipps, J., Pretschner, A., Slotosch, O., Aiglstorfer, E., Kriebel, S., and Scholl, K. (2003). Model-based test case generation for smart cards. Proc. Formal Methods for Industrial Critical Systems, Trondheim, Pages: 168–182. Electronic Notes in Theoretical Computer Science, 80. Prenninger, W. and Pretschner, A. (2005). Abstractions for model-based testing. Proc. 2nd Intl. Workshop on Test and Analysis of Component Based Systems (TACoS’04), Barcelona, March 2004. Electronic Notes in Theoretical Computer Science 116:59–71. Pretschner, A. (2003). Compositional generation of MC/DC integration test suites. ENTCS 82(6):1–11. Pretschner, A. and Philipps, J. (2005). Methodological issues in model-based testing. In Broy, M., Jonsson, B., Katoen, J.-P., Leucker, M., and Pretschner, A. Model-Based Testing of Reactive Systems, Volume 3472 of Springer LNCS, Pages: 281–291. Pretschner, A., Prenninger, W., Wagner, S., Ku¨hnel, C., Baumgartner, M., Sostawa, B., Z¨olch, R., and Stauner, T. (2005). One evaluation of model-based testing and its automation. Proc. 27th Intl. Conf. on Software Engineering (ICSE’05), Pages: 392– 401. St. Louis. A Model-Based View onto Testing 267 Pretschner, A., Broy, M., Kru¨ger, I., and Stauner, T. (2007). Software engineering for automotive systems: A roadmap. Proc. Future of Software Engineering, 55–71. Reiter, H. (2010). Reduktion von Integrationsproblemen fu¨r Software im Automobil durch fru¨hzeitige Erkennung und Vermeidung von Architekturfehlern. Ph. D. Thesis, Technische Universit¨at Mu¨nchen, Fakulta¨t fu¨r Informatik, forthcoming. Sax, E., Willibald, J., and Mu¨ller-Glaser, K. (2002). Seamless testing of embedded control systems. In Proc. 3rd IEEE Latin American Test Workshop, S. 151–153. Utting, M., Pretschner, A., and Legeard, B. (2006). A taxonomy of model-based testing. Technical report 04/2006, Department of Computer Science, The University of Waikato, New Zealand. This page intentionally left blank 11 Multilevel Testing for Embedded Systems Abel Marrero P´erez and Stefan Kaiser CONTENTS 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 11.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 11.2.1 Facing complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 11.2.2 Methods and tools heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 11.2.3 Integrated test specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 11.2.4 Test reuse across test levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 11.3 Test Levels for Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 11.4 Commonality and Variability Across Test Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 11.5 Multilevel Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 11.6 Test Level Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 11.6.1 Top-down refinement versus top-down reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 11.6.2 Multilevel test design strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 11.6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 11.7 Multilevel Test Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 11.7.1 Test model core interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 11.7.2 Test model behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 11.8 Case Study: Automated Light Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 11.8.1 Test specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 11.8.2 Test model core design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 11.8.3 Test adapter models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 11.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 11.1 Introduction Multilevel testing constitutes an evolving methodology that aims at reducing the effort required for functional testing of large systems, where the test process is divided into a set of subsequent test levels. This is basically achieved by exploiting the full test reuse potential across test levels. For this purpose, we analyze the commonality shared between test levels as well as the variability and design a test reuse strategy that takes maximum advantage of the commonality while minimizing the effects of the variability. With this practice, we achieve reductions in test effort for testing system functions across test levels, which are characterized by high commonality and low variability. We focus on large embedded systems such as those present in modern automobiles. Those embedded systems are mainly driven by software (Broy 2006) and consist of a large number of electronic components. The system’s complexity increases continuously as a consequence of new functionality and a higher level of functional distribution. Regarding testing, this implies a necessity to continuously increase the efficiency as well—something we can only achieve by enhancing our testing methods and tools. 269 270 Model-Based Testing for Embedded Systems Increasing the testing efficiency constitutes the fundamental challenge for novel testing methodologies because of the large cost of testing, an activity that consumes around 50% of development cost (Beizer 1990). The separation of the testing process into independent test levels contributes to establishing different methods and tools across test levels. This heterogeneity helps counter any efforts toward test level integration. It also results in a higher effort being required for updating many different methods and tools to the state of the art. Thus, a higher level of homogeneity is desired and in practice often necessary. A further significant problem in the field of large embedded systems is the level of redundancy that the test automation advances of the past decade have produced. Merely repeating test executions or developing additional test cases across test levels does not automatically lead to a higher test quality. The reduction in effort brought about by test automation should never obstruct the view on testing cost. The creation of new test cases and the assessment of new test results (especially for failed test cases) are costly activities that cannot be efficiently automated. We thus need to avoid the execution of similar or even identical test cases at different test levels whenever this repeated execution is redundant. In order to systematically define a test execution at a specific test level as redundant, appropriate test strategies must be applied. They should define what must be tested at the different test levels and should take the entire test process into consideration—instead of defining the testing scope at each test level independently. Such an integrated test strategy will consider the execution of numerous similar or even identical test cases at different test levels. This results from the refinement/abstraction relation between consecutive test levels and does not represent any form of redundancy. Hence, on the one hand, there is evidence of the existence of significant commonalities between test cases across test levels. However, the strict separation of the test process into independent test levels indirectly leads to an underestimation of the potential synergies and commonalities shared by the different test levels. In consequence, multiple test implementations of very similar test artifacts coexist in practice at different test levels. Great effort was necessary for their creation—and is further necessary for their maintenance. Our objective is thus to reduce this design and maintenance effort by reusing test cases across test levels. We take advantage of previous work on reusing test specifications and focus on reusing test implementations. Our work is mainly based on multilevel test models and multilevel test cases, which are test design concepts supporting an integrative methodology for all test levels. These concepts are presented and discussed in-depth in this contribution, especially highlighting the potential benefits for the entire test process. This introduction is followed by a summary of related work that provides insight into previous work regarding test case reuse across test levels. The subsequent sections introduce the different test levels for embedded systems, analyze their commonality and variability, and describe our initial solution for multilevel testing: multilevel test cases. In the main segment, we describe strategies for test level integration and introduce multilevel test models as our model-based approach in this context. The contributions are validated using an automated light control (ALC) example before concluding with a summary and a brief discussion of the practical relevance of multilevel testing. 11.2 Related Work Partial solutions for the problems mentioned in the introduction are currently available. Research is in progress in many of these areas. In this section, we follow the argumentation Multilevel Testing for Embedded Systems 271 pattern of the introduction in order to provide an overview of related work in our research field. 11.2.1 Facing complexity Manual testing nowadays appears to be a relict of the past: expensive, not reproducible, and error-prone. The automation of the test execution has significantly contributed to increasing testing efficiency. Since the full potential in terms of efficiency gains has already been achieved, research on test automation does not focus on test execution anymore, but on other test activities such as automatic test case generation. As an example, search-based testing uses optimization algorithms for automatically generating test cases that fulfill some optimization criteria, for example, worst-case scenarios. Such algorithms are also applicable to functional testing (Bu¨hler and Wegener 2008). Automatically searching for the best representatives within data equivalence classes using evolutionary algorithms is proposed in (Lindlar and Marrero P´erez 2009), which leads to an optimization of the test data selection within equivalence classes. Automatic test case generation is the main objective of model-based testing approaches, which take advantage of test models. Generally speaking, models are the result of an abstraction (Prenninger and Pretschner 2005). In this context, model-based testing increases the testing efficiency as it benefits from the loss of information provided by the abstraction. Later on, the missing details are introduced automatically in order to provide concrete test cases. Providing additional details is not necessary when testing abstract test objects such as models (Prenninger and Pretschner 2005). Zander-Nowicka described such an approach for models from the automotive domain (Zander-Nowicka 2008). However, most test objects are not that abstract. The additional details necessary for test execution are provided by test adapters, test case generators, and compilers. While test adapters represent separate instances that operate at the test model interfaces, test case generators and compilers perform a transformation of the abstract test model into executable test cases. The utilized approach is typically closely related to the kind of test models used. In our contribution, we apply a combination of both approaches. Basically, we differentiate between test abstraction and interface abstraction. As a consequence, low-level test cases, for instance written in C, can possess a particularly abstract interface and vice versa, an abstract test model can feature a very concrete interface. We use time partition testing (TPT) (Lehmann 2003) for test modeling, which employs a compiler to generate executable test cases from the abstract test models. Additionally, we use test adapters for adapting the abstract test model interface to the concrete test object interface. Abstraction principles go beyond our differentiation in test and interface abstraction. Prenninger and Pretschner describe four different abstraction principles: functional, data, communication, and temporal abstraction (Prenninger and Pretschner 2005). Functional abstraction refers to omitting functional aspects that are not relevant to the current test. It plays the key role in this contribution because multilevel testing addresses test objects at different test levels and hence at different abstraction levels. In this context, selecting the appropriate abstraction level for the test models represents a crucial decision. Data abstraction considers the mapping to concrete values, whereas temporal abstraction typically addresses the description of time in the form of events. Both principles will be considered in the context of the test adapters in this contribution. Only communication abstraction, from our point of view a combination of data and temporal abstraction, is beyond the scope of the contribution. Data abstraction and temporal abstraction are widely used within the model-based testing domain, but in this contribution, we will consider them in the context of what we have previously called interface abstraction. 272 Model-Based Testing for Embedded Systems Most approaches using test adapters mainly consider data abstraction. A recently published report (Aichernig et al. 2008) generically describes test adapters as functions that map abstract test data to concrete values. Temporal abstraction typically represents an additional requirement when time plays a central role for test execution. Larsen et al. present an approach for testing real-time embedded systems online using UPPAAL-TRON. In their work, they use test adapters to map abstract signals and events to concrete physical signals in order to stimulate the test object (Larsen et al. 2005). The concept of test adapters refers to the adapter concept in component-based design introduced by Yellin and Strom (1997). Adapters are placed between components and are responsible for assuring the correct interaction between two functionally compatible components. Adapters are further responsible for what they call interface mapping, typically data type conversion (Yellin and Strom 1997). Thus, clear differences exist between the test adapter concept and the original adapter concept from component-based design. In the latter, adapters are namely not specifically supposed to help bridge abstraction differences between the interfaces they map. 11.2.2 Methods and tools heterogeneity The lack of homogeneity along the test process has been addressed by industry in recent years. Wiese et al. (2008) describe a set of means for test homogenization within their company. One of their central ideas is making testing technologies portable across test levels. There are several testing technologies supporting multiple test platforms, that is, specific test environments at a specific test level. In the field of embedded systems, the main representatives are TPT (Lehmann 2003) and TTCN-3 (European Telecommunications Standards Institute 2009-06). TPT’s platform independence is based on the TPT virtual machine, which is capable of executing test cases on almost any platform. For test execution, the TPT virtual machine is directly nested in the test platform. TTCN-3 test cases are also executed close to the test platform using a platform and a system adapter. For a more detailed technology overview, please consult Marrero P´erez and Kaiser (2009). Such technologies are reuse friendly and ease homogenization attempts in the industry. For homogenization, however, the test interface represents the central problem, as denoted by Burmester and Lamberg (2008). Implementing abstract interfaces in combination with test adapters (also called mapping layer in this context) rapidly leads to platform independence and thus reusability (Burmester and Lamberg 2008, Wiese et al. 2008). Obviously, any model-based approach providing the appropriate test case generators and/or test adapters can lead to test cases that can be executed at different platforms. All published approaches for homogenization strategies are based on data abstraction. Reuse across test levels is not a great challenge technologically, but methods for implementing test cases capable of testing test objects at different abstraction levels sometimes featuring strongly differing interfaces have not been developed to date. This means that while in theory we can already reuse test cases across test levels today, we do not exactly know what the crucial issues are that have to be taken into account in order to be successful in this practice. 11.2.3 Integrated test specifications Reducing redundancy across test levels implies reducing their independence, that is, establishing relations between them. Hiller et al. (2008) have reported their experience in creating a central test specification for all test levels. A common test specification contributes to test level integration by establishing a central artifact for the different testing teams. The test Multilevel Testing for Embedded Systems 273 levels for which each test case must be executed are declared as an additional attribute in the test specification. The systematic selection of test levels for each test case contributes to avoiding the execution of the same test cases at multiple test levels where this is unreasonable. Hiller et al. argue that the test efficiency can be increased by using tailored test management technologies. For instance, a test case that failed at a specific test level in the current release should temporarily not be executed at any higher test level until the fault has been found and fixed (Hiller et al. 2008). Our approach will benefit from such an integrated test specification for different reasons. Firstly, we can take advantage of the additional attribute in the test specification providing the test levels where the test case should be specified. Secondly, we benefit from particularly abstract test cases that were specifically designed for being executable at different test levels. Lastly, the common test specification constitutes a further artifact featuring an integrative function for the different test levels, which represents additional support for our methodology. 11.2.4 Test reuse across test levels Sch¨atz and Pfaller have proposed an approach for executing component test cases at the system level (Sch¨atz and Pfaller 2010). Their goal is to test particular components from the system’s interface, that is, with at least partially limited component interface visibility. For this purpose, they automatically transform component test cases into system test cases using formal descriptions of all other system components. Note that these test cases do not aim at testing the system, but rather a single component that they designate component under test. Thus, their motivation for taking multiple test levels into consideration clearly differs from ours. The transformation performed by Scha¨tz and Pfaller results in a test case that is very similar or even identical to the result of appending a test adapter to the original component test case. Consequently, we can state that their work (Sch¨atz and Pfaller 2010) shows that test adapters can be generated automatically, provided that the behavior of all other system components is exactly known and formally specified. This assumption can be made for software integration testing, where the considered software components may be formally specified. However, when analog hardware parts are considered, the complexity of their physical behavior—including tolerances—often makes a formal description of their behavior with the required precision impossible. Another approach that considers test level integration was presented by Benz (2007). He proposes a methodology for taking advantage of component test models for integration testing. More concretely, Benz uses task models for modeling typically error-prone component interactions at an abstract level. From the abstract test cases generated using the task model, executable test cases that can stimulate the integrated components are generated based on a mapping between the tasks and the component test models. Hence, Benz takes advantage of the test models of another test level in order to refine abstract test cases without actually reusing them. M¨aki-Asiala (2005) introduced the concept of vertical reuse for designating test case reuse across test levels. This concept had been used before in the component-based design for addressing the reuse of components within a well-defined domain. In this context, vertical reuse is also known as domain-specific reuse (Gisi and Sacchi 1993). As in every reuse approach, commonality and variability are decisive for vertical reuse. M¨aki-Asiala (2005) states that similarities between the test levels must be identified, as well as the tests possessing the potential to be reused and to reveal errors at different test levels. His work, however, lacks instructions for these identification processes. There is no description of how to identify reuse potentials and error revelation potentials. 274 Model-Based Testing for Embedded Systems M¨aki-Asiala provides a set of guidelines for test case reuse in TTCN-3, discussing their benefits for vertical reuse. The guidelines were designed for reusing tests with little effort without considering test adapters so that interface visibility becomes a major issue (Ma¨kiAsiala 2005). Lehmann (2003) also addresses the interface visibility problem in his thesis, highlighting the impossibility of test reuse for different test object interfaces. As mentioned above, we address this problem by indirect observation, similar to Scha¨tz and Pfaller. Analogous to component-based design, we require functional compatibility between test and test object in order to design our test adapters (Yellin and Strom 1997). Hence, we can conclude that there are only a few approaches to test reuse across test levels—a fact that demonstrates the novelty of our approach. While Ma¨ki-Asiala presents generic approaches to test reuse, Sch¨atz and Pfaller address cross-level test reuse at consecutive test levels only. There is a lack of an integrative approach that considers the entire test process, the test levels of which are described in the subsequent section. 11.3 Test Levels for Embedded Systems In the domain of embedded systems the V model (Spillner et al. 2007, Gruszczynski 2006) constitutes the reference life cycle model for development and testing, especially in the automotive domain (Scha¨uffele and Zurawka 2006). It consists of a left-hand branch representing the development process that is characterized by refinement. The result of each development level is artifacts that—in terms of functionality—specify what must be tested at the corresponding test levels in the V model’s right-hand branch (Deutsche Gesellschaft fu¨r Qualita¨t e.V. 1992). A complete right-hand branch of the V model for embedded systems is shown in Figure 11.1. It starts at the bottom of the V with two branches representing software and hardware. After both hardware and software are integrated and tested, these branches merge at the system component integration test level. Before system integration, the system components are tested separately. The only task remaining after the system has been tested is acceptance testing. Note that in Figure 11.1 each integration test level is followed by a test level at which the integrated components are functionally tested as a whole before a new integration test level is approached. Hence, the V model’s right-hand branch consists of pairs of integration and integrated test levels. From software component testing up to acceptance testing, the V model features three different integration test levels for embedded systems: component integration (either software or hardware), software/hardware integration, and system integration. This makes it different from the V model for software systems which features only a single integration test level (cf. Spillner et al. 2007). But in analogy to that model, a test level comprising the completely integrated unit follows each integration test level, as mentioned before. Since our goal is functional testing, we will exclude all integration test levels from our consideration. In doing so, we assume that integration testing specifically focuses on testing the component interfaces and their interaction, leaving functional aspects to the test level that follows. In fact, integration and system testing are often used equivalently in the automotive domain (see for instance, Scha¨tz and Pfaller 2010). Acceptance testing, which is typically considered as not belonging to development (Binder 1999), is out of the scope of our approach. By excluding this test level and the integration test levels, we focus on the remaining four test levels: software (hardware) component testing, software (hardware) testing, system component testing, and system testing. Multilevel Testing for Embedded Systems 275 Acceptance testing System testing System integration testing System component testing Sys. Comp. integration HW testing testing SW HW testing integration testing SW integration HW testing component testing SW component testing FIGURE 11.1 Right-hand branch of the V model for embedded systems, featuring all test levels. Sytem testing System component testing Software testing Software component testing FIGURE 11.2 Test levels considered in this chapter. (Reprinted from The Journal of Systems and Software, 83, no. 12, Marrero P´erez, A., and Kaiser, S., Bottom-up reuse for multi-level testing, 2392– 2415. Copyright 2010, with permission from Elsevier.) As most of the functionality in modern vehicles is implemented in software, we will not consider the hardware branch in Figure 11.1 any further. Hence, the test levels we cover are depicted in Figure 11.2. Our V model’s righthand branch thus starts with software component testing, which typically addresses a single software function (for instance, a C function). Different software components are 276 Model-Based Testing for Embedded Systems integrated to constitute the complete software of a control unit (software testing). After software/hardware integration, the so-called electronic control unit (ECU) is tested at the system component test level. This test typically includes testing the real sensors and actuators connected to the ECU under test. Our test process concludes with the test of the entire system, consisting of a set of ECUs, sensors, and actuators, most of which may be physically available in the laboratory. Any components not available will be simulated, typically in real time. 11.4 Commonality and Variability Across Test Levels A very basic test case for a vehicle’s headlights could look as follows: From the OFF state, turn the lights on. After 10 s turn the lights off. Manually performing this test case in a car is not a big issue. But what happens when the switch is rotated? There is no wire directly supplying the lamps through the switch. Instead, there will be some kind of control unit receiving the information about the actual switch position. Some logic implemented in software then decides if the conditions for turning the headlights on are given, and should this be the case, a driver provides the current needed by the light bulbs. In addition, this functionality may be distributed across different control units within the vehicle. At this point, we can identify the simplest version of the typical embedded control system pattern consisting of a unidirectional data flow from sensors toward actuators: sensor → hardware → sof tware → hardware → actuator Dependencies between components, systems, and functions make reality look less neat. However, a key lesson from this pattern remains valid: as in the headlights example, the software decides and the remaining parts of the system—such as hardware—are gradually taking on secondary roles in terms of system functionality. For the basic headlights test case described above, this implies that it can be employed at the software component test level for testing the component in charge of deciding whether the enabling conditions are met or not. In fact, this test case is valid for all test levels. It is possible to test at every test level whether the headlights will switch on in the final system. Not only is it possible to repeat this particular test case at every test level, it is also reasonable and even efficient. The reason for this is that the testing of the functionality of the system has to begin as soon as the first software components are available. There is no point in waiting until the first car prototype is built, because the earlier a fault is detected, the less expensive the fault correction process becomes. Software components are the first test objects available for testing. Along the righthand branch of the V model, further test objects will become available successively till a completely integrated system makes testing at the top test level possible. Because of this temporal availability order, it appears reasonable to at least perform basic functional tests at each test level in order to ensure, for instance, that the headlights will work in the first prototype car. In addition to the benefits reported by testing earlier on in development, less effort is required for revealing and identifying faults at lower than at upper test levels. This keeps the lower test levels attractive for the last part of development where all test levels are already available. The headlights example demonstrates that there are significant similarities between the functional test cases across test levels. Consequently, a large set of functional test cases are Multilevel Testing for Embedded Systems 277 execution candidates for different test levels, provided that they are specified at a reasonable functional abstraction level. For example, because of the abstraction level, a software tester as well as a hardware tester will know how to perform the test case turn the headlights on and off. The key for the commonality is thus the functionality, and the key for having a low variability is the level of abstraction. When test cases address functional details, which are often test level specific, the variability becomes higher and there is less commonality to benefit from. For instance, if the headlights test case described above would utilize specific signal and parameter names, it would be more difficult to reuse. The variability is thus primarily given by the differences between the test objects. Both a software component and an ECU may implement the same function, for example, the headlights control, but their interfaces are completely different. We will address this issue in the following sections in combination with the already mentioned interface abstraction. A further variability aspect concerns the test abstraction level, which increases along the V model’s right-hand branch. Abstract test levels require testing less functional details than concrete test levels. In fact, many functional details are not even testable at abstract test levels because these details are not observable. In other cases, the details are observable but doing so requires a high effort that makes reuse unaffordable. As a consequence, there is no point in trying to reuse every test case at every test level, even if it were technically possible. Our solution to address the variability that originates from the different abstraction levels is to separate test cases into different groups depending on their level of abstraction. This approach will be described in Section 11.6, after the introduction of multilevel test cases. 11.5 Multilevel Test Cases We introduced multilevel test cases in (Marrero P´erez and Kaiser 2009) as a modularization concept for structuring test cases that permits reusing major parts of test cases across test levels. Multilevel test cases reflect the commonality and variability across test levels. As shown in Figure 11.3, they consist of an abstract test case core (TCC) representing the commonality and test level-specific test adapters that encapsulate the variability. The only form of variability accepted in the TCC is parameterization. The parameters cover both test case variability and test object variability. Within the test case variability, those parameters addressing differences across test levels are of particular interest here. With parameterization, we can rely on invariant test behavior across test levels. The test behavior provides an interface consisting of a set of signals T E (t) for evaluation and another set of signals T S (t) for stimulation. Thus, the interface of the TCC does not take data or temporal abstraction into consideration, but operates at a technical abstraction level using discrete signals, that is, value sequences that are equidistant in time. Without this practice, it would not be possible to consider complex signals because the loss of information caused by abstraction would be prohibitive. Test adapters are divided into three different modules: input test adapter (ITA), output test adapter (OTA), and parameter test adapter (PTA). As their names suggest, they are in charge of observation, stimulation, and parameterization of the different test objects at each test level. Within the embedded systems domain, both ITAs and OTAs are functions relating the TCC interface signals T E(t) and T S(t) to the test object interface signals 278 Model-Based Testing for Embedded Systems Input test adapter Test case core Test behavior Test evaluation Test stimulation Output test adapter Test parameters Parameter test adapter FIGURE 11.3 Structure of multilevel test cases. (Reprinted from The Journal of Systems and Software, 83, no. 12, Marrero P´erez, A., and Kaiser, S., Bottom-up reuse for multi-level testing, 2392– 2415. Copyright 2010, with permission from Elsevier.) T E (t) and T S (t): T E (t) = IT A T E T S (t) = OT A T S (11.1) (11.2) The IT A function typically provides functional abstraction as well as temporal and data abstraction, if necessary. In contrast, the OT A function will refine the test stimulation (TS) functionally but also add temporal and data details if required. The same applies to interface abstraction. Here, the ITA will abstract the test object interface while the OTA will perform a refinement. These differences should be kept in mind, even though Equations 11.1 and 11.2 demonstrate that both functions IT A T E and OT A T S technically represent a mapping between interfaces. Multilevel test cases primarily focus on reducing the test case implementation effort through reuse. For this reason, it is crucial that the design of test adapters and their validation require less effort than creating and validating new test cases. In addition, the maintenance effort necessary in both cases has to be taken into consideration. The resulting test case quality will mainly depend on the test developers, however. There is no evidence indicating that multilevel test cases directly increase the test case quality. Indirectly, our aim is to reduce the test implementation effort so that fewer resources are necessary for reaching the same quality level. 11.6 Test Level Integration This chapter addresses our methodological approach to test reuse across test levels, which primarily consists of departing from the strict separation of test levels stipulated by conventional test processes. The objective is to show how to take advantage of the common Multilevel Testing for Embedded Systems 279 functionality that must be tested across multiple test levels while taking the cross-level differences into account. These differences concern the abstraction levels and the interfaces, as discussed in Section 11.4. 11.6.1 Top-down refinement versus top-down reuse Test level integration implies relating test levels and hence analyzing and establishing dependencies between them. A straightforward approach for test level integration thus consists of reflecting the abstraction/refinement relationship between consecutive test levels by introducing such a relationship between test cases at the different test levels. Following this approach, abstract test cases at the top test level are refined stepwise towards more concrete test cases at the test levels below (top-down refinement). Figure 11.4 schematically shows the refinement process across test levels. The differences in the rectangles’ size point out the increasing amount of details present in the test cases across the test levels. Top-down refinement can be performed parallel to the refinement process in the V model’s left-hand branch. A further integration approach consists of reusing—not refining—test cases from the top test level down towards the lowest test level (top-down reuse). Instead of adding details with refinement, the details are introduced in form of additional test cases for each test level here. These test cases will be reused at the test levels below, as well, but never at the test levels above since they test details that are out of the scope of more abstract test levels. The principle of top-down reuse is depicted in Figure 11.5. The vertical arrows relating test cases at different test levels indicate reuse. With top-down reuse four different test case groups are created, one at each test level (A through D in Figure 11.5). Top-down refinement best fits the development process in the left-hand branch of the V model. Each development artifact possesses a specific testing counterpart so that traceability is given. Assuming a change in requirements, this change will affect one or more test cases at the top test level. Analogous to the update in requirements causing changes in the artifacts Abstract FIGURE 11.4 Top-down refinement. Test abstraction Detailed 4 3 2 1 System testing System component testing Software testing Software component testing 280 A A + A + A + Abstract FIGURE 11.5 Top-down reuse. Model-Based Testing for Embedded Systems System testing System B component testing B + C B + C Test abstraction Software testing Software + D component testing Detailed at all development levels below, the change in the abstract test case will affect all test cases that refine it. Without appropriate linking mechanisms between the different test levels, a large set of test cases would then have to be updated manually. In fact, the manual effort for updating all test cases in case the refinement cannot be automatically performed makes this approach impracticable nowadays. The second approach—top-down reuse—overrides the described update problem by avoiding test refinement along the V model while preserving the traceability between development and testing as well as across test levels. The key idea of this approach consists of describing test cases at the highest possible level of functional abstraction and then reusing these test cases at every lower test level. Thus, top-down reuse does not imitate the refinement paradigm of development on the V model’s right-hand branch. Instead, it provides better support for the testing requirements, as we argue on three accounts. Firstly, testing requires simplicity. The test objects are sufficiently complex—testing has to be kept simple in order to allow focusing on the test object and fault detection and not on the test artifacts themselves. Top-down reuse supports the simple creation of simple test cases at single abstraction levels without taking other abstraction levels into consideration. Additionally, the relationship between test levels is kept simple and clear. Refining test cases towards simple test cases is not trivial, however. Secondly, automatic testing requires executable artifacts—test cases—that must be implemented at each test level. Therefore, top-down reuse has a great effort reduction potential in terms of test implementation, which is not featured by top-down refinement. Thirdly, the current test process lacks collaboration between test teams at different test levels. Team collaboration is clearly improved by both approaches, but top-down reuse goes the extra mile in test level integration by assuring that identical test cases are shared across test levels. Team members at different test levels can thus discuss on the basis of a common specification and implementation. The advantages of top-down reuse are clear and thus lead us to selecting it as our preferred test level integration approach. So far, we only focused on test case specification. The reuse approach also features advantages for test case design and test case implementation, though, as we already alluded to when arguing for top-down reuse. Multilevel Testing for Embedded Systems 281 11.6.2 Multilevel test design strategy The main difference between test specification and test design is that we must take the test interface into account for test design, that is, we must decide which information channels to use for TS and which for test object observation and hence test evaluation (TE). The test interface differs across test levels. In fact, different test objects at different abstraction levels possess different interfaces. However, while designing and implementing test cases, we must define a test interface as a means of communication (stimulation and observation) with the test object. The variability of the test interface across test levels in conjunction with the necessity of defining a test case interface common to all test levels makes it difficult to completely reuse entire test designs and test implementations along the V model. The alternative is introducing interface refinement through test adapters supporting test case reuse across test levels by taking the differences in the test interface into consideration. With this practice, we can design and implement TCCs with abstract interfaces exactly following the top-down specification described in the previous section. Later on, these TCCs will be reused at test levels with more detailed interfaces using test adapters that perform an interface refinement. The described design strategy utilizes the multilevel test case concepts described in Section 11.5. In fact, the combination of top-down reuse and interface refinement represents a reasonable approach for multilevel test case design, our solution for test level integration. A central question that has not yet been addressed concerns the concept of an abstract test interface. We can approach this term from the perspective of data, temporal, or functional abstraction. In this contribution, we assume that the test interface consists of two signal sets: a set of test inputs for TE and a set of test outputs for TS. The exclusive use of signals implies a low—and basically constant—level of temporal abstraction. It represents, however, a fundamental decision for supporting complex signals, as already mentioned in Section 11.5. Temporal abstraction differences between signals may exist, but they will be insignificant. For instance, a signal featuring (a few) piecewise constant phases separated by steps may seem to simply consist of a set of events. This could be considered to imply temporal abstraction. However, such a signal will not contain less temporal information than a realworld sensor measurement signal featuring the same sampling rate. In contrast to temporal abstraction, the data abstraction level of a signal can vary substantially. These variations are covered by the typing information of the signals. For example, a Boolean signal contains fewer details than a floating point signal and is thus more abstract. Following the definition from Prenninger and Pretschner (2005) given in Section 11.2.1, we consider an interface to be abstract in terms of functionality if it omits any functional aspects that are not directly relevant for the actual test case. This includes both omitting any irrelevant signals and within each of these also omitting any details without a central significance to the test. This fuzzy definition complicates the exact determination of the functional abstraction level of an interface. However, as we will show below, such accuracy is also unnecessary. Considering the V model’s right-hand branch, starting at the bottom with software components and gradually moving closer to the physical world, we can state that all three aspects of interface abstraction reveal decreasing interface abstraction levels. On the one hand, the closer we are to the real world, the lower the temporal and data abstraction levels of the test object signals tend to be. On the other hand, software functions will not take more arguments or return more values than strictly necessary for performing their functionality. Consequently, the functional abstraction level of the test interface will also 282 Model-Based Testing for Embedded Systems be highest at the lowest test level. At the test levels, above additional components are incrementally integrated, typically causing the test interface to include more signals and additional details not significant to the very core of the function being tested but supporting other integrated components. Hence, we propose a bottom-up approach for test design and test implementation, as shown in Figure 11.6. In addition to the basic schematic from Figure 11.5, test adapters performing interface abstraction are depicted here. They adapt the narrow interface of the test cases to the more detailed—and thus wider—interface of the corresponding test objects. The figure also proposes the concatenation of test adapters for reducing the effort of interface refinement at each test level to a refinement of the interface of the previous test level. 11.6.3 Discussion In summary, our test level integration approach consists of two parts. Firstly, a top-down reuse approach for test specification that considers the increasing test abstraction∗ along the V model’s right-hand branch (see Figure 11.5). In analogy to Hiller et al. (2008), this implies introducing a central test specification covering all test levels. Secondly, a bottom-up approach for test design and test implementation that takes the decreasing interface abstraction along the test branch of the V model into consideration using interface refinement (see Figure 11.6). Multilevel test cases constitute our candidates for this part. As far as reuse is concerned, Wartik and Davis state that the major barriers to reuse are not technical, but organizational (Wartik and Davis 1999). Our approach covers different abstraction levels mainly taking advantage of reuse but also using refinement. In this Detailed A SYS System testing Interface abstraction A + B SYS-C System component testing A + B + C SW Software testing Abstract Software A + B + C + D SW-C component testing Abstract Detailed Test abstraction FIGURE 11.6 Bottom-up test design and test implementation. (Reprinted from The Journal of Systems and Software, 83, no. 12, Marrero P´erez, A., and Kaiser, S., Bottom-up reuse for multilevel testing, 2392–2415. Copyright 2010, with permission from Elsevier.) ∗Cf. Section 11.2.1. Multilevel Testing for Embedded Systems 283 context, our technical solution achieves a substantially higher relevance than in pure reuse situations without reducing the importance of organizational aspects. One of the most important disadvantages of test level integration is the propagation of faults within test cases across test levels. This represents a straightforward consequence of the reuse approach, which requires high-quality test designs and test implementations in order to minimize the risk of such fault propagation. In effect, not only faults are propagated across test levels but also quality. Reusing high-quality test cases ensures high-quality results. Test level integration introduces several advantages in comparison to fully independent test levels including better traceability for test artifacts, lower update effort, better conditions for collaboration across test levels, and above all, a noticeable reduction of the test specification, test design, and test implementation effort. Although multilevel testing does not constitute a universal approach, and test levels cannot be entirely integrated because of the lack of commonality in many cases, the relevance of test reuse across test levels is significant for large embedded systems. As mentioned before, there is clear evidence that the conventional test process lacks efficiency with respect to functional testing, and multilevel testing represents a tailored solution to this problem. Note that the ensuing lack of universality does not represent a problem, since both multilevel test cases and multilevel test models—which are presented in the subsequent section—are compatible with conventional test approaches. In fact, multilevel testing can be considered a complement of the latter, which supports increasing the testing efficiency with respect to selected functions. The reduction in effort is particularly evident for test specifications and has even been successfully validated in practice (Hiller et al. 2008). For test designs and test implementations, the successful application of multilevel testing depends on the effort required for interface refinement. Only if creating a new test case causes greater effort than refining the interface of an existing one, multilevel testing should be applied. In order to reduce the interface refinement effort, we already proposed the concatenation of test adapters and the reuse of test adapters within a test suite in (Marrero P´erez and Kaiser 2009). Another possible effort optimization for multilevel test cases consists in automatically configuring test adapters from a signal mapping library. In the next section, we will present an approach for a further optimization, consisting of designing multilevel test models instead of multilevel test cases. 11.7 Multilevel Test Models Multilevel test models constitute an extension of the multilevel test case concept. Instead of designing single scenarios that exercise some part of a system’s function, multilevel test models focus on the entire scope and extent of functional tests that are necessary for a system’s function. In other words, multilevel test models aim at abstractly representing the entire test behavior for a particular function at all test levels and hence at different functional abstraction levels. Note that the entire test behavior does not imply the overall test object behavior. The test behavior modeled by multilevel test models does not necessarily have to represent a generalization of all possible test cases. Instead, a partial representation is also possible and often more appropriate than a complete one. Multilevel test models are tied to the simplicity principle of testing, too. The main objective of multilevel test models is the reduction of the design and implementation effort in relation to multilevel test cases by achieving gains in functional abstraction 284 Model-Based Testing for Embedded Systems and consequently taking advantage of the commonalities of the different test cases for a specific system’s function. In this context, multilevel test models can also be seen as artifacts that can be reused for providing all functional multilevel test cases for a single function. The more commonalities these test cases share, the more the required effort can be reduced by test modeling. For the design of multilevel test models, we propose the structure shown in Figure 11.7, which basically corresponds to Figure 11.3 and hence best stands for the extension of the multi-level test case concept. The essential concept is to design a test model core (TMC) that constitutes an abstract test behavior model featuring two abstract interfaces: one for stimulation and the other for evaluation. On the one hand, the refinement of the stimulation interface toward the test object input interface at each test level will be the responsibility of the output test adapter model (OTAM). On the other hand, the input test adapter model (ITAM) will abstract the test object outputs at each test level toward the more abstract TMC input interface. With this practice, we can derive a multilevel test case from the multilevel test model through functional refinement of both TMC and test adapter model. The blocks within the TMC in Figure 11.7 reveal that we continue to separate test case stimulation and test case evaluation, as well as data and behavior. With respect to the data, test parameters are used whenever data has to be considered within the test model. These parameters may be related to the test object (test object parameters) or exclusively regard the test model and the test cases (test parameters). In addition to extensive structural similarities, a large set of differences exists between test cases and test models in terms of reuse across test levels. These differences start with the design and implementation effort. Furthermore, multilevel test models will typically offer better readability and maintainability, which are important premises for efficiently sharing implementations across test teams. The advantages of gains in abstraction apply to the differences between test adapters and test adapter models, as well. Test adapter models will cover any interface refinement (TMC output) or interface abstraction (TMC input) for a Input TAM Test model core Test behavior Test evaluation Test stimulation Test parameters Parameter TAM Output TAM FIGURE 11.7 Structure of multilevel test models. (Adapted from Marrero P´erez, A., and Kaiser, S., Multi-level test models for embedded systems, Software Engineering, pages 213–224, c 2010b/GI.) Multilevel Testing for Embedded Systems 285 single test level. A test adapter capable of performing this could also be implemented without modeling, but because of the missing functional abstraction, it would clearly possess inferior qualities in terms of design and implementation effort, readability, and maintainability. Additionally, we can concatenate test adapter models in exactly the same way we proposed in Marrero P´erez and Kaiser (2009) for test adapters. Multilevel test cases are always designed at a specific abstraction level. Some multilevel test cases will be more abstract than others, but there is a clear separation of abstraction levels. Multilevel test models cannot rely on this separation, however. They have to integrate test behavior at different functional abstraction levels into a single model. This is one of the most important consequences of the extension of multilevel test cases to multilevel test models. This issue is addressed next. 11.7.1 Test model core interface The test interface plays a central role for test cases because there is no better way of keeping test cases simple than to focus on the stimulation and evaluation interfaces. However, this is only half the story for test models. Through abstraction and (at least partial) generalization, the interfaces are typically less important for test models than they are for test cases. Other than common test models, though, TMCs integrate various functional abstraction levels. For this reason, we claim that the TMC interface is of particular relevance for the TMC behavior. As described in Section 11.6, the test cases for a specific test level are obtained top-down through reuse from all higher test levels and by adding new test cases considering additional details. The test interface of these new test cases will consist of two groups of signals: • Signals shared with the test cases reused from above. • Test-level specific signals that are used for stimulating or observing details and are thus not used by any test level above. With this classification, we differentiate between as many signal groups as test levels are available. Test cases at the lowest test level could include signals from all groups. Furthermore, the signals of each group represent test behavior associated with a specific functional abstraction level. We can transfer this insight to multilevel test models. If we consider multilevel test models as compositions of a (finite) set of multilevel test cases designed at different abstraction levels following the design strategy described in Section 11.6.2, we can assume that the TMC interface will be a superset of the test case interfaces and thus include signals from all groups. Note that the signal groups introduced above refer to the abstraction level of the functionality being tested and not to interface abstraction. 11.7.2 Test model behavior We take an interface-driven approach for analyzing the central aspects of the test model behavior concerning multiple test levels. By doing so, we expect to identify a set of generic requirements that the TMC has to meet in order to be reusable across test levels. In this context, we propose dividing the test behavior into different parts according to the signal groups defined in the previous section. Each part is associated with a specific model region within a multilevel test model. Since there are as many signal groups as test levels, there will also be as many model regions as test levels. Figure 11.8 schematically shows the structure of a multilevel test model consisting of four regions. Each region possesses its own TS and TE and is responsible for a group of 286 Model-Based Testing for Embedded Systems TMC Test behavior TE1 TS1 TE2 TS2 TE3 TS3 TE4 TS4 Test parameters FIGURE 11.8 Division of the test behavior into regions. (Adapted from Marrero P´erez, A., and Kaiser, S., Multi-level test models for embedded systems, Software Engineering, pages 213–224, c 2010b/GI.) input and output signals. Note that each signal is used by a single region within the test model. Relating test behavior and test interface in this manner implies separating the different abstraction levels within the test behavior—a direct consequence of the signal group definition provided in the previous section. This is not equivalent to creating separate test models for each functional abstraction level, however. Our proposition considers designing in each region of the TMC only the behavior required by the test level-specific signals of the corresponding group, omitting the behavior of other shared signals (cf. Section 11.7.1), which are provided by other regions. With this approach, the functionally most abstract test cases are derived from the abstract model parts, while the least abstract test cases may be modeled by one or more parts of the model, depending on the signals needed. A link between the different model parts will only be required for the synchronization of the test behavior across abstraction levels. Such a synchronization will be necessary for modeling dependencies between signals belonging to different groups, for instance. The separation of test behavior into different parts concerning different abstraction levels permits avoiding the design of TMCs mixing abstract and detailed behavior. This aspect is of particular significance for test case derivation. Multilevel test cases testing at a specific functional abstraction level cannot evaluate more detailed behavior for the following three reasons. Firstly, the corresponding signals may not be provided by the test object. In this case, the signals providing detailed information about the test object behavior will be internal signals within the SUT that are not visible at the SUT interface. Secondly, even in case these internal signals were observable, there would be no point in observing them at the current abstraction level. Functional testing is a black-box technique and so the only interesting observation signals are those at the SUT interface. Thirdly, the ITA will not be able to create these internal signals because the SUT interface will not provide the details that are necessary for reconstructing those internal signals featuring a lower abstraction level. In summary, we have presented an approach for extending multi-level test cases to multilevel test models, which possess a similar structure at a higher functional abstraction level. For the design and implementation of multilevel test models, we basically follow Multilevel Testing for Embedded Systems 287 the strategy for multilevel test cases presented in Section 11.6 with some extensions such as test behavior parts. The resulting test models will cover all abstraction levels and all interface signal groups for a single system’s function. Furthermore, multilevel test cases derived from multilevel test models feature a separation of the test behavior into stimulation and evaluation, as well as into different abstraction levels. 11.8 Case Study: Automated Light Control This section presents a case study of a multilevel test model for the vehicle function ALC. With this example, we aim at validating the proposed approach. This function is similar to the ALC function presented in Schieferdecker et al. (2006). As a proof of concept, we will follow the design strategy proposed in this contribution for the ALC. Hence, we will start specifying a set of test cases using the top-down reuse approach. We will then proceed with the design of the multilevel test model using Simulink 6.5.∗ The design will also include a set of test adapter models. 11.8.1 Test specification The ALC controls the state of the headlights automatically by observing the actual outside illumination and switching them on in the darkness. The automated control can be overridden using the headlight rotary switch, which has three positions: ON , OF F , and AU T O. For this functionality at the very top level, we create two exemplary test cases T C1 and T C2. T C1 switches between OF F and ON , while T C2 brings the car into darkness and back to light while the switch is in the AU T O position. Three ECUs contribute to the ALC, namely the driver panel (DP), the light sensor control (LSC), and the ALC. Figure 11.9 depicts how these ECUs are related. All three control units are connected via a CAN-Bus. The rotary light switch is connected to the DP, the two light sensors report to the LSC, and the headlights are directly driven from the ALC. At the system component test level we reuse both test cases T C1 and T C2 for testing the ALC, but in addition create a new test case T C3 in which the headlights are switched on because of a timeout in the switch position signal in the CAN bus. The existence of LSC DP ALC FIGURE 11.9 ALC system. ∗The MathWorks, Inc.—MATLAB /Simulink /Stateflow , 2006, http://www.mathworks.com/. 288 Model-Based Testing for Embedded Systems timeouts in the CAN bus is not within the scope of the system integration abstraction level, so that this test case belongs to the second test case group. The effect will be that the headlights are ON as long as the current switch position is not received via the CAN bus. The ALC ECU includes two software components: the darkness detector (DD) and the headlight control (HLC). The DD receives an illumination signal from the LSC ECU and returns f alse if it is light outside and true if it is dark. The HLC uses this information as well as the switch position for deciding whether to switch the headlights on. At the software test level, we reuse all three test cases from the test levels above without adding any new test cases. This means that there are no functional details to test that were not already included in the test cases of the test level above. Finally, at the software component level, we reuse the test cases once again but add two new test cases T C4 and T C5. T C4 tests that the headlights are ON when the value 3 is received at the switch position input. This value is invalid, because only the integer values 0 = OF F , 1 = ON , and 2 = AU T O are permitted. When detecting an invalid value, the headlights should be switched on for safety reasons. T C5 checks that variations in the darkness input do not affect the headlights when the switch is in the ON position. This test case is not interesting at any higher test level, although it would be executable. In terms of functional abstraction, it is too detailed for reuse. All five specified test cases are included in Table 11.1. Each test case consists of a set of test steps for each of which there are indications regarding the actions that must be performed and the expected results. In analogy to Hiller et al. (2008), we have also specified at which test levels each test case must be executed. Using a significantly reduced number of test cases and a simple example, this test specification demonstrates how the top-down reuse approach works. We obtain test cases that must be reused at different test levels. Instead of specifying these test cases four times (once at each test level) each time analyzing a different artifact on the V model’s left-hand branch, our approach only requires a single specification. The reduction in effort attained by this identification of common test cases is not only limited to test case creation but also applies to other activities such as reviews or updates. These activities benefit from test specifications that do not include multiple test cases being similar or even identical. 11.8.2 Test model core design We design the core of our multilevel test model in Simulink as shown in Figure 11.10. The TMC generates input values for the HLC software component in the stimulation subsystem, evaluates the headlights state in the evaluation subsystem, and includes an additional test control subsystem that provides the current test step for synchronizing stimulation and evaluation. The test control concept used in this contribution is similar to the one discussed in Zander-Nowicka et al. (2007). The test design follows a bottom-up strategy. The HLC component features two inputs (Switch and Darkness) and a single output (Headlights). All these inputs are utilized by the top-level test cases in Table 11.1 so that there is a single interface group in this example and all test cases will share these interface signals. As a consequence, there will be only one test behavior part within the TMC, even though we will be designing behavior at different abstraction levels. Both TS and TE are designed in a similar way (see Figure 11.11). The behavior of each test step is described separately and later merged. Figure 11.12 provides insight into the stimuli generation for the last test step (cf. Table 11.1). Note that the variables are not declared in the model but as test parameters in MATLAB’s workspace. Figure 11.13 presents the TE for the last test step. The test verdict for this step is computed as specified in Table 11.1. Finally, the test control shown in Figure 11.14 mainly consists of a Stateflow Multilevel Testing for Embedded Systems 289 TABLE 11.1 Test specification for the automated light control function T C1: Headlights ON/OFF switching Test levels: 1, 2, 3, 4 Step 1 2 Actions Set switch to ON Set switch to OF F Pass Conditions headlights = ON headlights = OF F T C2: Headlights automatic ON/OFF Test levels: 1, 2, 3, 4 Step 1 2 3 4 Actions Set switch to AU T O Set illumination to LIGHT Set illumination to DARK Set illumination to LIGHT Pass Conditions headlights = OF F headlights = ON headlights = OF F T C3: Headlights ON—switch CAN timeout Test levels: 1, 2, 3 Step 1 2 3 Actions Set switch to OF F CAN timeout Remove timeout Pass Conditions headlights = OF F headlights = ON headlights = OF F T C4: Headlights ON for invalid switch position input Test levels: 1 Step 1 2 3 4 Actions Set switch to AU T O Set illumination to LIGHT Set switch to INVALID Set switch to AUTO Pass Conditions headlights = OF F headlights = ON headlights = OF F T C5: Darkness variations with switch ON Test levels: 1 Step 1 2 3 4 Actions Set switch to ON Set darkness to f alse Set darkness to true Set darkness to f alse Pass Conditions headlights = ON headlights = ON headlights = ON headlights = ON 290 Test step Test control 1 Headlights FIGURE 11.10 Test model core. 1 Test step Model-Based Testing for Embedded Systems Test step Switch Darkness Test stimulation Switch 1 2 Darkness Test step Headlights Verdict Test evaluation 3 Verdict Inputs_old Inputs Step 1 Inputs_old Inputs Step 2 Inputs_old Inputs Step 3 Inputs_old Inputs Step 4 Memory Multiport switch 1 Switch 2 Darkness FIGURE 11.11 Test stimulation. diagram, where each state represents a test step. Within the diagram, the variable t represents the local time, that is, the time the test case has spent in the current state. Since the test interface exclusively contains shared signals, both stimulation and evaluation models take advantage of this similarity, making the model simpler (cf. Figures 11.12 and 11.13). Besides this modeling effect, the interface abstraction is also evident, particularly for the darkness signal, which is actually a simple Boolean signal instead of the signal coming from the sensor. 11.8.3 Test adapter models We implement two test adapter models (input and output) for every test level in Simulink . These models describe the relation between interfaces across test levels. The aim in this case Multilevel Testing for Embedded Systems 291 Test case TC 1 Inputs_old Darkness False 1 Inputs FIGURE 11.12 Test stimulation (test step 4). Test case TC1 1 Headlights Verdict1 OK == OFF Compare to constant2 Multiport switch 1 Verdict == ON Compare to constant1 FIGURE 11.13 Test evaluation (test step 4). Multiport switch 292 Model-Based Testing for Embedded Systems Step1 en: Test_step = 1; en: End_Sim = 0; [t >= TEST_STEP_TIME_MIN] Step2 en: Test_step = 2; 1 2 [t >= TEST_STEP_TIME_MIN] [TC == TC1] Step3 en: Test_step = 3; [t >= TEST_STEP_TIME_MIN] 2 [TC == TC3] 1 Step4 en: Test_step = 4; End en: End_Sim = 1; FIGURE 11.14 Test control. [t >= TEST_STEP_TIME_MIN] 1 Headlights_ST FIGURE 11.15 Input test adapter model for system testing. 1 Headlights_TP is not to model partial but complete behavior so that the test adapter models are valid for any additionally created scenarios without requiring updates. The ITAMs are in charge of abstracting the test object interface toward the TMC inputs. In the ALC example, there is only a single observation signal in the TMC, namely the Boolean signal headlights. In fact, there will not be much to abstract from. Within the software, we will be able to access this signal, and for the hardware, we have a test platform that is capable of measuring the state of the driver for the relay that is executing on the ECU. Hence, the test adapters basically have to pass the input signals to the outputs while only the name may change (see for instance Figure 11.15). At the system component test level the ITAM is of more interest, as shown in Figure 11.16. Here, the test platform provides inverted information on the relay status, that is, Headlights T P = true implies that the headlights are off . Thus, the test adapter must logically negate the relay signal in order to adapt it to the test model. For the ALC, we applied test adapter model concatenation in order to optimize the modeling. With this practice, it is not necessary to logically negate the relay signal at the system test level anymore (cf. Figure 11.15), even though we observe exactly the same signal. The test team at the system test level uses a test model including a test adapter chain from all test levels below. Hence, the test model already includes the negation. Multilevel Testing for Embedded Systems 293 1 Headlights_TP Not Logical operator FIGURE 11.16 Input test adapter model for system component testing. 1 Headlights_SW 1 Switch 2 Darkness == N/A Compare to constant2 OFF Constant1 Constant2 true == Relational operator DARK 5 95 LIGHT 1 Switch_Avl Switch4 2 Switch_SW Switch3 3 Illumination_avg FIGURE 11.17 Output test adapter model for software testing. The OTAMs do not abstract but refine the test model interfaces introducing additional details. A good example is the OTAM for the software test level that is depicted in Figure 11.17. The first signal to be refined is the switch position. There is another software component within the ALC ECU that merges the signals Switch Avl and Switch SW .∗ As a consequence, the test adapter has to invert this behavior by splitting the switch position. When the position is N/A (not available), the Switch SW signal is set to OF F . The darkness signal also has to be refined toward the Illumination avg, which is a continuous signal containing the average value of the two light sensors and whose range goes from 0 (dark) to 100 (light). In this case, we invert the function of the DD by setting an illumination of 5 % for darkness and 95 % for daylight. The OTAM for system component testing includes a data type conversion block in which a Boolean signal is refined to an 8-bit integer in order to match the data type at the test platform (cf. Figure 11.18). The test platform allows us to filter all CAN messages including the switch position information in order to stimulate a timeout. At the system test level, the status of the switch signal on the CAN bus is out of the abstraction level’s scope (see OTAM in Figure 11.19). Had we actually used this signal for ∗Switch Avl is a Boolean signal created by the CAN bus driver that indicates a timeout when active, that is, the information on the switch position has not been received for some specified period of time and is hence not available. Switch SW represents the signal coming from the CAN bus driver into the software, containing the information on the switch position. 294 1 Switch_Avl 2 Switch_SW 3 Illumination_avg Model-Based Testing for Embedded Systems Data type conversion uint8 1 Switch_CAN_Status 2 Switch_CAN 3 Illumination_avg_CAN FIGURE 11.18 Output test adapter model for system component testing. 1 Switch_CAN_Status Terminator 2 Switch_CAN == OFF Compare to constant1 == ON Compare to constant2 == AUTO Compare to constant3 1 Switch_OFF 2 Switch_ON 3 Switch_AUTO 3 Illumination_avg_CAN % 0.01 5 MaxVolt Product FIGURE 11.19 Output test adapter model for system testing. 4 Illumination_sensor_1 5 Illumination_sensor_2 the TMC interface instead of mapping it to the software component input, this signal would have belonged to the signal group corresponding to the system component test level. The switch delivers three Boolean signals, one for each position. We have to split this signal here so that the DP can merge it again. For the illumination, we must refine the average into two sensor signals delivering a voltage between 0 and 5 V. We opt to provide identical sensor signals through the test adapter for simplicity’s sake. In analogy to the ITAMs, the advantages of test adapter concatenation are clearly visible. Additionally, several interface refinements have been presented. The ALC case study provides insight into the concepts presented in this chapter, particularly for multilevel test models. It is a concise example that demonstrates that the proposed design strategies are feasible and practicable. Even though the test modeling techniques Multilevel Testing for Embedded Systems 295 applied in Simulink feature a rather low abstraction level, the advantages of reuse, particularly in terms of reductions in effort, have become clear. 11.9 Conclusion Multilevel testing is an integrative approach to testing across test levels that is based on test reuse and interface refinement. In this chapter, we presented test specification and test design strategies both for test cases and test models that aim at integrating test levels. We have identified test level integration as an approach promising a major potential for reduction of test effort. The strategies described in this chapter are applicable to both test models and test cases based on the two key instruments of the developed methodology: multilevel test models and multilevel test cases. We paid particular attention to describing the idiosyncrasies of the model-based approach in comparison to using test cases while aiming at formulating a generic and flexible methodology that is applicable to different kinds of models. There are no comparable approaches in literature for test level integration, particularly none that are this extensive. The benefits of multilevel testing include significant reductions in effort, especially if the multilevel test models are utilized only when they are more efficient than conventional approaches. Apart from the efficiency gains, we have discussed further reuse benefits in this contribution. Among the new possibilities provided by our test level integration, these benefits include improving cross-level collaboration between test teams for more systematic testing, supporting test management methods and tools, and automatically establishing a vertical traceability between test cases across test levels. We have evaluated our multilevel testing approach in four industrial projects within production environments. Details of this effort are documented in other work (Marrero P´erez and Kaiser 2010a). As a summary of the results, we compared the test cases at two different test levels and found that around 60% of the test cases were shared by both test levels, which is a very high rate considering that the test levels were not consecutive. We applied our multilevel testing approach to reuse the test cases from the lower test level applying interface refinement and noticed substantial reductions in test effort with respect to test design and test management. The only exception in our analysis was the validation of the test adapter models, which caused greater effort than expected because of our manual approach. However, we expect to noticeably reduce this effort by automating the creation of the test adapter models, an approach whose feasibility has been demonstrated in Scha¨tz and Pfaller (2010). Lastly, our evaluation confirmed that our test level integration approach scales. We validated our approach in projects of different sizes including both small and large test suites and obtained comparable results in both cases. Summing up, we offer an optimized approach tailored to embedded systems, but applicable to other domains whose test process is characterized by multiple test levels. This contribution advocates focusing on systematic methodologies more intensively, after years of concentration on technical aspects within the testing domain. Acknowledgments We would like to thank Oliver Heerde for reviewing this manuscript. 296 Model-Based Testing for Embedded Systems References Aichernig, B., Krenn, W., Eriksson, H., and Vinter, J. (2008). State of the art survey— Part a: Model-based test case generation. Beizer, B. (1990). Software Testing Techniques. International Thomson Computer Press, London, UK, 2 edition. Benz, S. (2007). Combining test case generation for component and integration testing. In 3rd International Workshop on Advances in Model-based Testing (A-MOST 2007), Pages: 23–33. Binder, R.V. (1999). Testing Object-oriented Systems. Addison-Wesley, Reading, MA. Broy, M. (2006). Challenges in automotive software engineering. In 28th International Conference on Software Engineering (ICSE 2006), Pages: 33–42. Bu¨hler, O. and Wegener, J. (2008). Evolutionary functional testing. Computers & Operations Research, 35(10):3144–3160. Burmester, S. and Lamberg, K. (2008). Aktuelle Trends beim automatisierten Steuerger¨atetest. In Gu¨hmann, C., editor, Simulation und Test in der Funktions- und Softwareentwicklung fu¨r die Automobilelektronik II, Pages: 102–111. Deutsche Gesellschaft fu¨r Qualita¨t e.V. (1992). Methoden und Verfahren der SoftwareQualita¨tssicherung, Volume 12-52 of DGQ-ITG-Schrift. Beuth, Berlin, Germany, 1 edition. European Telecommunications Standards Institute. The Testing and Test Control Notation version 3; Part 1: TTCN-3 Core Language, 2009-06. Gisi, M.A. and Sacchi, C. (1993). A positive experience with software reuse supported by a software bus framework. In Advances in software reuse: Selected papers from the 2nd International Workshop on Software Reusability (IWSR-2), Pages: 196–203. Gruszczynski, B. (2006). An overview of the current state of software engineering in embedded automotive electronics. In IEEE International Conference on Electro/Information Technology, Pages: 377–381. Hiller, S., Nowak, S., Paulus, H., and Schmitfranz, B.-H. (2008). Durchga¨ngige Testmethode in der Entwicklung von Motorkomponenten zum Nachweis der Funktionsanforderungen im Lastenheft. In Reuss, H.-C., editor, AutoTest 2008. Larsen, K.G., Mikucionis, M., Nielsen, B., and Skou, A. (2005). Testing real-time embedded software using UPPAAL-TRON. In 5th ACM International Conference on Embedded Software (EMSOFT 2005), Pages: 299–306. Lehmann, E. (2003). Time partition testing. PhD thesis, Technische Universit¨at Berlin. Lindlar, F. and Marrero P´erez, A. (2009). Using evolutionary algorithms to select parameters from equivalence classes. In Schlingloff, H., Vos, T.E.J., and Wegener, J., editors, Evolutionary test generation, Volume 08351 of Dagstuhl Seminar Proceedings. Multilevel Testing for Embedded Systems 297 M¨aki-Asiala, P. (2005). Reuse of TTCN-3 code, Volume 557 of VTT Publications. VTT, Espoo, Finland. Marrero P´erez, A. and Kaiser, S. (2009). Integrating test levels for embedded systems. In Testing: Academic & Industrial Conference – Practice and Research Techniques (TAIC PART 2009), Pages: 184–193. Marrero P´erez, A. and Kaiser, S. (2010a). Bottom-up reuse for multi-level testing. The Journal of Systems and Software, 83(12): 2392–2415. Marrero P´erez, A. and Kaiser, S. (2010b). Multi-level test models for embedded systems. In Software Engineering (SE 2010), Pages: 213–224. Prenninger, W. and Pretschner, A. (2005). Abstractions for model-based testing. Electronic Notes in Theoretical Computer Science, 116:59–71. Sch¨atz, B. and Pfaller, C. (2010). Integrating component tests to system tests. Electronic Notes in Theoretical Computer Science, 260:225–241. Sch¨auffele, J. and Zurawka, T. (2006). Automotive software engineering. Vieweg, Wiesbaden, Germany, 3 edition. Schieferdecker, I., Bringmann, E., and Großmann, J. (2006). Continuous TTCN-3: Testing of embedded control systems. In Workshop on Software Egineering for Automotive Systems (SEAS 2006), Pages: 29–36. Spillner, A., Linz, T., and Schaefer, H. (2007). Software testing foundations. Rockynook, Santa Barbara, CA, 2nd edition. Wartik, S. and Davis, T. (1999). A phased reuse adoption model. The Journal of Systems and Software, 46(1):13–23. Wiese, M., Hetzel, G., and Reuss, H.-C. (2008). Optimierung von E/E-Funktionstests durch Homogenisierung und Frontloading. In Reuss, H.-C., editor, AutoTest 2008. Yellin, D.M. and Strom, R.E. (1997). Protocol specifications and component adaptors. ACM Transactions on Programming Languages and Systems, 19(2):292–333. Zander-Nowicka, J. (2008). Model-based testing of real-time embedded systems in the automotive domain. PhD thesis, Technische Universit¨at Berlin. Zander-Nowicka, J., Marrero P´erez, A., Schieferdecker, I., and Dai, Z. R. (2007). Test design patterns for embedded systems. In Schieferdecker, I. and Goericke, S., editors, Business Process Engineering. 10th International Conference on Quality Engineering in Software Technology (CONQUEST 2007), Pages: 183–200. This page intentionally left blank 12 Model-Based X-in-the-Loop Testing Ju¨rgen Großmann, Philip Makedonski, Hans-Werner Wiesbrock, Jaroslav Svacina, Ina Schieferdecker, and Jens Grabowski CONTENTS 12.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 12.2 Reusability Pattern for Testing Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 12.3 A Generic Closed Loop Architecture for Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 12.3.1 A generic architecture for the environment model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 12.3.1.1 The computation layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 12.3.1.2 The pre- and postprocessing layer and the mapping layer . . . . . . . . . . . . . 305 12.3.2 Requirements on the development of the generic test model . . . . . . . . . . . . . . . . . . . . 305 12.3.3 Running example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 12.4 TTCN-3 Embedded for Closed Loop Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 12.4.1 Basic concepts of TTCN-3 embedded . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 12.4.1.1 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 12.4.1.2 Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 12.4.1.3 Access to time stamps and sampling-related information . . . . . . . . . . . . . . 313 12.4.1.4 Integration of streams with existing TTCN-3 data structures . . . . . . . . . 314 12.4.1.5 Control flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 12.4.2 Specification of reusable entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 12.4.2.1 Conditions and jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 12.4.2.2 Symbol substitution and referencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 12.4.2.3 Mode parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 12.5 Reuse of Closed Loop Test Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 12.5.1 Horizontal reuse of closed loop test artifacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 12.5.2 Vertical reuse of environment models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 12.5.3 Test reuse with TTCN-3 embedded, Simulink , and CANoe . . . . . . . . . . . . . . . . . . . 323 12.5.4 Test asset management for closed loop tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 12.6 Quality Assurance and Guidelines for the Specification of Reusable Assets . . . . . . . . . . . . 328 12.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Software-driven electronic control units (ECUs) are increasingly adopted in the creation of more secure, comfortable, and flexible systems. Unlike conventional software applications, ECUs are real-time systems that may be affected directly by the physical environment they operate in. Whereas for software applications testing with specified inputs and checking whether the outputs match the expectations are in many cases sufficient, such an approach is no longer adequate for the testing of ECUs. Because of the real-time requirements and the close interrelation with the physical environment, proper testing of ECUs must directly consider the feedback from the environment, as well as the feedback from the system under test (SUT) to generate adequate test input data and calculate the test verdict. Such simulation and testing approaches dedicated to verify feedback control systems are normally realized using so-called closed loop architectures (Montenegro, Jhnichen, and Maibaum 2006, 299 300 Model-Based Testing for Embedded Systems Lu et al. 2002, Kendall and Jones 1999), where the part of the feedback control system that is being verified is said to be “in the loop.” During the respective stages in the development lifecycle of ECUs, models, software, and hardware are commonly placed in the loop for testing purposes. Currently, often proprietary technologies are used to set up closed loop testing environments and there is no methodology that allows the technology-independent specification and systematic reuse of testing artifacts, such as tests, environment models, etc. for closed loop testing. In this chapter, we propose such a methodology, namely “X-in-the-Loop testing,” which encompasses the testing activities and the involved artifacts during the different development stages. This work is based on the results from the TEMEA project∗. Our approach starts with a systematic differentiation of the individual artifacts and architectural elements that are involved in “X-in-the-loop” testing. Apart from the SUT and the tests, the environment models, in particular, must be considered as a subject of systematic design, development, and reuse. Similar to test cases, they shall be designed to be independent from test platform-specific functionalities and thus be reusable on different testing levels. This chapter introduces a generic approach for the specification of reusable “X-in-theloop” tests on the basis of established modeling and testing technologies. Environment modeling in our context will be based on Simulink r (The MathWorks 2010b). For the specification and realization of the tests, we propose the use of TTCN-3 embedded (TEMEA Project 2010), an extended version of the standardized test specification language TTCN-3 (ETSI 2009b, ETSI 2009a). The chapter starts with a short motivation in Section 12.1 and provides some generic information about artifact reuse in Section 12.2. In Section 12.3, we describe an overall test architecture for reusable closed loop tests. Section 12.4 introduces TTCN-3 embedded, Section 12.5 provides examples on how vertical and horizontal reuse can be applied to test artifacts, and Section 12.6 presents reuse as a test quality issue. Section 12.7 concludes the chapter. 12.1 Motivation An ECU usually interacts directly with its environment, using sensors and actuators in the case of a physical environment, and with network systems in the case of an environment that consists of other ECUs. To be able to run and test such systems, the feedback from the environment is essential and must usually be simulated. Normally, such a simulation is defined by so-called environment models that are directly linked with either the ECU itself during Hardware-in-the-Loop (HiL) tests, the software of the ECU during Software-in-theLoop (SiL) tests, or in the case of Model-in-the-Loop (MiL) tests, with an executable model of the ECU’s software. Apart from the technical differences that are caused by the different execution objects (an ECU, the ECU’s software, or a model of it), the three scenarios are based on a common architecture, the so-called closed loop architecture. Following this approach, a test architecture can be structurally defined by generic environment models and specific functionality-related test stimuli that are applied to the closed loop. The environment model and the SUT constitute a self-contained functional entity, which is executable without applying any test stimuli. To accommodate such an architecture, test scenarios in this context apply a systematic interference with the intention to disrupt the functionality of the SUT and the environment model. The specification of ∗The project TEMEA “Testing Specification Technology and Methodology for Embedded Real-Time Systems in Automobiles” (TEMEA 2010) is co-financed by the European Union. The funds are originated from the European Regional Development Fund (ERDF). Model-Based X-in-the-Loop Testing 301 such test scenarios has to consider certain architectural requirements. We need reactive stimulus components for the generation of test input signals that depend on the SUT’s outcome, assessment capabilities for the analysis of the SUT’s reaction, and a verdict setting mechanism to propagate the test results. Furthermore, we recommend a test specification and execution language, which is expressive enough to deal with reactive control systems. Because of the application of model-based software development strategies in the automotive domain, the design and development of reusable models are well known and belong to the state of the art (Conrad and Do¨rr 2006, Fey et al. 2007, Harrison et al. 2009). These approved development strategies and methods can be directly ported to develop highly reusable environment models for testing and simulation and thus provide a basis for a generic test architecture that is dedicated to the reuse of test artifacts. Meanwhile, there are a number of methods and tools available for the specification and realization of environment models, such as Simulink r and Modelica (Modelica Association 2010). Simulink r in particular is supported by various testing tools and is already well established in the automotive industry. Both Modelica and Simulink r provide a solid technical basis for the realization of environment models, which can be used either as self-contained simulation nodes, or, in combination with other simulation tools, as part of a co-simulation environment. 12.2 Reusability Pattern for Testing Artifacts Software reuse (Karlsson 1995) has been an important topic in software engineering in both research and industry for quite a while now. It is gaining a new momentum with emerging research fields such as software evolution. Reuse of existing solutions for complex problems minimizes extra work and the opportunity to make mistakes. The reuse of test specifications, however, has only recently been actively investigated. Notably, the reusability of TTCN-3 tests has been studied in detail as a part of the Tests & Testing Methodologies with Advanced Languages (TT-Medal) (TT-Medal 2010) project, but the issue has been investigated also in Karinsalo and Abrahamsson 2004, Ma¨ki-Asiala 2004. Reuse has been studied on three levels within TT-Medal—TTCN-3 language level (Ma¨ki-Asiala, K¨arki, and Vouffo 2006, Ma¨ki-Asiala et al. 2005), test process level (M¨antyniemi et al. 2005), and test system level (K¨arki et al. 2005). In the following, focus will be mainly on reusability on the TTCN-3 language level and how the identified concepts transfer to TTCN-3 embedded. Means for the development of reusable assets generally include establishing and maintaining a good and consistent structure of the assets, the definition of and adherence to standards, norms, and conventions. It is furthermore necessary to establish well-defined interfaces and to decouple the assets from the environment. In order to make the reusable assets also usable, detailed documentation is necessary, but also the proper management of the reusable assets, which involves collection and classification for easy locating and retrieval. Additionally, the desired granularity of reuse has to be established upfront so that focus can be put on a particular level of reuse, for example, on a component level or on a function level. On the other hand, there are the three viewpoints on test reuse as identified in Ka¨rki et al. 2005, M¨aki-Asiala 2004, M¨aki-Asiala et al. 2005: Vertical - which is concerned with the reuse between testing levels or types (e.g., component and integration testing, functional and performance testing); 302 Model-Based Testing for Embedded Systems Horizontal - which is concerned with the reuse between products in the same domain or family (e.g., standardized test suites, tests for product families, or tests for product lines); Historical - which is concerned with the reuse between product generations (e.g., regression testing). While the horizontal and historical viewpoints have been long recognized in software reuse, vertical reuse is predominantly applicable only to test assets. “X-in-the-Loop” testing is closest to the vertical viewpoint on reuse, although it cannot be entirely mapped to any of the viewpoints since it also addresses the horizontal and historical viewpoints as described in the subsequent sections. Nevertheless, the reuse of real-time test assets can be problematic, since, similar to realtime software, context-specific time, and synchronization constraints are often embedded in the reusable entities. Close relations between the base functionality and the real-time constraints often cause interdependencies that reduce the reusability potential. Thus, emphasis shall be placed on context-independent design from the onset of development, identifying possible unwanted dependencies on the desired level of abstraction and trying to avoid them whenever a feasible alternative can be used instead. This approach to reuse, referred to as “revolutionary” (Ma¨ntyniemi et al. 2005), or “reuse by design” involves more upfront planning and a sweeping transformation on the organizational level, which requires significant experience in reuse. The main benefit is that multiple implementations are not necessary. In contrast, the “evolutionary” (Ma¨ntyniemi et al. 2005) approach to reuse involves a gradual transition toward reusable assets, throughout the development, by means of adaptation and refactoring to suit reusability needs as they emerge. The evolutionary approach requires less upfront investment in and knowledge of reuse and involves less associated risks, but in turn may also yield less benefits. Knowledge is accumulated during the development process and the reusability potential is identified on site. Such an approach is better suited for vertical reuse in systems where requirements are still changing often. Despite the many enabling factors from a technological perspective, a number of organizational factors inhibiting the adoption of reuse, as well as the risks involved, have been identified in Lynex and Layzell 1997, Lynex and Layzell 1998, Tripathi and Gupta 2006. Such organizational considerations are concerned primarily with the uncertainties related to the potential for reusability of production code and its realization. The basic principle of this approach is to develop usable assets first, then turn them into reusable ones. In the context of “X-in-the-Loop” testing, the aim is to establish reusability as a design principle, by providing a framework, an architecture, and support at the language level. Consequently, a revolutionary approach to the development of the test assets is necessary. 12.3 A Generic Closed Loop Architecture for Testing A closed loop architecture describes a feedback control system. In contrast to open loop architectures and simple feedforward controls, models, especially the environment model, form a central part of the architecture. The input data for the execution object is calculated directly by the environment model, which itself is influenced by the output of the execution object. Thus, both execution object and environment model form a self-contained entity. In terms of testing, closed loop architectures are more difficult to handle than open loop architectures. Instead of defining a set of input data and assessing the related output data, Model-Based X-in-the-Loop Testing 303 tests in a closed loop scenario have to be integrated with the environment model. Usually, neither environment modeling nor the integration with the test system and the individual tests are carried out in a generic way. Thus, it would be rather difficult to properly define and describe test cases, to manage them, and even to reuse them partially. In contrast, we propose a systematic approach on how to design reusable environment models and test cases. We think of an environment model as defining a generic test scenario. The individual test cases are defined as perturbations of the closed loop runs in a controlled manner. A test case in this sense is defined as a generic environment model (basic test scenario) together with the description of its intended perturbation. Depending on the underlying test strategies and intentions, the relevant perturbations can be designed based on functional requirements (e.g., as black box tests) or derived by manipulating standard inputs stochastically to test robustness. They can also be determined by analyzing limit values or checking interfaces. In all cases, we obtain a well-defined setting for closed loop test specifications. To achieve better maintenance and reusability, we rely on two basic principles. First, we introduce a proper architecture, which allows the reuse of parts of an environment model. Secondly, we propose a standard test specification language, which is used to model the input perturbations and assessments and allows the reuse of parts of the test specification in other test environments. 12.3.1 A generic architecture for the environment model For the description of the environment model, we propose a three layer architecture, consisting of a computation layer, a pre- and postprocessing layer, and a mapping (Figure 12.1). The computation layer contains the components responsible for the stimulation of the SUT (i.e., the abstract environment model and the perturbation component) and for the assessment of the SUT’s output (i.e., the assessment component). The pre- and postprocessing <> SUT Environment model and tests Mapping adapter Mapping layer Mapping adapter Pre- and postprocessing layer Preprocessing component Postprocessing component Computation layer Abstract environment model Perturbation Assessment FIGURE 12.1 Closed loop architecture. Test result 304 Model-Based Testing for Embedded Systems layer contains the preprocessing and postprocessing components, which are responsible for signal transformation. The mapping layer contains the so-called mapping adapters that provide the direct connectivity between the SUT and the environment model. Based on this architecture, we will present a method for the testing of controls with feedback. In what follows, we present a more detailed description of the individual entities of the environment model. 12.3.1.1 The computation layer An essential part in closed loop test systems is the calculation of the environmental reaction, that is, for each simulation step, the input values for the execution object are computed by means of the output data and the abstract environment model. Moreover, special components are dedicated to testing. The perturbation component is responsible for the production of test stimuli by means of test perturbations and the assessment component is responsible for the verdict setting. • Abstract environment model The abstract environment model is a generic model that simulates the environment where the ECU operates in. The level of technical abstraction is the same as that of the test cases we intend to specify. In a model-based approach, such a model can be designed as a reusable entity and realized using commercial modeling tools like Simulink, Stateflow r (The MathWorks 2010c), or Modelica. Hardware entities that are connected to the test environment may be used as well. Moreover, the character of the abstract environment model directly depends on the chosen test level. For a module test, such a test model can be designed using Simulink. In an integration test, performed with CANoe (Vector Informatics 2010), for example, the model can be replaced by a Communication Application Programming Language (CAPL) Node. And last but not least, in a HiL scenario, this virtual node can be replaced by other ECUs that interoperate with the SUT via Controller Area Network (CAN) bus communication. • Test perturbation As sketched above, a test stimulus specification is defined as a perturbation of the closed loop run. Thus, the closed loop is impaired and the calculation of the input data is partially dissociated from its original basis, that is, from the environment model output or the SUT output. In this context, the perturbation is defined by means of data generation algorithms that replace or alter the original input computation of the closed loop. The simplest way is to replace the original input computation by a synthetic signal. With the help of advanced constructs, already existing signals can be manipulated in various ways, such as adding an offset, scaling it with a factor, etc. For the algorithmic specification of perturbation sequences, we will use TTCN-3 embedded, the hybrid and continuous systems extension to TTCN-3. • Assessment component In order to conduct an assessment, different concepts are required. The assessment component must observe the outcome of the SUT and set verdicts respectively, but it should not interfere (apart from possible interrupts) with the stimulation of the SUT. TTCN-3 is a standardized test specification and assessment language and its hybrid and continuous systems extension TTCN-33e provides just the proper concepts for the definition of assessments for continuous and message-based communication. Furthermore, by relying on a standard, assessment specifications can be reused in other environments, for example, conformance testing, functional testing, or interoperability testing. Model-Based X-in-the-Loop Testing 305 12.3.1.2 The pre- and postprocessing layer and the mapping layer The pre- and postprocessing layer (consisting of pre- and postprocessing components) and the mapping layer (consisting of mapping adapters) provide the intended level of abstraction between the SUT and the computation layer. Please note that we have chosen the following perspective here—preprocessing refers to the preparation of the data that is emitted by the SUT and fed into the testing environment and postprocessing refers to the preparation of the data that is emitted by the test perturbation or the abstract environment model and sent to the SUT. • Preprocessing component The preprocessing component is responsible for the measurement and preparation of the outcome of the SUT for later use in the computation layer. Usually, it is neither intended nor possible to assess the data from the SUT without preprocessing. We need an abstract or condensed version of the output data. For example, the control of a lamp may be performed by pulse-width modulated current. To perform a proper assessment of the signal, it would suffice to know the duty cycle. The preprocessing component serves as such an abstraction layer. This component can be easily designed and developed using modeling tools, such as Simulink, Stateflow, or Modelica. • Postprocessing component The postprocessing component is responsible for the generation of the concrete input data for the SUT. It adapts the low-level interfaces of the SUT to the interfaces of the computation layer, which are usually more abstract. This component is best modeled with the help of Simulink, Stateflow, Modelica, or other tools and programming languages, which are available in the underlying test and simulation infrastructure. • Mapping adapter The mapping adapter is responsible for the syntactic decoupling of the environment model and the execution object, which in our case is the SUT. Its main purpose is to relate (map) the input ports of the SUT to the output ports of the environment model and vice versa. Thus, changing the names of the interfaces and ports of the SUT would only lead to slight changes in the mapping adapter. 12.3.2 Requirements on the development of the generic test model Another essential part of a closed loop test is the modeling of the environment feedback. Such a model is an abstraction of the environment the SUT operates in. Because this model is only suited for testing and simulation, generally we need not be concerned with completeness and performance issues in any case. However, closed loop testing always depends on the quality of the test model and we should carefully develop this model to get reliable results when using real-time environments as well as when using simpler software-based simulation environments. Besides general quality issues that are well-known from model-based development, the preparation of reusable environment models must address some additional aspects. Because of the signal transformations in the pre- and postprocessing components, we can design the test model at a high level of abstraction. This eases the reuse in different environments. Moreover, we are not bound to a certain processor architecture. When we use processors with floating point arithmetic in our test system, we do not have to bother with scalings. The possibility to upgrade the performance of our test system, for example, by adding more random access memory or a faster processor helps mitigate performance problems. Thus, when we develop a generic test model, we are primarily interested in the correct 306 Model-Based Testing for Embedded Systems functionality and less in the performance or the structural correctness. We will therefore focus on functional tests using the aforementioned model and disregard structural tests. To support the reuse in other projects or in later phases of the same project, we should carefully document the features of the model, their abstractions, their limits, and the meaning of the model parameters. We will need full version management for test models in order to be able to reproducibly check the correctness of the SUT behavior with the help of closed loop tests. In order to develop proper environment models, high skills in the application domain and good capabilities in modeling techniques are required. The test engineer, along with the application or system engineer shall therefore consult the environment modeler to achieve sufficient testability and the most appropriate level of abstraction for the model. 12.3.3 Running example As a guiding example, we will consider an engine controller that controls the engine speed by opening and closing the throttle. It is based on a demonstration example (Engine Timing Model with Closed Loop Control) provided by the MATLAB r (The MathWorks 2010a) and Simulink r tool suites. The engine controller model (Figure 12.2) has three input values, namely, the valve trigger (Valve_Trig_), the engine speed (Eng_Speed_), and the set point for the engine speed (Set_Point_). Its objective is the control of the air flow to the engine by means of the throttle angle (Throttle_Angle). In contrast to the controller model, the controller software is specifically designed and implemented to run on a real ECU with fixed-point arithmetic, thus we must be concerned with scaling and rounding when testing the software. The environment model (Figure 12.3) is a rather abstract model of a four-cylinder spark ignition engine. It processes the engine angular velocity (shortened: engine speed [rad/s]) and the crank angular velocity (shortened: crank speed [rad/s]) controlled by the throttle angle. The throttle angle is the control variable, the engine speed the observable. The angle of the throttle controls the air flow into the intake manifold. Depending on the engine speed, the air is forced from the intake manifold into the compression subsystem and periodically pumped into the combustion subsystem. By changing the period for pumping, the air mass provided for the combustion subsystem is regulated, and the torque of the engine, as well as the engine speed, is controlled. Technically, the throttle angle should be controlled at any valve trigger in such a way that the engine speed approximately reaches the set point speed. To model this kind of environment model, we use floating point arithmetic. It is a rather small model, but it is reasonable enough to test basic features of our engine controller (e.g., the proper functionality, robustness, and stability). Moreover, we can potentially extend the scope of the model by providing calibration parameters to adapt the behavior of the model to different situations and controllers later. [Valve_Trig_] [Set_Point_] Desired rpm [Eng_Speed_] N Throttle angle Controller FIGURE 12.2 ECU model for a simple throttle control. [Throttle_Angle] Model-Based X-in-the-Loop Testing 307 Air Fuel Throttle angle in Throttle angle Engine speed, N Mass (k+1) Trigger Mass (k) Mass (k+1) Trigger Throttle and manifold Compression Edge180 N Valve timing Air charge Torque N Combustion Load Drag torque 1 Valve trigger Vehicl e Dynam ics Teng N Tload rad/s to rpm 3 Crank speed (rad/s) 30/pi 4 Engine speed (rpm) 2 Engine speed (rad/s) FIGURE 12.3 Environment model for a simple throttle control. To test the example system, we will use the test architecture proposed in Section 12.3.1. The perturbation and assessment component is a TTCN-3 embedded component that compiles to a Simulink r S-Function. Figure 12.4 shows the complete architecture and identifies the individual components. The resulting test interface, that is, the values that can be assessed and controlled by the perturbation and assessment component are depicted in Table 12.1. Note that we use a test system centric perspective for the test interface, that is, system inputs are declared as outputs and system or environment model outputs as inputs. The mapping between the test system-specific names and the system and environment model-specific names is defined in the first and second column of the table. On the basis of the test architecture and the test interface, we are now able to identify typical test scenarios: • The set point speed to_Set_Point jumps from low to high. How long will it take till the engine speed ti_Engine_Speed reaches the set point speed? • The set point speed to_Set_Point falls from high to low. How long will it take till the engine speed ti_Engine_Speed reaches the set point speed? Are there any overregularizations? • The engine speed sensor retrieves perturbed values to_Engine_Perturbation. How does the controller behave? 308 Model-Based Testing for Embedded Systems System under test Controller (SUT) Desired rpm Throttle angle N [Set_Point_] [Set_Point_] [Eng_Speed_] [Valve_Trig_] Mapping/pre- and postprocessing layer [Eng_Speed_] [Valve_Trig_] [Throttle_ Angle] [Throttle_ Angle] [to_Set_Point] [to_Eng_Speed] [to_Set_Point] [to_Eng_Speed] [Valve_Trig] [Throttle_Angle_] Computation layer [Valve_Trig] Valve trigger Engine speed (rad/s) Crank speed (rad/s) Throttle angle in Engine speed (rpm) Throttle angle out [Throttle_Angle_] Engine (Abstract environment model) Add + + to_Engine_Perturbation Verdict output CTTCN3_Tester [ti_Crank_Speed] [ti_Crank_Speed] [ti_Engine_Speed] [ti_Throttle_Angle] [ti_Engine_Speed] [ti_Throttle_Angle] TTCN-3 (Perturbation and assessment) Assessment output FIGURE 12.4 Test architecture in Simulink r . TABLE 12.1 Engine Controller Test Interface Test System Symbol System Symbol ti_Crank_Speed ti_Engine_Speed ti_Throttle_Angle to_Engine_Perturbation to_Set_Point Crank Speed (rad/s) Engine Speed (rpm) Throttle Angle Out Eng_Speed_ Set_Point_ Direction In In In Out Out Unit rpm rpm rad rpm rpm Data Type Double Double Double Double Double Model-Based X-in-the-Loop Testing 309 12.4 TTCN-3 Embedded for Closed Loop Tests For the specification of the tests, we rely on a formal testing language, which provides dedicated means to specify the stimulation of the system and the assessment of the system’s reaction. To emphasize the reusability, the language should provide at least sufficient support for modularization as well as support for the specification of reusable entities, such as functions, operations, and parameterization. In general, a testing language for software-driven hybrid control systems should provide suitable abstractions to define and assess analog and sampled signals. This is necessary in order to be able to simulate the SUT’s physical environment and to interact with dedicated environment models that show continuous input and output signals. On the other hand, modern control systems consist of distributed entities (e.g., controllers, sensors, actuators) that are interlinked by network infrastructures (e.g., CAN or FlexRay buses in the automotive domain). These distributed entities communicate with each other by exchanging complex messages using different communication paradigms such as asynchronous eventbased communication or synchronous client server communication.∗ Typically, this kind of behavior is tested using testing languages, which provide support for event- or messagebased communication and provide means to assess complex data structures. In recent years, there have been many efforts to define and standardize formal testing languages. In the telecommunications industry, the Testing and Test Control Notation (TTCN-3) (ETSI 2009b, ETSI 2009a) is well established and widely proliferated. The language is a complete redefinition of the Tree and Tabular Combination Notation (TTCN-2) (ISO/IEC 1998). Both notations are standardized by the European Telecommunications Standards Institute (ETSI) and the International Telecommunication Union (ITU). Additional testing and simulation languages, especially ones devoted to continuous systems in particular, are available in the field of hardware testing or control system testing. The Very High Speed Integrated Circuit Hardware Description Language (VHDL) (IEEE 1993) and its derivative for analog and mixed signals (VHDL-AMS) (IEEE 1999) is useful in the simulation of discrete and analogue hardware systems. However, both languages were not specifically designed to be testing languages. The Boundary Scan Description Language (BSDL) (Parker and Oresjo 1991) and its derivative the Analog Boundary Scan Description Language (ABSDL) (Suparjo et al. 2006) are testing languages that directly support the testing of chips using the boundary scan architecture (IEEE 2001) defined by the Institute of Electrical and Electronics Engineers (IEEE). The Time Partition Testing Method (TPT) (Bringmann and Kraemer 2006) and the Test Markup Language (TestML) (Grossmann and Mueller 2006) are approaches that have been developed in the automotive industry about 10 years ago, but are not yet standardized. The Abbreviated Test Language for All Systems (ATLAS) (IEEE 1995) and its supplement, the Signal and Method Modeling Language (SMML) (IEEE 1998), define a language set that was mainly used to test control systems for military purposes. Moreover, the IEEE currently finalizes the standardization of an XML-based test exchange format, namely the Automatic Test Mark-up Language (ATML) (SCC20 ATML Group 2006), which is dedicated to exchanging information on test environments, test setups, and test results in a common way. The European Space Agency (ESA) defines requirements on a language used for the development of automated test and operation procedures and standardized a reference language called Test and Operations Procedure Language (ESA-ESTEC 2008). Last, but not ∗Please refer to AUTOSAR (AUTOSAR Consortium 2010), which yields a good example of an innovative industry-grade approach to designing complex control system architectures for distributed environments. 310 Model-Based Testing for Embedded Systems least, there exist a huge number of proprietary test control languages that are designed and made available by commercial test system manufacturers or are developed and used in-house only. Most of the languages mentioned above are neither able to deal with complex discrete data that are exhaustively used in network interaction, nor with distributed systems. On the other hand, TTCN-3, which is primarily specializing in testing distributed network systems, lacks support for discretized or analogue signals to stimulate or assess sensors and actuators. ATML, which potentially supports both, is only an exchange format, yet to be established, and still lacking user-friendly representation formats. The TTCN-3 standard provides a formal testing language that has the power and expressiveness of a normal programming language with formal semantics and a user-friendly textual representation. It also provides strong concepts for the stimulation, control, and assessment of message-based and procedure-based communication in distributed environments. Our anticipation is that these kinds of communication will become much more important for distributed control systems in the future. Additionally, some of these concepts can be reused to define signal generators and assessors for continuous systems and thus provide a solid basis for the definition of analogue and discretized signals. Finally, the overall test system architecture proposed in the TTCN-3 standard (ETSI 2009c) shows abstractions that are similar to the ones we defined in Section 12.3.1. The TTCN-3 system adapter and the flexible codec entities provide abstraction mechanisms that mediate the differences between the technical SUT interface and the specification level interfaces of the test cases. This corresponds to the pre- and postprocessing components from Section 12.3.1. Moreover, the TTCN-3 map statement allows the flexible specification of the mappings between so-called ports at runtime. Ports in TTCN-3 denote the communication-related interface entities of the SUT and the test system. Hence, the map statement directly corresponds to the mapping components from Section 12.3.1. In addition, the TTCN-3 standard defines a set of generic interfaces (i.e., Test Runtime Interface (TRI) (ETSI 2009c), Test Control Interface (TCI) (ETSI 2009d)) that precisely specify the interactions between the test executable, the adapters, and the codecs, and show a generalizable approach for a common test system architecture. Last, but not least, the TTCN-3 standard is actually one of the major European testing standards with a large number of contributors. To overcome its limitations and to open TTCN-3 for embedded systems in general and for continuous real-time systems in particular, the standard must be extended. A proposal for such an extension, namely TTCN-3 embedded (TEMEA Project 2010), was developed within the TEMEA (TEMEA 2010) research project and integrates former attempts to resolve this issue (Schieferdecker and Grossmann 2007, Schieferdecker, Bringmann, and Grossmann 2006). Next, we will outline the basic constructs of TTCN-3 and TTCN-3 embedded and show how the underlying concepts fit to our approach for closed loop testing. 12.4.1 Basic concepts of TTCN-3 embedded TTCN-3 is a procedural testing language. Test behavior is defined by algorithms that typically assign messages to ports and evaluate messages from ports. For the assessment of different alternatives of expected messages, or timeouts, the port queues and the timeout queues are frozen when the assessment starts. This kind of snapshot semantics guarantees a consistent view on the test system input during an individual assessment step. Whereas the snapshot semantics provides means for a pseudo parallel evaluation of messages from several ports, there is no notion of simultaneous stimulation and time-triggered evaluation. To enhance the core language for the requirements of continuous and hybrid behavior, we Model-Based X-in-the-Loop Testing 311 introduce the following: • The notions of time and sampling. • The notions of streams, stream ports, and stream variables. • The definition of statements to model a control flow structure similar to that of hybrid automata. We will not present a complete and exhaustive overview of TTCN-3 embedded.∗ Instead, we will highlight some basic concepts, in part by providing examples and show the applicability of the language constructs to the closed loop architecture defined in the sections above. 12.4.1.1 Time TTCN-3 embedded provides dedicated support for time measurement and time-triggered control of the test system’s actions. Time is measured using a global clock, which starts at the beginning of each test case. The actual value of the clock is given as a float value that represents seconds and which is accessible in TTCN-3 embedded using the keyword now. The clock is sampled, thus it is periodically updated and has a maximum precision defined by the sampling step size. The step size can be specified by means of annotations to the overall TTCN-3 embedded module. Listing 12.1 shows the definition of a TTCN-3 embedded module that demands a periodic sampling with a step size of 1 msec. Listing 12.1 Time module myModule { . . . } with { s t e p s i z e ” 0 . 0 0 1 ” } 1 12.4.1.2 Streams TTCN-3 supports a component-based test architecture. On a conceptual level, test components are the executable entities of a test program. To realize a test setup, at least one test component and an SUT are required. Test components and the SUT are communicating by means of dedicated interfaces called ports. While in standard TTCN-3 interactions between the test components and the SUT are realized by sending and receiving messages through ports, the interaction between continuous systems can be represented by means of so-called streams. In contrast to scalar values, a stream represents the entire allocation history applied to a port. In computer science, streams are widely used to describe finite or infinite data flows. To represent the relation to time, so-called timed streams (Broy 1997, Lehmann 2004) are used. Timed streams additionally provide timing information for each stream value and thus enable the traceability of timed behavior. TTCN-3 embedded provides timed streams. In the following, we will use the term (measurement) record to denote the unity of a stream value and the related timing in timed streams. Thus, concerning the recording of continuous data, a record represents an individual measurement, consisting of a stream value that represents the data and timing information that represents the temporal perspective of such a measurement. TTCN-3 embedded sends and receives stream values via ports. The properties of a port are described by means of port types. Listing 12.2 shows the definition of a port type for incoming and outgoing streams of the scalar type float and the definition of a component type that defines instances of these port types (ti_Crank_Speed, ti_Engine_Speed, ∗For further information on TTCN-3 embedded, please refer to (TEMEA 2010). 312 Model-Based Testing for Embedded Systems ti_Throttle_Angle, to_Set_Point, to_Engine_Perturbation) with the characteristics defined by the related port-type specifications (Listing 12.2, Lines 5 and 7). Listing 12.2 Ports type port FloatInPortType stream { in f l o a t } ; 1 type port FloatOutPortType stream {out f l o a t } ; 2 3 type component E n g i n e T e s t e r { 4 port FloatInPortType ti Engine Speed , ti Crank Speed , ti T h r o t t l e A n g l e ; 5 port FloatOutPort to Engine Perturbation , to Set Point ; 6 } 7 With the help of TTCN-3 embedded language constructs, it is possible to modify, access, and assess stream values at ports. Listing 12.3 shows how stream values can be written to an outgoing stream and read from an incoming stream. Listing 12.3 Stream Value Access to Set Point . value := 5000.0; 1 to Set Point . value := ti Engine Speed . value + 2 0 0 . 0 ; 2 Apart from the access to an actual stream value, TTCN-3 embedded provides access to the history of stream values by means of index operators. We provide time-based indices and sample-based indices. The time-based index operator at interprets time parameters as the time that has expired since the test has started. Thus, the expression ti_Engine_Speed.at(10.0).value yields the value that has been available at the stream 10 s after the test case has started. The sample-based index operator prev interprets the index parameter as the number of sampling steps that have passed between the actual valuation and the one that will be returned by the operator. Thus, t_engine.prev(12).value returns the valuation of the stream 12 sampling steps in the past. The expression t_engine.prev.value is a short form of t_engine.prev(1).value. Listing 12.4 shows some additional expressions based on the index operators. Listing 12.4 Stream History Access to Set Point . value := ti Engine Speed . at ( 1 0 . 0 ) . value ; 1 t o S e t P o i n t . value := t i E n g i n e S p e e d . prev . value 2 + t i E n g i n e S p e e d p e r t u r b . prev ( 2 ) . value ; 3 Using the assert statement, we can assess the outcome of the SUT. The assert statement specifies the expected behavior on the SUT by means of relational expressions. Hence, we can use simple relational operators that are already available in standard TTCN-3 and apply them to the stream valuation described above to express predicates on the behavior of the system. If any of the predicates specified by an active assert statement is violated, the test verdict is automatically set to fail and the test fails. Listing 12.5 shows the specification of an assertion that checks whether the engine speed is in the range between 1000 and 3000 rpm. Model-Based X-in-the-Loop Testing 313 Listing 12.5 Assert assert ( ti Engine Speed . value > 1000.0 , 1 ti Engine Speed . value < 3000.0) ; 2 // the values must be in the range ]1000.0 , 3000.0[. 3 12.4.1.3 Access to time stamps and sampling-related information To complement the data of a stream, TTCN-3 embedded additionally provides access to sampling-related information, such as time stamps and step sizes, so as to provide access to all the necessary information related to measurement records of a stream. The time stamp of a measurement is obtained by means of the timestamp operator. The timestamp operator yields the exact measurement time for a certain stream value. The exact measurement time denotes the moment when a stream value has been made available to the test system’s input and thus strongly depends on the sampling rate. Listing 12.6 shows the retrieval of measurement time stamps for three different measurement records. Line 3 shows the retrieval of the measurement time for the actual measurement record of the engine speed, Line 4 shows the same for the previous record, and Line 5 shows the time stamp of the last measurement record that has been measured before or at 10 s after the start of the test case. Listing 12.6 Time Stamp Access var float myTimePoint1 , myTimePoint2 , myTimePoint3 ; 1 ... 2 myTimePoint1 := t i E n g i n e S p e e d . timestamp ; 3 myTimePoint2 := t i E n g i n e S p e e d . prev ( ) . timestamp ; 4 myTimePoint3 := t i E n g i n e S p e e d . at ( 1 0 . 0 ) . timestamp ; 5 As already noted, the result of the timestamp operator directly relates to the sampling rate. The result of ti_Engine_Speed.timestamp need not be equal to now, when we consider different sampling rates at ports. The same applies to the expression ti_Engine_Speed.at(10.0).timestamp. Dependent on the sampling rate, it may yield 10.0, or possibly an earlier time (e.g., when the sampling rate is 3.0, we will have measurement records for the time points 0.0, 3.0, 6.0, and 9.0 and the result of the expression will be 9.0). In addition to the timestamp operator, TTCN-3 embedded enables one to obtain the step size that has been used to measure a certain value. This information is provided by the delta operator, which can be used in a similar way as the value and the timestamp operators. The delta operator returns the size of the sampling step (in seconds) that precedes the measurement of the respective measurement record. Thus, ti_Engine_Speed.delta returns: ti_Engine_Speed.timestamp - ti_Engine_Speed.prev.timestamp Please note, TTCN-3 embedded envisions dynamic sampling rates at ports. The delta and timestamp operators are motivated by the implementation of dynamic sampling strategies and thus can only develop their full potential in such contexts. Because of the space limitations, the corresponding, concepts are not explained here. 314 Model-Based Testing for Embedded Systems Listing 12.7 shows the retrieval of the step size for different measurement records. Listing 12.7 Sampling Access var float myStepSize1 , myStepSize2 , myStepSize3 ; 1 ... 2 myStepSize1 := ti Engine Speed . delta ; 3 myStepSize2 := t i E n g i n e S p e e d . prev ( ) . delta ; 4 myStepSize3 := ti Engine Speed . at ( 1 0 . 0 ) . delta ; 5 12.4.1.4 Integration of streams with existing TTCN-3 data structures To enable the processing and assessment of stream values by means of existing TTCN-3 statements, we provide a mapping of streams, stream values, and the respective measurement records to standard TTCN-3 data structures, namely records and record-of structures. Thus, each measurement record, which is available at a stream port, can be represented by an ordinary TTCN-3 record with the structure defined in Listing 12.8. Such a record contains fields, which provide access to all value and sampling-related information described in the sections above. Thus, it includes the measurement value (value_) and its type∗ (T), its relation to absolute time by means of the timestamp_ field as well as the time distance to its predecessor by means of the delta_ field. Moreover, a complete stream or a stream segment maps to a record-of structure, which arranges subsequent measurement records (see Listing 12.8, Line 4). Listing 12.8 Mapping to TTCN-3 Data Structures Measurement {T v a l u e , f l o a t d e l t a , f l o a t timestamp } 1 type record of Measurement Float Stream Records ; 2 To obtain stream data in accordance to the structure in Listing 12.8, TTCN-3 embedded provides an operation called history. The history operation extracts a segment of a stream from a given stream port and yields a record-of structure (stream record), which complies to the definitions stated above. Please note, the data type T depends on the data type of the stream port and is set automatically for each operation call. The history operation has two parameters that characterize the segment by means of absolute time values. The first parameter defines the lower temporal limit and the second parameter defines the upper temporal limit of the segment to be returned. Listing 12.9 illustrates the usage of the history operation. We start with the definition of a record-of structure that is intended to hold measurement records with float values. In this context, the application of the history operation in Line 2 yields a stream record that represents the first ten values at ti_Engine_Speed. Please note, the overall size of the record of structure that is the number of individual measurement elements depends on the time interval defined by the parameters of the history operation, as well as on the given sampling rate (see Section 12.4.1). Listing 12.9 The History Operation type record of Measurement Float Stream Records ; 1 var Float Stream Records speed := ti Engine Speed . history ( 0 . 0 , 1 0 . 0 ) ; 2 ∗The type in this case is passed as a type parameter which is possible with the new TTCN-3 advanced parameterization extension (ETSI 2009e). Model-Based X-in-the-Loop Testing 315 We can use the record-of representation of streams to assess complete stream segments. This can be achieved by means of an assessment function, which iterates over the individual measurement records of the stream record, or by means of so-called stream templates, which characterize a sequence of measurement records as a whole. While such assessment functions are in fact only small TTCN-3 programs, which conceptually do not differ from similar solutions in any other programming language, the template concepts are worth explaining in more detail here. A template is a specific data structure that is used to specify the expectations on the SUT not only by means of distinct values but also by means of data-type specific patterns. These patterns allow, among other things, the definition of ranges (e.g., field := (lowerValue .. upperValue)), lists (e.g., field := (1.0, 10.0, 20.0)), and wildcards (e.g., field := ? for any value or field := * for any or no value). Moreover, templates can be applied to structured data types and record-of structures. Thus, we are able to define structured templates that have fields with template values. Last but not least, templates are parameterizable so that they can be instantiated with different value sets.∗ Values received from the SUT are checked against templates by means of certain statements. • The match operation already exists in standard TTCN-3. It tests whether an arbitrary template matches a given value completely. The operation returns true if the template matches and false otherwise. In case of a record-of representation of a stream, we can use the match operation to check the individual stream values with templates that conform to the type definitions in Section 12.8. • The find operation has been newly introduced in TTCN-3 embedded. It scans a recordstructured stream for the existence of a structure that matches the given template. If such a structure exists, the operation returns the index value of the matching occurrence. Otherwise, it returns −1. In case of a record-of representation of a stream, we can use the find statement to search the stream for the first occurrence of a distinct pattern. • The count operation has been newly introduced in TTCN-3 embedded as well. It scans a record-structured stream and counts the occurrences of structures that match a given template. The operation returns the number of occurrences. Please note, the application of the count-operation is not greedy and checks the templates iteratively starting with each measurement record in a given record-of structure. Listing 12.10 shows a usage scenario for stream templates. It starts with the definition of a record template. The template specifies a signal pattern with the following characteristics: Listing 12.10 Using Templates to Specify Signal Shapes 1 template Float Stream Record toTest := { 2 { value := ? , d e l t a := 0 . 0 , timestamp := ?} , 3 { value := (1900.0 . . 2100.0) , d e l t a := 2 . 0 , timestamp := ?} , 4 { value := (2900.0 . . 3100.0) , d e l t a := 2 . 0 , timestamp := ?} , 5 { value := (2950.0 . . 3050.0) , d e l t a := 2 . 0 , timestamp := ?} , 6 { value := ? , d e l t a := 2 . 0 , timestamp := ?} 7 } 8 9 ∗Please note that the TTCN-3 template mechanism is a very powerful concept, which cannot be explained in full detail here. 316 Model-Based Testing for Embedded Systems // checks , whether a d i s t i n c t segment conforms to the template toTest 10 match ( t i E n g i n e S p e e d . h i s t o r y ( 2 . 0 , 1 0 . 0 ) , t o T e s t ) ; 11 // f i n d s the f i r s t occurrence of a stream segment that conforms to toTest 12 find ( ti Engine Speed . history (0.0 , 100.0) , toTest ); 13 // counts a l l occurrences of stream segments that conform to toTest 14 count ( t i E n g i n e S p e e d . history ( 0 . 0 , 1 0 0 . 0 ) , toTest ) ; 15 the signal starts with an arbitrary value; after 2 s, the signal value is between 1900 and 2100; after the next 2 s, the signal value reaches a value between 2900 and 3100, thereafter (i.e., 2 s later), it reaches a value between 2950 and 3050, and finally ends with an arbitrary value. Please note, in this case, we are not interested in the absolute time values and thus allow arbitrary values for the timestamp_ field. One should note that a successful match requires that the stream segment and the template have the same length. If this is not the case, the match operation fails. Such a definition of stream templates (see Listing 12.10) can be cumbersome and time consuming. To support the specification of more complex patterns, we propose the use of generation techniques and automation, which can easily be realized by means of TTCN-3 functions and parameterized templates.∗ Listing 12.11 shows such a function that generates a template record out of a given record. The resulting template allows checking a stream against another stream (represented by means of the Float_Stream_Record myStreamR) and additionally allows a parameterizable absolute tolerance for the value side of the stream. Listing 12.11 Template Generation Function 1 function generateTemplate ( in F l o a t S t r e a m R e c o r d myStreamR , in f l o a t t o l V a l ) 2 return template Float Stream Record { 3 var integer i ; 4 var template Float Stream Record toGenerate 5 template Measurement t o l e r a n c e P a t t e r n ( in f l o a t delta , 6 in float value , 7 in float t o l ) := { 8 d e l t a := delta , 9 value := ( ( value − ( t o l / 2 . 0 ) ) . . ( value + ( t o l / 2 . 0 ) ) ) , 10 timestamp := ? 11 } 12 13 for ( i := 0 , i < s i z e o f ( myStreamR ) , i := i + 1){ 14 toGenerate [ i ] := t o l e r a n c e P a t t e r n ( myStreamR [ i ] . delta , 15 myStreamR [ i ] . value , t o l V a l ) ; 16 } 17 return toGenerate ; 18 } 19 ∗Future work concentrates on the extensions for the TTCN-3 template language to describe repetitive and optional template groups. This will yield a regular expression like calculus, which provides a much more powerful means to describe the assessment for stream records. Model-Based X-in-the-Loop Testing 317 These kind of functions are not intended to be specified by the test designer, but rather provided as part of a library. The example function presented here is neither completely elaborated nor does it provide the sufficient flexibility of a state-of-the-art library function. It is only intended to illustrate the expressive power and the potential of TTCN-3 and TTCN-3 embedded. 12.4.1.5 Control flow So far, we have only reflected on the construction, application, and assessment of single streams. For more advanced test behavior, such as concurrent applications, assessment of multiple streams, and detection of complex events (e.g., zero crossings or flag changes), richer capabilities are necessary. For this purpose, we combine the concepts defined in the previous section with state machine-like specification concepts, called modes. Modes are well known from the theory of hybrid automata (Alur, Henzinger, and Sontag 1996, Lynch et al. 1995, Alur et al. 1992). A mode is characterized by its internal behavior and a set of predicates, which dictate the mode activity. Thus, a simple mode specification in TTCN-3 embedded consists of three syntactical compartments: a mandatory body to specify the mode’s internal behavior; an invariant block that defines predicates that must not be violated while the mode is active; and a transition block that defines the exit condition to end the mode’s activity. Listing 12.12 Atomic Mode cont { //body 1 // ramp , t h e v a l u e i n c r e a s e s a t any time s t e p by 3 2 t o S e t P o i n t . value := 3 . 0 ∗ now ; 3 // constant s i g n a l 4 to Engine Perturbation . value := 0 . 0 ; 5 } 6 inv { // invariants 7 // s t o p s when the s e t p o i n t exceeds a v a l u e o f 20000.0 8 to Set Point . value > 20000.0; 9 } 10 until { // t r a n s i t i o n 11 [ t i E n g i n e S p e e d . value > 2 0 0 0 . 0 ] { t o E n g i n e P e r t u r b a t i o n . value := 2 . 0 ; } 12 } 13 In the example in Listing 12.12, the set point value to_Set_Point increases linearly in time and the engine perturbation to_Engine_Perturbation is set constantly to 0.0. This holds as long as the invariant holds and the until condition does not fire. If the invariant is violated, that is the set point speed exceeds 20000.0, an error verdict is set and the body action stops. If the until condition yields true, that is the value of ti_Engine_Speed exceeds 2000.0, the to_Engine_Perturbation value is set to 2.0 and the body action stops. To combine different modes into larger constructs, we provide parallel and sequential composition of individual modes and of composite modes. The composition is realized by the par operator (for parallel composition) and the seq operator (for sequential composition). Listing 12.13 shows two sequences, one for the perturbation actions and other for the assessment actions, which are themselves composed in parallel. 318 Model-Based Testing for Embedded Systems Listing 12.13 Composite Modes par { // o v e r a l l p e r t u r b a t i o n and assessment 1 seq{// perturbation sequence 2 cont{// perturbation action 1} 3 cont{// perturbation action 2} 4 ...} 5 seq{// assessment sequence 6 cont{// assessment action 1} 7 cont{// assessment action 1} 8 ...} 9 } 10 In general, composite modes show the same structure and behavior as atomic modes as far as invariants and transitions are concerned. Hence, while being active, each invariant of a composite mode must hold. Additionally, each transition of a composite mode ends the activity of the mode when it fires. Moreover, a sequential composition ends when the last contained mode has finished and a parallel composition ends when all contained modes have finished. Furthermore, each mode provides access to an individual local clock that returns the time that has passed since the mode has been activated. The value of the local clock can be obtained by means of the keyword duration. Listing 12.14 Relative Time seq{// p e r t u r b a t i o n sequence 1 cont{ t o S e t P o i n t . value := 20000.0;} until ( duration > 3 . 0 ) 2 cont{ t o S e t P o i n t . value := 40000.0 + 100.0 ∗ duration ;} until ( duration > 2 . 0 ) 3 } until ( duration > 4.0) 4 Listing 12.14 shows the definition of three modes, each of which has a restricted execution duration. The value of to_Set_Point is increased continuously by means of the duration property. The duration property is defined locally for each mode. Thus, the valuation of the property would yield different results in different modes. 12.4.2 Specification of reusable entities This section presents more advanced concepts of TTCN-3 embedded that are especially dedicated to specifying reusable entities. We aim to achieve a higher degree of reusability by modularizing the test specifications and supporting the specification of abstract and modifiable entities. Some of these concepts are well known in computer language design and already available in standard TTCN-3. However, they are only partially available in state-ofthe-art test design tools for continuous and hybrid systems and they must be adapted to the concepts we introduced in Section 12.4.1. The concepts dedicated to support modularization and modification are the following: • Branches and jumps to specify repetitions and conditional mode execution. • Symbol substitution and referencing mechanisms. • Parameterization. Model-Based X-in-the-Loop Testing 319 12.4.2.1 Conditions and jumps Apart from the simple sequential and parallel composition of modes, stronger concepts to specify more advanced control flow arrangements, such as conditional execution and repetitions are necessary. TTCN-3 already provides a set of control structures for structured programming. These control structures, such as if statements, while loops, and for loops are applicable to the basic TTCN-3 embedded concepts as well. Hence, the definition of mode repetitions by means of loops, as well as the conditional execution of assertions and assignments inside of modes are allowed. Listing 12.15 shows two different use cases for the application of TTCN-3 control flow structures that directly interact with TTCN-3 embedded constructs. In the first part of the listing (Lines 4 and 6), an if statement is used to specify the conditional execution of assignments inside a mode. In the second part of the listing (Lines 10–14), a while loop is used to repeat the execution of a mode multiple times. Listing 12.15 Conditional Execution and Loops cont { // body 1 // ramp u n t i l d u r a t i o n >= 4 . 0 2 i f ( duration < 4 . 0 ) { t o S e t P o i n t . value := 3 . 0 ∗ now; } 3 // afterwards the value remains constant 4 else { t o S e t P o i n t . value := t o S e t P o i n t . prev . value ; } 5 } 6 7 // saw tooth s i g n a l for 3 minutes with a period of 5.0 seconds 8 while (now < 1 8 0 . 0 ) { 9 cont { 10 to Set Point . value := 3.0 ∗ duration ; 11 } until ( duration > 5.0) 12 } 13 For full compatibility with the concepts of hybrid automata, definition of so-called transitions must be possible as well. A transition specifies the change of activity from one mode to another mode. In TTCN-3 embedded, we adopt these concepts and provide a syntax, which seamlessly integrates with already existing TTCN-3 and TTCN-3 embedded concepts. As already introduced in the previous section, transitions are specified by means of the until block of a mode. In the following, we will show how a mode can refer to multiple consecutive modes by means of multiple transitions and how the control flow is realized. A transition starts with a conditional expression, which controls the activation of the transition. The control flow of transitions resembles the control flow of the already existing (albeit antiquated) TTCN-3 label and goto statements. These statements have been determined sufficiently suitable for specifying the exact control flow after a transition has fired. Thus, there is no need to introduce additional constructs here. A transition may optionally contain arbitrary TTCN-3 statements to be executed when the transition fires. Listing 12.16 illustrates the definition and application of transitions by means of pseudo code elements. The predicate is an arbitrary predicate expression that may relate to time values or stream values or both. The may contain arbitrary TTCN-3 or TTCN-3 embedded statements except blocking or timeconsuming statements (alt statements and modes). Each goto statement relates to a label definition that specifies the place where the execution is continued. 320 Listing 12.16 Transitions Model-Based Testing for Embedded Systems label labelsymbol 1 1 cont {} until { 2 [< a c t i v a t i o n p r e d i c a t e >] {< o p t i o n a l s t a t e m e n t l i s t >} goto l a b e l s y m b o l 2 ; 3 [< a c t i v a t i o n p r e d i c a t e >] {< o p t i o n a l s t a t e m e n t l i s t >} goto l a b e l s y m b o l 3 ; 4 } 5 label labelsymbol 2 ; cont {} goto labelsymbol 4 ; 6 label labelsymbol 3 ; cont {} goto labelsymbol 1 ; 7 label labelsymbol 4 ; 8 Listing 12.17 shows a more concrete example that relates to our example from Section 12.3.3. We define a sequence of three modes that specify the continuous valuation of the engine’s set point (to_Set_Point), depending on the engine’s speed (engine_speed). When the engine speed exceeds 2000.0 rpm, the set point is decreased (goto decrease), otherwise it is increased (goto increase). Listing 12.17 Condition and Jumps t e s t c a s e myTestcase ( ) runs on E n g i n e T e s t e r { 1 2 // r e u s a b l e mode a p p l i c a t i o n 3 cont { t o S e t P o i n t . value := 3 . 0 ∗ now; } 4 until { 5 [ duration > 1 0 . 0 and e n g i n e s p e e d . value > 2 0 0 0 . 0 ] goto i n c r e a s e ; 6 [ duration > 1 0 . 0 and e n g i n e s p e e d . value <= 2 0 0 0 . 0 ] goto d e c r e a s e 7 } 8 label increase ; 9 cont { t o S e t P o i n t . value := 3 ∗ now; } u n t i l { [ duration > 1 0 . 0 ] goto end ; } 10 label decrease ; 11 cont { t o S e t P o i n t . value := 3 ∗ now; } u n t i l ( duration > 1 0 . 0 ) 12 label end ; 13 } 14 12.4.2.2 Symbol substitution and referencing Similar to the definition of functions and functions calls, it is possible to declare named modes, which can then be referenced from any context that would allow the explicit declaration of modes. Listing 12.18 shows the declaration of a mode type (Line 1), a named mode∗ (Line 4) and a reference to it within a composite mode definition (Line 12). ∗Named modes, similar to other TTCN-3 elements that define test behavior, can be declared with a runs on clause, in order to have access to the ports (or other local fields) of a test component type. Model-Based X-in-the-Loop Testing 321 Listing 12.18 Symbol Substitution type mode ModeType ( ) ; 1 2 // r e u s a b l e mode d e c l a r a t i o n 3 mode ModeType p e r t s e q ( ) runs on E n g i n e T e s t e r seq { 4 cont { t o S e t P o i n t . value := 2 0 0 0 . 0 } u n t i l ( duration >= 2 . 0 ) 5 cont { t o S e t P o i n t . value := 2000.0 + duration / t o S e t P o i n t . delta ∗ 10.0} 6 u n t i l ( duration >= 5 . 0 ) 7 } 8 9 t e s t c a s e myTestcase ( ) runs on E n g i n e T e s t e r { 10 par { 11 p e r t s e q ( ) ; // r e u s a b l e mode a p p l i c a t i o n 12 cont { a s s e r t ( e n g i n e s p e e d . value >= 5 0 0 . 0 ) } 13 } until ( duration > 10.0) 14 } 15 12.4.2.3 Mode parameterization To provide a higher degree of flexibility, it is possible to specify parameterizable modes. Values, templates, ports, and modes can be used as mode parameters. Listing 12.19 shows the definition of a mode type, which allows the application of two float parameters and the application of one mode parameter of the mode type ModeType. Listing 12.19 Mode Parameterization type mode ModeType2 ( i n f l o a t s t a r t V a l , i n f l o a t i n c r e a s e , i n ModeType a s s e r t i o n ) ; 1 2 // r e u s a b l e mode d e c l a r a t i o n 3 mode ModeType a s s e r t m o d e ( ) runs on E n g i n e T e s t e r := 4 cont { a s s e r t ( e n g i n e s p e e d . value >= 5 0 0 . 0 ) } 5 6 mode ModeType2 p e r t s e q 2 ( i n f l o a t s t a r t V a l , 7 in float increase , 8 in ModeType a s s e r t i o n ) 9 runs on E n g i n e T e s t e r par { 10 seq{// p e r t u r b a t i o n sequence 11 cont { t o S e t P o i n t . value := s t a r t V a l } u n t i l ( duration >= 2 . 0 ) 12 cont { t o S e t P o i n t . value := s t a r t V a l + duration / t o S e t P o i n t . d e l t a ∗ i n c r e a s e } 13 u n t i l ( duration >= 5 . 0 ) 14 } 15 assertion (); 16 } 17 18 t e s t c a s e myTestcase ( ) runs on E n g i n e T e s t e r { 19 // r e u s a b l e mode a p p l i c a t i o n 20 pert seq 2 (1000.0 , 10.0 , assert mode ); 21 p e r t s e q 2 ( 5 0 0 0 . 0 , 1 . 0 , cont { a s s e r t ( e n g i n e s p e e d . value >= 0 . 0 ) } ) ; 22 } 23 322 Model-Based Testing for Embedded Systems Lines 23 and 24 illustrate the application of parameterizable reusable modes. Line 23 applies pert_seq_2 and sets the parameter values for the initial set point to 1000.0 and the parameter for the increase to 10.0, and the assert_mode is passed as a parameter to be applied within the pert_seq_2 mode. Line 24 shows a more or less similar application of pert_seq_2, where an inline mode declaration is passed as the mode parameter. 12.5 Reuse of Closed Loop Test Artifacts Thus far, we have discussed reusability on the level of test specification language elements (referencing mechanisms, modifications operators, parameterization), in this section, we additionally focus on the reuse of the high-level artifacts of our generic closed loop architecture for testing. In this context, however, different aspects must be taken into account. Three-tier development and closed loop testing (MiL, SiL, HiL) methodologies have an increased need for traceability and consistency between the different testing levels. Consider a scenario in which one subcontractor delivers test artifacts that are to be used on components from various suppliers. These components in turn may often utilize the same basic infrastructure and parts of a generic environment, which also contributes to increased reuse potential. Furthermore, there may be conformance tests for certain components provided by standardization organizations, which should again be reusable at the same testing level across different implementations, both of the components, but potentially also of the environment they shall operate in. Therefore, despite impeding factors of an organizational nature when production code is concerned, in the scope the current context, there are many organizational factors that not only facilitate the reuse of test artifacts but also increase the need and potential for reuse. In the following, a general approach for vertical and horizontal test specification and test model reuse at different closed loop test levels will be presented and accompanied by an example that outlines the practical reuse of TTCN-3 embedded specifications and the environment model from Section 12.3.3 in a MiL and a HiL scenario. Since closed loop tests, in general, require a complex test environment, an appropriate test management process and tool support throughout the life cycle of the SUT is required. This subject is addressed in the last subsection. 12.5.1 Horizontal reuse of closed loop test artifacts Reusable test suites can be developed using concepts such as symbol substitution, referencing, and parameterization. Environment modeling based on Simulink r provides modularization concepts such as subsystems, libraries, and model references, which facilitate the reusability of environment models. Since the generic closed loop architecture for testing clearly separates the heterogeneous artifacts by using well-defined interfaces, the notationspecific modularization and reuse concepts can be applied without interfering with each other. By lifting reuse to the architecture level, at least the environment model and the specification of the perturbation and assessment functionality can be (re-)used with different SUTs. The SUTs may differ in type and in version, but as long as they are built on common interfaces and share common characteristics, then this kind of reuse is applicable. In general, the reuse of test specifications across different products is nowadays often used for testing products or components, which are based on a common standard (conformance testing). The emerging standardization efforts in embedded systems development Model-Based X-in-the-Loop Testing 323 (e.g., AUTOSAR [AUTOSAR Consortium 2010]) indicate the emerging need for such an approach. 12.5.2 Vertical reuse of environment models Depending on the type of the SUT, varying testing methods on different test levels may be applied, each suited for a distinct purpose. With the MiL test, the functional aspects of the model are validated. SiL testing is used to detect errors that result from softwarespecific issues, for instance, the usage of fixed-point arithmetic. MiL and SiL tests are used in the early design and development phases, primarily to discover functional errors within the software components. These types of test can be processed on a common PC hardware, for instance through co-simulation, and, therefore, are not suitable for addressing real-time matters. To validate the types of issues that result from the usage of a specific hardware, HiL tests must be used. In principle, the SUTs exhibit the same logical functionality through all testing levels. However, they are implemented with different technologies and show different integration levels with other components, including the hardware. In the case of different implementation technologies, which often result in interfaces with the same semantics but different technological constraints and access methods, the reuse of the environment model and the perturbation and assessment specifications is straightforward. The technological adaptation is realized by means of the mapping components that bridge the technological as well as the abstraction gap between the SUT and the environment (Figure 12.5). 12.5.3 Test reuse with TTCN-3 embedded, Simulink , and CANoe In Section 12.3.3, we provided an example that shows the application of our ideas within a simple MiL scenario. This section demonstrates the execution of test cases in a MiL Scenario and outlines the reuse of some of the same artifacts in a HiL scenario. It provides a proof of concept illustration of the applicability of our ideas. The main artifacts for reuse are the environment model (i.e., generic test scenarios) and the test specifications. The reuse of the test specifications depends on their level of abstraction (i.e., the semantics of the specification must fit the test levels we focus on) and on some technological issues (e.g., the availability of a TTCN-3 test engine for the respective test platform). Within this example, MiL adapter SiL adapter HiL adapter Model Code ECU MiL adapter SiL adapter HiL adapter Computation layer Abstract environment model Perturbation Assessment FIGURE 12.5 Vertical reuse of environment models. 324 Model-Based Testing for Embedded Systems ECU Throttle_Ctrl prog ECU TTCN-3_Spec prog ECU Engine_Env prog Bus Can Can 1 FIGURE 12.6 Integration with Vector CANoe. we show that we are able to find the most appropriate level of abstraction for the test specifications and the environment model. The technological issues are not discussed here. Tests on MiL level are usually executed in the underlying MATLAB r simulation. The environment and the controller have a common time base (the simulation time). Based on the well-tested controller model, the object code will be generated, compiled, and deployed. On PiL level, the target hardware provides its own operating system and thus its own time base. Furthermore, it will be connected to other controllers, actuators, and sensors via a bus system (e.g., CAN bus, FlexRay Bus, etc.) using a HiL environment. In principle, we use pre- and postprocessing components to link our test system to the bus system and/or to analog devices and thus to the controller under test. The wellestablished toolset CANoe (Vector Informatics 2010) supports such kind of integration between hardware-based controllers, Simulink r -based environment models and test executables by means of software-driven network simulations (see Figure 12.6). To reuse the TTCN-3 embedded test specification, we only need a one time effort to build a TTCN-3 embedded test adapter for Simulink r and CANoe. Both being standard tools, most of the effort can be shared. We refer to a model-based development process and specify tests for the controller that was already introduced in Section 12.3.3. The controller regulates the air flow and the crankshaft of a four-cylinder spark ignition engine with an internal combustion engine (Figure 12.3). From the requirements, we deduce abstract test cases. They can be defined semiformally and understandably in natural language. We will use the following test cases as guiding examples. • The set point for the engine speed changes from 2000 to 5000 rpm. The controller should change the throttle angle in such a way that the system takes less than 5 s to reach the desired engine speed with a deviation of 100 rpm. • The set point for the engine speed falls from 7000 to 5000 rpm. The system should take less than 5 s to reach the desired engine speed with a deviation of 100 rpm. No over-regularizations are allowed. • The engine speed sensor data is measured up to an uncertainty of 10 rpm. The control must be stable and robust for that range. Given the manipulated variable, the throttle angle, the deviation caused by some disturbance should be less than 100 rpm. Model-Based X-in-the-Loop Testing 325 In the next step, we implement the test cases, that is, concretize them in TTCN-3 embedded so that they can be executed by a test engine. The ports of our system are as defined in Listing 12.2 (based on the test architecture in Figure 12.4). The first and the second abstract test cases analyze the test behavior in two similar situations. We look into the system when the engine speed changes. In the first test case, it jumps from 2000 to 5000 rpm and in the second from 7000 to 5000 rpm. In order to do the specification job only once, we define a parameterizable mode that realizes a step function (Listing 12.20). The engine speed is provided by an environment model. The same applies for the other data. To test the robustness (see abstract test case 3), we must perturb the engine speed. This is realized by means of the output to_Engine_Perturbation. Listing 12.20 Speed Jump type mode Set Value Jump ( in f l o a t s t a r t V a l , in f l o a t endVal ) ; 1 2 mode Set Value Jump p e r t s e q ( in f l o a t s t a r t V a l , in f l o a t endVal ) 3 runs on T h r o t t l e C o n t r o l T e s t e r seq { 4 cont { t o S e t P o i n t . value := s t a r t V a l } u n t i l ( duration >= 5 . 0 ) 5 // t h e f i r s t 5 s e c o n d s t h e s e t p o i n t i s g i v e n as s t a r t V a l rpm 6 cont { t o S e t P o i n t . value := endVal } u n t i l ( duration >= 1 0 . 0 ) 7 // t h e n e x t 10 s e c o n d s t h e s e t p o i n t s h o u l d be endVal rpm 8 } 9 10 testcase TC Speed Jump ( ) runs on T h r o t t l e C o n t r o l T e s t e r { 11 // r e u s a b l e mode a p p l i c a t i o n 12 pert seq (2000.0 , 5000.0); 13 } 14 A refined and more complex version of the mode depicted above, which uses linear interpolation and flexible durations, can easily be developed by using the ideas depicted in Listing 12.17. In order to assess the tests, we have to check whether the controller reacts in time. For this purpose, we have to check whether the values of a stream are in a certain range. This can be modeled by means of a reusable mode too (Listing 12.21). Listing 12.21 Guiding Example Assertion type mode Range Check ( in f l o a t s t a r t T i m e , in f l o a t endTime , in f l o a t s e t V a l u e , 1 in f l o a t Dev , out FloatInPortType measuredStream ) ; 2 3 // r e u s a b l e mode d e c l a r a t i o n 4 mode Range Check r a n g e c h e c k ( in f l o a t s t a r t T i m e , 5 in float endTime , in float setValue , 6 in f l o a t Dev , out FloatInPortType measuredStream ) 7 seq { 8 // wait u n t i l the startTime 9 cont {} u n t i l ( duration >= s t a r t T i m e ) ; 10 // check the engine speed u n t i l the endTime was reached 11 cont { a s s e r t ( measuredStream . value >= ( s e t V a l u e − Dev ) && 12 326 Model-Based Testing for Embedded Systems measuredStream . value <= ( s e t V a l u e + Dev ) ) } 13 u n t i l ( duration >= endTime ) 14 } 15 The executable test specification of the complete abstract test case TC_Speed_Jump is given in Listing 12.22. Listing 12.22 Guiding Example Test Specification testcase TC Speed Jump ( ) runs on T h r o t t l e C o n t r o l T e s t e r { par{ // s e t the jump pert seq (2000.0 , 5000.0); // check the control signal , where Desired Speed i s // the assumed desired value range check (10.0 , 10.0 , Desired Speed , 100.0 , ti Engine Speed } } 1 2 3 4 5 6 7 ); 8 9 10 In order to check for a possible overshoot, we will use the maximum value of a stream over a distinct time interval. This can be easily realized by using the constructs introduced in Section 12.4, thus we will not elaborate further on this here. To test the robustness, a random perturbation of the engine speed is necessary. It is specified by means of the random value function function rnd(float seed) return float of TTCN-3. The function returns a random value between 0.0 and 1.0. Listing 12.23 shows the application of the random function to the engine perturbation port. Listing 12.23 Random Perturbation // rnd ( f l o a t seed ) r e t r i e v e s random v a l u e s between [ 0 , 1 ] 1 to Engine Perturbation := rnd ( 0 . 2 ) ∗ 20.0 − 1 0 . 0 ; 2 This random function can be used to define the perturbation resulting from uncertain measurement. To formulate a proper assessment for the third abstract test case, the parameterized mode range_check can be reused with the stream ti_Throttle_Angle. We will leave this exercise for the reader. By using proper TTCN-3 embedded test adapter, we can perform the tests in a MATLAB r simulation, and as outlined in a CANoe HiL environment as well. Figure 12.7 shows the result of a test run of the first two test cases defined above with TTCN-3 embedded and Simulink r . While the upper graph shows the progress of the set point value (i.e., to_Set_Point) and the engine speed value (i.e., ti_Engine_Speed), the lower graph represents the time course of the throttle angle (i.e., ti_Throttle_Angle). This example shows the reusability of TTCN-3 embedded constructs and together with co-simulation, we establish an integrated test process over vertical testing levels. 12.5.4 Test asset management for closed loop tests Closed loop tests require extended test asset management with respect to reusability because of the fact that several different kinds of reusable test artifacts are involved. Managing test data systematically relies on a finite data set representation. For open loop tests, this often Model-Based X-in-the-Loop Testing 327 7000 6000 5000 4000 3000 2000 1000 00 to_Set_Point ti_Engine_Speed 35 ti_Throttle_Angle 30 25 5 10 15 20 20 25 30 35 15 10 5 0 5 10 15 20 25 30 35 FIGURE 12.7 Test run with Simulink r and TTCN-3 embedded. consists of input data descriptions that are defined by finitely many support points and interpolation prescriptions, such as step functions, ramps, or splines. The expectation is usually described by the expected values or ranges at specific points in time or time spans. This no longer works for closed loop tests. In such a context, an essential part of a test case specification is defined by a generic abstract environment model, which may contain rather complex algorithms. At a different test level, this software model may be substituted by compiled code or a hardware node. In contrast to open loop testing, there is a need for a distinct development and management process incorporating all additional assets into the management of the test process. In order to achieve reproducibility and reusability, the asset management process must be carefully designed to fit into this context. The following artifacts that uniquely characterize closed loop test specifications must be taken into consideration: • Abstract environment model (the basic generic test scenario). • Designed perturbation and assessment specification. • Corresponding pre- and postprocessing components. All artifacts that are relevant for testing must be versioned and managed in a systematic way. Figure 12.8 outlines the different underlying supplemental processes. The constructive development process that produces the artifacts to be tested is illustrated on the righthand side. Parallel to it, an analytic process takes place. Whereas in open loop testing, the tests are planned, defined, and performed within this analytic process; in closed loop architectures, there is a need for an additional complementary development process for the environment models. The skills required to develop such environment models and the points of interest they have to meet distinguish this complementary process from the development processes of the tests and the corresponding system. 328 Environment models CAPL nodes Environmet model repository Model-Based Testing for Embedded Systems Postprocessing components Preprocessing TTCN-3 components embedded Product models Product software Product hardtware Test specification management Development DB FIGURE 12.8 Management of test specifications. 12.6 Quality Assurance and Guidelines for the Specification of Reusable Assets As outlined above, TTCN-3 embedded, which is similar to standard TTCN-3, has been designed with reusability in mind, providing a multitude of variability mechanisms. This implies that similar aspects shall be taken into consideration for the specification of reusable assets using TTCN-3 embedded as well. Reusability is inevitably connected to quality, especially when the development of reusable test assets is concerned. Reusability was even identified as one of the main quality characteristics in the proposed quality model for test specifications (Zeiß et al. 2007). On the other hand, quality is also critically important for reusability. Reusable assets must be of particularly high quality since deficiencies in such assets will have a much larger impact on the systems they are reused in. Defects within reusable assets may or may not affect any and every system they are used in. Furthermore, modifications to remedy an issue in one target system, may affect the other target systems, both positively and negatively. Thus, special care should be taken to make sure that the reusable assets are of the necessary quality level. Quality may further affect the reusability in terms of adaptability and maintainability. The assets may have to be adapted in some contexts and they must be maintained to accommodate others, extend or improve functionality, correct issues, or simply be reorganized for even better reusability. If the effort for maintenance or adaptation is too high, it will offset (part of) the benefits of having reusable test assets. Hence, quality is even more important to reuse than reuse is to quality, and thus quality assurance is necessary for the effective development of reusable assets with TTCN-3 embedded. Furthermore, if an established validation process is employed, the use of validated reusable libraries would increase the percentage of validated test code in the testware, as noted in (Schulz 2008). Establishing such a process on the other hand, will increase the confidence in the reusable assets. A validation process will again involve quality assurance. In addition to the validation of reusable assets, assessment of the actual reusability of different assets may be necessary to determine the possible candidates for validation and possible candidates for further improvement. This can be achieved by establishing reusability goals and means to determine whether these goals are met (e.g., by defining metrics models or through testability analysis). If they are not met, either the asset is not suitable for reuse, or its implementation does not adhere to the reusability specifications for that asset. In Ma¨ki-Asiala 2004, two metrics for quantitative evaluation of reusability are illustrated in a small case study. The metrics themselves were taken from Caruso (1993). Further metrics Model-Based X-in-the-Loop Testing 329 for software reuse are described in (Frakes and Terry 1996). Additional metrics specifically concerning the reuse of test assets may have to be defined. There are ongoing studies that use advanced approaches to assess the reusability of software components (Sharma, Grover, and Kumar 2009). Such approaches could be adapted to suit the test domain. While there is a significant body of work on quality assurance for standard TTCN-3 (Bisanz 2006, Neukirchen, Zeiß, and Grabowski 2008, Neukirchen et al. 2008, Zeiß 2006), quality assurance measures for TTCN-3 embedded remain to be studied, as TTCN-3 embedded is still in the draft stage. Similar to standard TTCN-3, metrics, patterns, code smells, guidelines, and refactorings should be defined to assess the quality of test specifications in TTCN-3 embedded, detect issues, and correct them efficiently. Based on a survey of existing methodologies, a few examples for quality assurance items for TTCN-3 embedded that are related to reusability will be briefly outlined below. The main difficulty in the design of TTCN-3 libraries, as identified by Schulz 2008, is to anticipate the evolution of use of libraries. Therefore, modularization, in the form of separation of concepts and improved selective usage, and a layered structure of library organization are suggested as general guiding principles when developing libraries of reusable assets. Furthermore, in Schulz 2008, it is also recommended to avoid component variables and timers, as well as local verdicts and stop operations (unless on the highest level, that is, not within a library), when designing reusable behavior entities. The inability to pass functions as parameters and the lack of an effective intermediate verdict mechanism are identified as major drawbacks of the language. Meanwhile, the first issue has been resolved within the TTCN-3 standard by means of an extension package that enables so-called behavior types to be passed as parameters to functions, testcases, and altsteps (ETSI 2010). Similarly, in TTCN-3 embedded modes can be passed as parameters. Generic style guidelines that may affect the reusability potential of assets are largely transferable to TTCN-3 embedded, for example, • Restricting the nesting levels of modes. • Avoiding duplicated segments in modes. • Restricting the use of magic values (explicit literal or numerical values) or if possible avoiding them altogether. • Avoiding the use of over-specific runs on statements. • Proper grouping of certain closely related constructs. • Proper ordering of constructs with certain semantics. In Ma¨ki-Asiala (2004), 10 guidelines for the specification of reusable assets in TTCN-3 were defined. These are concerned with the reusability of testers in concurrent and nonconcurrent contexts, the use and reuse of preambles and postambles, the use of high-level functions, parameterization, the use of selection structures, common types, template modifications, wildcards, and modularization based on components and on features. The guidelines are also related to the reusability factors that contributed to their development. The guidelines are rather generic and as such also fully applicable to TTCN-3 embedded. In Ma¨ki-Asiala, K¨arki, and Vouffo (2006), four additional guidelines specific to the vertical reuse viewpoint are defined. They involve the separation of test configuration from test behavior, the exclusive use of the main test component for coordination and synchronization, redefinition of existing types to address new testing objectives, and the specification of 330 Model-Based Testing for Embedded Systems system- and configuration- related data as parameterized templates. Again, these guidelines are valid for TTCN-3 embedded as well. In addition, they can be adapted to the specific features of TTCN-3 embedded. The functional description should be ideally separated from the real-time constraints, and continuous behavior specifications shall be separated from noncontinuous behavior. At this stage, only guidelines based on theoretical assumptions and analogies from similar domains can be proposed. The ultimate test for any guideline is putting it into practice. Apart from validating the effectiveness of guidelines, practice also helps for the improvement and extension of existing guidelines, as well as for the definition of new guidelines. When discussing guidelines for the development of reusable real-time components, often cited in the literature are the conflicts between performance requirements on one side and reusability and maintainability on the other (Ha¨ggander and Lundberg 1998). TTCN-3 embedded, however, abstracts from the specific test platform and thus issues associated with test performance can be largely neglected at the test specification level. Thus, the guidelines shall disregard performance. Ideally, it should be the task of the compilation and adaptation layers to ensure that real-time requirements are met. The quality issues that may occur in test specifications implemented in TTCN-3 embedded (and particularly those that affect the reusability) and the means for their detection and removal remain to be studied in more detail. There is ongoing research within the TEMEA project concerning the quality assurance of test specifications implemented in TTCN-3 embedded. As of this writing, there are no published materials on the subject. Once available, approaches to the quality assurance can be ultimately integrated in a test development process and supported by tools to make the development of high-quality reusable test specifications a seamless process. Other future prospects include approaches and tool support for determining the reusability potential of assets both during design and during implementation to support both the revolutionary and evolutionary approaches to reuse. 12.7 Summary The “X-in-the-Loop” testing approach both suggests and presupposes enormous reusability potential. During the development cycle of embedded systems, software models are reused directly (for code generation) or indirectly (for documentation purposes) for the development of the software. The developed software is then integrated into the hardware (with or without modifications). Thus, it makes sense to reuse tests through all of the development phases. Another hidden benefit is that tests extended in the SiL and HiL levels can be reused back in earlier levels (if new test cases are identified at later levels that may as well be applicable to earlier levels). If on the other hand a strict cycle is followed, where changes are only done at the model level and always propagated onward, this would still reduce the effort significantly, as those changes will have to be made only once. For original equipment manufacturers (OEMs) and suppliers this will also add more transparency and transferability to different suppliers as well (on all levels, meaning reusable tests can be applied to models from one supplier, software from another, hardware from yet another). The proposed test architecture supports the definition of environment models and test specification on a level of abstraction, which allows the reuse of the artifacts on different test systems and test levels. For the needs of the present domain, we introduced TTCN-3 embedded, an extension of the standardized test specification language TTCN-3, Model-Based X-in-the-Loop Testing 331 which provides the capabilities to describe test perturbations and assessments for continuous and hybrid systems. Whereas TTCN-3 is a standard already, we propose the introduced extensions for standardization as well. Thus, the language does not only promise to solve the reusability issues on technical level but also addresses organizational issues, such as long-term availability, standardized tool support, education, and training. The ideas presented in this chapter are substantial results of the project “Testing Specification Technology and Methodology for Embedded Real-Time Systems in Automobiles” (TEMEA). The project is co-financed by the European Union. The funds are originated from the European Regional Development Fund (ERDF). References Alur, R., Courcoubetis, C., Henzinger, T. A., and Ho, P.-H. (1992). Hybrid automata: an algorithmic approach to the specification and verification of hybrid systems. In Hybrid Systems, Pages: 209–229. Alur, R., Henzinger, T. A., and Sontag, E. D. (Eds.) (1996). Hybrid Systems III: Verification and Control, Proceedings of the DIMACS/SYCON Workshop, October 22-25, 1995, Ruttgers University, New Brunswick, NJ, USA, Volume 1066 of Lecture Notes in Computer Science. Springer, New York, NY. AUTOSAR Consortium (2010). Web site of the AUTOSAR (AUTomotive Open System ARchitecture) consortium. URL: http://www.autosar.org. Bisanz, M. (2006). Pattern-based smell detection in TTCN-3 test suites. Master’s thesis, ZFI-BM-2006-44, ISSN 1612-6793, Institute of Computer Science, Georg-AugustUniversita¨t G¨ottingen (Accessed on 2010). Bringmann, E. and Kraemer, A. (2006). Systematic testing of the continuous behavior of automotive systems. In SEAS ’06: Proceedings of the 2006 International Workshop on Software Engineering for Automotive Systems, Pages: 13–20. ACM Press, New York, NY. Broy, M. (1997). Refinement of time. In Bertran, M. and Rus, T. (Eds.), TransformationBased Reactive System Development, ARTS’97, Number 1231 in Lecture Notes on Computer Science (LNCS), Pages: 44–63. TCS Springer, New York, NY. Conrad, M. and Do¨rr, H. (2006). Model-based development of in-vehicle software. In Gielen, G. G. E. (Ed.), DATE, Pages: 89–90. European Design and Automation Association, Leuven, Belgium. ESA-ESTEC (2008). Space engineering: test and operations procedure language, standard ECSS-E-ST-70-32C. ETSI (2009a). Methods for Testing and Specification (MTS). The Testing and Test Control Notation Version 3, Part 1: TTCN-3 Core Language (ETSI Std. ES 201 873-1 V4.1.1). ETSI (2009b). Methods for Testing and Specification (MTS). The Testing and Test Control Notation Version 3, Part 4: TTCN-3 Operational Semantics (ETSI Std. ES 201 873-4 V4.1.1). 332 Model-Based Testing for Embedded Systems ETSI (2009c). Methods for Testing and Specification (MTS). The Testing and Test Control Notation Version 3, Part 5: TTCN-3 Runtime Interfaces (ETSI Std. ES 201 873-5 V4.1.1). ETSI (2009d). Methods for Testing and Specification (MTS). The Testing and Test Control Notation Version 3, Part 6: TTCN-3 Control Interface (ETSI Std. ES 201 873-6 V4.1.1). ETSI (2009e). Methods for Testing and Specification (MTS). The Testing and Test Control Notation Version 3, TTCN-3 Language Extensions: Advanced Parameterization (ETSI Std. : ES 202 784 V1.1.1). ETSI (2010). Methods for Testing and Specification (MTS). The Testing and Test Control Notation Version 3, TTCN-3 Language Extensions: Behaviour Types (ETSI Std. : ES 202 785 V1.1.1). Fey, I., Kleinwechter, H., Leicher, A., and Mu¨ller, J. (2007). Lessons Learned beim U¨ bergang von Funktionsmodellierung mit Verhaltensmodellen zu Modellbasierter Software-Entwicklung mit Implementierungsmodellen. In Koschke, R., Herzog, O., R¨odiger, K.-H., and Ronthaler, M. (Eds.), GI Jahrestagung (2), Volume 110 of LNI, Pages: 557–563. GI. Frakes, W. and Terry, C. (1996). Software reuse: metrics and models. ACM Comput. Surv. 28 (2), 415–435. Grossmann, J. and Mueller, W. (2006). A formal behavioral semantics for TestML. In Proc. of IEEE ISoLA 06, Paphos Cyprus, Pages: 453–460. H¨aggander, D. and Lundberg, L. (1998). Optimizing dynamic memory management in a multithreaded application executing on a multiprocessor. In ICPP ’98: Proceedings of the 1998 International Conference on Parallel Processing, Pages: 262–269. IEEE Computer Society. Washington, DC. Harrison, N., Gilbert, B., Lauzon, M., Jeffrey, A. Lalancette, C., Lestage, D. R., and Morin, A. (2009). A M&S process to achieve reusability and interoperability. URL: ftp://ftp.rta.nato.int/PubFullText/RTO/MP/RTO-MP-094/MP-094-11.pdf. IEEE (1993). IEEE Standard VHDL (IEEE Std.1076-1993.). The Institute of Electrical and Electronics Engineers, Inc, New York, NY. IEEE (1995). IEEE Standard Test Language for all Systems–Common/Abbreviated Test Language for All Systems (C/ATLAS) (IEEE Std.716-1995.). The Institute of Electrical and Electronics Engineers, Inc, New York, NY. IEEE (1998). User’s manual for the signal and method modeling language. URL: http://grouper.ieee.org/groups/scc20/atlas/SMMLusers manual.doc. IEEE (1999). IEEE Standard VHDL Analog and Mixed-Signal Extensions (IEEE Std. 1076.1-1999.). The Institute of Electrical and Electronics Engineers, Inc, New York, NY. IEEE (2001). IEEE Standard Test Access Port and Boundary-Scan Architecture (IEEE Std.1149.1 -2001). The Institute of Electrical and Electronics Engineers, Inc, New York, NY. Model-Based X-in-the-Loop Testing 333 ISO/IEC (1998). Information technology - open systems interconnection - conformance testing methodology and framework - part 3: The tree and tabular combined notation (second edition). International Standard 9646-3. Karinsalo, M. and Abrahamsson, P. (2004). Software reuse and the test development process: a combined approach. In ICSR, Volume 3107 of Lecture Notes in Computer Science, Pages: 59–68. Springer. K¨arki, M., Karinsalo, M., Pulkkinen, P., Ma¨ki-Asiala, P., Ma¨ntyniemi, A., and Vouffo, A. (2005). Requirements specification of test system supporting reuse (2.0). Technical report, Tests & Testing Methodologies with Advanced Languages (TT-Medal). Karlsson, E.-A. (Ed.) (1995). Software Reuse: A Holistic Approach. John Wiley & Sons, Inc., New York, NY. Kendall, I. R. and Jones, R. P. (1999). An investigation into the use of hardware-in-theloop simulation testing for automotive electronic control systems. Control Engineering Practice 7 (11), 1343–1356. Lehmann, E. (2004). Time Partition Testing Systematischer Test des kontinuierlichen Verhaltens von eingebetteten Systemen. Ph. D. thesis, TU-Berlin, Berlin. Lu, B., McKay, W., Lentijo, S., Monti, X. W. A., and Dougal, R. (2002). The real time extension of the virtual test bed. In Huntsville Simulation Conference. Huntsville, AL. Lynch, N. A., Segala, R., Vaandrager, F. W., and Weinberg, H. B. (1995). Hybrid i/o automata. See Alur, Henzinger, and Sontag (1996), Pages: 496–510. Lynex, A. and Layzell, P. J. (1997). Understanding resistance to software reuse. In Proceedings of the 8th International Workshop on Software Technology and Engineering Practice (STEP ’97) (including CASE ’97), Pages: 339. IEEE Computer Society. Lynex, A. and Layzell, P. J. (1998). Organisational considerations for software reuse. Ann. Softw. Eng. 5, 105–124. M¨aki-Asiala, P. (2004). Reuse of TTCN-3 code. Master’s thesis, University of Oulu, Department of Electrical and Information Engineering, Finland. M¨aki-Asiala, P., Ka¨rki, M., and Vouffo, A. (2006). Guidelines and patterns for reusable TTCN-3 tests (1.0). Technical report, Tests & Testing Methodologies with Advanced Languages (TT-Medal). M¨aki-Asiala, P., Ma¨ntyniemi, A., Ka¨rki, M., and Lehtonen, D. (2005). General requirements of reusable TTCN-3 tests (1.0). Technical report, Tests & Testing Methodologies with Advanced Languages (TT-Medal). M¨antyniemi, A., Ma¨ki-Asiala, P., Karinsalo, M., and Ka¨rki, M. (2005). A process model for developing and utilizing reusable test assets (2.0). Technical report, Tests & Testing Methodologies with Advanced Languages (TT-Medal). The MathWorks (2010a). MATLAB r - the language of technical computing. URL: http:// www.mathworks.com/products/matlab/. The MathWorks (2010b). Web site of the Simulink r tool - simulation and model-based design. URL: http://www.mathworks.com/products/simulink/. 334 Model-Based Testing for Embedded Systems The MathWorks (2010c). Web site of the Stateflow r tool - design and simulate state machines and control logic. URL: http://www.mathworks.com/products/stateflow/. Modelica Association (2010). Modelica - a unified object-oriented language for physical systems modeling. URL: http://www.modelica.org/documents/ModelicaSpec30.pdf. Montenegro, S., Ja¨hnichen, S., and Maibaum, O. (2006). Simulation-based testing of embedded software in space applications. In Hommel, G. and Huanye, S. (Eds.), Embedded Systems - Modeling, Technology, and Applications, Pages: 73–82. Springer Netherlands. 10.1007/1-4020-4933-1 8. Neukirchen, H., Zeiß, B., and Grabowski, J. (2008, August). An Approach to Quality Engineering of TTCN-3 Test Specifications. International Journal on Software Tools for Technology Transfer (STTT), Volume 10, Issue 4. (ISSN 1433-2779), Pages: 309–326. Neukirchen, H., Zeiß, B., Grabowski, J., Baker, P., and Evans, D. (2008, June). Quality assurance for TTCN-3 test specifications. Software Testing, Verification and Reliability (STVR), Volume 18, Issue 2. (ISSN 0960-0833), Pages: 71–97. Parker, K. P. and Oresjo, S. (1991). A language for describing boundary scan devices. J. Electron. Test. 2 (1), 43–75. Poulin, J. and Caruso, J. (1993). A reuse metrics and return on investment model. In Proceedings of the Second International Workshop on Software Reusability, Pages: 152–166. SCC20 ATML Group (2006). IEEE ATML specification drafts and IEEE ATML status reports. Schieferdecker, I., Bringmann, E., and Grossmann, J. (2006). Continuous TTCN-3: testing of embedded control systems. In SEAS ’06: Proceedings of the 2006 international workshop on Software engineering for automotive systems, Pages: 29–36. ACM Press, New York, NY. Schieferdecker, I. and Grossmann, J. (2007). Testing embedded control systems with TTCN-3. In Obermaisser, R., Nah, Y., Puschner, P., and Rammig, F. (Eds.), Software Technologies for Embedded and Ubiquitous Systems, Volume 4761 of Lecture Notes in Computer Science, Pages: 125–136. Springer Berlin / Heidelberg. Schulz, S. (2008). Test suite development with TTCN-3 libraries. Int. J. Softw. Tools Technol. Transf. 10 (4), 327–336. Sharma, A., Grover, P. S., and Kumar, R. (2009). Reusability assessment for software components. SIGSOFT Softw. Eng. Notes 34 (2), 1–6. Suparjo, B., Ley, A., Cron, A., and Ehrenberg, H. (2006). Analog boundary-scan description language (ABSDL) for mixed-signal board test. In International Test Conference, Pages: 152–160. TEMEA (2010). Web site of the TEMEA project (Testing Methods for Embedded Systems of the Automotive Industry), founded by the European Community (EFRE). URL: http://www.temea.org. TEMEA Project (2010). Concepts for the specification of tests for systems with continuous or hybrid behaviour, TEMEA Deliverable. URL: http://www.temea.org/deliverables/ D2.4.pdf. Model-Based X-in-the-Loop Testing 335 Tripathi, A. K. and Gupta, M. (2006). Risk analysis in reuse-oriented software development. Int. J. Inf. Technol. Manage. 5 (1), 52–65. TT-Medal (2010). Web site of the TT-Medal project - tests & testing methodologies with advanced languages. URL: http://www.tt-medal.org/. Vector Informatics (2010). Web site of the CANoe tool - the development and test tool for can, lin, most, flexray, ethernet and j1708. URL: http://www.vector.com/vi canoe\ en.html. Zeiß, B. (2006). A Refactoring Tool for TTCN-3. Master’s thesis, ZFI-BM-2006-05, ISSN 1612-6793, Institute of Computer Science, Georg-August-Universita¨t G¨ottingen. Zeiß, B., Vega, D., Schieferdecker, I., Neukirchen, H., and Grabowski, J. (2007). Applying the ISO 9126 quality model to test specifications – exemplified for TTCN-3 test specifications. In Software Engineering 2007 (SE 2007). Lecture Notes in Informatics (LNI) 105. Copyright Gesellschaft fu¨r Informatik, Pages: 231–242. K¨ollen Verlag, Bonn. This page intentionally left blank Part IV Specific Approaches This page intentionally left blank 13 A Survey of Model-Based Software Product Lines Testing Sebastian Oster, Andreas Wu¨bbeke, Gregor Engels, and Andy Schu¨rr CONTENTS 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 13.2 Software Product Line Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 13.3 Testing Software Product Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 13.4 Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 13.5 Criteria for Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 13.6 Model-Based Test Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 13.6.1 CADeT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 13.6.2 ScenTED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 13.6.3 UML-based approach for validating software product lines . . . . . . . . . . . . . . . . . . . . . 360 13.6.4 Model-checking approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 13.6.5 Reusing state machines for automatic test case generation . . . . . . . . . . . . . . . . . . . . . . 367 13.6.6 Framework for product line test development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 13.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 13.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 13.1 Introduction Software product line (SPL) engineering is an approach to improve reusability of software within a range of products that share a common set of features [Bos00, CN01, PBvdL05]. Because of the systematic reuse, the time-to-market and costs for development and maintenance decrease, while the quality of the individual products increases. In this way, SPLs enable developers to provide rapid development of customized products. The concepts behind the product line paradigm are not new. Domains such as the automotive industry have successfully applied product line development for several years. The software developing industry has recently adopted the idea of SPLs. Especially when analyzing the development of embedded systems, it is evident that the product line paradigm has gained increasing importance, while developing products for particular domains, such as control units in the automotive domain [TH02, GKPR08]. In recent years, the development of software for embedded systems has changed to model-based approaches. These approaches are frequently used to realize and to implement SPLs. The development of mobile phone software [Bos05] or automotive system electronic controller units [vdM07] is an example of embedded systems, which are developed using an SPL approach. However, every concept of development is as sufficient and reliable as it is supported by concepts for testing. In single system engineering, testing often consumes up to 25% or even 339 340 Model-Based Testing for Embedded Systems 50% of the development costs [LL05]. Because of the variability within an SPL, the testing of SPLs is more challenging than single system testing. If these challenges are solved by adequate approaches, the benefits outweigh the higher complexity. For example, the testing of a component, which is to be re